Picking the Next Word

After processing the input, how does a language model choose which word comes next? Let's explore how models sample from probability distributions to generate text.

Token Sampling

The model assigns a score (logit) to every token in its vocabulary, reflecting its confidence. Instead of always choosing the top token, it samples from a probability distribution shaped by the sliders below, adding controlled randomness to the output.

Temperature

1.0

Temperature controls how deterministic the sampling is. Lower values make the model more confident and conservative, while higher values make it more creative but potentially less coherent.

Top-k tokens

Top-k sampling restricts token selection to only the k most likely tokens, helping to filter out low-probability options.

Token	Logit	e^logit	Probability
adapts	5.20	181.27	58.4%
grows	4.10	60.34	19.5%
evolves	3.70	40.45	13.0%
innovates	2.90	18.17	5.9%
improves	2.30	9.97	3.2%
Sum		338.68	100.0%

Completing the sentence:

The company quickly changes strategies and

adapts

58% probability

infrastructure