Picking the Next Word

After processing the input, how does a language model choose which word comes next? Let's explore how models sample from probability distributions to generate text.

Token Sampling

The model assigns a score (logit) to every token in its vocabulary, reflecting its confidence. Instead of always choosing the top token, it samples from a probability distribution shaped by the sliders below, adding controlled randomness to the output.

Temperature

1.0

Temperature controls how deterministic the sampling is. Lower values make the model more confident and conservative, while higher values make it more creative but potentially less coherent.

Top-k tokens

5

Top-k sampling restricts token selection to only the k most likely tokens, helping to filter out low-probability options.

TokenLogite^logitProbability
adapts5.20
181.27
58.4%
grows4.10
60.34
19.5%
evolves3.70
40.45
13.0%
innovates2.90
18.17
5.9%
improves2.30
9.97
3.2%
Sum338.68100.0%
Completing the sentence:
The company quickly changes strategies and
adapts
58% probability
infrastructure