Picking the Next Word
After processing the input, how does a language model choose which word comes next? Let's explore how models sample from probability distributions to generate text.
After processing the input, how does a language model choose which word comes next? Let's explore how models sample from probability distributions to generate text.
The model assigns a score (logit) to every token in its vocabulary, reflecting its confidence. Instead of always choosing the top token, it samples from a probability distribution shaped by the sliders below, adding controlled randomness to the output.
Temperature controls how deterministic the sampling is. Lower values make the model more confident and conservative, while higher values make it more creative but potentially less coherent.
Top-k sampling restricts token selection to only the k most likely tokens, helping to filter out low-probability options.
| Token | Logit | e^logit | Probability |
|---|---|---|---|
| adapts | 5.20 | 181.27 | 58.4% |
| grows | 4.10 | 60.34 | 19.5% |
| evolves | 3.70 | 40.45 | 13.0% |
| innovates | 2.90 | 18.17 | 5.9% |
| improves | 2.30 | 9.97 | 3.2% |
| Sum | 338.68 | 100.0% |