Multi-Token Prediction Performance on GSM8K Mathematical Reasoning
Table of Links
Abstract and 1. Introduction
2. Method
3. Experiments on real data
3.1. Benefits scale with model size and 3.2. Faster inference
3.3. Learning global patterns with multi-byte prediction...