Mechanism of In-Context Learning in Transformers

Comparing three hypothesized mechanisms for ICL without parameter updates

0.936
Best ICL Accuracy (Task Retrieval, k=8)
3
Mechanisms Compared
4
Layers Analyzed
50
Tasks Evaluated

ICL Accuracy vs Demonstrations

Layer-wise Mechanism Contributions

Task Retrieval Probability

Mechanism Comparison at k=8

Key Findings