Mechanism of In-Context Learning in Transformers

0.936

Best ICL Accuracy (Task Retrieval, k=8)

Mechanisms Compared

Layers Analyzed

Tasks Evaluated

Task retrieval achieves the highest individual accuracy, matching oracle at k=8 demonstrations.
Implicit gradient descent shows steady improvement but requires many more examples.
Induction heads provide a robust but lower-accuracy baseline via pattern matching.
Layer-wise analysis reveals depth-dependent specialization: early layers for task ID, later layers for pattern matching.
ICL is best understood as a multi-mechanism process, not a single algorithm.