A three-layer framework combining structured kernel design spaces, Hierarchical Constrained Monte Carlo Tree Search (HC-MCTS), and mixed-initiative interaction for GPU kernel optimization.
GPU kernel optimization involves navigating a vast combinatorial design space. Purely autonomous agents waste compute on configurations experts would reject; purely manual tuning cannot scale. CGAS addresses this with three layers:
| Constraints | Size | Reduction | Mean TFLOPS |
|---|---|---|---|
| None | 230,400 | 0.0% | 8.04 |
| Tile M >= 64 | 138,240 | 40.0% | 8.89 |
| + Vec >= 2 | 103,680 | 55.0% | 10.82 |
| + Tile N >= 64 | 62,208 | 73.0% | 12.09 |
| + Block >= 64 | 46,656 | 79.8% | 12.75 |
| + Tile K >= 16 | 34,992 | 84.8% | 12.33 |
Both agent-only and human-assisted strategies reach 15 TFLOPS within 15 evaluations and 19+ TFLOPS within ~21 evaluations. Primary benefit of human expertise: design space reduction and focused evaluation rather than faster convergence.