KernelEval: Robust Evaluation for AI-Driven Kernel Generation

A comprehensive framework evaluating shape robustness, operator coverage, and hardware portability for GPU kernel generators.

Composite Score = Median_Speedup x (1 - Shape_CV) x Operator_Coverage x Hardware_Portability
3
Evaluation Axes
Shape, Operator, Hardware
8
Shape Categories
Tiny to Large, P2/Non-P2
7
Operator Categories
Forward + Backward
4
Hardware Backends
CUDA, ROCm, Metal, CPU

Composite Score Comparison

Score Component Breakdown

Shape Robustness: Speedup by Category

Operator Coverage: Forward vs Backward

Hardware Portability: Relative Performance

Detailed Generator Comparison

GeneratorMedian SpeedupShape CVRobustness (1-CV)Op CoverageHW PortabilityComposite
Baseline1.05x0.150.850.850.750.420
Fragile1.40x0.650.350.700.500.240
Robust1.20x0.100.900.900.750.520