Compute Sufficiency in PIM

Quantifying the Processing-in-Memory compute gap for LLM inference under DRAM process constraints across nodes, power budgets, model sizes, and precision formats.

7.3%
PIM Sufficiency (7B, 2W)
0.8%
PIM Sufficiency (70B, 2W)
18.4%
Max PIM (5W, 10nm, INT4)
1.40
PNM Sufficiency (7B)
13.7x
Improvement Needed (7B)

Architecture Comparison

Compute Sufficiency Ratio by Architecture (log scale)

PIM Power Budget Sweep (7B model)

PIM Sufficiency: Power Budget vs Model Size

Energy Efficiency (GOPS/W)

Detailed Results

Architecture Comparison (14nm, 2W, INT8)

ModelArchGOPSLatency (ms)SufficiencyGOPS/W
7BPIM7.16840.0735.06
PNM20035.71.39950.6
GPU623K2.321.861558
70BPIM7.065130.0085.06
PNM2003150.15950.6
GPU624K20.22.4771561

PIM Sufficiency vs Power Budget

Power (W)7B13B30B70B
0.50.0180.0100.0040.002
1.00.0370.0190.0080.004
2.00.0730.0380.0160.008
3.00.1100.0580.0230.012
5.00.1840.0960.0390.020

Precision Format Impact (14nm, 2W)

Precision7B13B30B70B
FP160.0730.0380.0160.008
INT80.0740.0380.0160.008
INT40.0730.0380.0150.008