Feasibility of Extracting Copyrighted Text from Production LLMs

A computational analysis of attack-defense dynamics in memorization, extraction, and safety measures across production language model configurations.

0.1257
Avg Standard Extraction Rate
0.3251
Avg Jailbreak Extraction Rate
0.8377
Avg Defense Effectiveness
0.1993
Avg Jailbreak Uplift
0.42
Memorization Scaling Exponent

Production Model Extraction Results

Standard vs Jailbreak Extraction Rates

Defense Effectiveness vs Memorization

Model Size Std Rate JB Rate Defense Eff. Memorization JB Uplift
Model-A175B 0.13260.3396 0.82650.7800.207
Model-B540B 0.14340.3782 0.84540.9460.235
Model-C65B 0.07800.1998 0.83070.4320.122
Model-D1000B 0.14880.3826 0.84820.9740.234
Average--- 0.12570.3251 0.83770.7830.199

Defense Configuration Analysis

Defense Effectiveness by Configuration

Defense Cost: False Positive & Jailbreak Vulnerability

Configuration Effectiveness FP Rate Quality Loss JB Vulnerability
No defense0.15770.0690.0000.100
Output filter0.70690.1200.0000.100
Activation cap0.33650.0690.0160.100
RLHF alignment0.81100.0690.0000.456
Refusal training0.77320.2410.0000.061
Filter + RLHF0.90160.1200.0000.456
Filter + refusal0.88850.2830.0000.061
RLHF + refusal0.87090.2410.0000.279
Full stack0.84270.2830.0160.279

Memorization Scaling Analysis

Memorization vs Model Size (Power Law)

Extraction vs Defense Strength by Model Size

Statistical Tests

Model 1 Model 2 Rate 1 Rate 2 z-stat p-value Significant Cohen's h
Model-AModel-B0.13260.1434-0.7000.484No0.031
Model-AModel-C0.13260.07803.978<0.001Yes0.179
Model-AModel-D0.13260.1488-1.0420.298No0.047
Model-BModel-C0.14340.07804.661<0.001Yes0.211
Model-BModel-D0.14340.1488-0.3420.732No0.015
Model-CModel-D0.07800.1488-4.993<0.001Yes0.226