Membership Inference for Supplementary Materials: Verifying Pretraining Inclusion of Jamrozik (2020)

A four-technique computational framework for assessing whether specific documents were included in an LLM's pretraining data, applied to the Jamrozik et al. (2020) supplementary case raised by Lupyan et al. (2026).

Category: CL (Computation and Language) Track: Research

Problem Statement

Lupyan et al. (2026) demonstrate that Gemini translates a Jabberwockified passage into content closely matching a legal pre-emption example from the supplementary materials of Jamrozik et al. (2020). This raises the question: does the model output reflect genuine pattern-based reconstruction, or retrieval of memorized pretraining text? We develop a multi-technique membership inference framework to quantify the evidence for or against pretraining inclusion.

Methods

1

N-gram Overlap

Precision, recall, F1, and Jaccard similarity for n=1..8. Decay profile distinguishes memorization from reconstruction.

2

Perplexity & Min-K%

Compare model perplexity on target vs. controls. Min-K% amplifies memorization signal via low-probability tokens.

3

Perturbation Detection

Generate 25 paraphrases; z-score measures if model prefers exact original wording over perturbations.

4

Reconstruction Fidelity

Token accuracy, LCS ratio, edit distance, and semantic preservation between target and output.

Aggregate Scoring

S = 0.20 · Sngram + 0.25 · Sppl + 0.30 · Spert + 0.25 · Sfid

Thresholds: S > 0.65 = LIKELY_SEEN   0.35 < S ≤ 0.65 = UNCERTAIN   S ≤ 0.35 = LIKELY_UNSEEN

Key Statistics

0.667
Bigram F1 (n=2)
0.824
LCS Ratio
0.773
Semantic Preservation
0.353
Token Accuracy
-11.71
z-score (Strong Memorization)
-1.11
z-score (No Memorization)

Interactive Charts

N-gram F1 Decay Profile

F1 score between Jamrozik target and model output vs. controls across n-gram lengths. Rapid decay indicates non-verbatim reproduction; slow decay would indicate memorization.

Aggregate Score vs. Memorization Boost

The membership score transitions sharply from LIKELY_UNSEEN (0.349) to LIKELY_SEEN (0.735) as memorization strength increases.

Perplexity: Target vs. Controls

Perplexity of the target passage under varying memorization boost compared with control passages.

Perturbation z-scores by Scenario

z-scores for original text vs. paraphrases. Values below -2 strongly indicate memorization.

Min-K% Sensitivity Across Thresholds

Seen vs. unseen Min-K% scores across different K thresholds, showing consistent separation.

Data Tables

N-gram Overlap: Target vs. Model Output

nTarget N-gramsOutput N-gramsSharedPrecisionRecallF1Jaccard
13837310.8380.8160.8270.705
24746310.6740.6600.6670.500
34847250.5320.5210.5260.357
44847180.3830.3750.3790.234
54746120.2610.2550.2580.148
6464580.1780.1740.1760.096
7454440.0910.0890.0900.047
8444320.0470.0450.0460.024

Perplexity & Min-K% Scores

Passage / BoostPPLMean LogPMin-K%
Target (0.0)6.551-1.880-2.704
Target (0.4)4.795-1.567-2.453
Target (0.8)3.293-1.192-1.970
Target (1.0)2.568-0.943-1.625
Target (1.5)1.670-0.513-1.377
Legal Control 19.614-2.263-2.994
Legal Control 29.286-2.228-2.860
Legal Control 39.276-2.227-3.053
Unrelated Ctrl 110.321-2.334-3.055
Unrelated Ctrl 29.768-2.279-3.001

Perturbation-Based Detection

ScenarioOrig PPLMean Pert PPLRatioz-score
Seen Strong (1.0)2.4456.9600.351-11.71
Seen Moderate (0.6)4.1226.9450.593-5.44
Seen Weak (0.3)5.1026.9430.735-4.36
Unseen (0.0)6.3736.9310.919-1.11
Control: Legal 1-0.28
Control: Legal 2+0.43
Control: Legal 3+0.77

Reconstruction Fidelity

MetricTarget vs. OutputBest Control
Token Accuracy0.3530.059
LCS Ratio0.8240.235
Norm. Edit Distance0.1760.880
Semantic Preservation0.7730.190

Min-K% Threshold Sensitivity

K (%)SeenUnseenControlGap
5-2.168-3.031-3.0780.862
10-2.128-2.900-3.1650.772
20-1.961-2.593-3.0880.631
30-1.867-2.686-2.8950.819
40-1.614-2.581-2.7090.967
50-1.593-2.307-2.6230.714

Verdict Sensitivity: Aggregate Score by Memorization Boost

BoostN-gramPerplexityPerturbationFidelityAggregateVerdict
0.00.1900.6560.0000.5880.349LIKELY_UNSEEN
0.10.1900.6950.3490.5880.463UNCERTAIN
0.20.1900.8851.0000.5880.706LIKELY_SEEN
0.30.1900.9861.0000.5880.732LIKELY_SEEN
0.40.1901.0001.0000.5880.735LIKELY_SEEN
0.50.1901.0001.0000.5880.735LIKELY_SEEN
0.60.1901.0001.0000.5880.735LIKELY_SEEN
0.80.1901.0001.0000.5880.735LIKELY_SEEN
1.00.1901.0001.0000.5880.735LIKELY_SEEN
1.50.1901.0001.0000.5880.735LIKELY_SEEN

Key Findings