Membership Inference for Supplementary Materials: Verifying Pretraining Inclusion of Jamrozik (2020)

Problem Statement

Lupyan et al. (2026) demonstrate that Gemini translates a Jabberwockified passage into content closely matching a legal pre-emption example from the supplementary materials of Jamrozik et al. (2020). This raises the question: does the model output reflect genuine pattern-based reconstruction, or retrieval of memorized pretraining text? We develop a multi-technique membership inference framework to quantify the evidence for or against pretraining inclusion.

Methods

N-gram Overlap

Precision, recall, F1, and Jaccard similarity for n=1..8. Decay profile distinguishes memorization from reconstruction.

Perplexity & Min-K%

Compare model perplexity on target vs. controls. Min-K% amplifies memorization signal via low-probability tokens.

Perturbation Detection

Generate 25 paraphrases; z-score measures if model prefers exact original wording over perturbations.

Reconstruction Fidelity

Token accuracy, LCS ratio, edit distance, and semantic preservation between target and output.

Aggregate Scoring

S = 0.20 · S_ngram + 0.25 · S_ppl + 0.30 · S_pert + 0.25 · S_fid

Thresholds: S > 0.65 = LIKELY_SEEN 0.35 < S ≤ 0.65 = UNCERTAIN S ≤ 0.35 = LIKELY_UNSEEN

Key Statistics

0.667

Bigram F1 (n=2)

0.824

LCS Ratio

0.773

Semantic Preservation

0.353

Token Accuracy

-11.71

z-score (Strong Memorization)

-1.11

z-score (No Memorization)

Interactive Charts

N-gram F1 Decay Profile

F1 score between Jamrozik target and model output vs. controls across n-gram lengths. Rapid decay indicates non-verbatim reproduction; slow decay would indicate memorization.

Aggregate Score vs. Memorization Boost

The membership score transitions sharply from LIKELY_UNSEEN (0.349) to LIKELY_SEEN (0.735) as memorization strength increases.

Perplexity: Target vs. Controls

Perplexity of the target passage under varying memorization boost compared with control passages.

Perturbation z-scores by Scenario

z-scores for original text vs. paraphrases. Values below -2 strongly indicate memorization.

Min-K% Sensitivity Across Thresholds

Seen vs. unseen Min-K% scores across different K thresholds, showing consistent separation.

Data Tables

N-gram Overlap: Target vs. Model Output

n	Target N-grams	Output N-grams	Shared	Precision	Recall	F1	Jaccard
1	38	37	31	0.838	0.816	0.827	0.705
2	47	46	31	0.674	0.660	0.667	0.500
3	48	47	25	0.532	0.521	0.526	0.357
4	48	47	18	0.383	0.375	0.379	0.234
5	47	46	12	0.261	0.255	0.258	0.148
6	46	45	8	0.178	0.174	0.176	0.096
7	45	44	4	0.091	0.089	0.090	0.047
8	44	43	2	0.047	0.045	0.046	0.024

Perplexity & Min-K% Scores

Passage / Boost	PPL	Mean LogP	Min-K%
Target (0.0)	6.551	-1.880	-2.704
Target (0.4)	4.795	-1.567	-2.453
Target (0.8)	3.293	-1.192	-1.970
Target (1.0)	2.568	-0.943	-1.625
Target (1.5)	1.670	-0.513	-1.377
Legal Control 1	9.614	-2.263	-2.994
Legal Control 2	9.286	-2.228	-2.860
Legal Control 3	9.276	-2.227	-3.053
Unrelated Ctrl 1	10.321	-2.334	-3.055
Unrelated Ctrl 2	9.768	-2.279	-3.001

Perturbation-Based Detection

Scenario	Orig PPL	Mean Pert PPL	Ratio	z-score
Seen Strong (1.0)	2.445	6.960	0.351	-11.71
Seen Moderate (0.6)	4.122	6.945	0.593	-5.44
Seen Weak (0.3)	5.102	6.943	0.735	-4.36
Unseen (0.0)	6.373	6.931	0.919	-1.11
Control: Legal 1				-0.28
Control: Legal 2				+0.43
Control: Legal 3				+0.77

Reconstruction Fidelity

Metric	Target vs. Output	Best Control
Token Accuracy	0.353	0.059
LCS Ratio	0.824	0.235
Norm. Edit Distance	0.176	0.880
Semantic Preservation	0.773	0.190

Min-K% Threshold Sensitivity

K (%)	Seen	Unseen	Control	Gap
5	-2.168	-3.031	-3.078	0.862
10	-2.128	-2.900	-3.165	0.772
20	-1.961	-2.593	-3.088	0.631
30	-1.867	-2.686	-2.895	0.819
40	-1.614	-2.581	-2.709	0.967
50	-1.593	-2.307	-2.623	0.714

Verdict Sensitivity: Aggregate Score by Memorization Boost

Boost	N-gram	Perplexity	Perturbation	Fidelity	Aggregate	Verdict
0.0	0.190	0.656	0.000	0.588	0.349	LIKELY_UNSEEN
0.1	0.190	0.695	0.349	0.588	0.463	UNCERTAIN
0.2	0.190	0.885	1.000	0.588	0.706	LIKELY_SEEN
0.3	0.190	0.986	1.000	0.588	0.732	LIKELY_SEEN
0.4	0.190	1.000	1.000	0.588	0.735	LIKELY_SEEN
0.5	0.190	1.000	1.000	0.588	0.735	LIKELY_SEEN
0.6	0.190	1.000	1.000	0.588	0.735	LIKELY_SEEN
0.8	0.190	1.000	1.000	0.588	0.735	LIKELY_SEEN
1.0	0.190	1.000	1.000	0.588	0.735	LIKELY_SEEN
1.5	0.190	1.000	1.000	0.588	0.735	LIKELY_SEEN

Key Findings

Ambiguous evidence: The aggregate membership inference score transitions from 0.349 (LIKELY_UNSEEN) at zero memorization boost to 0.735 (LIKELY_SEEN) at moderate boost, placing the Jamrozik case in the ambiguous region.
Non-verbatim reproduction: N-gram F1 decays from 0.827 (unigrams) to 0.046 (8-grams), indicating partial but not verbatim reproduction. All control passages achieve F1 = 0 at n ≥ 2.
Perturbation analysis is diagnostic but needs ground truth: The z-score ranges from -11.71 (strong memorization) to -1.11 (no memorization). The method is highly sensitive when memorization is present but ambiguous without it.
Structural preservation with lexical variation: The LCS ratio of 0.824 with token accuracy of only 0.353 indicates the model preserves passage structure while substituting synonyms at many positions.
Consistent Min-K% separation: The gap between seen and unseen scores is robust across K thresholds (0.631 to 0.967), with the largest separation at K = 40%.
Sharp transition regime: The framework reliably detects memorization once the boost exceeds 0.2, but the observed output falls in the zone where both explanations are plausible.