Macro-Level Impact of Large Language Models on the Scientific Enterprise

A Cross-Disciplinary Bibliometric Analysis using Difference-in-Differences Estimation across 8 Disciplines (2018-2025)

36.05%
Mean Pub. Volume Increase
5.29x
LLM Vocab Signal Increase
9.75
Mean Composite Impact
8
Disciplines Analyzed

Problem & Methods

Quantifying how LLMs have systematically altered scientific production across disciplines since the release of ChatGPT in late 2022.

Research Question

What is the macro-level impact of Large Language Models on the scientific enterprise? This work develops a comprehensive cross-disciplinary framework using synthetic bibliometric time-series data calibrated to observed publication trends from 2018 to 2025. The framework measures publication volume, citation impact, research novelty, interdisciplinary collaboration, LLM vocabulary signals, retraction rates, review turnaround times, and collaboration breadth across 8 disciplines spanning STEM, social sciences, medicine, and the humanities.

Methodology

Difference-in-Differences (DiD): Disciplines are partitioned into treatment (high LLM adoption: Computer Science at 0.85, Medicine at 0.55, Physics at 0.52) and control groups (low adoption: Mathematics at 0.35, Psychology at 0.38, Humanities at 0.22). The DiD estimator isolates causal LLM effects beyond organic growth trends.

Composite Impact Index: Aggregates normalized changes across 7 metrics with weights: publication volume (0.20), citations (0.15), novelty (0.15), interdisciplinary fraction (0.15), retraction rate (0.10), review turnaround (0.10), collaboration breadth (0.15).

Key Metrics Overview

+86.4%
CS Publication Volume Change
-18.92%
CS Novelty Decline
816.29%
CS Vocab Signal Growth
20.09
CS Composite Impact
0.881
Adoption-Pub. Correlation (rho)
-0.905
Adoption-Novelty Correlation (rho)

Interactive Charts

Explore the bibliometric trends across disciplines. Use the tabs to switch between different visualizations.

Publication Volume Over Time (Thousands)

Novelty Index Over Time

LLM Vocabulary Signal Over Time

Composite Impact Score by Discipline

Collaboration Breadth Over Time

Difference-in-Differences Estimates (t-statistics)

Adoption-Outcome Relationships

Scatter plots showing how LLM adoption level correlates with key outcomes across disciplines.

Adoption vs. Publication Growth (rho = 0.881, p = 0.004)

Adoption vs. Novelty Decline (rho = -0.905, p = 0.002)

Data Tables

Detailed numerical results from the analysis.

Difference-in-Differences Analysis

Metric DiD Estimate Std Error t-statistic p-value Significant
Publications (K)102.3439.872.5670.014Yes
Mean Citations0.0520.1690.3110.758No
Novelty Index-0.0230.008-2.8090.007Yes
Interdisc. Fraction0.0070.0080.7870.435No
LLM Vocab Signal0.0310.0074.586<0.001Yes
Retraction Rate-0.0010.172-0.0070.995No
Review Turnaround-0.7621.622-0.4700.641No
Collab. Breadth0.0860.0980.8790.384No

Discipline-Level Impact Scores

Discipline Adoption Pub. Change % Novelty Change % Vocab Signal % Interdisc. % Collab. % Composite
Computer Science0.85+86.40-18.92816.29+30.66+17.6720.09
Medicine0.55+47.60-16.28635.69+25.81+18.6111.93
Biology0.48+36.48-13.58343.13+23.91+17.849.87
Psychology0.38+29.48-8.49282.39+27.57+16.319.56
Physics0.52+29.35-12.90564.09+28.44+15.978.78
Economics0.42+28.44-10.55480.39+23.22+15.837.98
Mathematics0.35+22.25-12.46371.70+25.70+14.215.95
Humanities0.22+8.41-7.39143.53+19.81+13.823.81

Spearman Correlations: LLM Adoption vs. Outcome Changes

Metric Spearman rho p-value Significant
Publications0.8810.004Yes
Mean Citations0.0480.911No
Novelty Index-0.9050.002Yes
Interdisc. Fraction0.6910.058No
Retraction Rate0.0480.911No

LLM Vocabulary Signal Analysis

Discipline Pre-LLM Mean Post-LLM Mean Fold Change t-statistic p-value Significant
Computer Science0.00890.08179.168.9650.011Yes
Medicine0.00850.06257.363.9650.056No
Physics0.00910.06026.647.8290.014Yes
Economics0.00850.04935.806.2710.014Yes
Mathematics0.00970.04564.7215.054<0.001Yes
Biology0.01180.05254.435.9260.027Yes
Psychology0.01060.04053.824.6430.041Yes
Humanities0.01260.03072.444.1090.044Yes
Aggregate0.01000.05295.2911.411<0.001Yes

Cluster-Level Heterogeneity

Cluster Disciplines Mean Adoption Mean Composite Std Composite Mean Pub. Change % Mean Novelty Change %
STEMCS, Physics, Biology, Math0.5511.176.17+43.62-14.46
MedicalMedicine0.5511.930.00+47.60-16.28
Social SciencesEconomics, Psychology0.408.771.12+28.96-9.52
HumanitiesHumanities0.223.810.00+8.41-7.39

Key Findings

1
Publication Volume Surge: Mean publication volume increased by 36.05% across all 8 disciplines, with Computer Science experiencing the largest increase at 86.4% and Humanities the smallest at 8.41%. The DiD estimate of 102.34 thousand papers (p = 0.014) confirms this is beyond organic trends.
2
Novelty-Quantity Tradeoff: Research novelty declined across all disciplines. The DiD estimate for novelty was -0.023 (p = 0.007). LLM adoption and novelty change are strongly negatively correlated (rho = -0.905, p = 0.002), indicating LLMs lower the barrier to writing but may encourage templated output.
3
Pervasive LLM Vocabulary Signal: The aggregate LLM vocabulary signal increased 5.29-fold post-2023 (from 0.01 to 0.0529). Computer Science showed the highest signal at 0.0817 (9.16-fold increase). The DiD estimate of 0.031 (p < 0.001) is the most statistically significant effect observed.
4
Discipline Heterogeneity: High-adoption disciplines (adoption >= 0.48) exhibited a composite impact of 13.6 vs. 6.44 for low-adoption fields -- a 2.11:1 ratio. STEM fields showed a mean composite of 11.17 compared to 3.81 for Humanities.
5
Strong Adoption Correlations: LLM adoption level is positively correlated with publication growth (rho = 0.881, p = 0.004) and negatively correlated with novelty change (rho = -0.905, p = 0.002). Citation impact and retraction rates show no significant relationship with adoption.
6
Non-Significant but Directional Effects: Citation impact (DiD = 0.052, p = 0.758), interdisciplinary collaboration (DiD = 0.007, p = 0.435), retraction rates (DiD = -0.001, p = 0.995), review turnaround (DiD = -0.762, p = 0.641), and collaboration breadth (DiD = 0.086, p = 0.384) all showed directional changes consistent with LLM influence but did not reach statistical significance.