The Skill Formation Paradox

How AI Coding Tools Boost Productivity While Impeding Novice Developer Learning

The Open Problem

AI coding assistants provide substantial productivity gains to novice software developers. But do these gains come at the cost of genuine skill development? This is one of the most consequential open questions in computing education and workforce development.

Shen et al. (2026) document that junior developers receive disproportionately large productivity boosts from AI tools, yet explicitly identify the effect on skill formation as unknown. Our work addresses this gap through computational cognitive modeling.

Research Question: What is the effect of AI coding tools on the skill formation of junior or less experienced software developers who experience productivity gains when using AI assistance?

Theoretical Foundation

Retrieval-Based Strengthening

Skills consolidate through active recall and application. AI tools that provide ready-made solutions may bypass this retrieval process (Bjork & Bjork, 1992).

Desirable Difficulty

Moderate challenge during practice enhances long-term retention, even at the cost of immediate performance. AI reconfigures this trade-off (Bjork, 1994).

Skill Compilation

Declarative knowledge becomes procedural through practice. If AI handles the procedural step, the compilation process is interrupted (Anderson, 1982).

Methodology

We simulate a 12-month, three-arm randomized trial with 80 novice developers per condition (240 total). Each developer encounters 5 coding tasks per day over 252 working days, with monthly tool-removed skill assessments.

Three Experimental Conditions

Control (No AI)

Developers work without any AI assistance. Full cognitive engagement on every task. This is the baseline for skill development.

Unrestricted AI

Full access to AI coding assistant with passive acceptance behavior. Difficulty is reduced; cognitive processing depth drops to ~15%.

Scaffolded AI

AI access with mandatory engagement: developers must read, modify, and explain AI output before proceeding. Processing depth maintained at ~70%.

Six Skill Dimensions

DimensionDescriptionAI Weight
Syntactic FluencyWriting correct code from specifications0.80
Algorithmic ReasoningSolving novel computational problems0.50
DebuggingLocating and fixing defects0.35
Code ComprehensionReading and predicting code behavior0.25
Architectural JudgmentSystem-level design evaluation0.15
Autonomous LearningLearning new frameworks independently0.10

Key Results

-1.04
Cohen's d: Unrestricted AI vs Control
(Large negative effect on skill)
-0.04
Cohen's d: Scaffolded AI vs Control
(Negligible effect on skill)
0.75
Crossover Threshold
(Processing depth where AI becomes beneficial)

Skill Growth Over 12 Months

Overall Skill Trajectories (Tool-Removed Assessments)
No AI (Control)
Unrestricted AI
Scaffolded AI
ConditionInitial SkillFinal SkillGrowthCohen's d vs Control
Control (No AI)0.2380.643+0.404
Unrestricted AI0.2280.562+0.334-1.04
Scaffolded AI0.2360.641+0.405-0.04

Dimension-Specific Analysis

The impact of AI is not uniform across skill dimensions. Skills that AI automates most effectively suffer the greatest impairment under unrestricted use.

Effect Sizes by Skill Dimension (Cohen's d vs Control)
Unrestricted AI
Scaffolded AI
Final Skill Levels by Dimension and Condition
No AI (Control)
Unrestricted AI
Scaffolded AI
DimensionAI WtControlUnrestrictedScaffoldedd (Unr.)d (Scaf.)
Syntactic Fluency0.800.6510.3900.650-5.10-0.02
Algorithmic Reasoning0.500.6480.5660.660-2.07+0.34
Debugging0.350.6660.6150.647-1.28-0.59
Code Comprehension0.250.6620.6200.649-1.21-0.42
Architectural Judgment0.150.6640.6480.656-0.44-0.22
Autonomous Learning0.100.5660.5350.582-0.72+0.30

The Spearman correlation between AI automation weight and the unrestricted AI effect size is -0.94, confirming that AI most impairs the very skills where it provides the most help.

The Productivity-Skill Paradox

The central finding is a dissociation between observed productivity and underlying skill. Unrestricted AI users appear more productive in daily work, yet possess weaker skills when assessed without AI tools. This creates a "dependency trap" invisible under continued AI access.

3.69
Tasks/day with AI
(Unrestricted AI group)
3.21
Tasks/day with AI
(Control group)
0.562
Skill without AI
(Unrestricted AI group)
0.643
Skill without AI
(Control group)
Productivity vs Underlying Skill at Month 12
Implication: Organizations evaluating developer performance based on AI-assisted output metrics will systematically overestimate the capability of developers who rely heavily on AI tools.

Dependency Index Over Time

AI Dependency Index (higher = more dependent on AI tools)
Unrestricted AI
Scaffolded AI

Sensitivity Analysis: The Crossover Threshold

By systematically varying the cognitive processing depth parameter, we identify the threshold at which AI assistance transitions from skill-harming to skill-enhancing.

At processing depth 0.15: AI HARMS skill formation. The skill delta is negative.

The crossover occurs at ~0.75. Below this, AI reduces net skill development. Above it, the benefits of reduced difficulty and increased success rate outweigh the cost of reduced cognitive effort.

Skill Delta (AI - Control) vs Processing Depth
Design Implication: AI tools that ensure developers engage with at least 75% of the cognitive depth of unaided work will produce net-positive skill outcomes. Current frictionless code completion (~15% processing depth) falls far below this threshold.

Implications & Conclusion

For Tool Designers

Incorporate scaffolding features that promote active engagement: explain-before-accept prompts, modification requirements, and progressive withdrawal of assistance as skills develop.

For Engineering Managers

Supplement AI-assisted productivity metrics with periodic tool-removed skill assessments. The gap between measured productivity and genuine skill is a hidden organizational risk.

For Educators

Integrate AI tools into curricula with explicit scaffolding protocols rather than unrestricted access. Teach students to evaluate and modify rather than merely accept AI output.

For Researchers

Prioritize empirical studies that disentangle productivity from skill, measure multiple skill dimensions, and test engagement-mode interventions. We recommend a Randomized Longitudinal Skill Assessment (RLSA) design as the most direct path to validating these predictions.

Testable Prediction 1

AI-induced skill deficits are largest for syntactic/algorithmic skills and smallest for architectural/meta-cognitive skills.

Testable Prediction 2

Active engagement protocols substantially reduce or eliminate the skill deficit across all dimensions.

Testable Prediction 3

Tool-removed assessments reveal skill gaps invisible in AI-assisted performance metrics.

Testable Prediction 4

Interventions pushing processing depth above ~0.75 flip the AI effect from negative to positive.

Key Takeaway: The skill formation paradox is not an argument against AI coding tools -- it is an argument for designing them thoughtfully, with attention to the cognitive processes that drive genuine skill development.