Prevalence of Security Vulnerabilities in Agent Skills

A large-scale simulation study of 2,500 synthetic agent skill packages across 8 categories and 10 vulnerability classes, calibrated to publicly reported ecosystem properties.

Category: cs.CR - Cryptography and Security

75.96%
Overall Prevalence
2,500
Skills Scanned
3,863
Total Vulnerabilities
27.48%
Critical Prevalence
1.5452
Mean Vulns / Skill

Problem & Methods

Agent skills are modular packages containing SKILL.md instructions and optional bundled scripts distributed via public marketplaces. Despite rapid adoption, the security posture of these skill packages remains largely uncharacterized.

Simulation Framework

We model a marketplace of N = 2,500 agent skill packages. Each skill is characterized by its category, code complexity, number of bundled scripts, requested permissions, popularity tier, and vetting status. A simulated multi-layer vulnerability scanner evaluates each skill against 10 vulnerability classes. The simulation uses a fixed random seed for full reproducibility.


For each skill-vulnerability pair, the detection probability is computed as: p_v = min(0.95, r_v * m_{c,v} * log(complexity+1)/log(101) * (1 + 0.08 * n_perms) * f_vet), where r_v is the base rate for vulnerability class v, m_{c,v} is the category-specific multiplier, and f_vet is the vetting reduction factor (1.0 for unreviewed, 0.65 for auto-scanned, 0.30 for human-reviewed).

Interactive Results

Explore vulnerability prevalence across multiple dimensions: vulnerability classes, skill categories, vetting status, popularity tiers, and code complexity.

Prevalence by Vulnerability Class

Instance Count by Vulnerability Class

Prevalence by Skill Category

Mean Vulnerabilities per Skill by Category

Prevalence by Vetting Status

Vetting Pipeline Distribution

Prevalence by Popularity Tier

Prevalence by Code Complexity

Severity Distribution per Vulnerability Class

Top Vulnerability Co-occurrence Pairs (Conditional Probability)

Data Tables

Detailed numerical results from the simulation study.

Overall Vulnerability Prevalence (N = 2,500)

MetricValue
Skills scanned2,500
Vulnerable skills1,899
Overall prevalence0.7596
Critical prevalence0.2748
High-or-critical prevalence0.5216
Total vulnerabilities3,863
Mean vulns per skill1.5452
Mean vulns per vulnerable skill2.0342

Prevalence and Severity by Vulnerability Class

Vulnerability ClassPrevalenceCriticalHighMediumLowCount
Missing input validation0.29920.04810.18320.43980.3289748
Excessive permissions0.29320.11730.26060.37520.2469733
Supply chain integrity0.20440.25050.31900.31900.1115511
Prompt injection0.16800.24050.38100.29290.0857420
Credential leakage0.16360.33740.32270.24450.0954409
Path traversal0.12160.18420.34870.30260.1645304
Data exfiltration0.11960.35790.24080.28430.1171299
Arbitrary code execution0.08600.41860.34420.20930.0279215
Dependency confusion0.05720.30770.32170.29370.0769143
Insecure deserialization0.03240.33330.25930.29630.111181

Vulnerability Prevalence by Skill Category

CategoryNVulnerablePrevalenceCriticalMean Vulns
Security tools1531240.81050.32031.7386
System admin2922330.79790.31851.8014
Web automation3612840.78670.26591.6205
Data analysis4083140.76960.27941.6152
File management2431830.75310.25511.4897
Misc2321720.74140.24571.4828
Communication2471830.74090.27941.4170
Coding5644060.71990.26061.3670

Vulnerability Prevalence by Vetting Status

Vetting StatusNPrevalenceCritical
Unreviewed1,3860.85860.3341
Auto-scanned7710.73020.2374
Human-reviewed3430.42570.1195

Vulnerability Prevalence by Popularity Tier

PopularityNPrevalenceCritical
Low1,4030.80760.2937
Medium6980.72490.2564
High2720.66910.2537
Very High1270.61420.2126

Vulnerability Prevalence by Code Complexity

Complexity TierNPrevalence
Tiny (<50 lines)7740.6370
Small (50-200)1,1240.7891
Medium (200-500)3910.8568
Large (500-2000)1940.8608
Very Large (2000+)171.0000

Key Findings

Primary findings from the simulation-based measurement study of agent skill security.

High Overall Prevalence

75.96% of all skills contain at least one vulnerability, with a mean of 1.5452 vulnerabilities per skill. This is substantially worse than mature package ecosystems such as npm (10-15%).

Critical Severity Concentration

27.48% of skills contain critical-severity vulnerabilities, and 52.16% contain high or critical issues. Arbitrary code execution has the highest critical rate at 41.86%.

Dominant Vulnerability Classes

Missing input validation (29.92% prevalence) and excessive permissions (29.32%) are the most common. Supply chain integrity gaps affect 20.44% of skills.

Category Risk Variation

Security tools (81.05%) and system administration (79.79%) skills are the most vulnerable. Paradoxically, security tools have the highest vulnerability rate in the ecosystem.

Vetting Effectiveness

Human-reviewed skills show 42.57% prevalence vs. 85.86% for unreviewed -- a 43.29 percentage-point absolute reduction. However, only 13.7% of marketplace skills have human review.

Complexity-Prevalence Gradient

Prevalence increases from 63.70% for tiny skills (<50 lines) to 86.08% for large skills (500-2000 lines), and 100% for very large skills (2000+ lines).