SQL-Executable Pipeline & Cross-Domain Generalization

Investigating the tension between data fidelity and cross-domain robustness in tool-augmented multi-turn dialogue systems.

0.974
SQL In-Domain Success
0.100
SQL Cross-Domain Success
0.707
Template In-Domain Success
0.194
Template Cross-Domain

Generalization Gap

In-Domain vs Cross-Domain Performance

Generalization Gap by Metric

Fidelity vs Environment Coupling

Relative Generalization Gap

Detailed Results

Pipeline Comparison

MetricSQL In-DomainSQL Cross-DomainSQL GapTemplate In-DomainTemplate Cross-DomainTemplate Gap
Dialogue Success0.9740.1000.8740.7070.1940.513
Tool Accuracy0.9980.0650.9330.7470.1560.592
State Tracking1.0000.0270.9740.6860.0880.598

Pipeline Properties

PropertySQL-ExecutableTemplate-Based
Data Fidelity0.920.71
Environment Coupling0.750.20
Relative Gap89.75%72.57%