Investigating the tension between data fidelity and cross-domain robustness in tool-augmented multi-turn dialogue systems.
| Metric | SQL In-Domain | SQL Cross-Domain | SQL Gap | Template In-Domain | Template Cross-Domain | Template Gap |
|---|---|---|---|---|---|---|
| Dialogue Success | 0.974 | 0.100 | 0.874 | 0.707 | 0.194 | 0.513 |
| Tool Accuracy | 0.998 | 0.065 | 0.933 | 0.747 | 0.156 | 0.592 |
| State Tracking | 1.000 | 0.027 | 0.974 | 0.686 | 0.088 | 0.598 |
| Property | SQL-Executable | Template-Based |
|---|---|---|
| Data Fidelity | 0.92 | 0.71 |
| Environment Coupling | 0.75 | 0.20 |
| Relative Gap | 89.75% | 72.57% |