Standardized benchmarks for measuring SOP compliance in LLM-based customer support
| Agent | UJCS | Adherence | Step Compl. | Depend. |
|---|---|---|---|---|
| Claude-3.5 | 0.829 | 0.847 | 0.870 | 0.790 |
| GPT-4o | 0.793 | 0.810 | 0.836 | 0.754 |
| Gemini-Pro | 0.759 | 0.776 | 0.801 | 0.720 |
| Mistral-Large | 0.715 | 0.731 | 0.758 | 0.677 |
| Llama-70B | 0.679 | 0.695 | 0.720 | 0.639 |