Workspace/

Compare

Comparing on

Finance · earnings Q&A

412 cases · ran 14 minutes ago
Model A
v2.4
81
avg score
Faithfulness82
Factual consistency79
Answer relevanceBEST88
Hallucination rate71
Citation accuracy84
Cost / 1k$0.42
p50 latency1.4s
Model B
v1.9
89
avg score
FaithfulnessBEST91
Factual consistencyBEST89
Answer relevance86
Hallucination rateBEST88
Citation accuracyBEST90
Cost / 1k$0.96
p50 latency2.1s
Model C
v3.1
64
avg score
Faithfulness68
Factual consistency64
Answer relevance79
Hallucination rate52
Citation accuracy58
Cost / 1k$0.18
p50 latency0.8s