Model
|
IND
|
OOD
|
|||||||||||
|
|
|
|
|
|||||||||
|
|
|
|
|
|||||||||
Accuracy | M-Rec | M-Pre | M-F1 | #failed | Accuracy | M-Rec | M-Pre | M-F1 | #failed | ||||
GPT-4 Turbo | 0.384 | 0.487 | 0.381 | 0.326 | 0 | 0.488 | 0.496 | 0.392 | 0.392 | 0 | |||
Gemini Pro | 0.128 | 0.385 | 0.352 | 0.136 | 1 | 0.147 | 0.359 | 0.390 | 0.123 | 0 | |||
General-purpose LLMs
|
Claude 2.1 | 0.508 | 0.347 | 0.344 | 0.275 | 10 | 0.362 | 0.394 | 0.400 | 0.277 | 4 | ||
Llama-2 13B-chat | 0.473 | 0.333 | 0.333 | 0.333 | 0 | 0.419 | 0.338 | 0.339 | 0.324 | 0 | |||
Mistral-7B-Instruct-v0.2 | 0.442 | 0.323 | 0.325 | 0.324 | 0 | 0.422 | 0.338 | 0.351 | 0.327 | 0 | |||
E-commerce LLM
|
EcomGPT | 0.147 | 0.101 | 0.101 | 0.091 | 444 | 0.125 | 0.125 | 0.092 | 0.096 | 455 | ||
SoTA task-specific model
|
DeBERTaV3 | 0.762 | 0.575 | 0.620 | 0.588 | 0 | 0.658 | 0.514 | 0.570 | 0.507 | 0 | ||
RGCN | 0.615 | 0.665 | 0.637 | 0.506 | 0 | 0.576 | 0.373 | 0.372 | 0.356 | 0 | |||
Flan-T5 XXL | 0.754 | 0.516 | 0.511 | 0.508 | 0 | 0.663 | 0.506 | 0.468 | 0.466 | 0 | |||
Llama-2 13B-chat | 0.769 | 0.530 | 0.517 | 0.521 | 0 | 0.690 | 0.520 | 0.472 | 0.483 | 0 | |||
Task-specific | Llama-2 7B-chat | 0.774 | 0.541 | 0.628 | 0.537 | 0 | 0.695 | 0.526 | 0.803 | 0.498 | 0 | ||
Mistral-7B Instruct-v0.2 | 0.782 | 0.547 | 0.689 | 0.543 | 0 | 0.711 | 0.532 | 0.808 | 0.502 | 0 | |||
Flan-T5 XL | 0.704 | 0.467 | 0.496 | 0.460 | 0 | 0.592 | 0.471 | 0.625 | 0.427 | 0 | |||
eCeLLM | Phi-2 | 0.584 | 0.372 | 0.379 | 0.348 | 0 | 0.406 | 0.349 | 0.334 | 0.251 | 0 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Flan-T5 XXL | 0.769 | 0.531 | 0.517 | 0.522 | 0 | 0.703 | 0.533 | 0.648 | 0.499 | 0 | |||
Llama-2 13B-chat | 0.775 | 0.599 | 0.635 | 0.611 | 0 | 0.726 | 0.564 | 0.611 | 0.558 | 0 | |||
Generalist | Llama-2 7B-chat | 0.797 | 0.586 | 0.661 | 0.595 | 0 | 0.703 | 0.533 | 0.648 | 0.499 | 0 | ||
Mistral-7B Instruct-v0.2 | 0.788 | 0.555 | 0.644 | 0.558 | 0 | 0.707 | 0.537 | 0.596 | 0.502 | 0 | |||
Flan-T5 XL | 0.757 | 0.517 | 0.515 | 0.511 | 0 | 0.678 | 0.521 | 0.587 | 0.489 | 0 | |||
Phi-2 | 0.747 | 0.524 | 0.552 | 0.518 | 0 | 0.710 | 0.541 | 0.611 | 0.520 | 0 | |||