Model
|
IND
|
||||||
|
|
|
|
|
|||
Accuracy | M-Rec | M-Pre | M-F1 | #failed | |||
GPT-4 Turbo | 0.611 | 0.527 | 0.540 | 0.487 | 0 | ||
Gemini Pro | 0.584 | 0.471 | 0.414 | 0.425 | 2 | ||
General-purpose LLMs
|
Claude 2.1 | 0.655 | 0.464 | 0.419 | 0.435 | 13 | |
Llama-2 13B-chat | 0.504 | 0.250 | 0.251 | 0.250 | 0 | ||
Mistral-7B-Instruct-v0.2 | 0.529 | 0.395 | 0.384 | 0.365 | 0 | ||
E-commerce LLM
|
EcomGPT | 0.540 | 0.265 | 0.218 | 0.223 | 2 | |
SoTA task-specific model
|
BERT | 0.661 | 0.381 | 0.423 | 0.393 | 0 | |
DeBERTaV3 | 0.703 | 0.436 | 0.472 | 0.448 | 0 | ||
Flan-T5 XXL | 0.666 | 0.438 | 0.412 | 0.346 | 0 | ||
Llama-2 13B-chat | 0.655 | 0.399 | 0.410 | 0.349 | 0 | ||
Task-specific | Llama-2 7B-chat | 0.659 | 0.399 | 0.531 | 0.330 | 0 | |
Mistral-7B Instruct-v0.2 | 0.681 | 0.406 | 0.423 | 0.387 | 0 | ||
Flan-T5 XL | 0.648 | 0.425 | 0.361 | 0.327 | 0 | ||
eCeLLM | Phi-2 | 0.646 | 0.387 | 0.316 | 0.321 | 0 | |
|
|
|
|
|
|
|
|
Flan-T5 XXL | 0.680 | 0.431 | 0.416 | 0.364 | 0 | ||
Llama-2 13B-chat | 0.684 | 0.440 | 0.435 | 0.414 | 0 | ||
Generalist | Llama-2 7B-chat | 0.679 | 0.427 | 0.434 | 0.398 | 0 | |
Mistral-7B Instruct-v0.2 | 0.696 | 0.450 | 0.456 | 0.443 | 0 | ||
Flan-T5 XL | 0.663 | 0.395 | 0.533 | 0.332 | 0 | ||
Phi-2 | 0.650 | 0.397 | 0.410 | 0.335 | 0 | ||