Model
|
IND
|
OOD
|
|||||||||||
|
|
|
|
|
|||||||||
|
|
|
|
|
|||||||||
Acc | M-Rec | M-Pre | M-F1 | #failed | Acc | M-Rec | M-Pre | M-F1 | #failed | ||||
GPT-4 Turbo | 0.595 | 0.575 | 0.544 | 0.516 | 0 | 0.556 | 0.586 | 0.544 | 0.510 | 0 | |||
Gemini Pro | 0.609 | 0.521 | 0.453 | 0.470 | 2 | 0.572 | 0.511 | 0.444 | 0.454 | 1 | |||
General-purpose LLMs
|
Claude 2.1 | 0.375 | 0.510 | 0.474 | 0.415 | 2 | 0.328 | 0.466 | 0.447 | 0.369 | 1 | ||
Llama-2 13B-chat | 0.406 | 0.188 | 0.191 | 0.188 | 0 | 0.384 | 0.179 | 0.180 | 0.178 | 0 | |||
Mistral-7B-Instruct-v0.2 | 0.633 | 0.532 | 0.551 | 0.470 | 0 | 0.594 | 0.531 | 0.494 | 0.438 | 0 | |||
E-commerce LLM
|
EcomGPT | 0.191 | 0.362 | 0.341 | 0.188 | 6 | 0.196 | 0.375 | 0.336 | 0.178 | 13 | ||
BERTweet | 0.733 | 0.503 | 0.530 | 0.511 | 0 | 0.729 | 0.507 | 0.524 | 0.513 | 0 | |||
SoTA task-specific model
|
DeBERTaV3 | 0.768 | 0.567 | 0.607 | 0.573 | 0 | 0.764 | 0.565 | 0.591 | 0.567 | 0 | ||
P5 | 0.611 | 0.199 | 0.157 | 0.156 | 0 | 0.620 | 0.200 | 0.124 | 0.153 | 0 | |||
Flan-T5 XXL | 0.783 | 0.619 | 0.618 | 0.612 | 0 | 0.770 | 0.604 | 0.601 | 0.600 | 0 | |||
Llama-2 13B-chat | 0.791 | 0.616 | 0.641 | 0.616 | 0 | 0.781 | 0.627 | 0.645 | 0.629 | 0 | |||
Task-specific | Llama-2 7B-chat | 0.790 | 0.620 | 0.652 | 0.634 | 0 | 0.769 | 0.583 | 0.599 | 0.589 | 0 | ||
Mistral-7B Instruct-v0.2 | 0.801 | 0.643 | 0.676 | 0.655 | 0 | 0.789 | 0.619 | 0.650 | 0.632 | 0 | |||
Flan-T5 XL | 0.771 | 0.645 | 0.638 | 0.620 | 0 | 0.743 | 0.594 | 0.592 | 0.582 | 0 | |||
eCeLLM | Phi-2 | 0.779 | 0.611 | 0.618 | 0.608 | 0 | 0.754 | 0.576 | 0.594 | 0.583 | 0 | ||
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Flan-T5 XXL | 0.797 | 0.629 | 0.646 | 0.628 | 0 | 0.787 | 0.619 | 0.624 | 0.619 | 0 | |||
Llama-2 13B-chat | 0.796 | 0.641 | 0.661 | 0.648 | 0 | 0.785 | 0.621 | 0.638 | 0.629 | 0 | |||
Generalist | Llama-2 7B-chat | 0.768 | 0.579 | 0.589 | 0.580 | 0 | 0.776 | 0.599 | 0.626 | 0.606 | 0 | ||
Mistral-7B Instruct-v0.2 | 0.781 | 0.630 | 0.654 | 0.639 | 0 | 0.784 | 0.630 | 0.653 | 0.640 | 0 | |||
Flan-T5 XL | 0.782 | 0.654 | 0.655 | 0.648 | 0 | 0.753 | 0.604 | 0.598 | 0.598 | 0 | |||
Phi-2 | 0.780 | 0.588 | 0.619 | 0.596 | 0 | 0.758 | 0.552 | 0.590 | 0.565 | 0 | |||