Model
|
SR
|
QPR
|
|||||||||
|
|
|
|
|
|||||||
|
|
||||||||||
IND
|
OOD
|
IND
|
|||||||||
|
|
|
|
|
|||||||
|
|
||||||||||
HR@1 | #failed | HR@1 | #failed | NDCG | #failed | ||||||
GPT-4 Turbo | 0.387 | 0 | 0.198 | 0 | 0.875 | 14 | |||||
Gemini Pro | 0.269 | 2 | 0.116 | 3 | 0.821 | 52 | |||||
General-purpose LLMs
|
Claude 2.1 | 0.066 | 34 | 0.036 | 42 | 0.821 | 26 | ||||
Llama-2 13B-chat | 0.056 | 0 | 0.050 | 0 | 0.815 | 0 | |||||
Mistral-7B-Instruct-v0.2 | 0.164 | 1 | 0.108 | 0 | 0.842 | 4 | |||||
E-commerce LLM
|
EcomGPT | 0.042 | 344 | 0.023 | 391 | 0.000 | 1000 | ||||
SoTA task-specific model
|
gSASRec / BERT | 0.249 | 0 | 0.065 | 0 | 0.839 | 0 | ||||
Recformer / DeBERTaV3 | 0.265 | 0 | 0.081 | 0 | 0.859 | 0 | |||||
Flan-T5 XXL | 0.467 | 0 | 0.252 | 0 | 0.881 | 0 | |||||
Llama-2 13B-chat | 0.518 | 0 | 0.263 | 0 | 0.879 | 0 | |||||
Task-specific | Llama-2 7B-chat | 0.517 | 0 | 0.228 | 0 | 0.867 | 0 | ||||
Mistral-7B Instruct-v0.2 | 0.535 | 0 | 0.268 | 0 | 0.883 | 0 | |||||
Flan-T5 XL | 0.436 | 0 | 0.226 | 0 | 0.875 | 0 | |||||
eCeLLM | Phi-2 | 0.413 | 5 | 0.219 | 10 | 0.858 | 0 | ||||
|
|
|
|
|
|
|
|
|
|
||
Flan-T5 XXL | 0.512 | 0 | 0.262 | 0 | 0.885 | 0 | |||||
Llama-2 13B-chat | 0.526 | 0 | 0.273 | 0 | 0.870 | 0 | |||||
Generalist | Llama-2 7B-chat | 0.517 | 0 | 0.261 | 0 | 0.868 | 0 | ||||
Mistral-7B Instruct-v0.2 | 0.542 | 0 | 0.280 | 0 | 0.876 | 0 | |||||
Flan-T5 XL | 0.463 | 0 | 0.256 | 0 | 0.868 | 0 | |||||
Phi-2 | 0.479 | 5 | 0.241 | 8 | 0.870 | 0 | |||||