Model
SR
QPR







IND
OOD
IND







HR@1 #failed HR@1 #failed NDCG #failed
GPT-4 Turbo 0.387 0 0.198 0 0.875 14
Gemini Pro 0.269 2 0.116 3 0.821 52
General-purpose LLMs
Claude 2.1 0.066 34 0.036 42 0.821 26
Llama-2 13B-chat 0.056 0 0.050 0 0.815 0
Mistral-7B-Instruct-v0.2 0.164 1 0.108 0 0.842 4
E-commerce LLM
EcomGPT 0.042 344 0.023 391 0.000 1000
SoTA task-specific model
gSASRec / BERT 0.249 0 0.065 0 0.839 0
Recformer / DeBERTaV3 0.265 0 0.081 0 0.859 0
Flan-T5 XXL 0.467 0 0.252 0 0.881 0
Llama-2 13B-chat 0.518 0 0.263 0 0.879 0
Task-specific Llama-2 7B-chat 0.517 0 0.228 0 0.867 0
Mistral-7B Instruct-v0.2 0.535 0 0.268 0 0.883 0
Flan-T5 XL 0.436 0 0.226 0 0.875 0
eCeLLM Phi-2 0.413 5 0.219 10 0.858 0










Flan-T5 XXL 0.512 0 0.262 0 0.885 0
Llama-2 13B-chat 0.526 0 0.273 0 0.870 0
Generalist Llama-2 7B-chat 0.517 0 0.261 0 0.868 0
Mistral-7B Instruct-v0.2 0.542 0 0.280 0 0.876 0
Flan-T5 XL 0.463 0 0.256 0 0.868 0
Phi-2 0.479 5 0.241 8 0.870 0