Model
AVE   PRP   PM   SA   SR   MPC PSI QPR AP AG










F1* Macro F1 F1 Macro F1HR@1Accuracy F1 NDCG F1 FBERT
GPT-4 Turbo 0.495 0.326 0.753 0.516 0.387 0.611 0.195 0.875 0.649 0.858
Gemini Pro 0.396 0.136 0.867 0.470 0.269 0.584 0.248 0.821 0.506 0.855
Claude 2.1 0.381 0.275 0.523 0.415 0.066 0.655 0.273 0.821 0.280 0.841
Llama-2 13B-chat 0.002 0.333 0.434 0.188 0.056 0.504 0.252 0.815 0.623 0.811
Mistral-7B Instruct-v0.2 0.369 0.324 0.613 0.470 0.164 0.529 0.305 0.842 0.588 0.853










EcomGPT 0.000 0.091 0.648 0.188 0.042 0.540 0.170 0.000 0.086 0.669










SoTA task-specific model 0.546 0.588 0.995 0.573 0.265 0.703 0.389 0.859 0.830 0.858
eCeLLM-L 0.582 0.611 0.995 0.648 0.526 0.684 0.501 0.870 0.851 0.841
eCeLLM-M 0.662 0.558 0.995 0.639 0.542 0.696 0.305 0.876 0.846 0.842
eCeLLM-S 0.509 0.518 0.991 0.596 0.479 0.650 0.392 0.870 0.846 0.842
improvement (%, avg: 10.7) 21.2 3.9 0.0 13.1 40.1 -1.0 28.8 0.1 2.5 -1.9