Model | AVE | PRP | SA | SR | AP | AG |
F1* | M-F1 | M-F1 | HR@1 | F1 | FBERT | |
GPT-4 Turbo | 0.397 | 0.392 | 0.510 | 0.198 | 0.680 | 0.860 |
Gemini Pro | 0.275 | 0.123 | 0.454 | 0.116 | 0.552 | 0.856 |
Claude 2.1 | 0.410 | 0.277 | 0.369 | 0.036 | 0.245 | 0.842 |
Llama-2 13B-chat | 0.000 | 0.324 | 0.178 | 0.050 | 0.644 | 0.808 |
Mistral-7B Instruct-v0.2 | 0.264 | 0.327 | 0.438 | 0.108 | 0.608 | 0.851 |
EcomGPT | 0.001 | 0.096 | 0.178 | 0.023 | 0.140 | 0.722 |
SoTA task-specific model | 0.269 | 0.507 | 0.567 | 0.081 | 0.853 | 0.860 |
eCeLLM-L | 0.335 | 0.558 | 0.629 | 0.273 | 0.867 | 0.841 |
eCeLLM-M | 0.367 | 0.502 | 0.640 | 0.280 | 0.878 | 0.840 |
eCeLLM-S | 0.302 | 0.520 | 0.565 | 0.241 | 0.879 | 0.840 |
improvement (%, avg: 9.3) | -10.5 | 10.1 | 14.1 | 41.4 | 3.0 | -2.2 |