I am trying to evaluate GritLM-7B on MTEB datasets using the provided script.
#!/bin/bash
python /home/e/e1347696/unified_encoder_decoder/src/eval/MTEB/eval_mteb.py \
--model_name_or_path /home/e/e1347696/unified_encoder_decoder/model/GritLM-7B \
--output_folder /home/e/e1347696/unified_encoder_decoder/src/results/GritLM-7B-mteb \
--task_types Classification,Clustering,PairClassification,Reranking,Retrieval,STS,Summarization \
--batch_size 32
However, it seems that it has only been evaluated on the following datasets:
AmazonCounterFactualClassification
AmazonReviewsClassification
MassiveIntentClassification
MassiveScenarioClassification
MTOPDomainClassification
MTOPIntentClassification
STS17
STS22
Other datasets seem to be skipped. The output log is shown here:
Created GritLM: torch.bfloat16 dtype, mean pool, embedding mode, bbcc attn
GritLM-7B instruction for AmazonCounterfactualClassification: <|user|>
Classify a given Amazon customer review text as either counterfactual or not-counterfactual
<|embed|>
─────────────────────────────── Selected tasks ────────────────────────────────
Classification
- AmazonCounterfactualClassification, s2s, multilingual 1 / 4 Subsets
GritLM-7B instruction for AmazonReviewsClassification: <|user|>
Classify the given Amazon review into its appropriate rating category
<|embed|>
─────────────────────────────── Selected tasks ────────────────────────────────
Classification
- AmazonReviewsClassification, s2s, multilingual 1 / 6 Subsets
Skipping task: MasakhaNEWSClassification
GritLM-7B instruction for MassiveIntentClassification: <|user|>
Given a user utterance as query, find the user intents
<|embed|>
─────────────────────────────── Selected tasks ────────────────────────────────
Classification
- MassiveIntentClassification, s2s, multilingual 1 / 51 Subsets
GritLM-7B instruction for MassiveScenarioClassification: <|user|>
Given a user utterance as query, find the user scenarios
<|embed|>
─────────────────────────────── Selected tasks ────────────────────────────────
Classification
- MassiveScenarioClassification, s2s, multilingual 1 / 51 Subsets
GritLM-7B instruction for MTOPDomainClassification: <|user|>
Classify the intent domain of the given utterance in task-oriented conversation
<|embed|>
─────────────────────────────── Selected tasks ────────────────────────────────
Classification
- MTOPDomainClassification, s2s, multilingual 1 / 6 Subsets
GritLM-7B instruction for MTOPIntentClassification: <|user|>
Classify the intent of the given utterance in task-oriented conversation
<|embed|>
─────────────────────────────── Selected tasks ────────────────────────────────
Classification
- MTOPIntentClassification, s2s, multilingual 1 / 6 Subsets
Skipping task: MultiHateClassification
Skipping task: MultilingualSentimentClassification
Skipping task: NusaX-senti
Skipping task: SIB200Classification
Skipping task: SouthAfricanLangClassification
Skipping task: MasakhaNEWSClusteringP2P
Skipping task: MasakhaNEWSClusteringS2S
Skipping task: SIB200ClusteringS2S
Skipping task: BelebeleRetrieval
Skipping task: MIRACLRetrieval
Skipping task: MIRACLRetrievalHardNegatives
Skipping task: MLQARetrieval
Skipping task: MultiLongDocRetrieval
Skipping task: WikipediaRetrievalMultilingual
Skipping task: XMarket
Skipping task: XQuADRetrieval
Skipping task: OpusparcusPC
Skipping task: PawsXPairClassification
Skipping task: RTE3
Skipping task: XNLI
Skipping task: MIRACLReranking
Skipping task: WikipediaRerankingMultilingual
Skipping task: SemRel24STS
GritLM-7B instruction for STS17: <|user|>
Retrieve semantically similar text.
<|embed|>
─────────────────────────────── Selected tasks ────────────────────────────────
STS
- STS17, s2s, multilingual 1 / 11 Subsets
Skipping task: STS22.v2
GritLM-7B instruction for STS22: <|user|>
Retrieve semantically similar text.
<|embed|>
─────────────────────────────── Selected tasks ────────────────────────────────
STS
- STS22, p2p, multilingual 1 / 18 Subsets
Skipping task: STSBenchmarkMultilingualSTS
And the error log contains some warning such as:
The `batch_size` argument is deprecated and will be removed in the next release. Please use `encode_kwargs = {'batch_size': ...}` to set the batch size instead.
Failed to extract metadata from model: 'GritLM' object has no attribute 'model_card_data'. Upgrading to sentence-transformers v3.0.0 or above is recommended.
The `task_langs` argument is deprecated and will be removed in the next release. Please use `tasks = mteb.get_tasks(... languages = [...])` to filter tasks instead. Note that this uses 3 letter language codes (ISO 639-3).
Passing task names as strings is deprecated and will be removed in the next release. Please use `tasks = mteb.get_tasks(tasks=[...])` method to get tasks instead.
The `batch_size` argument is deprecated and will be removed in the next release. Please use `encode_kwargs = {'batch_size': ...}` to set the batch size instead.
Failed to extract metadata from model: 'GritLM' object has no attribute 'model_card_data'. Upgrading to sentence-transformers v3.0.0 or above is recommended.
Dataset 'STS22' is superseeded by 'STS22.v2', you might consider using the newer version of the dataset.
I will really appreciate it if you could help me with that! Thank you so much!
I am trying to evaluate GritLM-7B on MTEB datasets using the provided script.
However, it seems that it has only been evaluated on the following datasets:
AmazonCounterFactualClassificationAmazonReviewsClassificationMassiveIntentClassificationMassiveScenarioClassificationMTOPDomainClassificationMTOPIntentClassificationSTS17STS22Other datasets seem to be skipped. The output log is shown here:
And the error log contains some warning such as:
I will really appreciate it if you could help me with that! Thank you so much!