Evaluate GritLM-7B on MTEB datasets

I am trying to evaluate GritLM-7B on MTEB datasets using the provided script.
```
#!/bin/bash

python /home/e/e1347696/unified_encoder_decoder/src/eval/MTEB/eval_mteb.py \
    --model_name_or_path /home/e/e1347696/unified_encoder_decoder/model/GritLM-7B \
    --output_folder /home/e/e1347696/unified_encoder_decoder/src/results/GritLM-7B-mteb \
    --task_types Classification,Clustering,PairClassification,Reranking,Retrieval,STS,Summarization \
    --batch_size 32
```
However, it seems that it has only been evaluated on the following datasets:
* ``AmazonCounterFactualClassification``
* ``AmazonReviewsClassification``
* ``MassiveIntentClassification``
* ``MassiveScenarioClassification``
* ``MTOPDomainClassification``
* ``MTOPIntentClassification``
* ``STS17``
* ``STS22``

Other datasets seem to be skipped. The output log is shown here:
```
Created GritLM: torch.bfloat16 dtype, mean pool, embedding mode, bbcc attn
GritLM-7B instruction for AmazonCounterfactualClassification:  <|user|>
Classify a given Amazon customer review text as either counterfactual or not-counterfactual
<|embed|>

─────────────────────────────── Selected tasks  ────────────────────────────────
Classification
    - AmazonCounterfactualClassification, s2s, multilingual 1 / 4 Subsets


GritLM-7B instruction for AmazonReviewsClassification:  <|user|>
Classify the given Amazon review into its appropriate rating category
<|embed|>

─────────────────────────────── Selected tasks  ────────────────────────────────
Classification
    - AmazonReviewsClassification, s2s, multilingual 1 / 6 Subsets


Skipping task: MasakhaNEWSClassification
GritLM-7B instruction for MassiveIntentClassification:  <|user|>
Given a user utterance as query, find the user intents
<|embed|>

─────────────────────────────── Selected tasks  ────────────────────────────────
Classification
    - MassiveIntentClassification, s2s, multilingual 1 / 51 Subsets


GritLM-7B instruction for MassiveScenarioClassification:  <|user|>
Given a user utterance as query, find the user scenarios
<|embed|>

─────────────────────────────── Selected tasks  ────────────────────────────────
Classification
    - MassiveScenarioClassification, s2s, multilingual 1 / 51 Subsets


GritLM-7B instruction for MTOPDomainClassification:  <|user|>
Classify the intent domain of the given utterance in task-oriented conversation
<|embed|>

─────────────────────────────── Selected tasks  ────────────────────────────────
Classification
    - MTOPDomainClassification, s2s, multilingual 1 / 6 Subsets


GritLM-7B instruction for MTOPIntentClassification:  <|user|>
Classify the intent of the given utterance in task-oriented conversation
<|embed|>

─────────────────────────────── Selected tasks  ────────────────────────────────
Classification
    - MTOPIntentClassification, s2s, multilingual 1 / 6 Subsets


Skipping task: MultiHateClassification
Skipping task: MultilingualSentimentClassification
Skipping task: NusaX-senti
Skipping task: SIB200Classification
Skipping task: SouthAfricanLangClassification
Skipping task: MasakhaNEWSClusteringP2P
Skipping task: MasakhaNEWSClusteringS2S
Skipping task: SIB200ClusteringS2S
Skipping task: BelebeleRetrieval
Skipping task: MIRACLRetrieval
Skipping task: MIRACLRetrievalHardNegatives
Skipping task: MLQARetrieval
Skipping task: MultiLongDocRetrieval
Skipping task: WikipediaRetrievalMultilingual
Skipping task: XMarket
Skipping task: XQuADRetrieval
Skipping task: OpusparcusPC
Skipping task: PawsXPairClassification
Skipping task: RTE3
Skipping task: XNLI
Skipping task: MIRACLReranking
Skipping task: WikipediaRerankingMultilingual
Skipping task: SemRel24STS
GritLM-7B instruction for STS17:  <|user|>
Retrieve semantically similar text.
<|embed|>

─────────────────────────────── Selected tasks  ────────────────────────────────
STS
    - STS17, s2s, multilingual 1 / 11 Subsets


Skipping task: STS22.v2
GritLM-7B instruction for STS22:  <|user|>
Retrieve semantically similar text.
<|embed|>

─────────────────────────────── Selected tasks  ────────────────────────────────
STS
    - STS22, p2p, multilingual 1 / 18 Subsets


Skipping task: STSBenchmarkMultilingualSTS
```
And the error log contains some warning such as:
```
The `batch_size` argument is deprecated and will be removed in the next release. Please use `encode_kwargs = {'batch_size': ...}` to set the batch size instead.
Failed to extract metadata from model: 'GritLM' object has no attribute 'model_card_data'. Upgrading to sentence-transformers v3.0.0 or above is recommended.
The `task_langs` argument is deprecated and will be removed in the next release. Please use `tasks = mteb.get_tasks(... languages = [...])` to filter tasks instead. Note that this uses 3 letter language codes (ISO 639-3).
Passing task names as strings is deprecated and will be removed in the next release. Please use `tasks = mteb.get_tasks(tasks=[...])` method to get tasks instead.
The `batch_size` argument is deprecated and will be removed in the next release. Please use `encode_kwargs = {'batch_size': ...}` to set the batch size instead.
Failed to extract metadata from model: 'GritLM' object has no attribute 'model_card_data'. Upgrading to sentence-transformers v3.0.0 or above is recommended.
Dataset 'STS22' is superseeded by 'STS22.v2', you might consider using the newer version of the dataset.
```
I will really appreciate it if you could help me with that! Thank you so much!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluate GritLM-7B on MTEB datasets #57

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluate GritLM-7B on MTEB datasets #57

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions