Clarifications regarding datasets and task #459

shreeya-dhakal · 2024-04-20T05:07:45Z

shreeya-dhakal
Apr 20, 2024

I have a couple of questions regarding datasets. My understanding is that since the purpose of MTEB is to benchmark text embedding models, when we add a HF dataset we need to make sure that the dataset must have either test or validation set. Or can we add say "n" rows from the train split as well? Also is there a restriction on the license of the dataset or any dataset on HF would qualify?

As for the task does a new sub task within the main task qualify as a new task contribution?

Answered by KennethEnevoldsen

Apr 20, 2024

@shreeya-dhakal, you can use whatever split you want. However, if there is a dev or test split we encourage that.

no restrictions on license (as long as it permits us to refer to the dataset). We include datasets with no license attached. The user can use the metadata to filter out tasks without permissible licenses.

View full answer

KennethEnevoldsen · 2024-04-20T13:17:05Z

KennethEnevoldsen
Apr 20, 2024
Maintainer

@shreeya-dhakal, you can use whatever split you want. However, if there is a dev or test split we encourage that.

no restrictions on license (as long as it permits us to refer to the dataset). We include datasets with no license attached. The user can use the metadata to filter out tasks without permissible licenses.

1 reply

KennethEnevoldsen Apr 20, 2024
Maintainer

As for the task does a new sub task within the main task qualify as a new task contribution?

I suspect you refer to a dataset then?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifications regarding datasets and task #459

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Clarifications regarding datasets and task #459

shreeya-dhakal Apr 20, 2024

Replies: 1 comment · 1 reply

KennethEnevoldsen Apr 20, 2024 Maintainer

KennethEnevoldsen Apr 20, 2024 Maintainer

shreeya-dhakal
Apr 20, 2024

Replies: 1 comment 1 reply

KennethEnevoldsen
Apr 20, 2024
Maintainer

KennethEnevoldsen Apr 20, 2024
Maintainer