Linguistic Probing and New Task #597

asparius · 2024-04-26T22:01:31Z

asparius
Apr 26, 2024

I've suggested(#562) incorporating Linguistic Probing tasks as outlined by Conneau et al.. These tasks serve as a method for evaluating embeddings regarding their ability to accurately capture linguistic characteristics within sentences. These tasks are under three different categories which are Surface Information, Syntactic Information and Semantic Information, please see the paper for further details. All probing tasks are casted under a classification task and detailed description of them are in the this page.

Discussion in PR #562

As pointed by @x-tabdeveloping in #562 , not all datasets fall under the classification naturally, such as sentence length prediction or most significantly Word Count dataset which is 1000 way classification. Instead, we can evaluate tasks differently, such as doing word retrieval instead of classification in Word Count proposed by @x-tabdeveloping. Thus, we could discuss possible evaluation strategies for probing tasks in this issue. Please share your thoughts.

Probing as a new Tasks to MMTEB.

Regardless of the evaluation strategies, probing is completely different from the current task categories we have as beautifully put by the abstract of the paper Although much effort has recently been devoted to training high-quality sentence embeddings, we still have a poor understanding of what they are capturing. “Downstream” tasks, often based on sentence classification, are commonly used to evaluate the quality of sentence representations. The complexity of the tasks makes it however difficult to infer what kind of information is present in the representations. We introduce here 10 probing tasks designed to capture simple linguistic features of sentences, and we use them to study embeddings generated by three different encoders trained in eight distinct ways, uncovering intriguing properties of both encoders and training methods. Therefore I propose Probing as a new task to dataset. @Muennighoff @KennethEnevoldsen @isaac-chung @imenelydiaker and all other I have missed to tag.

imenelydiaker · 2024-04-29T07:15:50Z

imenelydiaker
Apr 29, 2024
Maintainer

I find it interesting to have probing tasks. It would be an interesting insight for people who do XAI and interpretability.
Some thoughts:

Do you know any other dataset that was not proposed in this benchmark which could be added?
Is this benchmark still relevant for text embeddings? Is it used ? By which communities?
For the LB, I think this task should be shown differently, since the goal of it is not to evaluate a model over a language, but it's more about sanity checking the embeddings: validating the concepts that are encoded by the embedding model are relevant.

Could you please open a discussion instead of an issue?

1 reply

imenelydiaker May 2, 2024
Maintainer

@asparius do you have any idea about the questions I asked here? I really think it's a good idea to handle probing. Although, it may not be part of our upcoming paper since the goal here would be more about chacking the concept's representation quality than the models scoring and performance.

KennethEnevoldsen · 2024-04-29T08:51:16Z

KennethEnevoldsen
Apr 29, 2024
Maintainer

I will convert this to a discussion, but feel free to keep it going

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linguistic Probing and New Task #597

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Linguistic Probing and New Task #597

asparius Apr 26, 2024

Discussion in PR #562

Probing as a new Tasks to MMTEB.

Replies: 2 comments · 1 reply

imenelydiaker Apr 29, 2024 Maintainer

imenelydiaker May 2, 2024 Maintainer

KennethEnevoldsen Apr 29, 2024 Maintainer

asparius
Apr 26, 2024

Replies: 2 comments 1 reply

imenelydiaker
Apr 29, 2024
Maintainer

imenelydiaker May 2, 2024
Maintainer

KennethEnevoldsen
Apr 29, 2024
Maintainer