-
Notifications
You must be signed in to change notification settings - Fork 190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add task 1160 from MRS #375
base: master
Are you sure you want to change the base?
Conversation
Yeah, the data is quite noisy. I am leaning towards a "no", unless we can somehow clean it up around a particular subject. |
I like this task, but I agree that noise is a concern. |
I checked again and I think there's noisy data in short instances too. |
What I meant above was to retain only the longer instances. Longer instances seem to contain lesser noise. |
Oh sorry, I didn't read it carefully! |
Sorry for being late. I think it makes sense to keep the longer instances(not sure about the threshold though). Should I add other languages too? |
Yes, if you have time, feel free to add. It's also fine if you skip this and decide to focus on other ToDos we have in this project. |
I agree with Swaroop. If cleaning up this PR will take more than 1hr, I would say, it's not worth it. |
This task is created from the MRS dataset from this issue #283.
However, I am in doubt whether this is a good addition or not. The data is driven from Reddit replies and they're not good quality examples to learn from. I've cleaned them as much as I could but there're still lots of nonsense going on. I'm submitting the English task, to get other people's opinions. If it's good enough, or there's a good way to filter out nonsense, I will go on and add other languages too.
@swarooprm @danyaljj