You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In addition to that, the multinomial data processing code to create the different language splits are in this pull request, bigscience-workshop/Megatron-DeepSpeed#9
Here's few things,
Did you use this data for any one of your experiments?
For reference purpose, if you want to keep the code, I'm happy to open a pull request here. If not I'll close the pull request from bigscience/Megatron-Deepspeed repo.
Let me know what do you think.
The text was updated successfully, but these errors were encountered:
sbmaruf
changed the title
MC4 Pre-processing
mC4 sampling & pre-processing
Aug 17, 2022
Hi @TevenLeScao,
I think there are some confusing and broken link in the mC4 data preprocessing section. Can you take a look?
Both of the links are broken here,
The original link should be,
In addition to that, the multinomial data processing code to create the different language splits are in this pull request, bigscience-workshop/Megatron-DeepSpeed#9
Here's few things,
For reference purpose, if you want to keep the code, I'm happy to open a pull request here. If not I'll close the pull request from bigscience/Megatron-Deepspeed repo.
Let me know what do you think.
The text was updated successfully, but these errors were encountered: