You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
o Restored text for DISRPT format in 6 .tok files and 6 .rels files in .../gum//rst/disrpt/
o Restored text for DISRPT format in 0 .tok files and 235 .rels files in .../gum//rst/gdtb/disrpt/
o Processing 18 files in .../gum//rst/gdtb/pdtb/raw/00/...
o Processing 18 files in .../gum//const/...
I am able to see reddit text recovered at files eng.pdtb.gum_... in .../gum//rst/disrpt/
but not .../gum//rst/gdtb/disrpt/ despite that there are file changes after I ran get_text.py. There are still dashes in any of these files.
However, text is recovered for full text files in .../gum//rst/gdtb/pdtb/raw
The text was updated successfully, but these errors were encountered:
Hm, you're right, I can reproduce this bug - thanks for reporting it! I should be able to push a fix to the dev branch soon. A new stable release of GUM is expected in early winter, so I would probably wait with merging the fix for a little while longer.
* was failing if running from root before process_reddit.py has been run since src/dep/ is not yet restored
* now checks for restored dep files and uses top level dep/ instead if running from get_text.py
* fixes#197
OK, you should be able to use this fix. Either pull from the dev branch or just patch the file _build/utils/get_reddit/underscores_disrpt.py based on dev.
Leaving this issue open until the next stable release.
I ran
get_text.py
. The output is:I am able to see reddit text recovered at files
eng.pdtb.gum_...
in.../gum//rst/disrpt/
but not
.../gum//rst/gdtb/disrpt/
despite that there are file changes after I ranget_text.py
. There are still dashes in any of these files.However, text is recovered for full text files in
.../gum//rst/gdtb/pdtb/raw
The text was updated successfully, but these errors were encountered: