Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with Split File Handling in ACE2005 Preprocessing Code #24

Open
zdhgreat opened this issue Oct 13, 2024 · 2 comments
Open

Issue with Split File Handling in ACE2005 Preprocessing Code #24

zdhgreat opened this issue Oct 13, 2024 · 2 comments

Comments

@zdhgreat
Copy link

Hello, I tried running your generate_data.py code and found that the ACE2005 dataset requires preprocessing. However, when I attempted to preprocess it, I noticed that the code tries to read the doc.txt files from the split folder. But the original data doesn’t contain any split content, nor did I find that the preprocessing code generates the files under the split folder. Could you please clarify whether this is a missing part of the code, or if I haven’t followed the process correctly? I would greatly appreciate it if you could respond to this issue. Thank you very much!

@osainz59
Copy link
Member

Hi @zdhgreat ,

I am sorry, you can find the split folder in the code of the OneIE paper (from which we obtained the preprocessing script): http://blender.cs.illinois.edu/software/oneie/ .

Thank you for pointing it out, I will add this to the README.

@zdhgreat
Copy link
Author

Thank you very much for your help. Your suggestions resolved my issue with the split in the ACE2005 dataset. However, I still have some questions and would greatly appreciate further clarification. It seems that the link you provided for the E3C dataset is no longer valid, but I found the E3C dataset on the GitHub site in the form of test.txt and train.txt files. Unfortunately, there is no dev file, and it does not correspond to the tsv format mentioned in the config file. There is also no processing file for conversion. In the Casie processing files, there is no split functionality. Can I perform the split myself? Another issue concerns the DIANN data, which consists of txt files that need processing, but I might not know how to handle this, similar to the E3C dataset, which also needs to be converted into tsv files. Once again, I would be very grateful if you could address these questions. Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants