-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure with mmpdb fragment for some specific smiles #30
Comments
Hi Cheng, thanks for pointing out this issue. mmpdb does have functionality to skip erroneous SMILES, but this one seems to be another problem - the SMILES is complicated, but chemically correct. The most likely explanation I have so far is that there is an issue with the ring perception for bonds in RDKit. I will do some further tests to make sure I am on the right track, and if I am right, file a bug report in RDKit to solve the issue. Will keep you posted as this continues. Bests, |
Hi Christian, Thank you so much for looking into this issue. I agree that it might have something to do with the complicated ring system. Thanks, |
Hi Pablo, I currently do not personally develop mmpdb any more. This is in the hands of @adalke and Jerome Hert. Maybe they can comment? Bests, |
For @chengthefang , I cannot reproduce the problem using mmpdb3, available from https://github.com/adalke/mmpdb . Perhaps some of the changes I did for version 3 resolves your issue? For @PARODBE , your comment is not connected to this issue. Please use a new issue instead. It doesn't appear your problem is connected to mmpdb. It appears to be a general RDKit question. At the very least, you don't describe how "cdk2.fragdb" is generated, or the step you did which generates that error message. My guess is you're showing me how you exported the SDF to SMILES format, which you then converted to a "fragdb" using mmpdb v3. Version 2 used a text format to store the fragmentations, version 3 switched to sqlite3. You cannot use text processing to read an SQLite3 file as it's a binary format which includes non-UTF8 byte sequences. |
thanks @adalke ! So...In what format were the saved smiles provided? |
It's an SQLite3 file. This is the format specified by the SQLite embedded relational database, and accessible from Python via the sqlite3 module. The specific schema is at https://github.com/adalke/mmpdb/blob/v3-dev/mmpdblib/fragment_schema.sql . Your question is not related to issue #30 so please do not continue asking questions in this thread. Also, I am not willing to provide additional support on how use SQL or SQLite. There are many existing teaching resources for those topics. |
Hi all,
I am using mmpdb fragment to parse a subset of SureChembl database, and then I found the mmpdb fragment will fail for some specific smiles. I wonder if we could add some error handling to deal with some unfavorable structures.
Here is the example of test.smi.
I ran "python mmpdb/mmpdb fragment test.smi -o test_data.fragments". It failed on parsing the first smiles and won't skip it to continue. The error is shown as below:
Failure: file 'test.smi', line 1, record #1: first line starts 'C[C@]12CCC3c4c5cc(O)cc4[C@@]4(CC[C@@]1(C ...'
Traceback (most recent call last): File "mmpdb/mmpdb", line 11, in commandline.main() File "/mmpdb/mmpdblib/commandline.py", line 1054, in main parsed_args.command(parsed_args.subparser, parsed_args) File "/mmpdb/mmpdblib/commandline.py", line 181, in fragment_command do_fragment.fragment_command(parser, args) File "/mmpdb/mmpdblib/do_fragment.py", line 581, in fragment_command writer.write_records(records) File "/mmpdb/mmpdblib/fragment_io.py", line 404, in write_records for rec in fragment_records: File "/mmpdb/mmpdblib/do_fragment.py", line 475, in make_fragment_records fragments = result.get() File "anaconda2/lib/python2.7/multiprocessing/pool.py", line 572, in get raise self._value ValueError: need more than 1 value to unpack
Appreciate any suggestions or ideas.
Thanks,
Cheng
The text was updated successfully, but these errors were encountered: