Duplicate Molecule Not (Or Spuriously) Recognised

User 7910dcb734

25-04-2013 17:04:31

Hi,


For testing purposes, I am using the JChemManager gui to import molecules into a database. Standardizer settings attached (not many).


Attached are a set of three molecules that all go into a structure table (named duplicates.sdf). All three are inserted without a problem.


After importing these three, I try to insert a new molecule (failed.mrv, attached). This is flagged as a duplicate molecule by all molecules in duplicates.sdf.


If Molecule A is a duplicate of Molecule B, and molecule A is a duplicate of Molecule C, shouldn't Molecule B be recognised as a duplicate of Molecule C?


Cheers,


Brendan

ChemAxon abe887c64e

26-04-2013 08:59:35

Hi Brendan,


Thank you for informing us about your experiences. You are right, in duplicate search (duplicate filtering during import) molecule B should be recognised as duplicate of molecule C at the given conditions.


We start to investigate the operation details of our search process using the sent structures.


Could you inform us about the applied JChem version?


Best regards,


Krisztina

User 7910dcb734

26-04-2013 11:22:38

Hi Krisztina,


 


The version is 5.12.0 (I did not notice the new version; potentially this has been solved already).


 


Cheers,


 


Brendan

ChemAxon abe887c64e

29-04-2013 11:43:14

Hi Brendan,


We could reproduce the reported erroneous search behavior in version 5.12. Unfortunately, this issue is present in the newer JChem versions, too.


However, we found that failed.mrv is not identified as duplicate of any structure in duplicates.sdf at MolSearch level. So, we could only recommend  - as workaround till we fix the above bug - to remove the duplicate filtering setting from the table and prefilter the structures by duplicate searching in files, before import. E. g.:


jcsearch -q failed.mrv duplicates.sdf -t:d --standardize "standardizerConfig.xml"


Best regards,


Krisztina


 

User 7910dcb734

29-04-2013 11:54:17

Hi Krisztina,


At least the bug is being tracked now.


Unfortunately in the production environment, I am dealing with sdfiles containing several million compounds, and a database containing tens of millions. I don't think disabling duplicate filtering from the table and pre-filtering the structures would be a computationally viable operation. (Please correct me if I'm wrong; I have not tested or done any calculations to confirm my expectation.)


Presently I am just excluding molecules that cause these issues from the database.


Cheers,


Brendan


 



ChemAxon abe887c64e

30-04-2013 07:07:20

Hi Brendan,


We will inform you when the fix will be done.


Best regards,


Krisztina

User 7910dcb734

03-07-2013 14:40:06

Hi,


I noticed a new release of JChem last month; was this bug fixed in the update?


 


Best wishes,


Brendan

ChemAxon abe887c64e

03-07-2013 14:53:46

Hi Brendan,


Sorry to say, but the fix is not ready yet. Possibly we will release it only in version 6.2, late at autumn.


Best regards,
Krisztina

ChemAxon abe887c64e

19-07-2013 13:45:33

Hi Brendan,


We would like to inform you that the bugfix will be included in the next major release of JChem (version 6.1). If needed, the beta version will be available for testing in the near future.


Best regards,


Krisztina

ChemAxon abe887c64e

13-09-2013 09:30:51

Hi Brendan,


Please be informed that JChem version 6.1 - contaning the bugfix - is now downloadable.


Krisztina