When compiling a database from a subset of PubChem, I performed an overlap analysis in order to identify duplicate structures. PubChem compound 5312911 was perceived as a duplicate of both 5312912 and 5312913, while it is in fact a totally different structure (5312911 has no methyl between the carbonyl and double bond).
Indeed, if I create a new structure table containing 5312912 and 5312913 only, with duplicate filtering and tautomer duplicate checking activated, it will not allow me to add 5312911.
Can you reproduce this? I am attaching my structures for your reference.
This is on IJC 6.0.2, 64-bits Windows.
I can't reproduce it with local DB. Do you have any standardizers defined on the entity?
Thanks for you report!
Yes I can reproduce it, the bug is in tautomer duplicate checking feature.
Thanks again for the report!
For the sake of completeness: this is in a Derby local database, and I have no standardization defined.
Our tautomer duplicate search compares the generic tautomers of the structures. In the examples you sent, all the three structures have the same generic tautomer form, this is why we identify them as tautomers of each other.
See our documentation about tautomers: https://www.chemaxon.com/marvin/help/calculations/tautomers.html
In Marvin Sketch you can generate generic tautomer from Calculations->Isomers->Tautomers.