Strange behavior in substructure search

User 46924c5277

28-12-2015 04:00:20

One of our clients recently brought this to my attention.


In this schema, the table holding chemical structures is CHEM_STRUCTURE, and the MDL molfile blocks are stored in the STRUCTURE_MOLFILE column.  There is an index on that column.


This search returns no hits:


select chem_structure_id from chem_structure where jc_compare(structure_molfile,'
  MJ151207                      

  7  7  0  0  0  0  0  0  0  0999 V2000
   -0.3125    1.6955    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0269    1.2830    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0269    0.4579    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.3125    0.0454    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.4019    0.4579    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.4019    1.2830    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7414    0.0455    0.0000 Sn  0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  5  6  2  0  0  0  0
  6  1  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  3  7  1  0  0  0  0
M  END
', 't:s') = 1;


However, this search returns lots of hits:


select chem_structure_id from chem_structure where jc_compare(structure_molfile,'
  MJ151207                      

 10 10  0  0  0  0  0  0  0  0999 V2000
   -0.3125    1.6955    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0269    1.2830    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.0269    0.4579    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -0.3125    0.0454    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0
    0.4019    0.4579    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.4019    1.2830    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -1.7414    0.0455    0.0000 Sn  0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  2  0  0  0  0
  2  3  1  0  0  0  0
  5  6  2  0  0  0  0
  6  1  1  0  0  0  0
  3  4  2  0  0  0  0
  4  5  1  0  0  0  0
  3  7  1  0  0  0  0
  7  8  1  0  0  0  0
  7  9  1  0  0  0  0
  7 10  1  0  0  0  0
M  END
', 't:s') = 1;


Ignore the fact that the last three carbons are at the origin, it was just easier to add the lines manually than to re-draw the structure and convert it to a molblock (which gives the same results anyway).


I can't reproduce this internally on our server; since the structure in the first search is a substructure of the one in the second, it (correctly) returns all the structures the second search does.


I initially thought it might be a JChem cartridge version issue since they were using an outdated version, but after updating to the latest version and rebuilding the index on the table, I still get the same results.  I also briefly thought it might be the chemical structure sketcher passing in something strange, but I've eliminated that possibility by framing the query with the molblock explicitly included rather than going through the GUI. Any ideas what might be happening here?


Thanks,


Bob

ChemAxon abe887c64e

28-12-2015 10:18:05

Dear Bob,


Thank you for reporting this strange search result with JChem Cartridge.


The root of the missing hits is very possibly a bug in our molfile handling relating some metal atoms as Sn. Unfortunately,  a superfluous valence property - valence 2 - seems to be assigned to the Sn atom of the query which causes the targets with valence 4 Sn atoms not matching with this query.


We investigate this issue deeper and will be back soon.


Best regards,


Krisztina


 


 

ChemAxon cbb451ac1e

07-01-2016 14:20:53

Hello,


Actually, this is not really a bug. By default meals atoms are exported with their valence electrons into MDL's mol format. Consequently, implicit hydrogens are added upon export. By default Sn is exported with 2 valence electrons. If it causes a problem I advise to change the valence back to zero. MRV format, works as expected. 


Best regards,


Krisztian