N[C@H](CO)C(O)=O loosing stereo-information after search

User 8b97537f87

20-01-2009 16:51:31

Hi there and a happy new year:-)





we are inserting N[C@H](CO)C(O)=O and we are retrieving N[CH](CO)C(O)=O. What are we doing wrong?





inserting into our db...:





    try {


      updateHandler = new UpdateHandler(connectionHandler, UpdateHandler.INSERT,


          Persister.dbproperties.substanceTable, "");


      log.debug("Substance: " + substanceImpl.getSmiles());





      try {


        byte[] exportToBinFormat = substanceImpl.getMolecule().exportToBinFormat("sdf");


        log.debug("Substance - sdf: " + new String(exportToBinFormat));


        updateHandler.setStructure(exportToBinFormat);


      }


      catch (MolExportException e) {


        e.printStackTrace();


      }


      substanceImpl.setID(updateHandler.execute(returnNewID));


      return substanceImpl.getID();


    }





... and as far as we see it works fine:





DEBUG Persister:172 - Substance: N[C@H](CO)C(O)=O


DEBUG Persister:176 - Substance - sdf:


  Marvin  01200917330D         





  7  6  0  0  1  0            999 V2000


    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0


  1  2  1  0  0  0  0


  2  3  1  0  0  0  0


  3  4  1  0  0  0  0


  2  5  1  0  0  0  0


  5  6  1  0  0  0  0


  5  7  2  0  0  0  0


M  END


$$$$








when retrieving the structure:





    ...





    search.setStructureTable(table);


    search.setFilterIDList(new int[] { id });


    // search.setCacheExpirationTime(0);





 ...





    try {


      search.run();


...


          molecule = search.getHitsAsMolecules(results, null, null, null)[0];


          log.debug("molecule smiles: " + molecule.toFormat("smiles"));


          log.debug("molecule sdf: " + new String(molecule.toBinFormat("sdf")));





...


DEBUG Persister:351 - molecule smiles: NC(CO)C(O)=O


DEBUG Persister:352 - molecule sdf:


  Marvin  01200917380D         





  7  6  0  0  1  0            999 V2000


    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0


  1  2  1  0  0  0  0


  2  3  1  0  0  0  0


  3  4  1  0  0  0  0


  2  5  1  0  0  0  0


  5  6  1  0  0  0  0


  5  7  2  0  0  0  0


M  END


$$$$





Thank you for your help in advance





vincent

ChemAxon 9c0afc9aaf

20-01-2009 18:38:15

Hi Vincent, happy new year :)





As a quick test I have inserted your structure inot the "editaxample" table in this JSP example using our latest release:





http://chemaxon.com/jchem/examples/db_search/index.jsp





It seems to display well amongst the hits with or without coloring, alignment, etc.





Before we can proceed with the investigation





- please write us the your exact JChem version





- please make sure there is not an other structure W/O stereo already in the database





- you may also send us the query structure (if seems to be relevant to reproduce)





Best regards,





Szilard

User 8b97537f87

21-01-2009 11:23:25

Hi Szilard,





our version was 5.0.3..





Meanwhile we updated to 5.1.4 - unluckily leading to the same result :-(
Quote:
- please make sure there is not an other structure W/O stereo already in the database
    





As we are searching by the primary key 'cd_id'


Code:
search.setFilterIDList(new int[] { id });
    


(and the cd_id in this unittest is always 131), it is impossible to pick a wrong structure.
Quote:
- you may also send us the query structure (if seems to be relevant to reproduce)
    


as mentioned above we are searching by id (=131)





a


Code:
select cd_smiles, cd_structure from jchem_substances where cd_id = 131
    


results in


Code:
"N[C@H](CO)C(O)=O";





"\012  Marvin  01210911240D          \012\012  7  6  0  0  1  0            999 V2000\012000W000W70\012000W000W64\012000W000W60\012000W000W80\012000W000W60\012000W000W80\012000W000W80\01210201\01220301\01230401\01220501\01250601\01250702\012M  END\012"
   





 showing, that the stereo-information has been written to the db. It is the reading from the db that fails.





btw:





using the search in your jsp-example on our structure by searching by its cd_id (4310) leads to no result.





greetings





Vincent

ChemAxon 9c0afc9aaf

22-01-2009 22:54:06

Hi Vincent,





We have managed to reproduce the error.





It seems to be connected to the mol/SDF compression / decompresion that is performed on the sources of cd_structure.





For a temporary workaround please uncheck the"Compress ..." checkbox in the options dialog:





http://www.chemaxon.com/jchem/doc/admin/#options





After this newly inserted structures (or old structures "updated") will not have this problem.





We are investigating the problem and get back to you soon.
Quote:



using the search in your jsp-example on our structure by searching by its cd_id (4310) leads to no result.
 





It works fine for me.





Best regards,





Szilard

ChemAxon 9c0afc9aaf

22-01-2009 23:17:58

Vincent,





I spoke too soon ...





I have just noticed that your input structure is in 0D (contains no coordinates).





MDL mol/SDFiles should at least in 2D to properly retain stereo information.





To preserve stereo information of a 0D structure





1. you should either clean in 2D before exporting to mol/SDF





(this requires some CPU time)





or





2. you should use SMILES /  CXSMILES





(this might not be able some otherfeatures molfiles can )





or





3. Use MRV (Marvin Document) format, e.g.toBinFomrat("mrv")





This format stores all possible features either coming from the MDL or Daylight world.





So then what is the difference with the compresion setting and how come the wedge can appear at all ?





If you look closely there is a "2" in the atom block beside a lot of zeroes.





This indicates the parity. MDL's documentation states for this property "Ignored when read.", as coordinates take precedence.





So maybe the copression ignores these negligible features, while our import is a bit over-eager to use it when there are no coordinates at all. I will ask my collegaus about this, but it doesn't seem to be a bug.





Best regards,








Szilard

User 8b97537f87

27-01-2009 09:23:14

Dear Szilard,














thank you very much, our problem is solved:)


Though nevertheless there is a problem in your software:





in our sd-file we are commiting:





Code:
...Marvin  01200917330D         





  7  6  0  0  1  0            999 V2000


    0.0000    0.0000    0.0000 N   0  0  0  0  0  0  0  0  0  0  0  0


    0.0000    0.0000    0.0000 C   0  0  2  0  0  0  0  0  0  0  0  0...






^C 0 0 2 - is the stereo-information.














Inside the db-blob you can find





Code:
...V2000\012000W000W70\012000W000W64\012000W000W60...






^000W64 - is the carbon with stereo-information dumped correctly into the db.





But it is not possible to load the sd-structure correctly from the db.





Best regards





Vincent

ChemAxon 9c0afc9aaf

27-01-2009 17:30:06

Hi,





Yes, we have already noticed this.


So far it seems that if the structure is read directly from the compressed molfile the parity is not imported in case of 0D molecules, but if one explicitly decompresses it before import  (with chemaxon.formats.MdlCompressor) the streo information appears normally.





So no information is lost during the database import.





Regardless of this discrepancy it should be stated that 0D molfiles are not standard, and the import only uses this parity information as a last resort - in normal situations the coordinates and the wedges determine the stereo information.





So we advise to use 2D mol / SDfiles or an other format regardless of this.





Because of the above we do not treat it as very high priority, we will get beck to this topic and let you know if this is fixed.





Best regards,





Szilard