How to check and select inaccurate structures in sdf ?

User 22c88daf92

30-05-2013 07:15:14

Hello!


We use Instance Jchem to import sdf to database, but there exist some inaccurate structures in sdf which can't create InChI, how to check the sdf and batch selecting these inaccurate structures in Instance Jchem?


Is there some details about this operation?


thanks!


 

ChemAxon 2bdd02d1e5

30-05-2013 08:32:54

Hi,


If there are some wrong structures in sdf file, which cannot be imported. You can find the wrong structures in the file called yourfilename.sdf.errors. This file is located in the same directory like the file you imported.


Or is it something else? Can you send me the report from the logfile?


Thanks,


Filip

User 22c88daf92

31-05-2013 02:47:11

We can't find any *.sdf.errors file either in the directory of the imported file or in the project directory. I provide 3 examples mol file in attachment.


For the stru_1.mol and  stru_2.mol:


The stru_1.mol can't output InChI, be caused by the nonstandard line: "    0.9542  -20.8708    0.0000 CHO 0  0" , same instance in stru_2.mol line: "  -1.4625   -9.3292    0.0000 NO2 0  0"


how to select or standardization these kinds of structure in Instance Jchem or other chemAxon tool ?




For the stru_3.mol:


The structure in too complicated and we have follow questions:


1. stru_3.mol can't be opened by MarvinSketch5.12.0, but can be opened by MarvinSketch5.10.0. why?


2. When using MarvinView to import stru_3.mol, the panel gray and no response, how to select or standardization these kinds of structure in Instance Jchem or other chemAxon tool ?





The last question:


When we use MarvinView to open sdf (include same kinds of stru_1.mol, stru_2.mol, stru_3.mol and so an), totally have 6000 compounds, when we save as sdf, there only 4000 compounds, why?

User 22c88daf92

31-05-2013 03:12:50

I try to import the sdf in MarvinView again, and I get a sdf._error file, I put it in Attachment,


These structure can't create InChI, how to select or standardization these kinds of structures?


thanks!

ChemAxon 2bdd02d1e5

03-06-2013 13:36:23

For question #1) It looks like the name convertor does not handle these groups in sdf file correctly. I'm not sure about the workaround.


#2) Seems like a bug.


#3) Those compound which were not imported should be stored in sdf._error aren't they?

ChemAxon 2cd598e7ad

04-06-2013 10:53:14

Hi iamyyang,


About the import of stru_3.mol:
as far as I see nor 5.12 neither 5.10.0 was able to import it.
About the InChI generation of stru_1.mol (or stru_2.mol):
they don't seem to be valid mol files according to the latest CTFile Format Specification as the "CHO" atom symbol is not permitted. I think that's why the InChI generation fails.

Best regards,
Domi

User 22c88daf92

05-06-2013 00:56:38

thanks!


But now our sdf have many of these structures, how to batch standardize these structures by chemAxon tools ?

ChemAxon afdac7b783

05-06-2013 11:11:33

Hi iamyang,


How would you like to 'standardize' your structures?



BR,


Viktoria

User 22c88daf92

06-06-2013 00:39:12

We want to konw:


1. How to batch expand these shorthand formulae like"CHO" by chemAxon tools ?


1. How to expand these shorthand formulae and save as a new normal mol file?


 


 

ChemAxon afdac7b783

06-06-2013 11:02:58

As my colleague replied, the shorthand formulas in your files, like "CHO", are not valid atom symbols, so they are imported to ChemAxon products( via molImporter) as pseudo atoms.


After the structures are imported, you can use ChemAxon's Standardizer or Structure Checker to convert these pseudo atoms to groups and then it can expand or ungroup these converted groups.


As a summary, I would recommend using ChemAxon's Standardizer command line tool to expand these shorthand formulas and then save the result into a normal molfile. 


The following command will convert aliases or pseudo atoms to groups, and then expand all contracted groups:


standardize -c "aliastogroup..expandsgroups" stru_3.mol stru_2.mol stru_1.mol -g
CC1=CC(=CC=C1Br)[N+]([O-])=O
[H]N1C(C)=C(N2[Se]C3=C(C=CC(Cl)=C3)C2=S)C(C)=C1C=O


If you want to save the results into an SDF file, use the following command:


standardize -c "aliastogroup..expandsgroups" stru_3.mol stru_2.mol stru_1.mol -g -f sdf -o results.sdf


Regarding the stru_3.mol file: The error during import of this file might be a bug (since it works from API);  the relevant team is searching the root of this issue.
However, using Standardizer command line, you can use the option -g or --ignore-error "continue with next molecule on error". It will skip the erroneous molecule, as you can see it in the above example.


Standardizer command line help: http://www.chemaxon.com/jchem/doc/user/standardizer_cline.html


Alias to Group action: http://www.chemaxon.com/jchem/doc/user/standardizer_actions.html#aliastogroup


Expand S-groups action: http://www.chemaxon.com/jchem/doc/user/standardizer_actions.html#expandgroup


BR,


Viktoria


 


 

User 22c88daf92

07-06-2013 00:51:07

We use the command line of Jchem base to standardized these sdf.


Thank you very much!