About import smiles

User 0f28873a29

06-05-2008 19:56:58

Hi:


I’m trying to import some chemical compounds into in house database. Some of the structures can not be inserted and the program showed an output error:








edu.emory.mathcs.backport.java.util.concurrent.ExecutionException: chemaxon.jchem.db.UpdateHandlerException: Inserting a query or Markush structure is not allowed for table: "jcomp" at chemaxon.util.concurrent.processors.WorkUnitData.getResult(WorkUnitData.java:65) at chemaxon.util.concurrent.processors.ScheduledWorkUnitData.getResult(ScheduledWorkUnitData.java:53) at chemaxon.util.concurrent.processors.WorkUnitDataIterator.getNext(WorkUnitDataIterator.java:74)


at chemaxon.jchem.db.ParallelStructTableUpdater.importFile(ParallelStructTableUpdater.java:358)


at chemaxon.jchem.db.FileToSQLHandler.importFile(FileToSQLHandler.java:129)





These compounds are included in some of the most important catalogs of chemical compounds (Acros, alfa, aurora, bachem and some other, attach file if you are interesting).





Some of the examples:





The original compound (Fig. A): Cn/1c2ccccc2s\c1=N\N.O.Cl


Catalog ID: 15681


Vendor: ACROS





I run the standardize program of chemaxon package with the following option (<Aromatize ID="aromatize" Type="general"/>


<Mesomerize ID="mesomerize"/>


<Removal ID="removal" Method="keepLargest" Measure="atomCount"/>


<Tautomerize ID="tautomerize"/>


<RemoveExplicitH ID="removeexplicith"/>


<Neutralize ID="neutralize"/>


)


The processed compound from A is (Fig. B): Cn1\c(SC2=CC=CC=C12)=N/N





What is the problem with the structure B, that the program can’t inserted in the database?





Is this problem related with the remove of counterions?





Another problem is the appearance of character * in the smiles representation, which is also represented in the home page of the vendor but it could not be accepted by the Chemaxon imported program.





Thank for all…

ChemAxon a3d59b832c

07-05-2008 14:55:37

Hi,





Check out this related problem:


http://www.chemaxon.com/forum/ftopic3726.html





Regards,


Szabolcs.

ChemAxon 9c0afc9aaf

09-05-2008 09:33:27

Hi,





I have checked the attached file.


Most of the structures contain any atoms (denoted by *) so it's obvious that they are not exactly defined compounds, they can only be inserted into "Query" "Markush" or "Any structures" tables.





It is a bit less obvious for the depicted structure.


The structure contains aromatic atoms (lower case) in non-aromatic rings.


This is invalid for SMILES, however we can interpret these as SMARTS, and the mentioned atoms get the aromatic SMARTS query feature "(a)" here.


Since the structure contains query features now, it is rejected from the "Molecule" table.





The original SMILES was:


Cn/1c2ccccc2s\c1=N\N.O.Cl


The "correct" SMILES form is (not sure what was the intention of the content provider):


CN/1c2ccccc2S\C1=N\N.O.Cl





Standardized smiles:


Cn1\c(SC2=CC=CC=C12)=N/N


Corrected form:


CN1\C(SC2=CC=CC=C12)=N/N





I attach a screenshot of the uncorrected standardized smiles in Marvin View, here you can see the "(a)" query feature on the atoms.





Best regards,





Szilard

User 0f28873a29

15-05-2008 20:06:15

hi:
Quote:



I have checked the attached file.


Most of the structures contain any atoms (denoted by *) so it's obvious that they are not exactly defined compounds, they can only be inserted into "Query" "Markush" or "Any structures" tables.


Thanks for this answer, I decide delete this type of structures.
Quote:



It is a bit less obvious for the depicted structure.


The structure contains aromatic atoms (lower case) in non-aromatic rings.


This is invalid for SMILES, however we can interpret these as SMARTS, and the mentioned atoms get the aromatic SMARTS query feature "(a)" here.


Since the structure contains query features now, it is rejected from the "Molecule" table.


The original SMILES was:


Cn/1c2ccccc2s\c1=N\N.O.Cl


The "correct" SMILES form is (not sure what was the intention of the content provider):


CN/1c2ccccc2S\C1=N\N.O.Cl





Standardized smiles:


Cn1\c(SC2=CC=CC=C12)=N/N


Corrected form:


CN1\C(SC2=CC=CC=C12)=N/N


Is possible with standardized program write the correct form ? or other program from chemaxon.... or how can i do a function to know in a file if exist any function int he api that allow me to validate this kind of error.?





Thanks for all

ChemAxon 42004978e8

19-05-2008 10:18:25

Hi,





The original molecule


Cn/1c2ccccc2s\c1=N\N


might have an other problem as well. This can be an aromatic molecule, but it has double bond stereo information in one of the aromatic rings ("/" and "\" sign). If you omit the two false signs:


Cn1c2ccccc2sc1=N\N


then you can import and visualize it.





Regarding standardizer:


standardizer aims to handle correctly drawn molecules, it brings molecules drawn differently or with different stereo, charge, tautomerization.... information to the same format.


(see http://www.chemaxon.com/jchem/doc/user/Standardizer.html)





Since this is an incorrect molecule, standardize can't remove the double bond stereo information.





Bye,


Robert

User 0f28873a29

19-05-2008 21:47:07

Hi:


Thanks for your quick answers, i 'm sure that my smiles structures are


wrong, but I need (if exist) a function to make a program that I can run in


my files and it tell me if the structures are "not valid Markush structure".


This program can allow me a pre-procesing of my database to know wich


structures can I inser in my "molecule table".





Thank

ChemAxon e274e1bada

21-05-2008 10:02:24

Hi,





If you switch off the option "halt if an error occurs" on import panel of JChem Manager, only acceptable molecules will be stored and the process will not stop when a molecule is coming with wrong format.





Regards, Edvard

ChemAxon 42004978e8

21-05-2008 10:23:48

Hi,





You can use the Molecule.hasValenceError() function. It will check the molecule for valence and query property errors.


http://www.chemaxon.com/marvin/help/developer/beans/api/chemaxon/struc/MoleculeGraph.html#hasValenceError()





You can use MolImport to import the molecules to memory, and here check the Molecule objects with hasValenceError() , you can save the error-free ones to a file and import them to the DB with Importer.





If hasValenceError() is not sufficient for you please let us know.


Bye,


Robert