Chemaxon sanity issues with using MolImporter

User 45b4c9a16b

21-10-2016 13:51:16

I'm not sure what the issue is here,


 I use the Chemaxon functions to output a series of structures in SMILES format, such as this 


O=C(Nc1ccncc1)c1ncnc1C(=O)OC1CCCCC1


Now this is actually invalid, because of the ambiguity around the five memebred ring, but I can't seem to find a combination of functions that will let me fix this.


The reason I bring it up here is that if you were to import this string into MarvinSketch a valid structure is produced, and indeed Chemaxon renders it, but copying the SMILES from that gives O=C(Nc1ccncc1)c1[nH]cnc1C(=O)OC1CCCCC1, which is perfectly valid.


Is there a snippet of code or something that can be done to clean this up within a node, I've tried the various options of MolImporter to copy the structure into a new Molecule instance, but the invalid form remains.


Is this an actual bug? 


ChemAxon 25dcd765a3

24-10-2016 12:54:50

Hi,


You are right the ambiguity around the five membered ring is there due to the fact that the implicit Hydrogen is missing from a Nitrogen atom. I think it is missing from the original structure and that is the reason why you still miss it after export. What is the original structure and what transformations did you apply before exporting to SMILES?

User 45b4c9a16b

24-10-2016 13:09:46

It's a structure enumerator, so there is no 'original' structure as such, it's combined in the program itself. So I can't get access to it to manipulate it until after generation. Is there a way of manipulating the graph (or the nitrogen atom directly within the atom graph to fix this, as I don't think there's a way I can get in to fix this otherwise.

ChemAxon 25dcd765a3

24-10-2016 13:57:40

Are you using Chemaxon's enumeration tool or your own one?


During the enumeration do you work with structures in Kekule form? I think if you are using the Kekule's form and then change to aromatic representation after the enumeration process you should not see this issue. So you probably feed the enumerator with structures in Kekule's form.


However if you are using ChemAxon's enumerator than we should fix this bug in our code. In this case could you send us some structures to reproduce the issue?

User 45b4c9a16b

24-10-2016 14:01:44

My own, I work with things in the aromatic (non-Kekule form), combing fragments in a templated manner, and then outputting the results in Molecule objects.

ChemAxon 25dcd765a3

25-10-2016 06:05:11

In this case it seems that during your process you do not set the implicit Hydrogen count properly. Please note that it is not possible for valencecheck to set implicit Hydrogen count in case of Nitrogen atom with aromatic bonds as it is ambiguous (can have one implicit Hydrogen or no implicit Hydrogen atom).


My suggestion to start with Kekule form. Call molecule.dearomatize() method before your process.


If you need aromatic form at the and you can optionally call the aromatization method at the end.

User 45b4c9a16b

25-10-2016 13:51:15

That would seem to be the best way, I'll talk to the people who I work with regarding the enumeration process to see if we can make that work.

User 45b4c9a16b

18-11-2016 08:56:57

Since my last message, I've gone through the various parts of my enumeration code to see if they are working as intended, and fixed a few situations where certain functional groups were being removed from the SMILES strings as part of an overzealous attempt to simplify the molecular ouput form, but I still seem to have a few issues left over, all seemingly to do with the presence of hydrogen on the nitrogen atoms within the rings.

What I'm not sure I'm implementing correctly is the MolExporter and MolImporter handles to create molecules to output as SMILES. For example, I can connect fragments together, and receive a SMILES string that Marvin is prepared to render such as:

Cc1cnc(n1)c1ccc(CN)cc1

If I took that same SMILES string, and pasted into a MarvinSketch window, it would immediately be corrected to the more structurally correct form of:

Cc1c[nH]c(n1)-c1ccc(CN)cc1

Clearly there's some internal call that can be made to clean these molecule sup, as running the same input string through a MolImporter - MolExporter chain leaves it unchanged.

Is there something accessible in these classes I can use to get the same effect as the MarvinSketch 'correction' that doesn't involve copying the strings directly into a Sketch window? Obviously for batch processing such an intermediary step would be impossible, so I'd like to do it in situ if at all possible.

ChemAxon 25dcd765a3

18-11-2016 09:23:47

Hi,


As far as I know it runs valenceCheck with global aromatic checking.


So you can get the same result using the following code snippet:


ValenceCheckOptions valenceCheckOptions = new ValenceCheckOptions.Builder().setLocalAromaticChecking(false).build();
m.setValenceCheckOptions(valenceCheckOptions);
MolExporter.exportToFormat(m,"smiles");

User 45b4c9a16b

18-11-2016 09:26:13

Let me give this a try, if it is this straightforward that would help a great deal.

User 45b4c9a16b

18-11-2016 09:50:39

Yes, that seems to do what we need, if I insert that into our process chain, it cleans everything up at each stage and reduces our error rate considerably.