Leveraging chemaxon plugins to assess structure definition

User a18e201107

20-04-2010 14:53:30

Hello


We currently use chemaxon jchem tools for most of the processing for our compound registration system, and would like to do 100% of the processing using chemaxon (currently we bounce out to pipeline pilot for some additional processing).  One of the items we bounce out is what we call our "ambiguity detector."  This process takes a molecule and determines whether or not it has an undefined stereocenters and if so flags the molecule.  Our registration system is such that compounds receive an "A" if there is stereochemistry undefined and a "K" otherwise.  


In an attempt to replicate this behavior in chemaxon I have written some code (below) which determines the number of assymetric atoms in the molecule and then checks the chirality of each carbon (whether or not it is a R, S, niether, or UndefinedParity).  If there are no assymetric atoms, the compound is given a K.  If the count of R+S = assymetric atom count  OR there are no Carbons with UndefinedParity (this is a "3" chirality flag) the compound is marked a K.  Otherwise it is an "A."


This process seems to work pretty well except when the molecule has a assymetric nitrogen AND an adamante group.  Here the adamantane gets flagged as having a few carbons with undefinedparity (3) and the assymetric atom count does not match R+S count.  As you can see this logic is a bit clunky (and likely not the best way to proceed).  I was wondering if


1) there is a way to ask whether or not an individual atom is assymetric so that we can get a count of Carbons only


2) if you are developing anything which looks at a structure and determines it does not have defined stereochemistry



One thing I did not mention is that we would like to look at double bond stereochemistry in compounds as well and see if that is defined.  I believe I have a way of doing that which I did not describe here but I may wish to add this issue to this conversation. 


I have attached an sdf with some examples of the kinds of compounds I have been testing against and the results of the code below.  Thank you for any assistance you can provide.


 


public static boolean isAmbiguous(String smiles, String letter, List columns)
    {
        AssertUtil.a(StringUtils.isNotBlank(smiles), "smiles can't be blank");
        Molecule molecule;
        
            //convert smiles to molecule
            molecule = MolImporter.importMol(smiles);
            molecule.clean(2, null); // clean 2D
            //use topologyanlyser plugin to calculate a couple properties of the molecule
            
            TopologyAnalyser topologyPlugin = new TopologyAnalyser();
            topologyPlugin.setMolecule(molecule);
            int assymetricAtomCount = topologyPlugin.asymmetricAtomCount();
            int chiralCenterCount = topologyPlugin.chiralCenterCount();
            
            //call method to determine carbon ambiguity   
        boolean tetrahedralAmbiguous =  isTetrahedralAmbiguous(molecule, assymetricAtomCount);    
        return tetrahedralAmbiguous;
        
        
        
private static boolean isTetrahedralAmbiguous(Molecule molecule, int assymetricAtomCount)
    {
            boolean  undefinedParity = false;
            int RandS_Count = 0;
        
            if (assymetricAtomCount == 0)
        {
            return false;
        }
        else
        {

            int length = molecule.getAtomArray().length;
            for (int i = 0; i < length; i++)
            {
                
                MolAtom molAtom = molecule.getAtom(i);
                String symbol = molAtom.getSymbol();
                            if ("C".equals(symbol))
                            {
                                int chirality = molecule.getChirality(i);
                                
                                if (chirality == 16 || chirality == 8)
                                {
                                    ++RandS_Count;
                                }
                                else if (chirality == 3)
                                {
                                    undefinedParity = true;
                                }
                            }
            }
        }
            
        if (RandS_Count == assymetricAtomCount || undefinedParity == false)
        {
            return false;
        }
        else
        {
            return true;
        }
        }
       

ChemAxon 25dcd765a3

21-04-2010 10:59:55

Hi,


Let's discuss the examples you have attached.


1- The molecule should get "K" as all chiral atoms have specified chirality value


2- The molecule should get "A" as the Carbon with atomic index 5 doesn't have specified chirality


3- The molecule has no chiral center, however it has cis trans stereoisomerism in ring. So the molecule itself could be chiral if the cyclohexane would have wedges.


4- I guess this would be the case you have mentioned "an assymetric nitrogen AND an adamante group".


The adamantane is a rigid ring system so the chirality for substituted adamantane systems can be guessed out. This algorithm is already in our plans.


"Here the adamantane gets flagged as having a few carbons with undefinedparity (3)"


Yes you are right the atoms 4, 6, 8, 10 gets undefinedparity which is a bug. These atoms should get 0 chirality value. (This bug has no connection to the presence of N atom.)


Should get letter "K".


5- Atom indexes 4 and 10 should get undefinedparity other atoms should get 0. Should get "A"


6- Seems that it should get letter "K".


Is this what you would expect?


Andras

User a18e201107

21-04-2010 12:54:45

Andras


Thank you very much for your reply. 


You analysis is pretty much 100% inline as to what I was expecting. I am happy to hear that you are aware of the bug with respect to structure 4.  I believe that fix will solve my issue, is there an idea when that will be released? 


In truth after posting I went back and considered the possibility of not just evaluating Carbon atoms for chirality and may open it up to N, S, P as well.  It seems as though the functionality exposed handles this quite well already.


I have one other question I am hoping you can help me with.  I would like to access the doubleBondStereoisomerCount through the API, but it appears as though the Stereochemical plugin does not expose this functionality. Is there something I am missing? Are there any plans to expose this more advanced functionality?


Thank you again, as always you guys are most helpful


Dennis


 


 

ChemAxon e08c317633

22-04-2010 09:43:43










dmoccia wrote:

I have one other question I am hoping you can help me with.  I would like to access the doubleBondStereoisomerCount through the API, but it appears as though the Stereochemical plugin does not expose this functionality. Is there something I am missing? Are there any plans to expose this more advanced functionality?



To get the doubleBondStereoisomerCount through the API set StereoisomerPlugin.setStereoisomerismType(int) to StereoisomerPlugin.DOUBLE_BOND and then call the StereoisomerPlugin.getStereoisomerCount() method.


Zsolt

ChemAxon 25dcd765a3

22-04-2010 10:43:27

Hi Dennis,


I'm currently revise / rewrite the stereochemical recognition in Marvin.


It will be (hopefully) ready in 5.4


And the chirality recognition is supported for other atoms not just Carbon.


For example Nitrogen is under normal circumstances not a stereocenter since it is flexible
enough to invert but if the atom itself is in ring and all its' ligands
are also in ring smaller than size 12 this flexibility vanish.


All the best


Andras

User a18e201107

22-04-2010 17:24:30

Andras & Zsolt



Thank you for the responses.  I will look forward to the patch.



Following your adivce on the doubleBondStereoisomerCount I was able to get at the counts.  However I have noticed some discrepancy between IJC and calling the stereoDoubleBondCount() from the topology plugin. 


In IJC I see


[H]\C(c1c(-c2ccccc2)n(C)c2ccccc12)=C1\C(=O)Oc2ccccc2C1=O   stereoDoubleBondCount() = 1


[H]C(c1c(-c2ccccc2)n(C)c2ccccc12)=C1C(=O)Oc2ccccc2C1=O   stereoDoubleBondCount() = 0


but when I call the count in the following code


CODE START


TopologyAnalyser topologyPlugin = new TopologyAnalyser();
topologyPlugin.setMolecule(molecule);           


int stereoDoubleBondCount = topologyPlugin.stereoDoubleBondCount(); log.info("StereoDoubleBondCount = " + stereoDoubleBondCount);



CODE END


I receive a  stereoDoubleBondCount() = 0 for both instances.  Any ideas why this might be the case?


Also the rules for counting a  stereoDoubleBondCount() seem a bit odd and I was hoping you may be able to ellaborate on them.  I am not sure why the first example has a count of 0, and then 2 doublebondStereoisomers are calculated in IJC.


In IJC


Cn1c2ccc3ccccc3c2s\c1=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1


doublebondStereoisomerCount() = 2

stereoDoubleBondCount() = 0


 


C\C(=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1)c1ccc2ccccc2c1


doublebondStereoisomerCount() = 2


stereoDoubleBondCount() = 1


 


In using the topolgy plugin and the following code for the doublebond stereoisomer count (below)


Cn1c2ccc3ccccc3c2s\c1=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1


doublebondStereoisomerCount()
= 1



stereoDoubleBondCount() = 0


C\C(=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1)c1ccc2ccccc2c1


doublebondStereoisomerCount()
= 1


stereoDoubleBondCount() = 1


 


CODE START


StereoisomerPlugin isomerPlugin = new StereoisomerPlugin();
        //set input molecule
        isomerPlugin.setMolecule(molecule);
        
        //set plugin parameters
        isomerPlugin.setStereoisomerismType(2);
        isomerPlugin.setCheck3DStereo(true);              
      isomerPlugin.setIn3D(true);  
        log.info("here");
        //run plugin
        isomerPlugin.run();
        //get count
         int stereoisomerCount = isomerPlugin.getStereoisomerCount();


CODE END


I can provide more examples if you wish,


Thank you again for all your help


Dennis


 

ChemAxon e08c317633

23-04-2010 16:39:13










dmoccia wrote:


Following your adivce on the doubleBondStereoisomerCount I was able to get at the counts.  However I have noticed some discrepancy between IJC and calling the stereoDoubleBondCount() from the topology plugin. 


In IJC I see


[H]\C(c1c(-c2ccccc2)n(C)c2ccccc12)=C1\C(=O)Oc2ccccc2C1=O   stereoDoubleBondCount() = 1


[H]C(c1c(-c2ccccc2)n(C)c2ccccc12)=C1C(=O)Oc2ccccc2C1=O   stereoDoubleBondCount() = 0


but when I call the count in the following code


CODE START


TopologyAnalyser topologyPlugin = new TopologyAnalyser();
topologyPlugin.setMolecule(molecule);           


int stereoDoubleBondCount = topologyPlugin.stereoDoubleBondCount(); log.info("StereoDoubleBondCount = " + stereoDoubleBondCount);



CODE END


I receive a  stereoDoubleBondCount() = 0 for both instances.  Any ideas why this might be the case?



Dennis, I get the same results with the API (Marvin 5.3.2), as in IJC.


My code:


public class TopologyAnalyserTest {

private static final String[] MOLS = new String[] {
"[H]\\C(c1c(-c2ccccc2)n(C)c2ccccc12)=C1\\C(=O)Oc2ccccc2C1=O",
"[H]C(c1c(-c2ccccc2)n(C)c2ccccc12)=C1C(=O)Oc2ccccc2C1=O"
};

public static void main(String[] args) throws Exception {
TopologyAnalyser topologyPlugin = new TopologyAnalyser();
topologyPlugin.setMolecule(MolImporter.importMol(MOLS[0]));
System.out.println("1. StereoDoubleBondCount = " + topologyPlugin.stereoDoubleBondCount());
topologyPlugin.setMolecule(MolImporter.importMol(MOLS[1]));
System.out.println("2. StereoDoubleBondCount = " + topologyPlugin.stereoDoubleBondCount());
}
}

The output:


1. StereoDoubleBondCount = 1
2. StereoDoubleBondCount = 0

 Please attach your whole java code, so we can examine it.











dmoccia wrote:

Also the rules for counting a  stereoDoubleBondCount() seem a bit odd and I was hoping you may be able to ellaborate on them.  I am not sure why the first example has a count of 0, and then 2 doublebondStereoisomers are calculated in IJC.


In IJC


Cn1c2ccc3ccccc3c2s\c1=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1


doublebondStereoisomerCount() = 2

stereoDoubleBondCount() = 0


 


C\C(=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1)c1ccc2ccccc2c1


doublebondStereoisomerCount() = 2


stereoDoubleBondCount() = 1



All values except the stereoDoubleBondCount() for molecule "Cn1c2ccc3ccccc3c2s\c1=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1" seems to be OK. In this case TopologyAnalyser can not identify the double bond between the N and the aromatic carbon (in ring) as cis or trans double bond. We are working on the fix.











dmoccia wrote:

In using the topolgy plugin and the following code for the doublebond stereoisomer count (below)


Cn1c2ccc3ccccc3c2s\c1=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1


doublebondStereoisomerCount()
= 1



stereoDoubleBondCount() = 0


C\C(=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1)c1ccc2ccccc2c1


doublebondStereoisomerCount()
= 1


stereoDoubleBondCount() = 1


 


CODE START


StereoisomerPlugin isomerPlugin = new StereoisomerPlugin();
        //set input molecule
        isomerPlugin.setMolecule(molecule);
        
        //set plugin parameters
        isomerPlugin.setStereoisomerismType(2);
        isomerPlugin.setCheck3DStereo(true);              
      isomerPlugin.setIn3D(true);  
        log.info("here");
        //run plugin
        isomerPlugin.run();
        //get count
         int stereoisomerCount = isomerPlugin.getStereoisomerCount();


CODE END



There is a bug in StereoisomerPlugin API, doublebondStereoisomerCount() should be 2 in both cases. MarvinSketch, cxcalc, Chemical Terms, and IJC is not affected by this bug. We will fix it.


Thanks for the detailed bug report.
Zsolt

ChemAxon e08c317633

28-04-2010 07:47:22





















dmoccia wrote:

In using the topolgy plugin and the following code for the doublebond stereoisomer count (below)


Cn1c2ccc3ccccc3c2s\c1=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1


doublebondStereoisomerCount()
= 1



stereoDoubleBondCount() = 0


C\C(=N\C(=O)c1ccc(cc1)S(=O)(=O)N1CCCCC1)c1ccc2ccccc2c1


doublebondStereoisomerCount()
= 1


stereoDoubleBondCount() = 1


 


CODE START


StereoisomerPlugin isomerPlugin = new StereoisomerPlugin();
        //set input molecule
        isomerPlugin.setMolecule(molecule);
        
        //set plugin parameters
        isomerPlugin.setStereoisomerismType(2);
        isomerPlugin.setCheck3DStereo(true);              
      isomerPlugin.setIn3D(true);  
        log.info("here");
        //run plugin
        isomerPlugin.run();
        //get count
         int stereoisomerCount = isomerPlugin.getStereoisomerCount();


CODE END



There is a bug in StereoisomerPlugin API, doublebondStereoisomerCount() should be 2 in both cases. MarvinSketch, cxcalc, Chemical Terms, and IJC is not affected by this bug. We will fix it.


Thanks for the detailed bug report.
Zsolt



We identified the bug: by default StereoisomerPlugin.setProtectDoubleBondStereo(boolean) is set to true, so double bonds with specified cis or trans configuration are not allowed to change their stereo configuration. In Marvin 5.3.3 it will be fixed, the default will be false, as specified in the javadoc. Until then please insert the line


isomerPlugin.setProtectDoubleBondStereo(false);

into your code, and the API will return the same result as IJC. Code:


StereoisomerPlugin isomerPlugin = new StereoisomerPlugin();
//set input molecule
isomerPlugin.setMolecule(molecule);
        
//set plugin parameters
isomerPlugin.setStereoisomerismType(2);
isomerPlugin.setProtectDoubleBondStereo(false);
isomerPlugin.setCheck3DStereo(true);              
isomerPlugin.setIn3D(true);  
log.info("here");
//run plugin
isomerPlugin.run();
//get count
int stereoisomerCount = isomerPlugin.getStereoisomerCount();

Regards,
Zsolt

User a18e201107

28-04-2010 10:54:35

Zsolt


Thank you for the update, I will make that change.  I still owe you my code using the Topology Plugin, I will post that in later today.


 


Dennis