Exceptions thrown by standardizer

User 07c4f121e4

13-07-2009 15:35:50

Hi,


We're currently working to standardize all the compounds in our database and have found several hundred which cause the standardizer to throw an exception.  We're using JChem 5.2.2 and the config parameters "neutralize..tautomerize..aromatize:b..removeexplicitH", but I've also tested a number with 5.2.3 and get the same errors.  A couple of examples are:



standardize --config neutralize..tautomerize..aromatize:b..removeexplicitH "B1(C2=C(C=NN1S(=O)(=O)CCC)SC(=C2)C)O" Exception in thread "main" chemaxon.marvin.io.MolExportException: The following atom cannot be aromatic according to the SMILES definition: B
at chemaxon.marvin.io.formats.smiles.SmilesExport.generateSmilesString(SmilesExport.java:1989)
at chemaxon.marvin.io.formats.smiles.SmilesExport.singleMolToSMILES(SmilesExport.java:849)
at chemaxon.marvin.io.formats.smiles.SmilesExport.toSMILES(SmilesExport.java:709)
at chemaxon.marvin.io.formats.smiles.SmilesExport.convert(SmilesExport.java:578)
at chemaxon.formats.MolExporter.write(MolExporter.java:383)
at chemaxon.util.ConfigUtils.writeMol(ConfigUtils.java:596)
at chemaxon.reaction.ConcurrentStandardizerProcessor.run(ConcurrentStandardizerProcessor.java:561)
at chemaxon.reaction.ConcurrentStandardizerProcessor.main(ConcurrentStandardizerProcessor.java:665)



and



standardize --config neutralize..tautomerize..aromatize:b..removeexplicitH "[H+].[H+].CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C=C)C)C=C)C)C)CCC(=O)[O-])CCC(=O)[O-].CN1C=CN=C1.[Fe+2]"
chemaxon.reaction.StandardizerException: Concurrent processing error.
Caused by:
java.lang.ArrayIndexOutOfBoundsException: 87
Caused by:
87
chemaxon.reaction.StandardizerException: Concurrent processing error.
Caused by:
java.lang.ArrayIndexOutOfBoundsException: 87
Caused by:
87
at chemaxon.reaction.ConcurrentStandardizerProcessor.standardize(ConcurrentStandardizerProcessor.java:394)
at chemaxon.reaction.ConcurrentStandardizerProcessor.run(ConcurrentStandardizerProcessor.java:560)
at chemaxon.reaction.ConcurrentStandardizerProcessor.main(ConcurrentStandardizerProcessor.java:665)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 87
at chemaxon.util.concurrent.processors.WorkUnitData.getResult(WorkUnitData.java:65)
at chemaxon.util.concurrent.processors.ScheduledWorkUnitData.getResult(ScheduledWorkUnitData.java:53)
at chemaxon.util.concurrent.processors.WorkUnitDataIterator.getNext(WorkUnitDataIterator.java:74)
at chemaxon.reaction.ConcurrentStandardizerProcessor.standardize(ConcurrentStandardizerProcessor.java:383)
... 2 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 87
at chemaxon.calculations.Tautomerization.initCalc(Tautomerization.java:939)
at chemaxon.calculations.Tautomerization.calculateDACouples(Tautomerization.java:8191)
at chemaxon.calculations.Tautomerization.createDACouples(Tautomerization.java:8183)
at chemaxon.calculations.CanonicTautomer.calcCanonicalTautomer(CanonicTautomer.java:318)
at chemaxon.calculations.Tautomerization.createCanonicTautomer(Tautomerization.java:421)
at chemaxon.marvin.calculations.TautomerizationPlugin.run(TautomerizationPlugin.java:644)
at chemaxon.reaction.Standardizer.performTautomerize(Standardizer.java:1943)
at chemaxon.reaction.Standardizer.standardizeComponent(Standardizer.java:2146)
at chemaxon.reaction.Standardizer.standardize(Standardizer.java:2252)
at chemaxon.reaction.ConcurrentStandardizerProcessor$StandardizerWorkUnit.call(ConcurrentStandardizerProcessor.java:261)
at chemaxon.util.concurrent.processors.WorkUnitProcessorBase.process(WorkUnitProcessorBase.java:200)
at chemaxon.util.concurrent.processors.WorkUnitProcessorBase$Worker.call(WorkUnitProcessorBase.java:377)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)



These seem to be the only two types of exception we're getting, though the atom and array index value reported varies.  Dropping the aromatize option eliminates the first error, but I would expect this to be handled automatically within the standardizer code.

ChemAxon e08c317633

13-07-2009 16:46:47

Hi,


Please use the --ignore-error command line option to ignore  the SMILES export exception.


"If the command line parameter --ignore-error is specified, then import/export errors will not stop the processing but the error is written to the console and the molecule is skipped. By default, the program exits in case of molecule import/export erros. "


$ standardize --ignore-error --config aromatize:b "B1(C2=C(C=NN1S(=O)(=O)CCC)SC(=C2)C)O" 


 


The second exception (ArrayIndexOutOfBoundsException) is caused by a bug, we will fix it in JChem 5.2.4.


Zsolt

ChemAxon e08c317633

13-07-2009 17:33:55










biowisdom wrote:

standardize --config neutralize..tautomerize..aromatize:b..removeexplicitH "[H+].[H+].CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C=C)C)C=C)C)C)CCC(=O)[O-])CCC(=O)[O-].CN1C=CN=C1.[Fe+2]"
...


Caused by: java.lang.ArrayIndexOutOfBoundsException: 87
at chemaxon.calculations.Tautomerization.initCalc(Tautomerization.java:939)
at chemaxon.calculations.Tautomerization.calculateDACouples(Tautomerization.java:8191)
at chemaxon.calculations.Tautomerization.createDACouples(Tautomerization.java:8183)
at chemaxon.calculations.CanonicTautomer.calcCanonicalTautomer(CanonicTautomer.java:318)
at chemaxon.calculations.Tautomerization.createCanonicTautomer(Tautomerization.java:421)
at chemaxon.marvin.calculations.TautomerizationPlugin.run(TautomerizationPlugin.java:644)
at chemaxon.reaction.Standardizer.performTautomerize(Standardizer.java:1943)
at chemaxon.reaction.Standardizer.standardizeComponent(Standardizer.java:2146)
at chemaxon.reaction.Standardizer.standardize(Standardizer.java:2252)
at chemaxon.reaction.ConcurrentStandardizerProcessor$StandardizerWorkUnit.call(ConcurrentStandardizerProcessor.java:261)
at chemaxon.util.concurrent.processors.WorkUnitProcessorBase.process(WorkUnitProcessorBase.java:200)
at chemaxon.util.concurrent.processors.WorkUnitProcessorBase$Worker.call(WorkUnitProcessorBase.java:377)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)



It seems tautomerize throws exception only if a multifragment molecule, which contains H+ ions is neutralized before the tautomerize.


Workaround (until we fix the error):  start with tautomerize


$ standardize --config tautomerize..neutralize..aromatize:b..removeexplicitH "[H+].[H+].CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C=C)C)C=C)C)C)CCC(=O)[O-])CCC(=O)[O-].CN1C=CN=C1.[Fe+2]"
[H+].[H+].[Fe++].Cn1ccnc1.Cc1c(CCC(O)=O)c2cc3nc(cc4nc(cc5nc(cc1n2)c(C)c5C=C)c(C)c4C=C)c(C)c3CCC(O)=O


I hope this helps.


Zsolt

User 07c4f121e4

14-07-2009 08:07:04










Zsolt wrote:

Hi,


Please use the --ignore-error command line option to ignore  the SMILES export exception.


"If the command line parameter --ignore-error is specified, then import/export errors will not stop the processing but the error is written to the console and the molecule is skipped. By default, the program exits in case of molecule import/export erros. "


$ standardize --ignore-error --config aromatize:b "B1(C2=C(C=NN1S(=O)(=O)CCC)SC(=C2)C)O" 


 


The second exception (ArrayIndexOutOfBoundsException) is caused by a bug, we will fix it in JChem 5.2.4.


Zsolt



Thanks, but the exception itself isn't really causing me an issue - the problem is that I'm not getting a standardized form out for this compound (the whole point of doing this is to get a standard form out for every compound).  If the atom cannot be aromatic, then why is the standardizer aromatizing it in the first place?  Does this mean that the aromatize stage should just be a no-op (in which case I can catch this exception and rerun without the aromatize stage), or should some level of aromatization actually take place (forgive my vagueness, but I'm not a chemist)?


Thanks again,


Robin

User 07c4f121e4

14-07-2009 08:11:51










Zsolt wrote:

It seems tautomerize throws exception only if a multifragment molecule, which contains H+ ions is neutralized before the tautomerize.


Workaround (until we fix the error):  start with tautomerize


$ standardize --config tautomerize..neutralize..aromatize:b..removeexplicitH "[H+].[H+].CC1=C(C2=CC3=C(C(=C([N-]3)C=C4C(=C(C(=N4)C=C5C(=C(C(=N5)C=C1[N-]2)C=C)C)C=C)C)C)CCC(=O)[O-])CCC(=O)[O-].CN1C=CN=C1.[Fe+2]"
[H+].[H+].[Fe++].Cn1ccnc1.Cc1c(CCC(O)=O)c2cc3nc(cc4nc(cc5nc(cc1n2)c(C)c5C=C)c(C)c4C=C)c(C)c3CCC(O)=O


I hope this helps.


Zsolt



 Okay, thanks - I'll switch the config around when we hit this exception then.


Cheers,


Robin

ChemAxon e08c317633

14-07-2009 11:30:12










biowisdom wrote:










 
 

Thanks, but the exception itself isn't really causing me an issue - the problem is that I'm not getting a standardized form out for this compound (the whole point of doing this is to get a standard form out for every compound).  If the atom cannot be aromatic, then why is the standardizer aromatizing it in the first place?  Does this mean that the aromatize stage should just be a no-op (in which case I can catch this exception and rerun without the aromatize stage), or should some level of aromatization actually take place (forgive my vagueness, but I'm not a chemist)?


Thanks again,


Robin



There are several aromatization methods (we have 3, see Aromaticity Detection), and some of them aromatize also the boron. The aromatic molecule created by our basic aromatization method is correct, but can not be saved to SMILES format, due to SMILES format limitations. Please, consider using an other format for such molecules.


In JChem 5.3 we will introduce a new product, called Structure Checker, that can automatically fix file format related issues.


Regards,


Zsolt

User 07c4f121e4

14-07-2009 13:31:16










Zsolt wrote:

There are several aromatization methods (we have 3, see Aromaticity Detection), and some of them aromatize also the boron. The aromatic molecule created by our basic aromatization method is correct, but can not be saved to SMILES format, due to SMILES format limitations. Please, consider using an other format for such molecules.


In JChem 5.3 we will introduce a new product, called Structure Checker, that can automatically fix file format related issues.


Regards,


Zsolt



 Okay, thanks - I've checked and it outputs as a mol file without any errors.  What I find odd is that I can then take that mol file, load it into Marvin, and save as a SMILES structure without any errors.  Why is this, and can I do the same thing via the API?

ChemAxon e08c317633

14-07-2009 16:53:54










biowisdom wrote:










Zsolt wrote:

There are several aromatization methods (we have 3, see Aromaticity Detection), and some of them aromatize also the boron. The aromatic molecule created by our basic aromatization method is correct, but can not be saved to SMILES format, due to SMILES format limitations. Please, consider using an other format for such molecules.


In JChem 5.3 we will introduce a new product, called Structure Checker, that can automatically fix file format related issues.


Regards,


Zsolt



 Okay, thanks - I've checked and it outputs as a mol file without any errors.  What I find odd is that I can then take that mol file, load it into Marvin, and save as a SMILES structure without any errors.  Why is this, and can I do the same thing via the API?



I will forward this issue to the developer who is working on SMILES export, he will answer in few days.


It's strange, I can not convert the created molfile to SMILES with molconvert.


$ standardize --config neutralize..tautomerize..aromatize:b..removeexplicitH "B1(C2=C(C=NN1S(=O)(=O)CCC)SC(=C2)C)O" -f mol | molconvert smiles
40: cannot convert molecule to smiles: The following atom cannot be aromatic according to the SMILES definition: B


Zsolt

User 07c4f121e4

15-07-2009 09:47:25










Zsolt wrote:

I will forward this issue to the developer who is working on SMILES export, he will answer in few days.


It's strange, I can not convert the created molfile to SMILES with molconvert.


$ standardize --config neutralize..tautomerize..aromatize:b..removeexplicitH "B1(C2=C(C=NN1S(=O)(=O)CCC)SC(=C2)C)O" -f mol | molconvert smiles
40: cannot convert molecule to smiles: The following atom cannot be aromatic according to the SMILES definition: B


Zsolt



Many thanks.  Just to provide some more info - when I refered to Marvin, I meant the MarvinSketch applet (v5.1.2), and the resultant SMILES was "CCCS(=O)(=O)n1ncc2sc(C)cc21O".


Cheers,


Robin

ChemAxon 25dcd765a3

15-07-2009 12:37:40

Hi,


As Zsolt pointed out, SMILES has some limitation. One of them is that it is not possible to describe aromatic boron compounds.


From the SMILES definition:


" Only atoms on the following list can be considered aromatic:
C, N, O, P, S, As, Se, and * (wildcard)."


 


I suggest to use an other format. For example you can use cxsmiles format which would generate:


molconvert cxsmiles:a_bas -s "B1(C2=C(C=NN1S(=O)(=O)CCC)SC(=C2)C)O"
CCCS(=O)(=O)n1ncc2sc(C)cc21O


As you can see the generated string is not truely SMILES compliant, but we can import it.


Andras

User 07c4f121e4

16-07-2009 09:07:42

Thanks,  I'm all clear on this now.  We'll have a ponder over what approach to take with these compounds then.


We've now finished running everything through the standardizer, and have found another case which is causing an exception:



standardize --config tautomerize..neutralize..aromatize:b..removeexplicitH "COC1=C(C=C(C=C1)Cl)C(=O)NCCC2=CC=C(C=C2)S(=O)(=O)NC(=O)NC3CCCC(C3)O"
chemaxon.reaction.StandardizerException: Concurrent processing error.
Caused by:
java.lang.ArrayIndexOutOfBoundsException: -1
Caused by:
-1
chemaxon.reaction.StandardizerException: Concurrent processing error.
Caused by:
java.lang.ArrayIndexOutOfBoundsException: -1
Caused by:
-1
at chemaxon.reaction.ConcurrentStandardizerProcessor.standardize(ConcurrentStandardizerProcessor.java:394)
at chemaxon.reaction.ConcurrentStandardizerProcessor.run(ConcurrentStandardizerProcessor.java:560)
at chemaxon.reaction.ConcurrentStandardizerProcessor.main(ConcurrentStandardizerProcessor.java:665)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: -1
at chemaxon.util.concurrent.processors.WorkUnitData.getResult(WorkUnitData.java:65)
at chemaxon.util.concurrent.processors.ScheduledWorkUnitData.getResult(ScheduledWorkUnitData.java:53)
at chemaxon.util.concurrent.processors.WorkUnitDataIterator.getNext(WorkUnitDataIterator.java:74)
at chemaxon.reaction.ConcurrentStandardizerProcessor.standardize(ConcurrentStandardizerProcessor.java:383)
... 2 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: -1
at chemaxon.calculations.PolarGroups.setAtomicGroups(PolarGroups.java:1193)
at chemaxon.calculations.PolarGroups.setGroups(PolarGroups.java:1153)
at chemaxon.calculations.Charge.setCarboxylGroup(Charge.java:5622)
at chemaxon.calculations.Charge.initChargeCalc(Charge.java:436)
at chemaxon.calculations.Charge.calcCharges(Charge.java:728)
at chemaxon.calculations.CanonicTautomer.setScoreOfTautomers(CanonicTautomer.java:421)
at chemaxon.calculations.CanonicTautomer.calcCanonicalTautomer(CanonicTautomer.java:333)
at chemaxon.calculations.Tautomerization.createCanonicTautomer(Tautomerization.java:421)
at chemaxon.marvin.calculations.TautomerizationPlugin.run(TautomerizationPlugin.java:644)
at chemaxon.reaction.Standardizer.performTautomerize(Standardizer.java:1943)
at chemaxon.reaction.Standardizer.standardizeComponent(Standardizer.java:2146)
at chemaxon.reaction.Standardizer.standardize(Standardizer.java:2252)
at chemaxon.reaction.ConcurrentStandardizerProcessor$StandardizerWorkUnit.call(ConcurrentStandardizerProcessor.java:261)
at chemaxon.util.concurrent.processors.WorkUnitProcessorBase.process(WorkUnitProcessorBase.java:200)
at chemaxon.util.concurrent.processors.WorkUnitProcessorBase$Worker.call(WorkUnitProcessorBase.java:377)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)



Thanks,


Robin

ChemAxon e08c317633

16-07-2009 11:15:33

It's a bug in tautomerize, we will fix it. Thanks for reporting, and sorry for the inconvenience.


Zsolt