Error performing search on standardized structures

User 818520b6b8

02-01-2006 08:10:38

Hi,





I have two copies of the same structure table, one with default standardization and the second one with custom standardization.





Some searches return some warnings at system output and an exception when run against custom standardized table, but they work well when run against default standardized table.





The warnings at system output are the following:


WARNING: Chiral center has wrong connectivity at atom 155 in smiles: [H]N[C@@H](C)C(=O)N[C@@H](C(C)O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)NC(CCCNC(N)=N)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@H](CC2=CC=C(O)C=C2)[C@@H](=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC3=CNC4=C3C=CC=C4)C(=O)N[C@@H](CC(O)=O)C(=O)NC(CC5=CC=C(O)C=C5)C(=O)N[C@@H](CCSC)C(=O)[N@H]C(CCC(N)=[O@@])C(O)=O |@:25,48,78,129,149|


The chirality is ignored.


WARNING: Chiral center has wrong connectivity at atom 36 in smiles: [H]NC(C(C)CC)C(=O)N[C@@H](CC(C)C)C(=O)N1CCC[C@H]1C(=O)N[C@H](CC2=CNC3=C2C=CC=[C@H]3)C(=O)NC(CCCCN)C(=O)N[C@@H]([C@@H2]C4=CN[C@@H]5=C4C=CC=C5)C(=O)N6CCC[C@H]6C(=O)NC(CC7=C[N@@H]C8=C7C=CC=C8)C(=O)NC(CC9=CNC%10=[C@@H]9C=CC=C%10)C(=O)N%11CCC[C@H]%11C(=O)NC(CC%12=CNC%13=[C@H]%12C=CC=C%13)C(=O)N[C@@H](CCCNC(N)=N)C(=O)NC(CC[C@@H2]NC(N)=N)C(O)=O |@:2,3,25,39,69,83,104,129|


The chirality is ignored.


WARNING: Chiral center has wrong connectivity at atom 145 in smiles: [H]N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CC1=CNC2=C1C=CC=C2)C(=O)N[C@@H](CS[H])C(=O)N[C@@H](CC3=CC=C(O)C=C3)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCCN)C(=O)N[C@H](CCCCN)C(=O)N4CCC[C@@H]4C(=O)N[C@@H](CC5=CC=C(O)C=C5)C(=O)N[C@@H](CCCNC(N)=N)C(=O)N[C@@H](CCCNC(N)=O)C(=O)N[C@@H](CS[H])C(=O)NC(CCCNC(N)=N)C(O)=[O@@] |@:89,134|


The chirality is ignored.


WARNING: Chiral center has wrong connectivity at atom 57 in smiles: [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)NC(CC1=CNC2=C1C=CC=C2)C(=O)[N@@H][C@@H](CCCCN)C(=O)N[C@@H](C(C)O)C(=O)[N@@H][C@@H](CC(C)C)C(=O)NC(CC([C@@H3])C)C(=O)N[C@@H](CCCCN)C(=O)NC(C[C@@H2]CCN)C(=O)N[C@@H](C(C)C)C(=O)NC(CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)NC([C@@H3])C(=O)N[C@@H](C)C(=O)NC([C@H2]C(C)C)C(=O)N[C@@H](CCCCN)C(=O)NC([C@@H3])C(=O)NC(C(C)C)C(=O)NC(CC([C@@H3])C)[C@H](=O)N[C@@H](C(C)C)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](C)C(O)=O |@:15,53,70,80,86,127,137,154,159,160,166,175|


The chirality is ignored.


WARNING: Chiral center has wrong connectivity at atom 170 in smiles: [H]N[C@@H](C)C(=O)N[C@@H](CC(C)C)C(=O)NC(CC1=CNC2=C1C=CC=C2)C(=O)[N@@H][C@@H](CCCCN)C(=O)N[C@@H](C(C)O)C(=O)[N@@H][C@@H](CC(C)C)C(=O)NC(CC([C@@H3])C)C(=O)N[C@@H](CCCCN)C(=O)NC(C[C@@H2]CCN)C(=O)N[C@@H](C(C)C)C(=O)NC(CC(C)C)C(=O)N[C@@H](CCCCN)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)N[C@@H](C)C(=O)N[C@@H](CCCCN)C(=O)NC([C@@H3])C(=O)N[C@@H](C)C(=O)NC([C@H2]C(C)C)C(=O)N[C@@H](CCCCN)C(=O)NC([C@@H3])C(=O)NC(C(C)C)C(=O)NC(CC([C@@H3])C)[C@H](=O)N[C@@H](C(C)C)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](CC(N)=O)C(=O)N[C@@H](C)C(O)=O |@:15,53,70,80,86,127,137,154,159,160,166,175|


The chirality is ignored.


WARNING: Chiral center has wrong connectivity at atom 29 in smiles: CC(C)CC(NC(=O)[C@H](CCCCN)NC(=O)[C@@H](NC(=O)C(CC(C)C)[N@H][C@H](=[O@])C1CCCN1C(=O)C(CC(O)=O)NC(=O)C(CC2=CC=C(O)C=C2)[N@H]C(=O)[C@@H](CC3=CNC4=C3C=[C@H]C=C4)NC(=O)[C@H](CC(N)=O)NC(=O)C(C)NC(C)=O)C(C)O)[C@@H](=[O@@])NC(CC5=CNC6=C5C=CC=C6)C(=O)N[C@@H](CC(C)C)C(=O)N7CCC[C@H]7C(=O)N[C@@H](CC(O)=O)C(=O)NCC(=O)NCC(=O)NC(CCCCN)[C@H](O)=O |@:90,135,@@:4,21,29,36,44,56,78|


The chirality is ignored.


WARNING: Chiral center has wrong connectivity at atom 89 in smiles: CC(C)CC(NC(=O)[C@H](CCCCN)NC(=O)[C@@H](NC(=O)C(CC(C)C)[N@H][C@H](=[O@])C1CCCN1C(=O)C(CC(O)=O)NC(=O)C(CC2=CC=C(O)C=C2)[N@H]C(=O)[C@@H](CC3=CNC4=C3C=[C@H]C=C4)NC(=O)[C@H](CC(N)=O)NC(=O)C(C)NC(C)=O)C(C)O)[C@@H](=[O@@])NC(CC5=CNC6=C5C=CC=C6)C(=O)N[C@@H](CC(C)C)C(=O)N7CCC[C@H]7C(=O)N[C@@H](CC(O)=O)C(=O)NCC(=O)NCC(=O)NC(CCCCN)[C@H](O)=O |@:90,135,@@:4,21,29,36,44,56,78|


The chirality is ignored.





The exception is the following:


java.lang.ArrayIndexOutOfBoundsException: -1


at java.util.Vector.elementAt(Vector.java:434)


at chemaxon.struc.Molecule.addChildSgroupClonesRecursively(Molecule.java:648)


at chemaxon.struc.Molecule.clonecopySgroups(Molecule.java:725)


at chemaxon.struc.Molecule.clonecopy(Molecule.java:626)


at chemaxon.struc.Molecule.cloneMolecule(Molecule.java:767)


at chemaxon.struc.Molecule.clone(Molecule.java:776)


at chemaxon.sss.search.StructureSearch.setTarget(StructureSearch.java:705)


at chemaxon.sss.search.StructureSearch.setTarget(StructureSearch.java:727)


at chemaxon.sss.search.MolSearch.initSearch(MolSearch.java:1126)


at chemaxon.sss.search.MolSearch.isMatching(MolSearch.java:719)


at chemaxon.jchem.db.JChemSearch.isMatching(JChemSearch.java:3491)


at chemaxon.jchem.db.JChemSearch.retrieveBatchAndSearch(JChemSearch.java:3437)


at chemaxon.jchem.db.JChemSearch.searchInDB(JChemSearch.java:3319)


at chemaxon.jchem.db.JChemSearch.search1(JChemSearch.java:2106)


at chemaxon.jchem.db.JChemSearch.search(JChemSearch.java:1897)


at chemaxon.jchem.db.JChemSearch.setRunning(JChemSearch.java:1784)


at chemaxon.jchem.db.JChemSearch.run(JChemSearch.java:1804)


...........





I used a simple custom standardization file to standardize structures. I'm attaching it.





This table I'm working on is a smaller copy of our production table.





Many thanks.

ChemAxon a3d59b832c

02-01-2006 08:32:51

Hello,





Thanks for the bug reports. To further investigate the problems, we should have the content of the cd_structure column of the involved molecules.





Could you provide them, please?


Which version do you currently use?





One remark about the custom standardization configuration: This lacks aromatization which is essential for the proper work of structure searching. See:





http://www.chemaxon.com/jchem/doc/user/Query.html#noteonaromatic





Best regards,


Szabolcs

User 818520b6b8

02-01-2006 08:53:23

Hi,





I cannot understand you. I'm using JChemSearch class to perform searches, so I imagine structures are aromatized before searching. And I'm searching on a standardized table.





Where should I set aromatization?





I'm using JChem 3.1.3





If you provide me with a ftp account on one of your servers I can send you an Oracle export of my structure table. I cannot identify structures which cause the error, but I can send you the table and the structure I'm using to perform the search.

ChemAxon a3d59b832c

02-01-2006 10:56:05

prous wrote:
I cannot understand you. I'm using JChemSearch class to perform searches, so I imagine structures are aromatized before searching. And I'm searching on a standardized table.
It is only true for standard aromatization. Once you have provided a custom configuration, only the actions provided there are performed.
prous wrote:
Where should I set aromatization?
You should put it as the first action:


Code:
<StandardizerConfiguration Version ="0.1">


    <Actions>


        <Aromatize ID="aromatize"/>


        <Removal ID="removal" Method="keepLargest" Measure="atomCount"/>


        <ClearStereo ID="clearstereo"/>


    </Actions>


</StandardizerConfiguration>






I will send you the ftp server details by mail.





Szabolcs

ChemAxon d76e6e95eb

02-01-2006 11:57:12

Just a note, it is even simpler if you leave out the IDs. The default measure for Removal action is atomCount, so you can leave that out. Furthermore, the default Removal method is keepLargest, you can leave that out as well! The resulting xml is very simple:





Code:
<StandardizerConfiguration>


    <Actions>


        <Aromatize/>


        <Removal/>


        <ClearStereo/>


    </Actions>


</StandardizerConfiguration>

ChemAxon a3d59b832c

03-01-2006 17:15:47

Hi Gerard,





I got the file, thanks. I successfully identified the molecule causing the


exception, my colleague is investigating the problem now.





However, I could not reproduce the warnings. Now it seems to me that


you can ignore these warnings, they do not cause errors in structure


searching or in data representation. Just for double checking, could you


send me the contents of the cd_structure column where the cd_smiles


columns are equal to the smiles string reported in the warnings? (It


seems that they were not in mol format.)





You should use SQL statements like the following:





Code:
select cd_structure from <table> where cd_smiles = "[H]N[C@@H](C)C(=O)N[C@@H](C(C)O)C(=O)N[C@@H](CCCNC(N)=N)C(=O)NC(CCCNC(N)=N)C(=O)N[C@@H](CC1=CC=C(O)C=C1)C(=O)N[C@H](CC2=CC=C(O)C=C2)[C@@H](=O)N[C@@H](CC(C)C)C(=O)NCC(=O)N[C@@H](C)C(=O)N[C@@H](C(C)C)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CC(C)C)C(=O)N[C@@H](CO)C(=O)N[C@@H](CC3=CNC4=C3C=CC=C4)C(=O)N[C@@H](CC(O)=O)C(=O)NC(CC5=CC=C(O)C=C5)C(=O)N[C@@H](CCSC)C(=O)[N@H]C(CCC(N)=[O@@])C(O)=O |@:25,48,78,129,149|"

ChemAxon 9c0afc9aaf

04-01-2006 07:19:38

Hi Gerard,





In the meantime we could also reproduce the warnings, so there's no need to send the result of the SQL statements Szabolcs requested in the last post.





Best regards,





Szilard

ChemAxon a3d59b832c

05-01-2006 07:49:08

The exception was raised on molecules with nested sgroups. These are not officially supported, but can be imported and handled by JChem/Marvin to some extent. The exception has been fixed now, the next minor Marvin release (4.0.4) will contain this fix, as well as the next JChem release which contains this Marvin version.

User 818520b6b8

05-01-2006 08:11:24

Ok,





which will be this next JChem version, 3.2 ??? And when is expected to be available for download?





Thanks.

ChemAxon a3d59b832c

05-01-2006 08:31:35

It will be JChem 3.1.5, and it is coming out during the next two weeks, hopefully.