Substructure searching in Vitic

User 4ad39ca939

07-01-2011 12:16:51

When performing a substructure search of our Vitic database using the Accord chemistry cartridge, 2 compounds are retrieved that are not retrieved when using the JChem cartridge.


I imagine this is to do with differences in the way the cartridges work but would like to understand why this is happening and whether this could be changed as the behaviour is undesired.


The substructure used was paracetamol and the structures that are not retrieved when using the JChem cartridge are CAS 113241-47-7 and CAS 143343-83-3.


Thank you


Liz

ChemAxon a3d59b832c

07-01-2011 13:38:25

Dear Liz,


 


Could you send us the molecules in the format that were used in the JChem database?


(Both the query and the two missing database molecules.)


 


Furthermore, can you tell us which JChem version are you using?


 


Most likely it would be a difference in the aromatization method. There are ways to change it in JChem Cartridge, but first it would be best if we could reproduce the behaviour.


 


Thanks,


Szabolcs

User 4ad39ca939

11-01-2011 12:45:40

Hi Szabolcs


The query structure was paracetamol and was entered in molfile format and I also drew the molecule in Marvin Sketch 5.2.4 which is embedded into our Vitic database


 


The 2 molecules that were not retrieved in our structure database have been exported and attached to this email. I believe they were originally entered into our database in Accord molfile format.


We currently use JChem 5.2.


 


Thanks,


Liz

ChemAxon a3d59b832c

12-01-2011 18:47:12

Hi Liz,


Thank you for the structures!

Indeed, 143343-83-3 is missed because of the different aromatization method.

Our default aromatization method (general) aromatizes both rings of the fused ring system.

To avoid that, you could use a custom standardization on the table with a different aromatization method. (For example, basic aromatization.)

Alternatively, the search option vague bond level 3 or 4 ("All bonds "or aromatic"" and "Ignore bond types")  will also retrieve the structure. (Or using single or aromatic bonds instead of single bonds in the query.)

See more details here:

http://www.chemaxon.com/marvin/help/sci/aromatization-doc.html />
http://www.chemaxon.com/jchem/doc/dev/dbconcepts/index.html#standardizerintegration />
http://www.chemaxon.com/jchem/doc/user/query_searchoptions.html#vaguebond />

However, for me the other structure (113241-47-7) is found by your query.

I have inserted the structures into our public web application example.

You can check it yourself here using table editexample:

http://www.chemaxon.com/jchem/examples/db_search/ />

(It is using the latest JChem version - 5.4.0.1, but I think that JChem 5.2 should work the same way in this regard.)

So this needs to be investigated further.

A few more questions:

What is the file format that you use to transfer the query?

From the symptoms, I suspect that it may be smiles, but it is a problem. This is because smiles was designed to represent molecules, not substructures. We expect that substructure queries come in as smarts if it is a smiles-like format. (smarts is the substructure language extension of smiles, and its semantics is different.)

Instead of smiles, I recommend to use mrv, mol, cxsmarts or smarts formats. (This order is in decreasing information content.)


If the transfer format is not smiles: Do you have any standardization on the table or JChem index?


Thanks,

Szabolcs

User 44b96ebc9a

13-01-2011 12:55:32

Hi Szabolcs


(I am now logging in with our generic support team details so that my colleagues can also track this request).


What is the file format that you use to
transfer the query?  Molfile


If the transfer format
is not smiles: Do you have any standardization on the table or JChem index?


 


There
is a JChem index on the structures, yes.  the command used to
create the index is :


 


create index
struct_jchem_idx on structure_table (structure) 


  indextype
is jchem.jc_idxtype 


  parameters('tableType=anyStructures,haltOnError=n,duplicateFiltering=n');


 


Thanks,


Liz

ChemAxon a3d59b832c

19-01-2011 08:26:08

Dear Liz,


I am sorry for the delay.


I am checking this issue in JChem Base 5.2.5.1:


(The same import and search engine as in the cartridge.)


$ ./jcman c any_5251 --t:any



$ ./jcman a any_5251 Structures/113241-47-7.mol
Collecting file information ...
 Done.
Importing structures from Structures/113241-47-7.mol ...

Total number of processed molecules: 1
Not imported (error): 0
Successfully imported: 1
Elapsed time: 2 seconds
 Done.

$ ./jcman a any_5251 Structures/143343-83-3.mol
Collecting file information ...
 Done.
Importing structures from Structures/143343-83-3.mol ...

Total number of processed molecules: 1
Not imported (error): 0
Successfully imported: 1
Elapsed time: 1 seconds
 Done.

$ ./jcsearch -q Structures/Query.mol DB:any_5251
Cl.COC1=C(OC)C=C(CCN(C)CCCN(C(=O)C2=CC=C(C=C2)[N+]([O-])=O)C2=CC(OC)=C(OC)C=C2)C=C1


 


And when I am using basic aromatization:


$ ./jcman c any_5251 --t:any --stconfig Structures/basic_aromatization.xml

$ ./jcman a any_5251 Structures/113241-47-7.mol
Collecting file information ...
 Done.
Importing structures from Structures/113241-47-7.mol ...

Total number of processed molecules: 1
Not imported (error): 0
Successfully imported: 1
Elapsed time: 1 seconds
 Done.

$ ./jcman a any_5251 Structures/143343-83-3.mol
Collecting file information ...
 Done.
Importing structures from Structures/143343-83-3.mol ...

Total number of processed molecules: 1
Not imported (error): 0
Successfully imported: 1
Elapsed time: 1 seconds
 Done.

$ ./jcsearch -q Structures/Query.mol DB:any_5251
Cl.COC1=C(OC)C=C(CCN(C)CCCN(C(=O)C2=CC=C(C=C2)[N+]([O-])=O)C2=CC(OC)=C(OC)C=C2)C=C1
COC1=CC=C(CNCC(O)COC2=CC=C3NC(=O)C=CC3=C2)C=C1OC

 


This latter finds both structures. We will check if there any difference in the Cartridge.


 


Could you send us the actual SQL query that is used to submit the search?


 


Thanks,


Szabolcs

ChemAxon a3d59b832c

19-01-2011 08:27:09

Here I attach basic_aromatization.xml.

User 44b96ebc9a

19-01-2011 12:40:09

Typically the query would look something like this :


 


create table LOGIC_SET_13024


  as select 'TEST_DB' as db_name, subst_id, luid, luid as structure_luid 


   from TEST_DB.STRUCTURES 


  where  jc_compare(TEST_DB.STRUCTURES.STRUCTURE, 'C1CCCCC1', 't:s ') = 1


 


but of course the TEST_DB.STRUCTURES part would change according to the database and table name in question...


 


This is for a substructure search.  The "t:s" part is different if it is an exact match / similarity search.  

ChemAxon a3d59b832c

19-01-2011 13:57:37

Thanks for the information.


The quoted SQL select statement looks fine. I assume that in the place of 'C1CCCCC1' stands the molfile content  (in mol format) or a pl/sql variable holding this value. Is that correct?




Please note that if the molfile is converted to smiles, then only one result will be found:


$ ./molconvert smiles Structures/Query.mol


CC(=O)NC1=CC=C(O)C=C1

$ ./jcsearch -q 'CC(=O)NC1=CC=C(O)C=C1' DB:any_5251
COC1=CC=C(CNCC(O)COC2=CC=C3NC(=O)C=CC3=C2)C=C1OC

This is because jc_compare expects a SMARTS expression in case of substructure search:


$ ./molconvert smarts Structures/Query.mol
[#6]C(=O)NC1=CC=C(O)C=C1

$ ./jcsearch -q '[#6]C(=O)NC1=CC=C(O)C=C1' DB:any_5251
Cl.COC1=C(OC)C=C(CCN(C)CCCN(C(=O)C2=CC=C(C=C2)[N+]([O-])=O)C2=CC(OC)=C(OC)C=C2)C=C1
COC1=CC=C(CNCC(O)COC2=CC=C3NC(=O)C=CC3=C2)C=C1OC

(All above is executed on the basic aromaticity table.)


 


In the meantime, we have tested the cartridge as well.


5.2.5.1 and 5.2.6 both works the same way as JChem base above. (Which is the expected behaviour.)


 


Best regards,


Szabolcs

User 44b96ebc9a

20-01-2011 15:54:44

Thank you for the information.


I think we have enough here to understand what is happening. I will need to discuss this with colleagues to understand whether the basic aromaricity needs to be encorporated into our DB.