Substructure Search: explicit hydrogens

User dedc9c3574

26-02-2014 02:38:48

I am experiencing different substructure search behavior between a JChem Base setup and command-line JChem. My example is the following SMILES string (query.smi): [nH]1c2ncccc2cc1.


When I use the command-line query of "jcsearch -q query.smi database.smi", it returns hits where the explicit hydrogen can be replaced by any atom. When I use JChem Base with the same SMILES string, I get a subset of these hits, where the explicit hydrogen must be matched and not substituted by other groups, such as a methyl. In the case of JChem Base, I am using a database to store "database.smi". I have tried to set "implicit_h_matching_ignore" so that JChem Base ignores the explicit hydrogen, but it makes no difference.


Any ideas, please? Thanks.

ChemAxon abe887c64e

26-02-2014 10:48:44

The cause of the apparently strange search result is the ambiguous interpretation of the string "[nH]1c2ncccc2cc1". If we interpret this query string as smiles, we receive the substituted hits, too. If we interpret this query string as smarts, we don't receive the substituted (in place of H) hits.


If you define the query and target as files in smiles format, their content is interpreted as smiles. Otherwise, in command line search, we interpret the ambiguous query strings as smarts.


If you would like to interpret the string as smiles, you have to define it in the following way:


jcsearch -q '[nH]1c2ncccc2cc1 {smiles}' DB:<tablename>


or


jcsearch -q query.smiles DB:<tablename>


We plan to unify JChem's interpretation rules for the cases of ambiguous strings in the near future, till then the use of {smiles} or {smarts} options can be applied, if necessary.


Best regards,


Krisztina

User dedc9c3574

26-02-2014 16:27:26

Thanks for your reply. It sounds like "jcsearch" is correctly perceiving my query structure as SMILES (because I supply it in a ".smi" file, but JChem Base perceives it as SMARTS (to recap, I am getting the desired substituted hits in jcsearch but not JChem Base). What I would like to do then, is force my ambiguous query string into SMILES format in JChem Base, so I retrieve the same number of hits as jcsearch - is that possible?


Here is how I currently set the query structure in JChem Base:


JChemSearch searcher = new JChemSearch();
searcher.setQueryStructure("[nH]1c2ncccc2cc1");


Is there a function to "cast" the form of the query structure?


Thankyou.

ChemAxon abe887c64e

27-02-2014 11:30:25

In Java code, we recommend to apply the unequivocal smarts format of the query structure: "c1cc2cccnc2n1"


searcher.setQueryStructure("c1cc2cccnc2n1");

If you need, you can use our command-line molconvert function to transform smiles to smarts.


molvonvert smarts input.smiles -o output.smarts

Best regards,


Krisztina



ChemAxon 9c0afc9aaf

27-02-2014 12:14:25

Hi,


You can also set a Molecule object for the searcher:


http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/jchem/db/JChemSearch.html#setQueryStructure(chemaxon.struc.Molecule)


You can import this Molecule form the source with MolImporter:


http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/formats/MolImporter.html#importMol(java.lang.String, java.lang.String)


Even if not options specified it should give priority to the SMILES interpretation if possible, but you can also control the detected input format.


You can find more information on file formats and option strings here:


https://www.chemaxon.com/marvin/help/formats/formats.html


Best regards,


Szilard

User dedc9c3574

27-02-2014 17:18:36

Thanks for your help. Passing a Molecule object to the JChemSearch object instead of a String worked for me and the intended perception as a SMILES string (not SMARTS) worked (substructure search retrieves substitutions at the explicit hydrogen position which were not seen when perceived as SMARTS):


Molecule m = MolImporter.importMol("[nH]1c2ncccc2cc1", "smiles");


JChemSearch searcher = new JChemSearch();


searcher.setQueryStructure(m);