Using smiles as the query form substructure search

User 8139ea8dbd

21-07-2006 20:58:01

Could you explain what happens when queryType is defined as smiles?


The aliphatic carbon matches aromatic carbon in this example.





select jc_compare('c1ccccc1', 'C', 't:s queryType:l') from dual


return 1





Thanks.

ChemAxon 9c0afc9aaf

24-07-2006 10:04:55

Hi,





Due to holidays you may have to wait a few days for the complete answer.





Let me try to give a partial answer until then.





The same string can mean a different thing if interpreted as SMILES or SMARTS.


For example [CH4] in SMILES can represent any carbon atom with normal valence, while the same in smarts is a strictly aliphatic carbon with *exactly* four hydrogen neighbours.





In general , one should use SMARTS as queries, and SMILES as structures. The only practical case where the query should be interpreted as SMILES the duplicate checking (PERFECT search mode).





By default the search mode determines the behaviour, as stated in the documentation:
Quote:
* d: the query is imported as SMILES, when the 't' option is set to 'p' ("t:p"), in all other cases the query is imported as SMARTS.





The default is 'd'.



I recommend you to use the default mode "d" (which is equivalent of not specifying the "queryType").


I cannot give you an example where using an other mode is necessary, probably these are very exceptional rare cases.





Best regards,





Szilard

User 8139ea8dbd

24-07-2006 16:34:24

I just was not sure why "[CH4] in SMILES can represent any carbon atom with normal valence", since smiles also distinguishes C and c. I am not suggesting that's wrong, it's probably a desirable behavior for chemists. So just want to know what is ignored in the smiles match, besides the aliphatic/aromatic behavior of the atom.





The reason I try to use smiles is I guess smiles query could be faster, since you can generate fingerprint from the query and carry out a prescreen, while fingerprint may not be easily obtained from smarts.

ChemAxon 9c0afc9aaf

25-07-2006 09:32:13

Hi,





Please note that SMILES is not originally designed to represent a substructure with query features.


We recommend to use SMARTS or other file formats for this purpose.
Quote:



The reason I try to use smiles is I guess smiles query could be faster, since you can generate fingerprint from the query and carry out a prescreen, while fingerprint may not be easily obtained from smarts.
We also generate fingerprint from SMARTS, and use it to speed up the search process.


My colleague will provide a more detailed answer in a few days.





Best regards,





Szilard

ChemAxon a3d59b832c

25-07-2006 09:45:16

yzhou wrote:
So just want to know what is ignored in the smiles match, besides the aliphatic/aromatic behavior of the atom.
All features of SMARTS that can be interpreted as SMILES strings and means different things:
  • Aliphaticity/aromaticity information (as it turned out earlier)
  • H count: (the [CH4] example: in SMILES these are the implicit hydrogens which are ignored during the search)
  • When the given number of hydrogens do not add up to a valid valence number, you end up having radicals on the atom. For example: the string [CH3] as a SMILES string describes a carbon radical (with one non-bonded electron) while as SMARTS it simply means a carbon atom with three hydrogens (e.g. like one in ethane).


However, if your query contains a SMARTS-only feature, like [$(CC)]CCC , then the whole query will be treated as SMARTS.
yzhou wrote:
The reason I try to use smiles is I guess smiles query could be faster, since you can generate fingerprint from the query and carry out a prescreen, while fingerprint may not be easily obtained from smarts.
Actually, the performance would be the same, because we extract as much information as possible from the SMARTS query also. If the SMARTS query contains explicit atom and bond types (even if only for a part of the query), then those can be used by the fingerprint generation. You can find some more information about fingerprint screening in our recent user group meeting presentation below. (Slides 20-27)





http://www.chemaxon.com/forum/viewpost6491.html#6491





Best Regards,


Szabolcs