How to compare several structures? - ChemAxon Forum Archive

User 4e4b708dbd

17-12-2013 14:09:44

Let's say I run a query with JChem Base for Structure A in a large database and get 25 hits. One may be exact match, others may be isomers and/or salts.

Is there any other way to get this match type than buy running several queries (one exact math, one with exact match but stereo ignored, then both of those as exact fragment matches, etc)?

ChemAxon abe887c64e

17-12-2013 14:53:17

Hi,

You will get more hits if you use less strict search options and/or draw the query structure in a less strict way (e.g., without stereo bonds).

I recommend the following settings:

search type should be set to 'substructure' search or 'full fragment'

tautomerSearch:y option can be set for hitting also tautomers

checking of many structural features can be switched off by setting to 'ignore': stereoSearchType:i, charge:i, isotope:i, radical:i, valence:i.

Here you find our search options collected.

Furthermore, you can increase the number of hits if you apply more standardizer actions than the default ones (aromatization, removing explicit hydrogens) set to the database table.

Best regards,

Krisztina

User 4e4b708dbd

17-12-2013 15:13:58

Dear Krisztina,

I guess my question was not as clear as I thought. I understand how to get more hits. But let's say I run a query and get 7 hits and let's call hem Hit 1 - Hit 7.

I want to get match type for them. Something like this:

Hit 1 Isomer

Hit 2 Isomer, salt

Hit 3 Isomer

Hit 4 Exact match

Hit 5 Exact match, salt

Hit 6 Exact match, salt

Hit 7 Isomer

How can this be done?

Regards,

Imants

ChemAxon abe887c64e

17-12-2013 16:01:06

Hi Imants,

I suppose 'full fragment search' will give hits because the query matches at least with one fragment of the target (hit 4, hit 5, hit 6) .

What kind of isomers would be hit 1, hit 2, hit 3, hit 7?

If tautomers then tautomerSearch:y could help.

If structural isomers then position variation bonds or R-groups can be applied in the query structure.

If stereo-isomers then our different stereo search options and/or stereo bonds, bond types can be used.

I hope this helps.

Best regards,

Krisztina

ChemAxon 9c0afc9aaf

17-12-2013 20:15:26

Hi Imants,

AFAIK we only store and return the list of hits (cd_id).

No extra information is preserved regarding why a record was hit, which atoms were matching, etc.

As you say,one solution is to run the searches repeatedly.

The other is to go with the most general type of search, and perform further tests on the hits.

One way to do these would be use the MolSearch class for graph search directly, but it may be a little messy, as you need to attend to proper standardization of the query and target structures and similar issues.

The other way is to repeat the search with JChemSearch, but reduce the scope of the search to the previous hit list via setFilterIDList():

http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/jchem/db/JChemSearch.html#setFilterIDList(int[])

In the case of a large database with a few hits this could also reduce the execution time (on the screening part), and simpler to use in this context than MolSearch directly.

I hope this helps.

Best,

Szilard

User 4e4b708dbd

18-12-2013 09:18:21

Is a query with setFilterIDList() faster than the same query without it? Or does it still search full table before applying the filter?

What may be useful is a method similar to similarity calculation. Something like getMatchType(Structure A SMILES, Hit 6 SMILES) = Exact match, salt.

ChemAxon 4a2fc68cd1

19-12-2013 13:45:40

Hi,

Your first question was also answered in a separate thread. Using filters decrease search time, only those records will be searched that are present on the list.

As far as I understand, you are looking for some kind of relevance scoring/ordering of the search hits. Unfortunately, JChem search does not provide such feature yet. However, you can make a custom workflow based on the above suggestions. E.g. you perform the less restrictive search that is appropriate for you (e.g. substructure) and perform subsequent searches on the hits of the first one: either to obtain results of stricter search criteria (e.g. full structure and full fragment searches) or to obtain similarity scores. Based on these data, you can combine results that are similar to your examples.

Regards,
Peter