JChemSearch - ChemAxon Forum Archive

User b22f714996

26-05-2005 07:27:44

Hello...

Sorry to bother again but I have 2 questions concerning the search capabilities of JChemSearch (API).

I saw that I have the option to restrict the maximum number of hits in a search. But is there a way to randomize the result? What I mean is, if I submit a substructure search query and limit the maximal number of hits to 100, I will always get the first 100 hits. I would like to get 100 random hits (since there are many more in the database). So is there are other way to do this except for search the total database and the do a "ORDER BY RAND() LIMIT 100" on the result table?

My second question concerns the similarity search in JChemSearch. I saw in the examples of Evaluator this neat piece about scaffold hopping. The rules for this are defined as:

Code:

dissimilarity("ChemicalFingerprint", refmol) -

dissimilarity("PharmacophoreFingerprint", refmol) > 0.6

Now to do this I just tried to perform two searches on my database (the database consists of a structure table, a table for the pharmacophore fingerprint descriptor and one for the structural fingerprint). First on the structural fingerprint and afterwards on the pharmacophore fingerprint. This is quite slow since it has to iterate through the whole table twice... Is there a way to combine this in just one search?

Thanks a lot for your help...

tobias

ChemAxon 9c0afc9aaf

26-05-2005 14:52:37

Hi Tobias,

1. There is no direct option in JChemSearch to randomize hits, but you can try the following:

Use setFilterQuery with an SQL query to influence the order of the search:

"SELECT cd_id FROM mytable ORDER BY RAND()"

2. You must run the 2 searches indeed.

Combining them wouldn't result in any speedup, since we have to read all the rows for both fingerprint tables anyway.

Best regards,

Szilard

User b22f714996

27-05-2005 06:54:58

Hi Szilard,

Thanks a lot for your help. The first hint is quite useful.

Regarding my second question:

Is it possible to make a search on a previous search result?

I mean, I could make the first search on the PF since I can set the dissimilarity threshold to 0.4f. That would reduce the total set. Afterwards I could perform the second search (the CF) on this subset.... So I could avoid processing the full database twice. Did I express this clearly?

Actually looking at the first hint you gave me, something like this should work:

1.) Make a similarity search on the pharmacophore fingerprint with a dissimilarity threshold of 0.4 and save the result in a result table.

2.) set a filter: setFilterQuery("SELECT st.* FROM structuretable as st, resulttable as rt WHERE rt.cd_id = st.cd_id");

3.) run the second similarity search. This time on the structural fingerprint and put the result in a different result table.

4.) join the result tables by the rule for scaffold hopping.

What do you think? That should make the search faster...

Thanks a lot,

tobias

ChemAxon 9c0afc9aaf

27-05-2005 07:42:14

Tobias,

The solution you suggest should work, but I recommend some modifications:

A: You should only get the cd_id values in setFilterQuery, st.* can be much slower than st.cd_id

B: Did you define CF as a descriptor (created with generatemd) or do you use the standard similarity search ?

These two approaches provide the same results.

If you use the columns in the structure table (no generatemd), then this dissimilarity search will be very quick if you are also using the structure cache (setStructureCaching(true)), since these fingerprint columns are stored in the cache.

You should start with non-descriptor CF (setSearchType(JChemSearch.SIMILARITY)), and then run the PF descriptor search with filterQuery.

For CF you should use dissimilarity threshold 0.6 and setReturnsNonHits(true) to get the rows with >0.6 dissimilarity.

Best regards,

Szilard