User 57295192cc
03-02-2014 15:34:46
Hi,
I have a question about the results returned by substructure search when the max. results are limited and the search returns only a subset of all possible hits. Based on some testing I found that the same substructure search doesn't always return the same exact hits. All returned hits are correct though and ordered correctly. Is this behaviour expected?
(Apologies if this is something trivial -- I've checked the docs, and couldn't find anything obvious, but might have missed something...)
A simple Java example:
JChemSearch searcher = new JChemSearch();
searcher.setQueryStructure("N");
searcher.setConnectionHandler(getConnectionHandler());
searcher.setStructureTable(...);
ChemSearchOptions searchOptions = new JChemSearchOptions(SearchConstants.SUBSTRUCTURE);
searchOptions.setMaxResultCount(20);
searchOptions.setDissimilarityThreshold(.35f);
searcher.setSearchOptions(searchOptions);
searcher.run();
int[] results = searcher.getResults();
System.out.println("Results: " + Arrays.toString(results));
Running this twice in a row, I get results like this:
Results: [1, 2, 4, 7, 8, 10, 14, 29, 30, 31, 33, 34, 35, 42, 46, 48, 53, 56, 58, 116]
Results: [1, 2, 4, 7, 8, 10, 14, 29, 30, 31, 32, 34, 42, 46, 48, 53, 56, 58, 59, 116]
I also tried it with SIMILARITY and it seemed to be deterministic (although haven't done an exhaustive testing). I tried to play with various search options but it didn't make any difference. I also tried the Instant JChem demo and found the same: the results vary slightly each time.
I did the testing on JChem 6.2.0 but have observed the same behaviour before, with older versions.
Many thanks,
Pal