SDF Search - Multiple Structures

User 2a533dbb3b

15-05-2009 14:29:35

Using the JChem API, is it possible to do an exact or perfect search against a list of structures in an SD File?  When I do this, it appears that the JChemSearcher is only searching the first strusture.  A simplified example of the code:

  JChemSearch searcher = new JChemSearch();
  ...
  searcher.setQueryStructure(contentOfSDF);
  searcher.run();

The debug output produces:
  Fri May 15 10:17:15 EDT 2009
  Search mode: PERFECT
  Structure table: CATALOG.CHEMSTRUCTURE
  Query: [H]C1=C([H])C(=O)C(=C([H])C1=O)C([H])([H])[H]
  Screened: 1
  Hits: 1
  Total time: 1281 ms  Screening: 94 ms
  Processing threads: 4

As shown above, I only get one hit.  The SDF should get several hits.  The SMILES displayed in the output above is equivalent to the first Mol in the SDF.  If I split the SDF into separate Mol files and do one search for each mol I get more than one result.  I would like to avoid splitting the SDF up and doing multiple searches for performance reasons. Also, even a small SD file would quickly eat up our license limit.


 

ChemAxon 9c0afc9aaf

15-05-2009 15:38:04










paulpablo wrote:

Using the JChem API, is it possible to do an exact or perfect search against a list of structures in an SD File?  When I do this, it appears that the JChemSearcher is only searching the first strusture.  A simplified example of the code:

  JChemSearch searcher = new JChemSearch();
  ...
  searcher.setQueryStructure(contentOfSDF);
  searcher.run();

The debug output produces:
  Fri May 15 10:17:15 EDT 2009
  Search mode: PERFECT
  Structure table: CATALOG.CHEMSTRUCTURE
  Query: [H]C1=C([H])C(=O)C(=C([H])C1=O)C([H])([H])[H]
  Screened: 1
  Hits: 1
  Total time: 1281 ms  Screening: 94 ms
  Processing threads: 4

As shown above, I only get one hit.  The SDF should get several hits.  The SMILES displayed in the output above is equivalent to the first Mol in the SDF.  If I split the SDF into separate Mol files and do one search for each mol I get more than one result.  I would like to avoid splitting the SDF up and doing multiple searches for performance reasons. Also, even a small SD file would quickly eat up our license limit.


 



Hi,


First of all I recommend to change your "Username" to something else than your e-mail in your Profile (link at top). Reason: this forum is open to read for anyone, so e-mail addresses in public can attract a lot of spam.


JChemSearch only searches for 1 structure at a time, so you have to execute the search for each structure separately.


You do not ave to split the SDF yourself, MolImporter can read the individual molecules for you, and you can also specify a Molecule object to JChemSearch


http://www.chemaxon.com/jchem/doc/api/chemaxon/formats/MolImporter.html


http://www.chemaxon.com/jchem/doc/api/chemaxon/jchem/db/JChemSearch.html#setQueryStructure(chemaxon.struc.Molecule)


 


You should use the same JChemSearch object and only call these repeatedly ina loop


- setQueryStructure


- run()


- fetching the results


If done this way there is no performance overhead, all you have to do is to merge the results.


 


Best regards,


Szilard