How to 'append' associated info for an incoming dup struct?

User 59bc43367b

25-10-2007 20:22:50

I am scratching my head over how to handle this scenario;


1. I get multiple sdf files from labs/researchers


2. I compile a table of unique structures - uniques


3. I return sdf to labs/researchers with an additional field - cd_id from my uniques table.





When I import structures, I can choose to ignore duplicates. But what I really need is a way to add associated data for a duplicate to some field in uniques table. So I know that structure_nnnn was duplicate so it was not added to my uniques table but I can later query this field to know which structure fm input.sdf matched an existing structure.





Only way I can think of is to import input.sdf with no duplicates option. Then read the input.sdf, for each structure do an Exact search on uniques table, if a hit is found, get the cd_id from uniques and use it to create input_enhanced.sdf - there has to be a shorter/easier way!

ChemAxon 990acf0dec

26-10-2007 17:16:04

It looks that you need a kind of simple registration service. We have already started the development of such a tool, but it will be available only probably early next year. The first solution will be an InstantJChem based service, which can - besides the functionalitiy you requested - standardize your structures, and also make a quality check according to your business rules. If you need more information on the planned features of our registration system, I am happy to send you additional information.

User 59bc43367b

26-10-2007 17:20:47

Please do send me the details. I am currently evaluating your software and this is a vital fx for our decision.





CAn you confirm that I can indeed read structures from sdf file and use them in query - the way I am envisioning it with current fx?

ChemAxon a3d59b832c

27-10-2007 08:52:27

SubodhJoshi wrote:
CAn you confirm that I can indeed read structures from sdf file and use them in query - the way I am envisioning it with current fx?
Yes, if you would like to implement it yourself, it is a possible approach. For duplicate check, we do not use "Exact", but the "Perfect" search type - this will require equality in all molecular features.


(See: http://www.chemaxon.com/jchem/doc/user/Query.html#otherSearchTypes


http://www.chemaxon.com/jchem/doc/user/QueryMatchExamples.html )





For such an implementation, you would need to use the following Java classes:





To read molecules from SD files, manipulate and write them out to SDF:





http://www.chemaxon.com/jchem/doc/api/chemaxon/struc/Molecule.html


http://www.chemaxon.com/jchem/doc/api/chemaxon/formats/MolImporter.html


http://www.chemaxon.com/jchem/doc/api/chemaxon/formats/MolExporter.html





Examples:


http://www.chemaxon.com/marvin/examples/index.html


(Mainly the first will be useful for you: SimpleConverter)





For database search:





http://www.chemaxon.com/jchem/doc/api/chemaxon/jchem/db/JChemSearch.html





Developers guide, with lots of examples for DB search, import, export, etc:


http://www.chemaxon.com/jchem/doc/guide/





Let us know if you have any questions about the details.





---


Another "easy" solution could be to use overlap analysis in Instant JChem:


http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/chemistry_functions/performing_overlap_analysis.html


But currently it is accessible through the user interface only, and the result is not entirely the same format as you described.





Szabolcs