How can I compare to molecules

User 0f28873a29

02-05-2008 20:49:48

I want to compare to molecules, to know if they are equals...?


How can I do that.?


Thank in advance...

ChemAxon a3d59b832c

07-05-2008 14:46:06

There are basically two methods:





1. Convert to unique smiles and compare by string comparison.


2. Use the MolSearch class to compare Molecule objects.





We just covered this topic on our developer training day, here is a code example illustrating both methods.





I hope this helps.





Szabolcs

User 0f28873a29

14-05-2008 03:42:18

Thanks for your reply


I try to import a file to a database with 1 000 000 of copounds and i wnat to compoute the


duplicate structures with the tautomers criteria, but is very slow.


Then I will want to split my file in four files and compoute this function between a pair of


files in 2 computers and then between the results files.


Is this possible.?


I need to storage the Id of duplicate structures.


Thanks for all.

ChemAxon 9c0afc9aaf

14-05-2008 14:24:04

Hi,





We plan to speed up duplicate filtering with tautomers significantly from version 5.1.


Until then


- Please make sure you run the import on a (powerful) server if possible. JChemManager can be invoked from command-line if there is no graphic environment.


- We automatically utilize all available processors, using a multi-processor machine is recommended
Quote:



Then I will want to split my file in four files and compoute this function between a pair of


files in 2 computers and then between the results files.


Is this possible.?


I need to storage the Id of duplicate structures.
I'm not sure I understand the details here, but such an overlap analysis is possible in different ways.





The Instant JChem GUI application is the most convenient of these:





http://www.chemaxon.com/product/ijc.html





Please see this link how to perform an overlap analysis.





http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/chemistry_functions/performing_overlap_analysis.html





Best regards,





Szilard

User 0f28873a29

14-05-2008 15:18:52

HI:


I have a cluster machine with 16 processor, how can I do the search against the


database?


... concurrency?


Thanks for all

User 0f28873a29

15-05-2008 04:54:39

Hi:


If to molecules have different hash codes they can be the same?





Which is the different between hash comparison and stereochemestry and


tautomers (duplicates) options in the jcman program?





Thanks in advance

ChemAxon 9c0afc9aaf

15-05-2008 09:30:47

Regarding multi-processing:


We currently support using multiple processors in a single machine.


The available processors are automatically utilized.


We plan to develop cluster computing support later (JChem 5.2).


I hope this answers your question.
Quote:



If to molecules have different hash codes they can be the same?
No, different hash code guarantees they are different.


On the other hand an identical hash code does not guarantee they are the same, but a very efficient pre-filter before graph search.
Quote:



Which is the different between hash comparison and stereochemestry and


tautomers (duplicates) options in the jcman program?
Stereochemistry is not coded in the hash code.


Otherwise I don't really understand this question, please rephrase.





Best regards,





Szilard

User 0f28873a29

15-05-2008 11:49:31

Hi:


Thanks for all.

User 0f28873a29

26-06-2008 18:33:54

I have these structures to insert in our database (and others). An I run an java script to find duplicates structures. In my program I compare these structures by the hash code and if they have the same hash code the I compare the tautomeric forms. But these structures have the same hash code but the program toll me that they aren't equals.





But, when i go to the mview program i saw that these structure are the same, the only different is the radical in N group.


How can I search do a comparison in these case.?





I use the the method of setRadicalMatching(SearchConstants.RADICAL_MATCHING_IGNORE) of the class MolSearch but the result show me that these molecules are different.

ChemAxon 42004978e8

27-06-2008 12:42:40

Hi Yasset,





The two molecules are not matching in case of perfect search (this what's executed in DB import) because the two nitrogens have different radicals in the two molecules. - as you mentioned


If you disable radical checking the nitrogens still remain different because in the second molecule the nitrogens have one more hydrogen. Hence the hydrogene numbers aren't the same and they can't match.


Thus in case of import the second structure will be different from the first and it's imported.

User 0f28873a29

27-06-2008 15:13:13

Thanks for all.

ChemAxon 42004978e8

01-07-2008 06:42:58

I close this topic.

User 0f28873a29

13-08-2008 16:19:03

Hi:


I have two molecules:


CCCN1C(=O)N\C(=C/c2c(CCC)[nH]n(-c3cccc(Cl)c3)c2=O)C1=O


Oc1n(nc(c1/C=C/1C(N(C(N1)=O)CCC)=O)CCC)c1cc(ccc1)Cl


When I compute the hash code with an script the result is this:


CCCN1C(=O)N\C(=C/c2c(CCC)[nH]n(-c3cccc(Cl)c3)c2=O)C1=O -2144019660


Oc1n(nc(c1/C=C/1C(N(C(N1)=O)CCC)=O)CCC)c1cc(ccc1)Cl -21434294682


This molecules are the same when I try to insert them, with the tautomer criteria in the GUI application, but they have diffrent hasch code. Is this possible?





Thanks for all.

ChemAxon 9c0afc9aaf

14-08-2008 06:05:20

Hi,





This is certainly possible, as the hash code does not deal with tautomerization - not even with standardization, these has to be performed externally.





If you need tautomers to be considered you should bring your structures to the generic tautomer form before creating the hash code.








Code:
 TautomerizationPlugin plugin = new TautomerizationPlugin();


            plugin.setTakeGenericTautomer(true);


                plugin.setMolecule(mol);


                plugin.run();


                genericTautomer = plugin.getStructure(0);


                standardize(genericTautomer); // standardize as usual - aromatize at minimum


                hash=hc.getHashCode(genericTautomer); //no FP for canonical tautomer hash code






To compare two such generic tautomer forms, you have to set a comparator for MolSearch. Please note that this is not public API so might be subject to change:





Code:
MolSearchOptions mso = ... ;


mso.addUserComparator(new chemaxon.sss.search.DataSgroupComparator(          chemaxon.calculations.Tautomerization.SGROUP_PROPERTY_NAME));






If these structures are stored in a database anyway probably it's much simpler to store them in a JChem Base table and use JChemSearch to find duplicates.





Best regards,





Szilard

ChemAxon a3d59b832c

14-08-2008 14:47:10

Hi Yasset,





This section of the Developers Guide describes the recommended guidelines to handle tautomers:





http://www.chemaxon.com/jchem/doc/guide/dbconcepts/index.html#tautomers





Let us know if you have questions.





Regards,


Szabolcs

User 0f28873a29

23-09-2008 19:12:45

Hi again:


Well, with your answer I resolve many of the problem of the comparison of two


strcutures. But we have some of new problems:


I have two structures with a hash (tautomeric hash propouse before this post)





[H]N1c2nc(SC([H])([H])c3c([H])c([H])c([H])c([H])c3[H])nn2C([H])(c2c([H])c([H])c([H])c([H])c2[H])C(C(=O)OC([H])([H])C([H])([H])[H])=C1C([H])([H])[H] -2147374255





[H]N1N2C(N=C1SC([H])([H])c1c([H])c([H])c([H])c([H])c1[H])=NC(=C(C(=O)OC([H])([H])C([H])([H])[H])C2([H])c1c([H])c([H])c([H])c([H])c1[H])C([H])([H])[H] -2147374255





When I try to insert this molecules with the jcman program (with option non-duplicates), the program say me that this structures are the same (tautomeric form).





But I wrote a program with this code to compare this structures:





molToCompare.dearomatize();


molToCompare.aromatize();


MolSearch stms = null;


stms = new MolSearch();


stms.setSearchType(SearchConstants.PERFECT);


stms.setQuery(actualMol);


stms.setTarget(molToCompare);


stms.setStereoSearch(false);


if(stms.isMatching()){


Systemout.out.println("They are the same");


}else{


stms = null;


stms = new MolSearch();


stms.setQuery(actualMol);


stms.setTarget(molToCompare);


stms.setSearchType(SearchConstants.PERFECT);


stms.setTautomerSearch(true);


stms.setStereoSearch(false);


if(stms.isMatching()){


Systemout.out.println("Theay are the same");


}else{


Systemout.out.println("They are different");


}


This program say me that molecules are different. What is wrong with my code?

ChemAxon a3d59b832c

24-09-2008 11:57:53

Without trying, it seems that you forgot to aromatize one of the structures.





More information: see http://www.chemaxon.com/jchem/doc/guide/search/index.html#searchmem


http://www.chemaxon.com/jchem/doc/user/query_standard.html





The easiest would be to replace MolSearch with StandardizedMolSearch.





Best regards,


Szabolcs.

ChemAxon 42004978e8

24-09-2008 13:35:58

Hello,





It seems, that there is a bug in tautomer generation/search for the two molecules.


We are investigating it.


Robert

User 0f28873a29

24-09-2008 14:10:42

Hi:


Thakns for your quick answers...





When I try to import the molecules with the GUI jcman They show that these molecules are the same (they are tautomers). Well my questions is : What is the code that jcman GUI program use to compare two molecules?





And second one I use the StandardizedMolSearch class and the script say me that this m,molecules are differents.





Thanks for all.





PD: Any idea to solve this problem?

ChemAxon a9ded07333

25-09-2008 12:06:17

Hi,





I'm investigating the difference between jcman and StandardizedMolSearch behaviour, and get back to you soon.





Best regards,


Tamás

ChemAxon a9ded07333

02-10-2008 11:46:09

Hi,





Database search (and import) uses generic tautomer generation for tautomer recognition while MolSearch and StandardizedMolSearch enumerates the possible tautomers of the molecule. We are continuously improving our tautomer plugin so there can be bugs in previous releases.





Since I could reproduce the bug only in our 5.0.* releases and not in the 5.1.* ones I guess you are using a 5.0.* version. I advise you to upgrade your program.





Please let me know if you are using a 5.1.* version and the bug still occurs.





Best wishes,





Tamás