How are fingerprints generated in JChemBase during import ?

User c226609425

27-05-2005 16:40:25

Hello,





Trainee in a bioinformatic laboratory, I currently create a base containing more than one million molecules.


This one has a response time ranging between 30 and 60 seconds. However, we can find on the Net a molecular base of 4000000 of compounds using JChemBase and having response times much faster (a few seconds, only). Personally, except the fact that this base of 4000000 million compounds is located on a biprocessor computer, I think that the difference in enormous response time comes owing to the fact that my pharmacophoric fingerprints are badly selected for my database of a million (choice by default) localised on a PC having a processor of 2,41 GHz and 1280 of RAM.


Could you tell me, if you know it, which would be the configuration most suited for such a quantity of recordings ?


Moreover, when we import molecules via JChemBase, how does this last one calculate exactly the pharmacophoric fingerprints, which descriptors does it use, etc...? Is there in JChemBase, several types of pharmacophoric fingerprints ? And if there are several, which is used by default ?





Thank you in advance.





PS : if documentation about the generation of the fingerprints is significant, let's hesitate to send it (Thanks).

ChemAxon 9c0afc9aaf

30-05-2005 10:28:17

Hi,





Please see some typical Substructure Search benchmark results here:





http://www.jchem.com/FAQ.html#benchmark3





As you see the search time greatly depends on the number of hits.





These benchmarks were measured using structure cache.


If you get worse response time, you probably do not use the cache, or you do not allow enough memory for the cache to accommodate the table.


(typical structures require around 100MB memory / 1 million compounds)





Could you tell me how do you run your searches (jcsearch command-line utility, JSP example, API) ?


I would also like to know what kind of searches are slow for you ?


(substructure, similarity, etc.)





For substructure searches JChem only uses Chemical Hashed Fingerprints.


The default parameters work well with drug-like structures.


Please read more on chemical fingerprints here:





http://www.jchem.com/doc/admin/index.html#create


http://www.jchem.com/doc/user/fingerprint.html








Pharmacophoric and other Molecular Descriptor fingerprints are only used for Molecular Descriptor Similarity searches, they are never cached, and usually the search speed does not depend much on the parameters.


Please visit these link for more information on Molecular Descriptors:





http://www.jchem.com/doc/user/GenerateMD.html


http://www.jchem.com/doc/user/ScreenMD.html





Best regards,





Szilard

User c226609425

30-05-2005 14:58:45

Hello,





First, thanks for your answer and informations about the fingerprints.





About the worse response time, you must be right.


Because, when I run a substructure search with the same code of the JSP example, an error message always appears and stipulates : "Table 'my_jchem_table' could not fit into structure cache."


(around 43 second for loading the 1 086 263 structures).





Consequently, i attempted to modify the JVM as specify in the FAQ.


But none of the two examples success, on my computer (Win32 XP).





Lastly, do you have any another idea to resolve this problem of memory.





Thank you in advance.





Biolbo

ChemAxon 9c0afc9aaf

30-05-2005 18:29:54

Hi,





If you are using Tomcat, please see the following page for instructions on how to set the heap size for Tomcat's JVM:





http://www.jchem.com/doc/admin/tomcat.html





You must restart Tomcat for the changes to take effect.





Let me know if it helps.





Best regards,





Szilard

User c226609425

31-05-2005 08:13:20

Hello,





Once more, you are right.


Indeed, I use one of the last versions of Tomcat, and the fact of modifying directly in the monitor Tomcat the memory size allocated for the JVM has significantly to increase the execution speed (0 to 2 seconds against 40s before ).





Thank you a lot.





Best regards.





Biolbo