name to structure API

User 8139ea8dbd

02-03-2009 20:26:35

Two questions:


1) Is it possible to provide a short sample code of doing name-to-structure conversion using jchem API?


2) Does the name to structure conversion support drug names (like what's shown in chemicalize.org)? Or what's done in chemicalize.org is a specially written routine, but the logic is not included in the name-to-structure API? E.g., can name-to-structure API translate "Imatinib" into a molecule object? If yes, can you outline the Java code for that?


Thanks.

ChemAxon e7b9408ca1

03-03-2009 14:47:03

Here is the code example requested. It does support drug and traditional names, using the same dictionary as http://chemicalize.org





Code:



    String name = "Imatinib";


    try {


        Molecule m = chemaxon.formats.MolImporter.importMol(name, "name");


        // Here m contains the molecule object, you can do anything with it, for instance:


        System.out.println(name + " -> " + m.toFormat("smiles"));


    }


    catch (MolFormatException e) {


        System.out.println("Name " + name + " could not be imported: " + e);


    }


User 8139ea8dbd

03-03-2009 18:35:01

Awesome. So I assume there is a name-to-smiles mapping table inside. Is there an API that one can access that mapping table, as well adding additional entries to that table, so that it can recognize more names?

ChemAxon e7b9408ca1

04-03-2009 09:54:51

You can access this mapping exactly as in the code above: you pass a name and it returns the structure. If you have a different need, could you describe it more precisely?





Regarding adding additional entries, it will be supported in the soon-to-be-released version 5.2 in the form of a custom dictionary.

User 8139ea8dbd

04-03-2009 18:32:10

We are interested in comparing multiple disctionaries, the one Jchem uses, the dictionary derived from WDI, drugbank, pubchem, etc.  If we have to guess what names are in the jchem dictionary, it makes it hard.





Regarding custom dictionaries, what do you plan to do if a name appears in multiple dictionaries. E.g., if you search Ibuprofen in PubChem, you can find several structures (varying in stereochemistry information), which one is considered as the match is tricky. Very likely the structure provided by jchem is different from the structure provided by another dictionary. Also there are so many drug synonyms, it's hard to see which alias is missing from jchem dictionary, if one has to guess one at a time.





I would think it will be convenient if the jchem dictionary is covered by the name-to-structure license, and becomes accessible via API.

ChemAxon e7b9408ca1

06-03-2009 14:08:44

What specific API are you thinking of? What I can imagine is either passing a name and getting all the synonyms, or passing a structure and getting all the names corresponding to that structure.














Besides that, the only thing I can imagine is providing the whole dictionary as a file. Is this what you are requesting?

User 8139ea8dbd

06-03-2009 19:52:49

> Besides that, the only thing I can imagine is providing the whole dictionary as a file. Is this what you are requesting?





Yes, that's what I am getting at. Ideally, users should be able to access a jchem dictionary file, then he can modify the dictionary (including append, remove, and replace an entry). One then use his own dictionary file as the default for name-to-structure.





This is because one can often get different structures for the same chemical name (mostly different stereochemistry structures), and a user may have his own preference which structure is the "correct" one name to structure should use. Also when we find new names from commerical databases, we cannot send that to ChemAxon to add to the jchem dictionary. We will need to be able to append them to jchem dictionary ourselves in every new release.

ChemAxon e7b9408ca1

20-03-2009 11:58:36

Quote:
> Besides that, the only thing I can imagine is providing the whole dictionary as a file. Is this what you are requesting?





Yes, that's what I am getting at. Ideally, users should be able to access a jchem dictionary file, then he can modify the dictionary (including append, remove, and replace an entry). One then use his own dictionary file as the default for name-to-structure.






You can make use of the custom dictionary to get these results. The final documentation for it will be available shortly, I append a temporary one at the bottom of this comment.





More precisely the custom dictionary allows you to append (add entries) and replace (since it has precedence over the core name-to-structure conversion). If removing is needed, we could implement a special feature to support that.








Quote:
This is because one can often get different structures for the same chemical name (mostly different stereochemistry structures), and a user may have his own preference which structure is the "correct" one name to structure should use. Also when we find new names from commerical databases, we cannot send that to ChemAxon to add to the jchem dictionary. We will need to be able to append them to jchem dictionary ourselves in every new release.






Using your custom dictionary will actually be better, since that will ensure that your dictionary will still be used after upgrading to a new release of Marvin/JChem, while still benefiting from the improved released dictionary. It would be very hard to achieve if you had edited the chemaxon dictionary itself.

















Here is the documentation for the custom dictionary:








It is possible to extend the name to structure conversion by putting


structures with their name in a custom dictionary file. Names present


in the custom dictionary will be converted to the corresponding


structure. The custom dictionary has precedence over the standard name


to structure conversion.




















The dictionary has to be located at:








C:\Documents and Settings\[USERNAME]\chemaxon\custom_names.smi (on Windows)





or





$HOME/.chemaxon/custom_names.smi (on Linux/Mac OS X)














FORMAT











For performance reasons, the dictionary has to be in smiles format. To


use a dictionary in another format, it can be converted to smiles using


molconvert or mview (Save As). In the same way, several dictionaries


should be merged into a single dictionary file in smiles format.











Smiles format is supported, by 2 different ways:











smiles and name fields, separated by TABS.





eg.





C\C=C\CCC(O)=O gamma-hexenoic acid











if there are named properties in the file, NAME field will be used.





eg.





#SMILES EXACT_MASS NAME





C\C=C\CCC(O)=O 114.1424 gamma-hexenoic acid

User 5143c8f14d

21-04-2009 04:52:57










dbonniot wrote:
Here is the code example requested. It does support drug and traditional names, using the same dictionary as http://chemicalize.org













Code:



    String name = "Imatinib";


    try {


        Molecule m = chemaxon.formats.MolImporter.importMol(name, "name");


        // Here m contains the molecule object, you can do anything with it, for instance:


        System.out.println(name + " -> " + m.toFormat("smiles"));


    }


    catch (MolFormatException e) {


        System.out.println("Name " + name + " could not be imported: " + e);


    }




I downloaded the latest version of Marvin today (marvinbeans-5_2_0-windows.exe) and cannot get the above code example to work. My compiler claims that the importMol method is not valid for the parameters (String, String). The API claims that it is a valid method. As far as I can tell all the other importMol methods are available. I am using the jar files contained in the lib directory for the above install.


Has this method been removed from the latest version? If so what is the equivalent method to use now?

ChemAxon e7b9408ca1

21-04-2009 10:44:15










Richard Koks wrote:

I downloaded the latest version of Marvin today (marvinbeans-5_2_0-windows.exe) and cannot get the above code example to work. My compiler claims that the importMol method is not valid for the parameters (String, String). The API claims that it is a valid method. As far as I can tell all the other importMol methods are available. I am using the jar files contained in the lib directory for the above install.


Has this method been removed from the latest version? If so what is the equivalent method to use now?



Hi Richard,


This method has not been removed, and we could not reproduce your problem. Could you paste the exact java file and compilation error you are seeing?


The method has been introduced in 5.1.0, so one possible explanation is that you have an older marvin in your classpath by accident.


As a workaround, you can try calling method importMol(String s, String opts, String enc) with an enc(oding) value of null.


Please let us know how we can further help.


Best regards,


Daniel

User 5143c8f14d

21-04-2009 21:13:09

Oops, it was an old version in my classpath! I'm working from a large codebase and missed the reference to an old jchem jar. The example works fine now.


Thank you for the help.

User 5adfeb8d26

09-06-2009 14:26:16

Hi,


I've added a custom_names.smi to the location specified above, but I'm getting a MolFormatException "XXX can't be recognized as a name" when I try to use the test name I used.


The smi file has one line in it :


C\C=C\CCC(O)=O    MY_DRUG


 


Any ideas ?


Cheers


 


Luke

ChemAxon e7b9408ca1

09-06-2009 20:20:52

Dear Luke,


There are two possible issues here. The first is that the current format for the custom dictionary is rather strict, there has to be a TAB character between the smiles and the name. Your post has spaces (though that might be lost when pasting in the forum). Just make sure your file has a TAB character. If not there must be an error message printed on the console, but you might not see that depending on how you start our software. If this format is inconvenient, we can consider allowing spaces in future versions.


The second issue is with capitalization. Currently we expect the dictionary contains a standardized version of names, which uses lower case characters, except when upper case is really meaningful, like vitamin C, TNT, ... Therefore, you should use my_drug in the dictionary. It will then recognize all forms (my_drug, MY_DRUG, My_drug, ...). I think this is impractical, so I already made sure that next version, 5.2.3, will allow any form in the dictionary, and convert it automatically.


Please let me know if this solves your problem, and if you have any further comments or issues.


Best regards,


Daniel Bonniot


 

User 5adfeb8d26

10-06-2009 08:20:01

Hi Daniel,


 


Yes, I changed to my_drug and it worked just fine.


 


Thanks


Luke