Get CAS numbers from structure

User 96e1912746

29-05-2012 21:16:43

Hi!


I have smiles string for a batch of molecules and I am trying to get the CAS number and IUPAC names for each of them using the smiles as the input. Chemaxon chemicalize does this job really well, however I have a huge file and doing it manually is not possible. Is their a command line tool that I can include in my script in order to get  the CAS numbers from the smiles string?


I looked at the Molconverter and Cxcalc but could not figure out how to get the CAS numbers. I was able to get the InChi string for my files using Molconverter and would like to do the same for CAS numbers.


 

ChemAxon d26931946c

30-05-2012 11:44:50

Hi,


at the moment we can't generate CAS numbers from structure. It is a planned future feature.


chemicalize.org uses a database to resolve some of them, but it's in experimental status. 


You may try to use http://cactus.nci.nih.gov/chemical/structure. You can generate CAS from InChi there.


Example:


http://cactus.nci.nih.gov/chemical/structure/InChI=1S/CH2O/c1-2/h1H2/cas


 


Regards, 


Peter

ChemAxon e7b9408ca1

28-05-2013 14:11:53

This is now supported since version 6.0 for batch and API usage, for instance:


molconvert name:cas# -s 'NCCCCCCCC(=O)O'                                                                                                                                  1 ↵
1002-57-9

User b1fdf71dd3

31-05-2013 12:54:02










dbonniot wrote:

This is now supported since version 6.0 for batch and API usage, for instance:


molconvert name:cas# -s 'NCCCCCCCC(=O)O'                                                                                                                                  1 ↵
1002-57-9


Hi could you give me more information on this?  I cannot seem to get it to work in InstantJChem 6.0.   I have a database of molecules in InstantJChem and I'm trying to use the "New Chemical Terms" function along with the molConvert("cas#") to return the CAS #.  Is there some other tool I should be using instead?  Thanks!

ChemAxon d26931946c

31-05-2013 13:26:19

Try molConvert("name:cas#")  instead of molConvert("cas#"). 


BRs,


Peter

User b1fdf71dd3

31-05-2013 13:57:14










gezapeti wrote:

Try molConvert("name:cas#")  instead of molConvert("cas#"). 


BRs,


Peter



OK, that worked - thanks!  Is it possible to know what "input" value is used to query CACTUS for CAS number retrieval when using InstantJChem in this way?  For my example there are a lot of errors and "not found" entries, but when I search manually using either the traditional name or the SMILES, CACTUS returns the correct value.

ChemAxon e7b9408ca1

31-05-2013 15:07:42

The smiles is used as the input. Can you share an example that you think should work and that does not?

User b1fdf71dd3

03-06-2013 10:52:47










dbonniot wrote:

The smiles is used as the input. Can you share an example that you think should work and that does not?



Sure - here's one example from my particular database:


The compound was imported to the Instant JChem database using the common name: "albendazole".  I also used the traditionalName() function to generate a new field, and this verified the name (it was the same).  When I ran the database using the molConvert("name:cas#") function, this compound gave an error.  If I manually browse to http://cactus.nci.nih.gov/chemical/structure and enter the name "albendazole", the server returns the value "54965-21-8", which is the correct CAS #.


I know the problem:  Instant JChem 6.0 generates the following SMILES for this compound: 


CCCSc1ccc2[nH]c(NC(=O)OC)nc2c1


If I ask CACTUS to generate SMILES from the name "albendazole", the output is:


C1=C(SCCC)C=CC2=C1[NH]C(=N2)NC(OC)=O


The first SMILES string is not recognized properly by CACTUS, but the second (obviously) is.  So, I guess the issue is the fact that SMILES nomenclature isn't completely unambiguous.  Perhaps one solution would be to allow the user to specify the input parameter for the CACTUS query?  It might be more useful in some cases to query by InChIKey or name (traditional or IUPAC)...

ChemAxon e7b9408ca1

03-06-2013 12:08:51

Thanks a lot for your research! Accordingly, version 6.1 will use the InChI Key to perform the search, which will solve such cases.