New Academic user - Can I import my CDX files ?

User 717bd7b1a6

07-04-2011 18:41:05

Hi I'm a brand new academic user to Instant JChem. 


 


It seems like it's potentially very powerful, but it's a tad overwhelming with all it's options and data-base related features (though I suppose these are necessary they're a bit daunting to the beginning organic chemist !)


 


I have a very simple request that may have an obvious answer, but i've not found it in the help files or the forums yet.  I, and many other grad students keep a CDX file for structures we synthesize, along with word file / spectra / characterization data for these compounds for our theses.


Can InstantJChem help here to make this data more accesable  and searchable ?  In particular, is there a way to import the vast quantity of individual CDX files I have into Instant JChem so I can search the structures ?  I've gathered there's a commandline tool called molconvert that might glom together all my CDX files into one SDF, which I could then import.  What i'm really looking to do is tag eag entry with the file name or better yet, the file path from whence it came, so I can use Jchem to do a "lookup" into my hundreds of molecule files to find the one i'm looking for, and know where in my file directory to find the spectra, etc.  Is this at all possible, or is it a pie in the sky dream ?


 


Also.. on a side note.. and maby i'm missing the obvious here, but is there a "getting started guide" or tutorial that shows you how to use Instant JChem to do "cool/useful stuff" to get someone like myself interested in using it more and learning more ?  (this is more helpful to convince colleagues to try it rather than myself, but i'd also like to be motivated to learn more!).

User 717bd7b1a6

07-04-2011 19:53:33

I've discovered that if I glom together all my CDX files with


./molconvert sdf /Users/aless/Desktop/AllCDX/*.cdx -o /Users/aless/Desktop/AllCDX/Super.sdf


, where the directory AllCDX contains a copy of each CDX file (used Spotlight, or could use command line tools to generate this folder easily), that I get an SDF file with all my CDX structures in it !  Yay !


 


Now I have all my structures in there, but the file names.. how do I get those to appear in the database ?  I figured out that a little script-hack can take advantage of the fact that the converter stores the file names in the first space-delinated section of each structure name.


Therefore, I could use a new calculated field with the following script to extract the file name:


def S= mol.toString()
def a=S.split(" ")
a[0]


 


(I also figured out that you have to set the "test value" to something like "SMILES" or "MRV" so that the script will validate, else it complains about null object... this should probably be there by default to reduce confusion, as it's just a test case)


 


This still seems like a dirty hack.. is there a better way to do what I did ?


 


Also, I noticed in the process that the molconvert program does not like CDXs with too many structures in it... is there a way to get it to convert those  ?  I got the following error :


java.lang.NullPointerException
    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGraphic(Unknown Source)
    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessPage(Unknown Source)
    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessCDX(Unknown Source)
    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.readMol(Unknown Source)
    at chemaxon.marvin.io.formats.cdx.CDXImport.readMol(Unknown Source)
    at chemaxon.marvin.io.formats.cdx.CDXRecordReader.nextRecord(Unknown Source)
    at chemaxon.marvin.io.MRecordImporter.startReadingNext(Unknown Source)
    at chemaxon.marvin.io.MRecordImporter.readRecord(Unknown Source)
    at chemaxon.marvin.io.MRecordImporter.readDoc0(Unknown Source)
    at chemaxon.marvin.io.MRecordImporter.readDoc(Unknown Source)
    at chemaxon.formats.MolImporter.readDoc(Unknown Source)
    at chemaxon.formats.MolConverter.readDoc(Unknown Source)
    at chemaxon.formats.MolConverter.convert0(Unknown Source)
    at chemaxon.formats.MolConverter.convert(Unknown Source)
    at chemaxon.formats.MolConverter.main(Unknown Source)

ChemAxon fa971619eb

08-04-2011 08:50:04

Hi,


lots of questions there. I try to asnwer.


1. introductory tutorials


Try these:


Feature animations: http://www.chemaxon.com/products/instant-jchem/instant-jchem-animations/ />User guide: http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/ijcTOC.html />Especially: http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/quick_start.html


2. importing multiple chemdraw files


Right now probably your best best is to use MolImporter as you describe. But a better approach will be to write a script to handle this. It can trawl a directory tree and convert and import each file. We will be providing an example script for doing this with IJC 5.5, which should be released in next few weeks.


3. linking to data files.


A URL field (http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/editing_database/fields_url.html) would potentailly allow you to link to data files with spectra etc., assuming you had some consistent naming conventions. The file name of the CDX file could be be imported along with the structure as part of the import process described in #2.


4. extracting the name.


The approach you describe may look a bit ugly, but its a good example of 'ugly, but useful'.
This might be worth a try:


mol.native.name

I'm not 100% sure it will work, and I don't have converted CDX files to test.the result might be the same, but it will be less ugly and slightly faster.


Actually, this code would be slightly safer:


mol?.native?.name

(the question marks make it safe if mol or mol.native are null).


5. Errors with CDX files with too many structures


Not sure about this. Are you able to provide an example file? (send via email if data should be kept private).


Tim


 

User 717bd7b1a6

08-04-2011 17:11:23

Sure.. here's an example CDX file that fails.


 


I've neutered it to protect confidential information, i checked that it still produced the error, however.


My guess is that molconvert is choking on the chemdraw graphics.  I included a screenshot of what the file looks like in case you can't see it w/ chemdraw.


 


Also, many chemdraw files contain multiple molecules inside each file.. I don't suppose there is a way to split each one into a unique SDF entry ?


--Adam


The following is the terminal output I get when executing molconvert (see first line for execution params) on this file.


 


aless$ ./molconvert -vv sdf /Users/aless/Desktop/AllCDX/AllCDX_Local/TestCase.cdx -o moo.sdf

Reading file 1: /Users/aless/Desktop/AllCDX/AllCDX_Local/TestCase.cdx (cdx format) ... 

/Users/aless/Desktop/AllCDX/AllCDX_Local/TestCase.cdx: error

java.lang.NullPointerException

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGraphic(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGroup(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGroup(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGroup(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessPage(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessCDX(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.readMol(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXImport.readMol(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXRecordReader.nextRecord(Unknown Source)

    at chemaxon.marvin.io.MRecordImporter.startReadingNext(Unknown Source)

    at chemaxon.marvin.io.MRecordImporter.readRecord(Unknown Source)

    at chemaxon.marvin.io.MRecordImporter.readDoc0(Unknown Source)

    at chemaxon.marvin.io.MRecordImporter.readDoc(Unknown Source)

    at chemaxon.formats.MolImporter.readDoc(Unknown Source)

    at chemaxon.formats.MolConverter.readDoc(Unknown Source)

    at chemaxon.formats.MolConverter.convert0(Unknown Source)

    at chemaxon.formats.MolConverter.convert(Unknown Source)

    at chemaxon.formats.MolConverter.main(Unknown Source)

java.lang.NullPointerException

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGraphic(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGroup(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGroup(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessGroup(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessPage(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.preprocessCDX(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXObjectReader.readMol(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXImport.readMol(Unknown Source)

    at chemaxon.marvin.io.formats.cdx.CDXRecordReader.nextRecord(Unknown Source)

    at chemaxon.marvin.io.MRecordImporter.startReadingNext(Unknown Source)

    at chemaxon.marvin.io.MRecordImporter.readRecord(Unknown Source)

    at chemaxon.marvin.io.MRecordImporter.readDoc0(Unknown Source)

    at chemaxon.marvin.io.MRecordImporter.readDoc(Unknown Source)

    at chemaxon.formats.MolImporter.readDoc(Unknown Source)

    at chemaxon.formats.MolConverter.readDoc(Unknown Source)

    at chemaxon.formats.MolConverter.convert0(Unknown Source)

    at chemaxon.formats.MolConverter.convert(Unknown Source)

    at chemaxon.formats.MolConverter.main(Unknown Source)

ChemAxon b124dd5f17

11-04-2011 07:55:24

HI,


I am Alex, responsble for the academic package. You mention confidential information, can I ask if this useage if licensed for commercial works? Our academic package is not for use in commercial works so if this commercially sensitive information you should buy a commercial license.


Cheers/Alex

User 717bd7b1a6

11-04-2011 14:58:51

Hi Alex,


 


The information is definately not commercial "Confidential".  It was only confidential in the sense that I didn't feel comfortable posting certain academic research work on the web (not yet published, hopefully will be soon !).


 


I appreciate the difference between commercial / academic work, and I particularly appreciate that ChemAxon lets grad students like myself learn with your nice tools when we can't quite afford to buy them.  I think it's a good strategy to get us to use them in industry when we transition there.


 


--Adam


 














 


alexa wrote:



HI,


I am Alex, responsble for the academic package. You mention confidential information, can I ask if this useage if licensed for commercial works? Our academic package is not for use in commercial works so if this commercially sensitive information you should buy a commercial license.


Cheers/Alex


ChemAxon d26931946c

11-04-2011 16:10:51

Hi Adam,


 


The bug in the import of CDX files containing complex graphical structures is fixed already, it will be available in Marvin 5.5.


Unfortunately there is no way to create a multi-molecule sdf from a cdx file using molconvert, only from the API. I attach a small sample code which does what I think you wanted to do. Also, I store the filename of the original CDX as a property.


 


Best regards,


Peter

ChemAxon fa971619eb

13-04-2011 11:53:32

That sort of behaviour of splitting out the individual molecules and loading them individually could easily be incorporated into the sort of loader script that I mentioned earlier in this topic. There is of course a question of how you would want the data from each individual CDX file to be handled.


We will provide an example script soon, you can see whether this approach could be useful.


 


Tim

ChemAxon fa971619eb

13-04-2011 14:20:02

He is an example script.


Its pretty basic, but may be a useful starting point. It finds cdx files in a directory and all its sub-directories, extracts the individual molecules from each file and loads them into the IJC database.


To run it:


1. create a data tree with a structure entity as its root entity.
2. add a text field name FileName
3. right click on the data tree and choose 'Run script...'
4. paste the contents of the script into the script editor
5. edit the variables towards the top of the script that defines where the strucutres are found etc.
6. execute.


 


As I said, its pretty basic, but could be extended into something more useful.


It will not handle your cdx files that cause errors (until IJC 5.5 is out).


The scripting support in IJC 5.4.1 is pretty basic. It is being improved in the 5.5 version.


Tim