Smiles file with attached data (or other text formate) ?

User 2ca05a7f4e

27-01-2013 09:41:56

Hi - I am still pretty new to this, so I might have missed something regarding Jchem4Excel / Knime usage for file conversion (so, I might be overlapping with some ofther forum sections?):


I can do most of my conversions with Jchem for Excel to convert between, say,  .sdf files and .csv (.txt) files to have compounds with data attached.


The problem here is, that large files make Excel crash sooner or later, say, several thousands of compounds to even hundreds of thousands (I don't have any database handling system, unfortunately). The nice thing about excel is, that I feel it is simpler and faster to move/rename/delete/add/sort columns of data, especially if it sort of a database replacement.


Thus I am (attempting) using Knime in conjuction with this task. The problem here though is, I can only read .sdf (or .mrv) files, but can not read or write a .csv or .txt file to get in structure (as smiles) *and* data.


Any suggestions would be greatly appreciated.


Here an example of a text file as I am thinking of (the spaces here represent tab, or comma delimited, although comma delimited usually leads to problems if one has a chemical name included)


SMILES      property1    property 2    ......


CCCCCC     123             456                ....


CCCCCN      124            789               .....


 


PS: if it is of importance, my system is


Win7 64bit, 6GB ram, i7, ATI 58xx, Office 2007 32bit, latest Jchem4Excel. Knime 2.7.1 with ChemAxon / Infocom Marvin nodes, as well as a trial version of the Infocom Jchem Nodes V2.6.3

ChemAxon 25dcd765a3

28-01-2013 08:50:57

If you change the data separator to tab character the importer will automatically import the data as well. You should also specify the input format.


Here is a simple command line example that works:


molconvert sdf 'test.txt{smiles}'

See also: http://www.chemaxon.com/marvin/help/formats/smiles-doc.html#smiles_with_info

User 2ca05a7f4e

28-01-2013 09:17:55

Thanks. That gives me certainty that there is a possibility. Again, since I am new to CA software and doing this from home - which package do I need to do this? I don't see where I have command line possibilities available (only have Jchem for excel installed atm) since I have been (trying to) working mostly with Knime.

ChemAxon 25dcd765a3

28-01-2013 09:37:38

For the command line tools you need the 


 


Marvin for end users and Java developers


but all the things I wrote should be possible from JChemForExcel and and Knime also


You should define that your file is smiles format (or let the file end as ".smi") and use tab characters instead of spaces to separate fields in the smiles file.

User 2ca05a7f4e

28-01-2013 10:05:14

Yes, indeed it's possible with Jchem4Excel, but only for a limited number of molecules before excel crashes (memory, I assume).


And Knime I haven't figured out yet otherwise I wouldn't have asked :-D


I got some stuff to go after for now, thanks.