molconvert bug when reading from stdin?

User 7c177bab3b

28-01-2011 16:41:03

Hi


molconvert smiles:a -T '*' my.sdf


works, printing all fields to the smiles file with the header.


molconvert smiles:a -T '*' < my.sdf


just prints the smiles.

ChemAxon d76e6e95eb

01-02-2011 16:34:02

We would like to generally support exporting fields to SMILES, but the problem we need to solve is, that SDF allows different fields for each molecule, while our SMILES extension supports the same fields only for all molecules (like DB columns).The detection of all fields in the input stream would require reading the entire stream twice, that would be slow and sometimes even not possible.


What we can do, however, is that we detect the field names in the first record of the input stream and export only those fields of the molecules to the output SMILES.


We can add a new output file type (i.e. smiles with fields) and then all of our tools will support the export of data fields to SMILES.


Do you think, that this solution would work for you?

User 7c177bab3b

01-02-2011 17:18:06

I can see the issue with the lookahead so what you suggest would be okay.


I wonder if others would be interested in other smiles based output such as TDT? Perhaps there is a more up to date markup that would allow smiles and fields? The option to output the list of field names to a separate file could make subsequent parsing easier. You can't really get away from having to read the data twice to get from the SD to say a smiles csv.


However, the main point of the post was that there seems to be a difference in behaviour when specifying the filename on the command-line compared to taking input from a redirect or pipe.

ChemAxon 25dcd765a3

01-02-2011 20:35:46

Hi,


We have looked after the I/O reditrection problem and found that we do not support SDF field parsing from I/O redirected streams. We have found the code snipplet which states that it is not supported, but it is not mentioned in the documentation which needs correction ASAP, so thank you for warning us.


Regarding your other comments,  right now we do not support TDT format. Until now it is not requested by our users. On the other hand SMILES format allows to store additional data after the SMILES string itself (usually the name of the structure), so that is why we have selected this form. Using this form it is still a valid SMILES with extra information and any SMILES parser should able to handle the SMILES string, but it is not the case with the TDT format. You need to have a TDT importer which is quite rare as far as I know (TDT is not so widespread, but please correct me if I'm wrong).


However, the main point of the post was that
there seems to be a difference in behaviour when specifying the
filename on the command-line compared to taking input from a redirect
or pipe.


I totally agree and we put this info to the documentation.


I would like to mention that you were very clever to find out that molconvert works well with I/O redirection. I have checked our documentation and in most cases it states that "molconvert
[options]
outformat[:exportoptions]
[files...]" which means that it works on files. I have found one example in the molconvert help page "molconvert -h" where standard input is mentioned.