Use of Importer.setInput methods produce different results

User dbcf39f8c0

30-04-2008 00:55:37

When loading a file containing one SMILES per line into the fingerprint DB using Importer I get different strings in the 'cd_structure' field if I pass a FileInputStream instead of a File to Importer.setInput(). The difference in the cd_structure strings is that the FileInputStream (or any InputStream for that matter) version puts carriage returns at the end of each SMILES string - the File versions does not.





The same fingerprints are generated with either input type, I just want the correct values in the cd_structure field for comparisons elsewhere.





I am using JChem 5.0.3. Here is a code snippet that I use to do the work.





Code:
   Importer importer = new Importer();


    importer.setConnectionHandler(...);


    importer.setTableName("fptable");


    importer.setStoreDuplicates(false);


    importer.setInfoStream(System.err);





    importer.setInput(new File("xxx"));


or


    importer.setInput(new FileInputStream(new File("xxx")));





    runImporter(importer);






Can anyone help?





I would really not want to have to use the setInput(File) method. I have a list of SMILES in an InputStream from another source and I don't want to save them to a file.

ChemAxon 9c0afc9aaf

30-04-2008 09:11:56

Hi,








Don't worry, this is really a cosmetic issue, this does not influence either search operations nor structure display.





I do not know what are you using the cd_structure directly for, but it's mainly recommended for displaying the structures (e.g. no good for filtering duplicates)





We will investigate the difference and get back to this forum topic later.





Best regards,





Szilard

User dbcf39f8c0

01-05-2008 21:20:45

I use the SMILES string in the cd_structure field to map the fingerprint table's cd_id to ids I have elsewhere in a database. I have worked around this 'cosmetic' issue to do the job. It's just a bit of a nuisance that changing the way I load the SMILES into the fp table (from file to and input stream) means I have to change other post processing code.

ChemAxon 9c0afc9aaf

23-05-2008 11:47:11

Hi,





I have tested your example code with 5.0.3, there seem to be no difference between the File and InputStream mode.


The record in the cd_strucure field ends with a linefeed in both cases.


The simple input file is attached.





Is it possible that the records without linefeed were added with a previous version ?


I think only newer versions add the linefeed (which is theoretically more correct).





Sorry again for the inconvenience, we did not anticipate this change will cause problems.





Best regards,





Szilard