User 7a902f260f
31-10-2013 14:31:02
While trying to upgrade from 5.6 to 6.1, I am running into an issue with parsing the following SDF file:
[#6:14]-[#7]-[#6](=O)-c1ccc(F)cc1 |$S1;;;;;;;;;;$|
Mrv0541 10291313282D
11 11 0 0 0 0 999 V2000
vO4WfW1W70
QL6W122W60
ns6WY+3W60
Zm7Wtc0W80
Ip8WvV4W60
fK9WQS6W60
Vv7WZt7W60
+y5WDM7W60
dR5WhP5W60
tQ8W4q9W90
Xt3W9a-V60
10201
B0101
20301
20402
30504
30904
50604
60704
70804
70A01
80904
A 11
S1
M END
Calling MolImporter.importMol() is throwing an exception stating that it cannot parse it. Explicitly passing the "sdf" option will return the correct Molecule. Using that Molecule, invoking the method getName() will return the SDF header "[#6:14]-[#7]-[#6](=O)-c1ccc(F)cc1 |$S1;;;;;;;;;;$|". Is this the correct behavior?
Just wanted to verify as on JChem 5.6, importMol() was able to detect the format automatically (i.e. no need to pass in the "sdf" option). In addition, invoking getName() returned an empty string.
ChemAxon a202a732bf
05-11-2013 11:21:51
Dear Arthur,
I have checked your molecule: in 5.6 it was recognized as cxsmiles from the first line and was imported from the first line of your mol format file as cxsmiles.
In 6.1 indeed there is a bug in format recognition: the format of your example can not be detected. This has been fixed since than, the fix will be available in 6.2, in this version the molecule will also be imported from the first line, but the format is correctly recognized as cxsmarts. It is intentional: if first line is a correct chemical format, than format is determined from that.
If it is explicitly specified that the format is an mdl format type then the recognition system does not recognize it as cxsmarts and it can be imported as an mdl format. Let me point out though that in the example that you have provided one line is missing between second and the third line to have a correct format. In this case the first line of the mdl format file is used as the name of the molecule.
Hope I could help, best regards,
Zsuzsa