Tautomers and Conformers of Vitamin C

User 25d107bd42

07-05-2008 12:49:12

Hi,


while evaluating the MarvinSketch tools, I made the tautomer calculation for Vitamin C = L-(+)-ascorbic acid, see screenshot518. The result was very unexpected. The tautomer 3) we are always formulating got a distribution of 0 %. I didn't browse the literature, but is tautomer 1) really the most important tautomer ? Vitamin C is always formulated as tautomer 3), because of the stabiltiy of the conjugated carbonyl-enol structure.





To find more information I used the tool conformers and calculated the Dreiding energies for these tautomers. Each calculation took more than one hour ! Please see the next three posts. The calculation for tautomer 4) stopped after 1.5 hours and there was no result.

User 25d107bd42

07-05-2008 12:57:11

Conformers calculation for Vitamin C tautomer 1)

User 25d107bd42

07-05-2008 13:00:23

Conformers calculation for Vitamin C tautomer 2)

User 25d107bd42

07-05-2008 13:03:21

Conformers calculation for Vitamin C tautomer 3)

User 25d107bd42

07-05-2008 13:15:29

Screenshot526 shows the summary of these calculations.





In the tautomer calculation tautomer 1) has the greatest distribution.


In the conformer calculation tautomer 3) has the best energy (-27.51 kcal/mol) for the best conformer. This looks better.





But the tautomer calculation refer to water solutions and the Dreiding energy corresponds to vakuum. Is that the reason for the difference ?





Regards, Hans-Ulrich

User 851ac690a0

07-05-2008 14:10:03

Hi,








The stability difference of the tautomers of Vitamin C (in solution) is greatly related with the intermolecular hydrogen bonds. See attached figure.





This hydrogen bond effect is not included in the distribution calculation, therefore, the calculated distribution is not in agreement with the generally accepted formulation of Vitamin C.





Jozsi

User 25d107bd42

07-05-2008 15:07:56

Hi Jozsi,


this is a good explanation. The water solution is important.





And it is interesting the Dreiding Energy calculation gives the result, one is expecting by discussion of substituent effects.





Regards, Hans-Ulrich





PS: Nice to see which version of Marvin you are using ;-)

ChemAxon 8b644e6bf4

13-05-2008 15:17:02

Dear Hans-Ulrich,
HUWagner wrote:



To find more information I used the tool conformers and calculated the Dreiding energies for these tautomers. Each calculation took more than one hour ! Please see the next three posts. The calculation for tautomer 4) stopped after 1.5 hours and there was no result.
These calculation times even for a full conformational analysis (for these structures) are unacceptable. I will check the underlying cause and post a comment about it here.





regards,


Gabor

User 25d107bd42

20-05-2008 06:39:23

Hi, there are still problems with the conformer calculations.


I used the following script to get the conformer calculation of tautomer 4.
Code:
Two posts below there is the optimized code.





Applying the script to tautomer 4 "VitaminC-T4.mrv" I got after 2 hours the failure output "Out of memory...". In other cases the script works fine and puts the result in the *.sdf file.


My machine is not very fast, but it has 1 GB memory.





How to get more JAVA-Memory ?





Regards, Hans-Ulrich

ChemAxon 8b644e6bf4

20-05-2008 16:03:19

Dear Hans-Ulrich,
HUWagner wrote:
My machine is not very fast, but it has 1 GB memory.


How to get more JAVA-Memory ?





You can use the -Xmx option to specify Java maximum heap size:





Code:
cxcalc -Xmx640M conformers ...








will let the JVM use at most 640 MBytes heap size.





regards,


Gabor

User 25d107bd42

21-05-2008 18:09:22

Dear Gabor,


thank for your help. Now it works. I optimized the script, see the new code.





Code:
#!/bin/bash


#


# Calculating conformers


#


echo "-------------------------------"


date


echo "Calculating conformers for $1"


echo "input  $1.mrv"


echo "output $1confs.sdf"


time cxcalc -Xmx640M conformers \


--optimization 3 --maxconformers 5000 --diversity  0.1 --timelimit 90000 \


--prehydrogenize true --hyperfine true  --saveconfdesc false \


$1.mrv >$1confs.sdf


echo


ls -l $1*


date


echo "-------------------------------"





The results will be shown in the next post.





Regards, Hans-Ulrich

User 25d107bd42

21-05-2008 18:23:51

Dear Gabor,





using this script I recalculated the 4 tautomers of Vitamin C listed above.


The results using the command line procedure are slightly different from the results using the Marvin-GUI, see the list eight posts above ("Summary of itamin C tautomers") . The numbers of conformers are different and especially the energies of the "last" (most instable) conformers. Why ?





Regards, Hans-Ulrich

ChemAxon 8b644e6bf4

23-05-2008 12:09:56

Dear Hans-Ulrich,





My first guess would be that the implementation of the "hyperfine" contains bugs (as you noted in an other topic: http://www.chemaxon.com/forum/viewpost15492.html#15492). Since we are currently rewriting major parts of the Clean3D and the Dreiding force field implementation I think it would not be worth tracking down the cause in the current (released) implementation. As soon as our test shows that the development version is free of apparent bugs i will look for the possibility to provide it to you before its release.





regards,


Gabor

User 25d107bd42

23-05-2008 13:13:49

Dear Gabor, OK. I would like to contribute by testing. Regards, Hans-Ulrich

User 25d107bd42

03-06-2008 18:56:36

Hi Gabor,


I tested the options 0=loose, 1=normal, 2=strict, 3=very strict using the script above and only changing the value after "optimization". The results are shown in the screenshot attached.





1) The calculations with the options "loose" and "normal" didn't find an end. I must kill them using CTRL-C after 2 - 6 hours.





2) The calculations with the options "strict" and "very strict" do not differ essentially and the calculation times are not very different.





3) Interestingly the numbers of conformers are a little bit higher for the option "strict". But the question is, how different are the conformers really ?





So it's now the question, why to use the options 0 and 1 ?


Does ist make sense to have these options at all ?





(BTW: I have a new machine, which is faster than the machine in the tests above, as you can see comparing the results in this post and in the posts above. Other consequences given by the hardware and software changes will be discussed in a new topic).





Regards, Hans-Ulrich

ChemAxon 8b644e6bf4

04-06-2008 19:59:47

Dear Hans-Ulrich,





Thanks for the detailed measurements. I still have not tested why it took so long exactly. About the optimization limit: this determines the magnitude of the remaining gradient where the geometry optimization process should stop. Setting this is a trade of time vs quality of the generated structure in most cases.





In the last few days i have looked into the hyperfine and it seems to be free of hard bugs. However setting up correct parameters is crucial: when the above mentioned optimization criteria is loose and the diversity limit is low then - after the MD runs - the optimizer might stop (because of the loose limit) before getting close enough (within the specified diversity limit) to the starting conformers representing true local energy minimum. This can result in "endless" number of generated conformers (which are around true local energy minimums).





After a few tests i would suggest that the diversity limit should be raised to at least 0.15 or even higher (0.2-0.4) and only the "very strict" optimization limit should be used. In the next release probably we will use the very strict limit only in hyperfine. (MD temperature probably also will be cooled and the minimal diversity limit will be elevated.)





So, concluding the above, only the "very strict" optimization criteria is reasonable with setting up little higher diversity limit.





If you have any further questions, do not hesitate to ask them. In the near future i will do a detailed test and post the results.





Regards,


Gabor

User 25d107bd42

13-06-2008 21:45:20

Hi Gabor, I have done further tests.


With some hints of our SysAdmins now I use the following optimized script : (BTW: UNIX/Linux is the optimal operating system :-)


Code:
#!/bin/bash


#


# script file conformers


#


# starting with f.e. ./conformers ethanol >>ethanol.txt


#


# *appends* the calculation parameters to ethanol.txt


# *appends* the calculation output to $1confs.sdf


#


echo "-------------------------------"


date


echo "Calculating conformers for $1"


echo "input  $1.mrv"


echo "output $1confs.sdf"


(time cxcalc -Xmx640M conformers \


--format sdf --maxconformers 5000 --diversity  0.1 --saveconfdesc false \


--hyperfine true --prehydrogenize true --timelimit 90000 --optimization 3 \


$1.mrv >>$1confs.sdf ;) 2>&1


date


ls -l $1*


echo "-------------------------------"





I have done 2 times the same calculation 3 times ( = 6 conformer calculations) :


1) Three times the same: conformation optimization = 1 on ethanol. See the results in the screenshot on the left side.


2) Three times the same: conformation optimization = 3 on ethanol. See the results in the screenshot on the right side.


(I hope the screenshots would be readable, but I didn't find the option to enlarge the font size).





Now I have problems.





a) The results in the three optimization runs should be the numerically the same ! The results for equal conformers (f.e no. 1, 4 and 7, the "best" transoid-conformer of ethanol) must be numerically equal. But there are differences in the fourth digit for opt = 1 and the sixth digit for opt = 3. Doing the same thing gives numerically different results ? I don't understand it !





b) To use more than two digits accuracy does not make any sense. So the options "hyperfine=false" and optimization limits lower than 3 are inacceptable.





To get poorer results in dependance of the calculation time is also inacceptable.





The options "optimization limit" and "hyperfine" could be omitted.





Regards, Hans-Ulrich.

User 25d107bd42

18-06-2008 06:37:47

Hi, one question more:





Does your JAVA code use float or double variables ?





Regards, Hans-Ulrich

ChemAxon 8b644e6bf4

19-06-2008 02:19:26

Dear Hans-Ulrich,
Quote:
a) The results in the three optimization runs should be the numerically the same !
I would like to argue with the above since using "hyperfine" all of the generated conformers are passed through an MD run and a subsequent MM optimization: this - in case of proper operation - definitely should result in slightly different conformers. (In this case numerically equal energies could be indicate a bug!) To be more specific: MD runs are initialized using random starting velocities, so ideally after the runs our conformers are in the neighborhood (and not at in) local minima. The subsequent MM optimization starts from these structures and optimize them until a limit in the gradient reached. So the optimized structures will be in the close vicinity of the closest local energy minimum.
Quote:
b) To use more than two digits accuracy does not make any sense. So the options "hyperfine=false" and optimization limits lower than 3 are inacceptable.
Indeed, the conformer selection dialog in the GUI uses a reduced precision already. However, there can be use cases demanding full precision of the calculated energy (consider using energy as a geometry-dependent hash), so it should be stored in the ENERGY property with full resolution. Hyperfine=false and possibly looser optimization criteria is reasonable for sampling conformational space of flexible molecules required.
Quote:
The options "optimization limit" and "hyperfine" could be omitted.


I think the hyperfine - havng correct default parameters - should not be excluded since it is the tool for eliminating false energy minimum situations. The optimization limit is also a useful parameter in certain situations. However, hyperfine should be used only with the strictest (3) optimization limit as i wrote earlier. (In 5.1 hyperfine automatically uses this limit.) (MD parameters - temperature and step count - will also be tuneable in the future. The third important parameter, the diversity limit having 0.1 default value also seems to be too sharp, this probably also will be loosen)
Quote:
Does your JAVA code use float or double variables ?
We use double variables through the Clean3D and related codes.





regards,


Gabor

User 25d107bd42

19-06-2008 08:37:48

Hi Gabor,


thank you for your detailed answer. Now I have to think over it.


And I am eagerly awaiting Marvin 5.1 ;-)


Regards, Hans-Ulrich

User 25d107bd42

30-06-2008 09:49:50

Hi Gabor,


I made some more test calculcations and each calculation gives different results, see the screenshot bf0031. The non-planar structure sholuld not be a minimum and it should not have a lower energy than the planar molecule (The planar molecule has always 246.74 kcal/mol). I think we are at the limit of the accuracy of the calculations.





As already mentioned above, the results are not reproducable, obviously coming from the random starting velocities.





Regards, Hans-Ulrich

ChemAxon 8b644e6bf4

30-06-2008 22:01:53

Dear Hans-Ulrich,





It seems that the force field does not recognize the cyclopropene "triangle", so the assigned equilibrium bond angles around the SP2 carbons produce a non-planar energy minimum. (Fixing this is already scheduled.) The fragment builder however generates the correct, planar structure, which represents a false energy minimum (according to the force field) which can not be broken by a single optimization. Hyperfine supposed to handles these situations (in this case, revealing a bug in the FF) by "disturbing" the structure with short MD runs as discussed earlier.





In the 5.0 version it seems that the MD itself using the "very strict" optimization limit has enough initial temperature/step count to reveal this false energy minimum for most of the times. When the diversity limit is increased enough to make the two conformers equivalent the faulty one having lower energy will be kept.





I suppose that the the different energies on the screenshot are resulted by a looser optimization limit; a quick test gave a bit more consistent results for me.





Regards,


Gabor