be aware that there are newer releases of ChEMBL (ftp://ftp.ebi.ac.uk/pub/databases/chembl/ChEMBLdb/releases/).
The CHEBI DB has 1.2 million compounds, 3000 of them are larger than 2000 Dalton MW. SO unless you are a protein, peptide researcher I would programmatically excluded them. The 3000 compounds are attached as extremely-large-compounds.cxsmarts file as ZIP below. Try those 3000 and they will take almost 90% or 18 hours of your computational process to tautomerize.
With the new 6.0 version your tautomerization speed is dependent on:
1) Processor speed (best >3 GHZ) latest 2013 chip technology
2) Processor or CPU count (best 16- 40 CPUs) or at least 24-32 threads.
3) DISK speed (Best via RAMDISK or SSD RAID array)
4) Memory (unless you use the compiled WIN.EXE) in order to avoid costly JAVA garbage collection
assign 40 to 80 GByte (not MByte) or more heap space.
One Million structures from the CHEMBL16 took around 40 minutes on my system (all <2000Da).
To put that into perspective the first 500,000 compounds took around 7 minutes!
That's 1200 compounds per second for the first 500k, throughput drops of course with larger MW.
(Dual CPU Xeon E5-2687W with 196 Gbyte RAM, 40 Gbyte assigned as RAMDISK drive, 40G JAVA heap size).
I also added a PDF that explains how to sort according to size or use other tools as filter to exclude those large compounds.