its always good to start with a small random benchmarkset and also provide here some insights about your used hardware, software and input molecules. If they are small molecules <2000 Da, your computer is too slow. If they are large proteins, even a fast computer may lag. So in your case you would do a run for each descriptor for lets say 50 compounds, then 100, then 500 and extrapolate the time you need. Its also good to set time limits in case the program is not able to calculate a fast solution. Its better to filter such molecules out for example by sorting your sets according to MW or complexity.
A small benchmark on 10,000 NCI compounds (2x your size) needed 14 minutes (instead of 2 months) on a 16 core machine (3.3 GHz) file output on a ramdisk with cxcalc (v6.0). All processes were run in parallel, but half of the time was actually spent on the tetrahedralstereoisomers calculation. Further breakdown revealed that the time needed was:
all other descriptors < majortautomer < stereoisomers < tetrahedralstereoisomers
Some cxcalc calculations maybe parallelized, some are not (such as tetrahedralstereoisomers). Under LINUX you can use make or GNU parallel or PPSS. Under WIndows nmake or the start command in a batch file:
start /B cxcalc doublebondstereoisomers -f sdf NCI-10000.smi > doublebondstereoisomers.txt
start /B cxcalc stereoisomers -f sdf NCI-10000.smi > stereoisomers.txt
start /B cxcalc tetrahedralstereoisomers -f sdf NCI-10000.smi > tetrahedralstereoisomers.txt
start /B cxcalc logd NCI-10000.smi > logd.txt
start /B cxcalc chargedistribution NCI-10000.smi > chargedistribution.txt
start /B cxcalc msacc NCI-10000.smi > msacc.txt
start /B cxcalc msdon NCI-10000.smi > msdon.txt
start /B cxcalc generictautomer -f sdf NCI-10000.smi > generictautomer.txt
start /B cxcalc majortautomer -f sdf NCI-10000.smi > majortautomer.txt
start /B cxcalc canonicalresonant -f sdf NCI-10000.smi > canonicalresonant.txt
start /B cxcalc enumerations -f sdf NCI-10000.smi > enumerations.txt
I added the NCI-10000.smi below. Again to further improve the speed one would also split the data sets for the stereosiomers into smaller chunks, that would trim the time below. For example chargedistribution needs only 15 seconds and logD only 20 seconds for 10,000 molecules here.The cxalc parallelism (if there is any) also breaks down on some molecules simply due to heap space or potentially I/O constraints or its actually not fully threaded.
Regarding some of your "descriptors" they are actually molecular result files (such as the stereoisomers) and not the stereoisomer count number. So here it may be recommended to make an estimation based on the formula 2^n (n=chiral centers + double bonds ). That is a rough estimate, but assume you have a sugar with 15 chiral centers you would create 32768 stereoisomer sugars. So use the computational cheap chiral center count for each of you molecules and then estimate the number of isomers.
So for example in case of Hexa-N-acetylchitohexaose (FUHDMRPNDKDRFE-LPUYKFNUSA-N) a topology analysis will tell you you have a chiral center count of 29 AND double bond count = 6 that means 2^(29+6) = 34,359,738,368 stereoisomers. Now the question comes, could you actually handle an estimated 350 TByte (based on a 10k mol file)? If not its easier to just count or even estimate the stereoisomers instead of generating 34 billions of them (as an example). If you use the inbuilt cxcalc functions make sure to override the default settings with the switch -m, --maxstereoisomers. Plus for such extremes even the generation is too slow, so just estimating the numbers would be fine.
With the small benchmark sets with increasing compound numbers you would not wait 2 months, but you would have a good estimate how long things will take or if its even feasible to go ahead.