I really would like to see my libary as clustered by libmcs. The output is a little strange... I think I know how to parse it, but it's going to be a pain. Is there a better output format (like a a->b-c> tree format?) Also, I cluster what I believe to be 17k compounds, and only get about half of them back. Is it ignoring singletons?
sorry, but at present only one output format is available. We will consider your suggestion and we may decide to implement it in the next release of JChem.
Regarding structure count: unless you specify level of hierarchy after the -o flag, libmcs is supposed to output all levels of the hierarchy, and this contains the bottom level (leaves of the tree) where your input structures are located. So it should output at least 17K lines in your case, even if the tree has one level only.
Singletons are not ignored.
I suggest to try to experiment with smaller input files, e.g. a 100 structures, then it is easier to see what's going on. If you whish I will upload 100 structures from the NCI database so you and us can run the program for the same input to compare results.
Thanks, it appears that Jchem was choking on a sdf->smiles conversion, so it was only giving me some of the molecules. I've reprocessed the structure file (sdf) using openbabel and see if I can fix it. I can't tell exactly which molecule it is that is crashing, so I'll try it once more...
Art, it would help us to improve sdf->smiles conversion if we could get your SDfile. We will sign a Confidentiality Agreement if needed.
Should be a problem, I'll get a CDA from our lawyer.
I was wondering if you are willing to share your parsing script that you have mentioned here. I am also stuck in the same problem. Can you please help? Thanks on advance.