more than three levels on Libmcs? - ChemAxon Forum Archive

User 78821debe8

09-02-2005 13:07:15

Hi,

I finally fixed the smiles import issue by taking out the offending compounds... in the meantime, i've been playing around with libmcs (up to 3.0.8), and for some reason I can't get more than three levels even if I specify the maximum number of levels to be 10. I'm played around with the mcs values and similarities, but no luck. Any sugestions on how to relax the constraints so that I can build a deeper hierarchy?

ChemAxon efa1591b5a

09-02-2005 14:09:15

There are two independent sets of parameters that impose various termination conditions on clustering.

1. -c, -l, -e: the default values are 1, 6, 10, respectively (from version 3.0.8!) . So, it is parameter -e that has to be tweaked in your particular case. The smaller this value the higher the hierarchy tree grows. Thus, if you want more levels, just specify -e 1.

2. -n: if you specify this, the above three are ignored. The smaller its value, the higher the tree. I suppose -n 0.01 produces reasonable results.

Hope this helps.

Regards,

Miklos

User 78821debe8

09-02-2005 21:40:43

Thanks, that worked fine... (-e 1) still working on the scripts to analyze the results :)

Art

ChemAxon efa1591b5a

10-02-2005 09:45:13

Cheers.

If you have some spare minutes please tell us the output format that would suit your needs the best. I remember you mentioned a tree-like representation previously.

Regards,

Miklos

User 78821debe8

11-02-2005 03:28:43

Well,

I'm trying to get the output in a more useful way (that is not with the -w, since I have to run the clustering everytime to use the viewer, and I can't scroll sideways on the viewer... I realize it's alpha :)

It would be useful to see a text only version of the output that is human readable. BTW, there should be a way to correlate the unique IDs to the input file, is there? I'd like to find out where specific structures fall on the map, but I the smiles string that comes out from the output file is different than the input (probably a language difference between different programs). My original intent was to correlate molecule ids by matching the smiles, but I haven't been able to do that since they are somehow different (haven't spent much time on that yet)

Anyways, I'm writing a script that will take the output from the -o option and generate a dot language file (a -> b) for every relationship. My intention is to then use graphviz to visualize the hierarchy. Unfortunately, although I would like to show the structure in the map, I don't believe graphviz supports this. I have a few ideas of what to do with it, but I'm still looking at options.

I will probably release the script (not much to it yet) once it's ready.

Art

ChemAxon efa1591b5a

14-02-2005 16:22:11

Hi Art,

o.k. now I do better understand your problem.

A few thoughts:

- SMILES in the output are unique SMILES, so if your input smiles were not unique, there is an apparent difference between the input and the output strings.

- unique id-s: I suppose here you are referring to a MOL/SD file aren't you? o.k. so I will introduce a new command line flag/api method that allows to specify the name of the ID field. If such field name is specified, output will contain the corresponding ID values instead of the unique SMILES. Is that o.k.?

- libmcs internal unique id-s: these are smiles input structure numbers (row count in smiles file, starting from 0, sorry, I know it is mad..., numbering will start from 1 in future releases) on the bottom level of the hierarchy. On higher levels of the hierarchy these are cluster id's, though in the case of singletons this id is the same as the smiles id of the corresponding structure.

- I understand that graphviz is popular in the scientific community, so an optional DOT language output is a reasonable enhancement of the libmcs program. I'll consider this direction of development. Thank you for drawing my attention to this tool.

Regards,

Miklos