some questions about the parameter -g(--generate-id)

User 7da159d46e

04-08-2008 09:00:43

Hi,

I'm trying to cluster a set of molecules based on chemical fingerprints, but now I'm puzzled about the parameter -g(--generate-id). At first, I think the parameter -g was used to generate id for each compound and did not influence the result of cluster. But occasionally, I found that the fact is not like this. For example: I got a chemical fingerprints file using generatemd with parameter -g; and then I run ward two times with or without parameter -g separately;as a result, I got two totally different result of ward.

I don't know if I have done something wrong.

This is urgent to me and I will appreciate your help.

Best regards,

Yolanda

ChemAxon efa1591b5a

04-08-2008 11:31:39

Hi Yolanda,

the id (per each individual compound) is required to identify compounds: the output of ward refers to the compounds clustered by their id.

If you original input file contains id-s, you don't need to generate them. If, however, your compounds have no intrinsic id, then you need to generate them. This can be one at various stages:

- either in the descriptor generation phase, when you are running generatemd, then the id is saved in the output file and can be used by the clustering program (in which case no further id should be added by ward, that corrupts data)

- or if id generation is omitted during the descriptor (fingerprint) generation, then the id-s can be added during the clustering procedure, by ward.

So, do not specify the -g flag for both generatemd and ward, just either of them (perhaps just for ward is the best practice).

Does this help?

Regards,

Miklos

User 7da159d46e

04-08-2008 13:33:51

Dear mvargyas,

Got it!

Thank you very much! I really appreciate for your soon answers!

Best regards,

Yolanda