Ward NullPointerException

User f822a95708

01-09-2005 12:43:03

Hi all,





I'm testing the ward cluster procedures on our stock database.


I was able to create a RNN file for CF and BCUT descriptors but the kelley


statistics were not created. The starting database was over a 137,000


compounds. Could this be the cause of it?





I recreated the RNN files without kelley statistics to make sure the files


were OK but when I wanted to use the file for clustering it immediate gave


this errors:





D:\>call ward -C -c 20000 -i 17mg_burden_rnn.txt -Z


1>>02_ward_kelly_17mg_burden.log


Unknown error


java.lang.NullPointerException


at chemaxon.clustering.Ward.joinCluster(Ward.java:576)


at chemaxon.clustering.Ward.run(Ward.java:438)


at chemaxon.clustering.Ward.main(Ward.java:1035)


d:\>





When I replace the rnn file with a smaller file it does work. I will attach


one of the rnn files. Can you please check wether they are OK?





I tried windows 2000 and red-head unix





java version "1.5.0_04"


Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_04-b05)


Java HotSpot(TM) Client VM (build 1.5.0_04-b05, mixed mode, sharing)





jchem 3.0.9





Thanks,





Peter Maas

ChemAxon efa1591b5a

01-09-2005 14:38:35

Dear Peter,





we are sorry to say but this is clearly a bug. It's being traced back right now. We'll inform you about any further advance.





Regards,


Miklos

ChemAxon 25dcd765a3

04-09-2005 07:59:10

Hi Peter,





I have checked the 17mg_burden_rnn.txt file and the problem is that some calculated distances are negative. Could you send me the file with the descriptors as I am afraid that the problem is at the RNN list generation.





Thank you


Andras

User f822a95708

05-09-2005 08:18:50

Hi Andreas





I will attach the descriptor file. The exact same problems also was caused by the Chemical Fingerprint but these files are somewhat larger. If you like them as well please let me know.





Thanks,





Peter

ChemAxon 25dcd765a3

09-09-2005 13:25:47

Hi Peter,





Thank you for the descriptor file. I could reproduce the problem and fix one bug. But an other bug shown up. I try to fix that also soon.





Andras

User f822a95708

09-09-2005 13:56:02

Hi Andreas,





Thanks so far. I understand you will implement a fix in a new release. Is there a work around for the current? For instance, is it courced by some problematic structure, which I can through out of my set?





Peter

ChemAxon 25dcd765a3

10-09-2005 07:42:33

Hi Peter,





I'm afraid that somehow this is related to the number of the compounds (that is why it takes more time to fix :-( ), but surely not to one specific structure. Actually I don't see any workaround in the current release. I try to fix this asap and then we can let you download a fixed version before the new release if it is urgent.





All the best


Andras

User f822a95708

12-09-2005 08:18:04

Hi Andras,





Since we use clustering to reduce the numbers we (the users) will always keep pushing the limits I guess ;-). Anyway, can you give us the number or the estimated number of compounds which can be clustered at this moment? This would enable us reducing the initial set to a allowed number before clustering.





Thanks,





Peter

ChemAxon 25dcd765a3

12-09-2005 08:19:04

Hi Peter,


I have fixed the bugs. It was a problem about precision of the float type.


If you need the fixed version asap, please let me know.





Andras

User f822a95708

12-09-2005 08:31:04

Hi Andras,





If it's not to big of problem it would be great if you can send me the fix but it's not really needed.





Thanks,





Peter

User f822a95708

06-10-2005 11:38:56

Hi Andras,





I thought you might be interested in some feeback. I tried the revised version and it seems to work correctly. There were no negative values in the rnn file anymore. Also the kelley statistics seem to be OK accept for the last row. It comes back with :


Optimal number of clusters: -1





Not a real problems because you establish the optimal number from the index but I guess it's not how it's suppose to work.





Thanks,





Peter

ChemAxon 25dcd765a3

06-10-2005 16:16:00

Hi Peter,





Thank you for the feedback.


I'm glad that it works well for you also. And thank you for the report that there is some problem at the optimal cluster calculation.


I'll check what is wrong with it.





(You are right it is not that it suppose to work.)





Andras

ChemAxon 25dcd765a3

23-11-2005 18:32:45

Hi Peter,





I was trying to reproduce the bug about the optimal cluster calculation, but actually I cannot.


I have tried the following:


unzip 17mg_burden.zip


jchemsite/bin/ward -f 0 -m 4 -i 17mg_burden.cfp -o 17mg_burden.rnn


(this takes a long time)


jchemsite/bin/ward -C -c 20 -i 17mg_burden.rnn -K kelley.txt > log.txt


(this is approx. 30 min)





Could you write me how did you generate the bug.





Thank you


Andras

User f822a95708

24-11-2005 10:20:48

Hi Andreas,





That's quite some time ago. I really had to dig up some old files. First of all I tried a different file but that's not the issue here I think. Afterall, the first problem was also available for the chemical fingerprints or the bcut descriptors. The bcut file was smaller so eassier to sent to you.





I guess the difference was caused by the fact I did your two steps in one go like this:





ward -f 512 -i 17mg_cf512.cfp -0 17mg_cf512_rnn.txt -K 17mg_cf512_kelly.txt > logfile.log





I hope it helpes,





Peter

ChemAxon 25dcd765a3

01-12-2005 22:16:57

Hi Peter,


Thank you for your efforts.


I found the bug in the optimal cluster calculation and fixed it.





Andras