Structure clustering in command line on Linux

User 8c68bb23cf

11-03-2009 00:33:24

Hi,





we need to perform clustering of a large set of compounds (~200k). Since we have a linux version of ChemAxon installed, we'd like to split the task into a number of SDF-formatted subsets and run Library MCS in parallel on several machines. But since these are rack-mounted cluster nodes, having a requirement for a locally started GUI application would be fairly inconvenient.





Hence, the question: is it possible to run compound clustering purely via command line? Is there a built-in function or would it require writing Java code using the plugins?





Thanks in advance








Sasha

ChemAxon efa1591b5a

11-03-2009 08:18:28

Hi Sasha,





there is a batch version of LibraryMCS that runs without a GUI. Run libmcs -h to get a list of available options.





I'm sure you are aware of getting different results for the split subsets than for the full set...





Regards,





Miklos

User 8c68bb23cf

11-03-2009 16:16:40

Thank you Miklos,





I'll play around with that option and see what I get. And yes, I'm aware of the tradeoffs while clustering a number of subsets. I'm doing it mostly for performance reasons, but will probably experiment with sets of different size to get an idea of the maximum acceptable subset.





Thanks again





Sasha

User 8c68bb23cf

11-03-2009 21:06:38

Hi Miklos,





I tried the libmcs command, but it's not really a true command line version. It simply starts the GUI version of LibraryMCS. When invoked remotely (via ssh) on a compute node of a cluster, it complains about X11 not being set and headless environment (which is ok for a GUI application, but not ok for something that is presumably written for command line use).





With that, I guess, there is no command line version.





A side note. At startup, LibraryMCS has a habit of ALWAYS loading a sample sdf set. While it's fine for illustration purposes for first time users, it becomes rather annoying after a while. Perhaps you guys could add an option somewhere to disable this action.














Also, when I run clustering locally and save the graph, it can't later be imported (even into the same instance of the LibraryMCS!). I simply get blank workspace of the application after I use the "Open graph" option. No error message is generated either. Is there anything else that needs to go along with the graph to make it work? I use RHEL 5.0 on a 64-bit Intel system (it shouldn't really be relevant for the Java code, so it's more for the sake of completeness).








Correction: when I start libmcs from the command line, I get a Java exception. Here's the stack trace:





java.io.EOFException


        at java.io.DataInputStream.readInt(DataInputStream.java:358)


        at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2720)


        at java.io.ObjectInputStream.readInt(ObjectInputStream.java:930)


        at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:888)


        at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)


        at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)


        at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)


        at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)


        at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)


        at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)


        at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)


        at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)


        at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)


        at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)


        at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)


        at chemaxon.clustering.MGraph.loadGraph(MGraph.java:1905)


        at chemaxon.clustering.JKlustorImport.loadGraph(JKlustorImport.java:854)


        at chemaxon.clustering.JKlustorImport.importStructures(JKlustorImport.java:144)


        at chemaxon.clustering.gui.JKlustor.importStructures(JKlustor.java:98)


        at chemaxon.clustering.gui.action.ImportGraphAction.actionPerformed(ImportGraphAction.java:45)


        at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1849)


        at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2169)


        at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:420)


        at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:258)


        at javax.swing.AbstractButton.doClick(AbstractButton.java:302)


        at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1051)


        at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1092)


        at java.awt.AWTEventMulticaster.mouseReleased(AWTEventMulticaster.java:231)


        at java.awt.Component.processMouseEvent(Component.java:5517)


        at javax.swing.JComponent.processMouseEvent(JComponent.java:3135)


        at java.awt.Component.processEvent(Component.java:5282)


        at java.awt.Container.processEvent(Container.java:1966)


        at java.awt.Component.dispatchEventImpl(Component.java:3984)


        at java.awt.Container.dispatchEventImpl(Container.java:2024)


        at java.awt.Component.dispatchEvent(Component.java:3819)


        at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4212)


        at java.awt.LightweightDispatcher.processMouseEvent(Container.java:3892)


        at java.awt.LightweightDispatcher.dispatchEvent(Container.java:3822)


        at java.awt.Container.dispatchEventImpl(Container.java:2010)


        at java.awt.Window.dispatchEventImpl(Window.java:1791)


        at java.awt.Component.dispatchEvent(Component.java:3819)


        at java.awt.EventQueue.dispatchEvent(EventQueue.java:463)


        at java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThread.java:242)


        at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:163)


        at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:157)


        at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:149)


        at java.awt.EventDispatchThread.run(EventDispatchThread.java:110)











I'm attaching the zipped sdf input file. Clustering was done with default parameters, just as a quick and dirty test..

















Cheers





Sasha

ChemAxon efa1591b5a

11-03-2009 22:08:17

Hi Sasha,





just a quick comment now and more tomorrow: you need to provide parameters in the command line in order to stop the GUI from launching.





There's no explicit option flag to switch GUI off. This sounds a bit silly, i know: the command without any parameter launches the GUI, otherwise it does not. it's not very nice but it works like that (now...).





Miklos

User 8c68bb23cf

11-03-2009 22:22:14

Well, your initial reply was "libmcs -h", and I really have no idea what options I need to provide to get a help message instead of the GUI. I didn't see anything in the online manuals either.





So I guess, until I get those parameters from you (or someone else at ChemAxon), I'm pretty much stuck..





Sasha

ChemAxon efa1591b5a

17-03-2009 16:05:17

Hi Sasha,





you're not forgotten, I'm just superbusy.... I'll get back to this forum asap. thanks for your patience.














Miklos

User 8c68bb23cf

17-03-2009 16:16:55

No worries, Miklos.





At this point, I'm running libmcs locally on cluster nodes and leave them logged in with the GUI app running. It just would be nice to do the whole thing remotely via command line in the future..





So, whenever you get a chance





Sasha

ChemAxon efa1591b5a

18-03-2009 10:36:38

Hi Sasha,





apologies for not being able to respond in time.





The libmcs command is actually a shell script, which either launches the GUI or the batch program. If libmcs is called without any command line parameter, then the GUI version starts. If, however, any parameter is passed, then the batch program runs.





So, if you run





Code:
libmcs -h






then you should see a brief help about available options. It looks like this:





Code:



Library MCS - Maximum Common Substructure Clustering 0.7, (C) 2006-2008 ChemAxon Ltd.


Clusters input structure with respect to shared common substructures.





Usage: Library MCS [input file] [options]





Options:


  -h, --help                   this help message


  -v, --verbose                progres monitoring and other messages


  -e, --exact                  exact MCS recognition


  -f, --fast                   fast, yet fairly accurate MCS recognition


  -t, --turbo                  fastest and less reliable MCS recognition


  -n, --minMCS           integer value specifying the MCS size


                               where clustering terminates


  -m, --match (a|b|c|r) (+|-)  turns matching contraints on (+), off (-)


                               for atom types (a), bond types (b),


                               formak charges (c) and rings (r)


  -o, --output       SDF output file, terminal if -o omitted


  -o, --output CSV   CSV output file


  -r, --report                 generate report (cluster statistics)








If you don't get this for libmcs -h then we should investigate what went wrong with your JChem installation.





For your original needs the command line is simple:





Code:



libmcs inputfile.sdf -o outfile.sdf








Does this work?





Regards


Miklos