User 8c68bb23cf
11-03-2009 00:33:24
Hi,
we need to perform clustering of a large set of compounds (~200k). Since we have a linux version of ChemAxon installed, we'd like to split the task into a number of SDF-formatted subsets and run Library MCS in parallel on several machines. But since these are rack-mounted cluster nodes, having a requirement for a locally started GUI application would be fairly inconvenient.
Hence, the question: is it possible to run compound clustering purely via command line? Is there a built-in function or would it require writing Java code using the plugins?
Thanks in advance
Sasha
ChemAxon efa1591b5a
11-03-2009 08:18:28
Hi Sasha,
there is a batch version of LibraryMCS that runs without a GUI. Run libmcs -h to get a list of available options.
I'm sure you are aware of getting different results for the split subsets than for the full set...
Regards,
Miklos
User 8c68bb23cf
11-03-2009 16:16:40
Thank you Miklos,
I'll play around with that option and see what I get. And yes, I'm aware of the tradeoffs while clustering a number of subsets. I'm doing it mostly for performance reasons, but will probably experiment with sets of different size to get an idea of the maximum acceptable subset.
Thanks again
Sasha
User 8c68bb23cf
11-03-2009 21:06:38
Hi Miklos,
I tried the libmcs command, but it's not really a true command line version. It simply starts the GUI version of LibraryMCS. When invoked remotely (via ssh) on a compute node of a cluster, it complains about X11 not being set and headless environment (which is ok for a GUI application, but not ok for something that is presumably written for command line use).
With that, I guess, there is no command line version.
A side note. At startup, LibraryMCS has a habit of ALWAYS loading a sample sdf set. While it's fine for illustration purposes for first time users, it becomes rather annoying after a while. Perhaps you guys could add an option somewhere to disable this action.
Also, when I run clustering locally and save the graph, it can't later be imported (even into the same instance of the LibraryMCS!). I simply get blank workspace of the application after I use the "Open graph" option. No error message is generated either. Is there anything else that needs to go along with the graph to make it work? I use RHEL 5.0 on a 64-bit Intel system (it shouldn't really be relevant for the Java code, so it's more for the sake of completeness).
Correction: when I start libmcs from the command line, I get a Java exception. Here's the stack trace:
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:358)
at java.io.ObjectInputStream$BlockDataInputStream.readInt(ObjectInputStream.java:2720)
at java.io.ObjectInputStream.readInt(ObjectInputStream.java:930)
at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:888)
at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)
at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)
at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)
at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)
at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)
at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)
at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)
at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)
at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)
at chemaxon.clustering.MBaseNode.loadNodeChildren(MBaseNode.java:921)
at chemaxon.clustering.MBaseNode.loadNode(MBaseNode.java:937)
at chemaxon.clustering.MGraph.loadGraph(MGraph.java:1905)
at chemaxon.clustering.JKlustorImport.loadGraph(JKlustorImport.java:854)
at chemaxon.clustering.JKlustorImport.importStructures(JKlustorImport.java:144)
at chemaxon.clustering.gui.JKlustor.importStructures(JKlustor.java:98)
at chemaxon.clustering.gui.action.ImportGraphAction.actionPerformed(ImportGraphAction.java:45)
at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:1849)
at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2169)
at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:420)
at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:258)
at javax.swing.AbstractButton.doClick(AbstractButton.java:302)
at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:1051)
at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:1092)
at java.awt.AWTEventMulticaster.mouseReleased(AWTEventMulticaster.java:231)
at java.awt.Component.processMouseEvent(Component.java:5517)
at javax.swing.JComponent.processMouseEvent(JComponent.java:3135)
at java.awt.Component.processEvent(Component.java:5282)
at java.awt.Container.processEvent(Container.java:1966)
at java.awt.Component.dispatchEventImpl(Component.java:3984)
at java.awt.Container.dispatchEventImpl(Container.java:2024)
at java.awt.Component.dispatchEvent(Component.java:3819)
at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4212)
at java.awt.LightweightDispatcher.processMouseEvent(Container.java:3892)
at java.awt.LightweightDispatcher.dispatchEvent(Container.java:3822)
at java.awt.Container.dispatchEventImpl(Container.java:2010)
at java.awt.Window.dispatchEventImpl(Window.java:1791)
at java.awt.Component.dispatchEvent(Component.java:3819)
at java.awt.EventQueue.dispatchEvent(EventQueue.java:463)
at java.awt.EventDispatchThread.pumpOneEventForHierarchy(EventDispatchThread.java:242)
at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:163)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:157)
at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:149)
at java.awt.EventDispatchThread.run(EventDispatchThread.java:110)
I'm attaching the zipped sdf input file. Clustering was done with default parameters, just as a quick and dirty test..
Cheers
Sasha
ChemAxon efa1591b5a
11-03-2009 22:08:17
Hi Sasha,
just a quick comment now and more tomorrow: you need to provide parameters in the command line in order to stop the GUI from launching.
There's no explicit option flag to switch GUI off. This sounds a bit silly, i know: the command without any parameter launches the GUI, otherwise it does not. it's not very nice but it works like that (now...).
Miklos
User 8c68bb23cf
11-03-2009 22:22:14
Well, your initial reply was "libmcs -h", and I really have no idea what options I need to provide to get a help message instead of the GUI. I didn't see anything in the online manuals either.
So I guess, until I get those parameters from you (or someone else at ChemAxon), I'm pretty much stuck..
Sasha
ChemAxon efa1591b5a
17-03-2009 16:05:17
Hi Sasha,
you're not forgotten, I'm just superbusy.... I'll get back to this forum asap. thanks for your patience.
Miklos
User 8c68bb23cf
17-03-2009 16:16:55
No worries, Miklos.
At this point, I'm running libmcs locally on cluster nodes and leave them logged in with the GUI app running. It just would be nice to do the whole thing remotely via command line in the future..
So, whenever you get a chance
Sasha
ChemAxon efa1591b5a
18-03-2009 10:36:38
Hi Sasha,
apologies for not being able to respond in time.
The libmcs command is actually a shell script, which either launches the GUI or the batch program. If libmcs is called without any command line parameter, then the GUI version starts. If, however, any parameter is passed, then the batch program runs.
So, if you run
then you should see a brief help about available options. It looks like this:
Code: |
Library MCS - Maximum Common Substructure Clustering 0.7, (C) 2006-2008 ChemAxon Ltd.
Clusters input structure with respect to shared common substructures.
Usage: Library MCS [input file] [options]
Options:
-h, --help this help message
-v, --verbose progres monitoring and other messages
-e, --exact exact MCS recognition
-f, --fast fast, yet fairly accurate MCS recognition
-t, --turbo fastest and less reliable MCS recognition
-n, --minMCS integer value specifying the MCS size
where clustering terminates
-m, --match (a|b|c|r) (+|-) turns matching contraints on (+), off (-)
for atom types (a), bond types (b),
formak charges (c) and rings (r)
-o, --output SDF output file, terminal if -o omitted
-o, --output CSV CSV output file
-r, --report generate report (cluster statistics)
|
If you don't get this for libmcs -h then we should investigate what went wrong with your JChem installation.
For your original needs the command line is simple:
Code: |
libmcs inputfile.sdf -o outfile.sdf
|
Does this work?
Regards
Miklos