Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.

Support Ticket System is replacing forum

This forum was converted into a searchable archive. You cannot add posts here any more. For support please use our new Ticket System.

Create your first ticket
Missing molecule ID in jklustor
To watch this topic for replies  Register (enables digests) or give email address:
This topic is locked: you cannot edit posts or make replies.
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
Yingyao

Joined: 30 Sep 2004
Posts: 205

View user's profile

Back to top
Link to postPosted: Mon May 13, 2013 4:15 amPost subject: Missing molecule ID in jklustor Reply with quote

jklustor does not export the molecule ID in the input, this has not been a high priority item to fix for quite some time now.  There is some workarounds, such as https://www.chemaxon.com/forum/ftopic8475.html, however, if someone who prefers to use the command line jklustor, instead of API to carry out the task, it probably means the user prefers not to program.

The problem becomes more noticeable as I am trying to do "-c bm" using:
jklustor t.sdf -o wrmols:sdf:t_m.sdf -o wrclus:sdf:t_c.sdf -lfin -c bm
The compound ID in the input t.sdf file is lost in the output t_m.sdf file in this case, so even the workaround no longer works.  Now one has to do pair-wise exact structure matching in order to get meaningful results, that seems a lots of work to use jklustor.

If we could keep the either the input molecule ID, or simply the order number in input (ID1, ID2, et.), jklustor could have been much more user friendly for people who choose this command version over API.

 

Miklos
ChemAxon personnel
Joined: 16 Jan 2008
Posts: 447

View user's profile

Back to top
Link to postPosted: Mon May 13, 2013 10:34 amPost subject: Reply with quote

Thank you very much for your suggestions. These are very important features that we want to provide very soon.

We have been working on the improvement of id/structure and source handling in jklustor command line. The improvements are expected to be out in the 6.1 release. We are going to handle indentification of input structures and the original input SD file fields for the output.

 

Steven

Joined: 02 Nov 2001
Posts: 52

View user's profile

Back to top
Link to postPosted: Mon Oct 20, 2014 11:52 pmPost subject: Molecule ID Reply with quote

Hello.  Is it possible to recover the original compound ID from the input set.  The diverse compound selector is pretty much if one can't connect back to the original compound ID.  Thanks.

-&

Steven

Joined: 02 Nov 2001
Posts: 52

View user's profile

Back to top
Link to postPosted: Thu Oct 30, 2014 5:45 pmPost subject: jklustor Dropping All SD Properties and Molecule ID Reply with quote

I'm wondering if there has been any progress on fixing this.  It's really irritating that I have to merge back to the initial compound IDs after clustering.  It makes no sense to drop compound IDs and all SD properties post-clustering.  Please fix!

-&

Yingyao

Joined: 30 Sep 2004
Posts: 205

View user's profile

Back to top
Link to postPosted: Mon Nov 03, 2014 4:37 amPost subject: jcluster missing input ID Reply with quote

It has been long enough that I have already given up using jcluster for the past few years.  Since other users echo the original request, I will try to explain again why this is important to users who care to use jcluster.

Users' data file does not just contain one SMILES column, their data typically comprises many other columns: compound ID, assay activities, etc.  So if they run jcluster, the ultimate goal is to merge the clustering results into their original data sheet, so that they can carry out additional post-jcluster analyses.  If the output of jcluster does not contain the original compound ID (or ROW ID), users have to do an all-by-all full structure match to figure out the mapping. As stated in the first post, sometimes, the output of jcluster even does not allow a reliable full structure match.  Then the only alternative, as we do now, is to use API.

Steven

Joined: 02 Nov 2001
Posts: 52

View user's profile

Back to top
Link to postPosted: Thu Nov 06, 2014 7:50 pmPost subject: MMDS API Reply with quote

Since it appears that ChemAxon is not going to fix their command line utilities for clustering, I was wondering if someone could point me to the API for the MMDS algorithm (Maximum-Minimum Dissimilarity Selection)?  I can find all the old clustering algorithms in chemaxon.clustering.*, but I have no idea where the newer algorithms live.  Thanks.

-&

László
ChemAxon personnel
Joined: 14 Jan 2011
Posts: 78

View user's profile

Back to top
Link to postPosted: Thu Nov 13, 2014 4:46 pmPost subject: Reply with quote

Dear Steven,

Sorry for the very late answer.
Actually the MMDS algorithm is not part of the public API, but you can use the following code:

import chemaxon.clustering.calculations.SimpleDiverseSubsetSelection;
import chemaxon.clustering.calculations.impl.MMDS;
import chemaxon.formats.MolFormatException;
import chemaxon.formats.MolImporter;

try {
    SimpleDiverseSubsetSelection sel = new MMDS();
    sel.addMolecule( MolImporter.importMol( "C1CCCCC1" ) );
    sel.addMolecule( MolImporter.importMol( "CCCC1CCCCC1CCC" ) );
    sel.addMolecule( MolImporter.importMol( "C1CCC(CCCCCCC)CC1CCC" ) );
    sel.addMolecule( MolImporter.importMol( "N#CCCC" ) );
    sel.addMolecule( MolImporter.importMol( "N#CCCC(CCCCCCC)" ) );

    int[] ids = sel.getDiverseSubsetIndices( 2 );
    for( int i = 0; i < ids.length; i++ ) {
     	System.out.println(sel.get(ids[i]).toFormat("smiles") + "\t" + ids[i] );
    }

} catch ( MolFormatException e ) {
    System.err.println("Exception: " + e.getMessage());
}

Also there is a way to use from command line:

cat molecules.smiles | java -cp jchem.jar chemaxon.clustering.calculations.impl.MMDS 20

We apologize for not dealing the compound ID issue so far, but our priority list did not let us work on this. We will notify you, if the fix is released.

Best regards,
Laszlo 
Steven

Joined: 02 Nov 2001
Posts: 52

View user's profile

Back to top
Link to postPosted: Fri Nov 14, 2014 10:27 pmPost subject: re: Reply with quote

Thanks for posting the code.  It worked for me.  I can finally select compounds without loosing SD tags.  Huzzah!!

-&

This topic is locked: you cannot edit posts or make replies.
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You cannot download files in this forum