simple question

User e9249ba1fe

21-10-2007 06:37:43

this is a bit foolish but cant help





1. i have academic license for jchem and generatemd


2. i have a windows xp desktop pc


3. i have a file conitaing 50,000~ molecules represented as smiles


4. i wish to compute as many possible descriptors as i can for qspr/qsar





please help


thanks a lot

User 677b9c22ff

23-10-2007 08:14:08

Hi,


You know how to program in JAVA and to use Eclipse?


Then you can use the book Molecular Descriptors from Todeschini and


programm all the descriptors with the JCHEM API.





If you don't know JAVA its going to be harder. You can


in principle use cxcalc or generateMD to generate all the descriptors using


a XML command sheet. Or you can use Instant-JChem to


generate most of the descriptors, but due to current


restrictions Instant-JChem can only have 256 or 512 columns


so you can not include all the possible descriptors.


Furthermore it is not possible to apply a general XML sheet


(write once-use often) to generate most of the descriptors.





Now you also have to be clear, what kind of descriptors you


want to use 0D,1D,2D,3D,4D molecular descriptors? All of them? (See Engel\Gasteiger Cheminformatics)





* 0D - bond counts, mol weight, atom counts


* 1D - fragment counts, H-Bond acc/don, Crippen, PSA, SMARTS


* 2D - topological descriptors (Balaban, Randic, Wiener, BCUT, kappa, chi)


* 3D - geometrical descriptors (3D WHIM, 3D autocorrelation, 3D-Morse) + surface properties + COMFA


* 4D - 3D coordinates + conformations (JCHEM conformer, CORINA, gold set, Crystaleye)





The good thing about the JCHEM API is, that in principle you can implement most of the stuff very easily. Those


functions are attached at the bottom. The 1D fragment counts can be implemented using a SMARTS matcher function.





Among those fingerprints are the PubChem Fingerprints or the public


OpenBabel SMARTS implementation. You can also use MCS maximum common substructures (LIBMCS) to create such


patterns only for your dataset or any other dataset (like PubChem).





You can easily calculate 2000 descriptors with different


software applications, see moleculardescriptors.eu


For a small test set of 150 molecules you can use VCCLAB from Igor Tetko for testing the effectiveness of some of


the descriptors (you want to implement with the JCHEM API).


Or you can use JOELIB or better the CDK Descriptor Calculator GUI from Rajarshi Guha.





Beware! Most of the descriptors you can calculate


will have no impact. You need to use feature selection to find useful descriptors for regression or classification.


It is also helpful to prevent overfitting by dividing your dataset into a 70% development and 30% test set


and have a independent external validation set at hand.


You can additionally use v-fold cross-validation or bootstrapping for your development test set.


All those methods are known since the 70s of the last century.


Do not use the R^2=0.999999999 linear fit scam.


Use prediction errors or R^2, Q^2 for independent datasets or other measurements (do not fool yourself).





For the classification or regression statistics it absolutely


does not matter which method you use. The best case is to test all methods or build ensemble methods or group contribution methods which may include:





Generalized Linear Models (GLM)


General Discriminant Analysis


Binary logit (logistic) regression


Binary probit regression


Nonlinear models


Multivariate adaptive regression splines (MARS)


Tree models


Standard Classification Trees (CART)


Standard General Chi-square Automatic Interaction Detector (CHAID)


Exhaustive CHAID


Boosting classification trees


Neural Networks


Multilayer Perceptron


neural network (MLP)


Radial Basis Function neural network (RBF)


Machine Learning


Support Vector Machines (SVM)


Naive Bayes classifier


k-Nearest Neighbors (KNN)





You can implement such methods with MEV, Statistica Dataminer, Yale or WEKA.





Tobias





JCHEM descriptors supported in the API:


Code:



   <Descriptor Name="ChemicalFingerprint"/>


   <Descriptor Name="PharmacophoreFingerprint"/>


   <Descriptor Name="BCUT"/>


   <Descriptor Name="HDon"/>


   <Descriptor Name="HAcc"/>


   <Descriptor Name="Heavy"/>


   <Descriptor Name="LogD"/>


   <Descriptor Name="LogP"/>


   <Descriptor Name="Mass"/>


   <Descriptor Name="TPSA"/>


   <Plugin ID="majorMs" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin" JAR="MajorMicrospeciesPlugin.jar"/>


   <Plugin ID="msCount" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin" JAR="MajorMicrospeciesPlugin.jar">


   <Plugin ID="ms" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin" JAR="MajorMicrospeciesPlugin.jar">


   <Plugin ID="msDistr" Class="chemaxon.marvin.calculations.MajorMicrospeciesPlugin" JAR="MajorMicrospeciesPlugin.jar">


   <Plugin ID="tautomer" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar"/>


   <Plugin ID="canonicalTautomer" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="tautomers" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="tautomerCount" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="dominantTautomer" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="dominantTautomers" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="dominantTautomerCount" Class="chemaxon.marvin.calculations.TautomerizationPlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="resonant" Class="chemaxon.marvin.calculations.ResonancePlugin" JAR="MultiformPlugin.jar"/>


   <Plugin ID="canonicalResonant" Class="chemaxon.marvin.calculations.ResonancePlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="resonants" Class="chemaxon.marvin.calculations.ResonancePlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="resonantCount" Class="chemaxon.marvin.calculations.ResonancePlugin" JAR="MultiformPlugin.jar">


   <Plugin ID="charge" Class="chemaxon.marvin.calculations.ChargePlugin" JAR="ChargePlugin.jar"/>


   <Plugin ID="ionCharge" Class="chemaxon.marvin.calculations.IonChargePlugin" JAR="IonChargePlugin.jar"/>


   <Plugin ID="sigmaOrbitalElectronegativity" Class="chemaxon.marvin.calculations.OrbitalElectronegativityPlugin" JAR="OrbitalElectronegativityPlugin.jar">


   <Plugin ID="sOEN" Class="chemaxon.marvin.calculations.OrbitalElectronegativityPlugin" JAR="OrbitalElectronegativityPlugin.jar">


   <Plugin ID="piOrbitalElectronegativity" Class="chemaxon.marvin.calculations.OrbitalElectronegativityPlugin" JAR="OrbitalElectronegativityPlugin.jar">


   <Plugin ID="pOEN" Class="chemaxon.marvin.calculations.OrbitalElectronegativityPlugin" JAR="OrbitalElectronegativityPlugin.jar">


   <Plugin ID="polarizability" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="pol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="atomPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="molPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="avgPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="averagePol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="axxPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="ayyPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="azzPol" Class="chemaxon.marvin.calculations.PolarizabilityPlugin" JAR="PolarizabilityPlugin.jar">


   <Plugin ID="pKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">


   <Plugin ID="acidicpKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">


   <Plugin ID="apKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">


   <Plugin ID="basicpKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">


   <Plugin ID="bpKa" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">


   <Plugin ID="acidicpKaLargeModel" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">


   <Plugin ID="basicpKaLargeModel" Class="chemaxon.marvin.calculations.pKaPlugin" JAR="pKaPlugin.jar">


   <Plugin ID="logD" Class="chemaxon.marvin.calculations.logDPlugin" JAR="logDPlugin.jar"/>


   <Plugin ID="logP" Class="chemaxon.marvin.calculations.logPPlugin" JAR="logPPlugin.jar">


   <Plugin ID="logPi" Class="chemaxon.marvin.calculations.logPPlugin" JAR="logPPlugin.jar">


   <Plugin ID="orderE" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">


   <Plugin ID="orderNu" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">


   <Plugin ID="energyE" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">


   <Plugin ID="energyNu" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">


   <Plugin ID="piEnergy" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">


   <Plugin ID="piChargeDensity" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">


   <Plugin ID="totalChargeDensity" Class="chemaxon.marvin.calculations.HuckelAnalysisPlugin" JAR="HuckelAnalysisPlugin.jar">


   <Plugin ID="PSA" Class="chemaxon.marvin.calculations.TPSAPlugin" JAR="TPSAPlugin.jar"/>


   <Plugin ID="vanDerWaalsSurfaceArea" Class="chemaxon.marvin.calculations.MSAPlugin" JAR="MSAPlugin.jar">


   <Plugin ID="solventAccessibleSurfaceArea" Class="chemaxon.marvin.calculations.MSAPlugin" JAR="MSAPlugin.jar">


   <Plugin ID="pI" Class="chemaxon.marvin.calculations.IsoelectricPointPlugin" JAR="IsoelectricPointPlugin.jar"/>


   <Plugin ID="elemanal" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar"/>


   <Plugin ID="mass" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">


   <Plugin ID="exactMass" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">


   <Plugin ID="atomCount" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">


   <Plugin ID="formula" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">


   <Plugin ID="isotopeFormula" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">


   <Plugin ID="dotDisconnectedFormula" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">


   <Plugin ID="composition" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">


   <Plugin ID="isotopeComposition" Class="chemaxon.marvin.calculations.ElementalAnalyserPlugin" JAR="ElementalAnalyserPlugin.jar">


   <Plugin ID="topanal" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar"/>


   <Plugin ID="aliphaticAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="aromaticAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="bondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="aliphaticBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="aromaticBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="rotatableBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="ringCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="aliphaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="aromaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="heteroRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="heteroaromaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="carboRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="carboaromaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="ringAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="ringBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="chainAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="chainBondCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="smallestRingSize" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="largestRingSize" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="fusedRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="fusedAliphaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="fusedAromaticRingCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="asymmetricAtomCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="chiralCenterCount" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="aromaticAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="aliphaticAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="chainAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="ringAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="asymmetricAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="chiralCenter" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="cyclomaticNumber" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="plattIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="randicIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="balabanIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="distanceDegree" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="eccentricity" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="hararyIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="hyperWienerIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="szegedIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="wienerIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="wienerPolarity" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="stericEffectIndex" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="smallestAtomRingSize" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="largestAtomRingSize" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="shortestPath" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="connected" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="connectedGraph" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="bondType" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="chainBond" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="ringBond" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="rotatableBond" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="ringCountOfAtom" Class="chemaxon.marvin.calculations.TopologyAnalyserPlugin" JAR="TopologyAnalyserPlugin.jar">


   <Plugin ID="HBDA" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar"/>


   <Plugin ID="acc" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">


   <Plugin ID="don" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">


   <Plugin ID="accSiteCount" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">


   <Plugin ID="donSiteCount" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">


   <Plugin ID="acceptorCount" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">


   <Plugin ID="donorCount" Class="chemaxon.marvin.calculations.HBDAPlugin" JAR="HBDAPlugin.jar">


   <Plugin ID="refrac" Class="chemaxon.marvin.calculations.RefractivityPlugin" JAR="RefractivityPlugin.jar"/>


   <Plugin ID="refractivity" Class="chemaxon.marvin.calculations.RefractivityPlugin" JAR="RefractivityPlugin.jar"/>


   <Plugin ID="refraci" Class="chemaxon.marvin.calculations.RefractivityPlugin" JAR="RefractivityPlugin.jar">


   <Plugin ID="refractivityIncrements" Class="chemaxon.marvin.calculations.RefractivityPlugin" JAR="RefractivityPlugin.jar">


   <Plugin ID="conformer" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar"/>


   <Plugin ID="conformers" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar">


   <Plugin ID="conformerCount" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar">


   <Plugin ID="leconformer" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar">


   <Plugin ID="hasValidConformer" Class="chemaxon.marvin.calculations.ConformerPlugin" JAR="ConformerPlugin.jar">


   <Plugin ID="stereoisomer" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="stereoisomers" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="stereoisomerCount" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="doubleBondStereoisomer" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="doubleBondStereoisomers" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="doubleBondStereoisomerCount" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="tetrahedralStereoisomer" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="tetrahedralStereoisomers" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="tetrahedralStereoisomerCount" Class="chemaxon.marvin.calculations.StereoisomerPlugin" JAR="StereoisomerPlugin.jar">


   <Plugin ID="dreidingEnergy" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">


   <Plugin ID="distance" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">


   <Plugin ID="angle" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">


   <Plugin ID="dihedral" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">


   <Plugin ID="stericHindrance" Class="chemaxon.marvin.calculations.GeometryPlugin" JAR="GeometryPlugin.jar">


   <Plugin ID="name" Class="chemaxon.marvin.calculations.IUPACNamingPlugin" JAR="IUPACNamingPlugin.jar"/>


   <Plugin ID="traditionalName" Class="chemaxon.marvin.calculations.IUPACNamingPlugin" JAR="IUPACNamingPlugin.jar">











Fragment counts using OpenBabel counts and the


JCHEM SMARTS matching function:





Code:



#              SMARTS Patterns for Functional Group Classification


#


#              written by Christian Laggner


#              Copyright 2005 Inte:Ligand Software-Entwicklungs und Consulting GmbH


#


#              Released under the Lesser General Public License (LGPL license)


#              see http://www.gnu.org/copyleft/lesser.html


#              Modified from Version 221105


#              Project homepage: http://sourceforge.net/projects/openbabel





Primary_carbon: [CX4H3][#6]


Secondary_carbon: [CX4H2]([#6])[#6]


Tertiary_carbon: [CX4H1]([#6])([#6])[#6]


Quaternary_carbon: [CX4]([#6])([#6])([#6])[#6]


Alkene: [CX3;$([H2]),$([H1][#6]),$(C([#6])[#6])]=[CX3;$([H2]),$([H1][#6]),$(C([#6])[#6])]


Alkyne: [CX2]#[CX2]


Allene: [CX3]=[CX2]=[CX3]


Alkylchloride: [ClX1][CX4]


Alkylfluoride: [FX1][CX4]


Alkylbromide: [BrX1][CX4]


Alkyliodide: [IX1][CX4]


Alcohol: [OX2H][CX4;!$(C([OX2H])[O,S,#7,#15])]


Primary_alcohol: [OX2H][CX4H2;!$(C([OX2H])[O,S,#7,#15])]


Secondary_alcohol: [OX2H][CX4H;!$(C([OX2H])[O,S,#7,#15])]


Tertiary_alcohol: [OX2H][CX4D4;!$(C([OX2H])[O,S,#7,#15])]


Dialkylether: [OX2]([CX4;!$(C([OX2])[O,S,#7,#15,F,Cl,Br,I])])[CX4;!$(C([OX2])[O,S,#7,#15])]


Dialkylthioether: [SX2]([CX4;!$(C([OX2])[O,S,#7,#15,F,Cl,Br,I])])[CX4;!$(C([OX2])[O,S,#7,#15])]


Alkylarylether: [OX2](c)[CX4;!$(C([OX2])[O,S,#7,#15,F,Cl,Br,I])]


Diarylether: [c][OX2][c]


Alkylarylthioether: [SX2](c)[CX4;!$(C([OX2])[O,S,#7,#15,F,Cl,Br,I])]


Diarylthioether: [c][SX2][c]


Oxonium: [O+;!$([O]~[!#6]);!$([S]*~[#7,#8,#15,#16])]


Amine: [NX3+0,NX4+;!$([N]~[!#6]);!$([N]*~[#7,#8,#15,#16])]


Primary_aliph_amine: [NX3H2+0,NX4H3+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]


Secondary_aliph_amine: [NX3H1+0,NX4H2+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]


Tertiary_aliph_amine: [NX3H0+0,NX4H1+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]


Quaternary_aliph_ammonium: [NX4H0+;!$([N][!C]);!$([N]*~[#7,#8,#15,#16])]


Primary_arom_amine: [NX3H2+0,NX4H3+]c


Secondary_arom_amine: [NX3H1+0,NX4H2+;!$([N][!c]);!$([N]*~[#7,#8,#15,#16])]


Tertiary_arom_amine: [NX3H0+0,NX4H1+;!$([N][!c]);!$([N]*~[#7,#8,#15,#16])]


Quaternary_arom_ammonium: [NX4H0+;!$([N][!c]);!$([N]*~[#7,#8,#15,#16])]


Secondary_mixed_amine: [NX3H1+0,NX4H2+;$([N]([c])[C]);!$([N]*~[#7,#8,#15,#16])]


Tertiary_mixed_amine: [NX3H0+0,NX4H1+;$([N]([c])([C])[#6]);!$([N]*~[#7,#8,#15,#16])]


Quaternary_mixed_ammonium: [NX4H0+;$([N]([c])([C])[#6][#6]);!$([N]*~[#7,#8,#15,#16])]


Ammonium: [N+;!$([N]~[!#6]);!$(N=*);!$([N]*~[#7,#8,#15,#16])]


Alkylthiol: [SX2H][CX4;!$(C([SX2H])~[O,S,#7,#15])]


Dialkylthioether: [SX2]([CX4;!$(C([SX2])[O,S,#7,#15,F,Cl,Br,I])])[CX4;!$(C([SX2])[O,S,#7,#15])]


Alkylarylthioether: [SX2](c)[CX4;!$(C([SX2])[O,S,#7,#15])]


Disulfide: [SX2D2][SX2D2]


1,2-Aminoalcohol: [OX2H][CX4;!$(C([OX2H])[O,S,#7,#15,F,Cl,Br,I])][CX4;!$(C([N])[O,S,#7,#15])][NX3;!$(NC=[O,S,N])]


1,2-Diol: [OX2H][CX4;!$(C([OX2H])[O,S,#7,#15])][CX4;!$(C([OX2H])[O,S,#7,#15])][OX2H]


1,1-Diol: [OX2H][CX4;!$(C([OX2H])([OX2H])[O,S,#7,#15])][OX2H]


Hydroperoxide: [OX2H][OX2]


Peroxo: [OX2D2][OX2D2]


Organolithium_compounds: [LiX1][#6,#14]


Organomagnesium_compounds: [MgX2][#6,#14]


Organometallic_compounds: [!#1;!#5;!#6;!#7;!#8;!#9;!#14;!#15;!#16;!#17;!#33;!#34;!#35;!#52;!#53;!#85]~[#6;!-]


Aldehyde: [$([CX3H][#6]),$([CX3H2])]=[OX1]


Ketone: [#6][CX3](=[OX1])[#6]


Thioaldehyde: [$([CX3H][#6]),$([CX3H2])]=[SX1]


Thioketone: [#6][CX3](=[SX1])[#6]


Imine: [NX2;$([N][#6]),$([NH]);!$([N][CX3]=[#7,#8,#15,#16])]=[CX3;$([CH2]),$([CH][#6]),$([C]([#6])[#6])]


Immonium: [NX3+;!$([N][!#6]);!$([N][CX3]=[#7,#8,#15,#16])]


Oxime: [NX2](=[CX3;$([CH2]),$([CH][#6]),$([C]([#6])[#6])])[OX2H]


Oximether: [NX2](=[CX3;$([CH2]),$([CH][#6]),$([C]([#6])[#6])])[OX2][#6;!$(C=[#7,#8])]


Acetal: [OX2]([#6;!$(C=[O,S,N])])[CX4;!$(C(O)(O)[!#6])][OX2][#6;!$(C=[O,S,N])]


Hemiacetal: [OX2H][CX4;!$(C(O)(O)[!#6])][OX2][#6;!$(C=[O,S,N])]


Aminal: [NX3v3;!$(NC=[#7,#8,#15,#16])]([#6])[CX4;!$(C(N)(N)[!#6])][NX3v3;!$(NC=[#7,#8,#15,#16])][#6]


Hemiaminal: [NX3v3;!$(NC=[#7,#8,#15,#16])]([#6])[CX4;!$(C(N)(N)[!#6])][OX2H]


Thioacetal: [SX2]([#6;!$(C=[O,S,N])])[CX4;!$(C(S)(S)[!#6])][SX2][#6;!$(C=[O,S,N])]


Thiohemiacetal: [SX2]([#6;!$(C=[O,S,N])])[CX4;!$(C(S)(S)[!#6])][OX2H]


Halogen_acetal_like: [NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][FX1,ClX1,BrX1,IX1]


Acetal_like: [NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][FX1,ClX1,BrX1,IX1,NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])]


Halogenmethylen_ester_and_similar: [NX3v3,SX2,OX2;$(**=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][FX1,ClX1,BrX1,IX1]


NOS_methylen_ester_and_similar: [NX3v3,SX2,OX2;$(**=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])]


Hetero_methylen_ester_and_similar: [NX3v3,SX2,OX2;$(**=[#7,#8,#15,#16])][CX4;!$(C([N,S,O])([N,S,O])[!#6])][FX1,ClX1,BrX1,IX1,NX3v3,SX2,OX2;!$(*C=[#7,#8,#15,#16])]


Cyanhydrine: [NX1]#[CX2][CX4;$([CH2]),$([CH]([CX2])[#6]),$(C([CX2])([#6])[#6])][OX2H]


Chloroalkene: [ClX1][CX3]=[CX3]


Fluoroalkene: [FX1][CX3]=[CX3]


Bromoalkene: [BrX1][CX3]=[CX3]


Iodoalkene: [IX1][CX3]=[CX3]


Enol: [OX2H][CX3;$([H1]),$(C[#6])]=[CX3]


Endiol: [OX2H][CX3;$([H1]),$(C[#6])]=[CX3;$([H1]),$(C[#6])][OX2H]


Enolether: [OX2]([#6;!$(C=[N,O,S])])[CX3;$([H0][#6]),$([H1])]=[CX3]


Enolester: [OX2]([CX3]=[OX1])[#6X3;$([#6][#6]),$([H1])]=[#6X3;!$(C[OX2H])]


Enamine: [NX3;$([NH2][CX3]),$([NH1]([CX3])[#6]),$([N]([CX3])([#6])[#6]);!$([N]*=[#7,#8,#15,#16])][CX3;$([CH]),$([C][#6])]=[CX3]


Thioenol: [SX2H][CX3;$([H1]),$(C[#6])]=[CX3]


Thioenolether: [SX2]([#6;!$(C=[N,O,S])])[CX3;$(C[#6]),$([CH])]=[CX3]


Acylchloride: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[ClX1]


Acylfluoride: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[FX1]


Acylbromide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[BrX1]


Acyliodide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[IX1]


Acylhalide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[FX1,ClX1,BrX1,IX1]


Carboxylic_acid: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[$([OX2H]),$([OX1-])]


Carboxylic_ester:  [CX3;$([R0][#6]),$([H1R0])](=[OX1])[OX2][#6;!$(C=[O,N,S])]


Lactone: [#6][#6X3R](=[OX1])[#8X2][#6;!$(C=[O,N,S])]


Carboxylic_anhydride: [CX3;$([H0][#6]),$([H1])](=[OX1])[#8X2][CX3;$([H0][#6]),$([H1])](=[OX1])


Carboxylic_acid_derivative: [$([#6X3H0][#6]),$([#6X3H])](=[!#6])[!#6]


Carbothioic_acid: [CX3;!R;$([C][#6]),$([CH]);$([C](=[OX1])[$([SX2H]),$([SX1-])]),$([C](=[SX1])[$([OX2H]),$([OX1-])])]


Carbothioic_S_ester: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[SX2][#6;!$(C=[O,N,S])]


Carbothioic_S_lactone: [#6][#6X3R](=[OX1])[#16X2][#6;!$(C=[O,N,S])]


Carbothioic_O_ester: [CX3;$([H0][#6]),$([H1])](=[SX1])[OX2][#6;!$(C=[O,N,S])]


Carbothioic_O_lactone: [#6][#6X3R](=[SX1])[#8X2][#6;!$(C=[O,N,S])]


Carbothioic_halide: [CX3;$([H0][#6]),$([H1])](=[SX1])[FX1,ClX1,BrX1,IX1]


Carbodithioic_acid: [CX3;!R;$([C][#6]),$([CH]);$([C](=[SX1])[SX2H])]


Carbodithioic_ester: [CX3;!R;$([C][#6]),$([CH]);$([C](=[SX1])[SX2][#6;!$(C=[O,N,S])])]


Carbodithiolactone: [#6][#6X3R](=[SX1])[#16X2][#6;!$(C=[O,N,S])]


Amide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]


Primary_amide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[NX3H2]


Secondary_amide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3H1][#6;!$(C=[O,N,S])]


Tertiary_amide: [CX3;$([R0][#6]),$([H1R0])](=[OX1])[#7X3H0]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])]


Lactam: [#6R][#6X3R](=[OX1])[#7X3;$([H1][#6;!$(C=[O,N,S])]),$([H0]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]


Alkyl_imide: [#6X3;$([H0][#6]),$([H1])](=[OX1])[#7X3H0]([#6])[#6X3;$([H0][#6]),$([H1])](=[OX1])


N_hetero_imide: [#6X3;$([H0][#6]),$([H1])](=[OX1])[#7X3H0]([!#6])[#6X3;$([H0][#6]),$([H1])](=[OX1])


Imide_acidic: [#6X3;$([H0][#6]),$([H1])](=[OX1])[#7X3H1][#6X3;$([H0][#6]),$([H1])](=[OX1])


Thioamide: [$([CX3;!R][#6]),$([CX3H;!R])](=[SX1])[#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]


Thiolactam: [#6R][#6X3R](=[SX1])[#7X3;$([H1][#6;!$(C=[O,N,S])]),$([H0]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]


Oximester: [#6X3;$([H0][#6]),$([H1])](=[OX1])[#8X2][#7X2]=,:[#6X3;$([H0]([#6])[#6]),$([H1][#6]),$([H2])]


Amidine: [NX3;!$(NC=[O,S])][CX3;$([CH]),$([C][#6])]=[NX2;!$(NC=[O,S])]


Hydroxamic_acid: [CX3;$([H0][#6]),$([H1])](=[OX1])[#7X3;$([H1]),$([H0][#6;!$(C=[O,N,S])])][$([OX2H]),$([OX1-])]


Hydroxamic_acid_ester: [CX3;$([H0][#6]),$([H1])](=[OX1])[#7X3;$([H1]),$([H0][#6;!$(C=[O,N,S])])][OX2][#6;!$(C=[O,N,S])]


Imidoacid: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[$([OX2H]),$([OX1-])]


Imidoacid_cyclic: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[$([OX2H]),$([OX1-])] 


Imidoester: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[OX2][#6;!$(C=[O,N,S])]


Imidolactone: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[OX2][#6;!$(C=[O,N,S])]


Imidothioacid: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[$([SX2H]),$([SX1-])]


Imidothioacid_cyclic: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[$([SX2H]),$([SX1-])] 


Imidothioester: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[SX2][#6;!$(C=[O,N,S])]


Imidothiolactone: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[SX2][#6;!$(C=[O,N,S])]


Amidine: [#7X3v3;!$(N([#6X3]=[#7X2])C=[O,S])][CX3R0;$([H1]),$([H0][#6])]=[NX2v3;!$(N(=[#6X3][#7X3])C=[O,S])]


Imidolactam: [#6][#6X3R;$([H0](=[NX2;!$(N(=[#6X3][#7X3])C=[O,S])])[#7X3;!$(N([#6X3]=[#7X2])C=[O,S])]),$([H0](-[NX3;!$(N([#6X3]=[#7X2])C=[O,S])])=,:[#7X2;!$(N(=[#6X3][#7X3])C=[O,S])])] 


Imidoylhalide: [CX3R0;$([H0][#6]),$([H1])](=[NX2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[FX1,ClX1,BrX1,IX1]


Imidoylhalide_cyclic: [#6R][#6X3R](=,:[#7X2;$([H1]),$([H0][#6;!$(C=[O,N,S])])])[FX1,ClX1,BrX1,IX1]


Amidrazone: [$([$([#6X3][#6]),$([#6X3H])](=[#7X2v3])[#7X3v3][#7X3v3]),$([$([#6X3][#6]),$([#6X3H])]([#7X3v3])=[#7X2v3][#7X3v3])]


Alpha_aminoacid: [NX3,NX4+;!$([N]~[!#6]);!$([N]*~[#7,#8,#15,#16])][C][CX3](=[OX1])[OX2H,OX1-]


Alpha_hydroxyacid: [OX2H][C][CX3](=[OX1])[OX2H,OX1-]


Peptide_middle: [NX3;$([N][CX3](=[OX1])[C][NX3,NX4+])][C][CX3](=[OX1])[NX3;$([N][C][CX3](=[OX1])[NX3,OX2,OX1-])]


Peptide_C_term: [NX3;$([N][CX3](=[OX1])[C][NX3,NX4+])][C][CX3](=[OX1])[OX2H,OX1-]


Peptide_N_term: [NX3,NX4+;!$([N]~[!#6]);!$([N]*~[#7,#8,#15,#16])][C][CX3](=[OX1])[NX3;$([N][C][CX3](=[OX1])[NX3,OX2,OX1-])]


Carboxylic_orthoester: [#6][OX2][CX4;$(C[#6]),$([CH])]([OX2][#6])[OX2][#6]


Ketene: [CX3]=[CX2]=[OX1]


Ketenacetal: [#7X2,#8X3,#16X2;$(*[#6,#14])][#6X3]([#7X2,#8X3,#16X2;$(*[#6,#14])])=[#6X3]


Nitrile: [NX1]#[CX2]


Isonitrile: [CX1-]#[NX2+]


Vinylogous_carbonyl_or_carboxyl_derivative: [#6X3](=[OX1])[#6X3]=,:[#6X3][#7,#8,#16,F,Cl,Br,I]


Vinylogous_acid: [#6X3](=[OX1])[#6X3]=,:[#6X3][$([OX2H]),$([OX1-])]


Vinylogous_ester: [#6X3](=[OX1])[#6X3]=,:[#6X3][#6;!$(C=[O,N,S])]


Vinylogous_amide: [#6X3](=[OX1])[#6X3]=,:[#6X3][#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=[O,N,S])])[#6;!$(C=[O,N,S])])]


Vinylogous_halide: [#6X3](=[OX1])[#6X3]=,:[#6X3][FX1,ClX1,BrX1,IX1]


Carbonic_acid_dieester: [#6;!$(C=[O,N,S])][#8X2][#6X3](=[OX1])[#8X2][#6;!$(C=[O,N,S])]


Carbonic_acid_esterhalide: [#6;!$(C=[O,N,S])][OX2;!R][CX3](=[OX1])[OX2][FX1,ClX1,BrX1,IX1]


Carbonic_acid_monoester: [#6;!$(C=[O,N,S])][OX2;!R][CX3](=[OX1])[$([OX2H]),$([OX1-])]


Carbonic_acid_derivatives: [!#6][#6X3](=[!#6])[!#6]


Thiocarbonic_acid_dieester: [#6;!$(C=[O,N,S])][#8X2][#6X3](=[SX1])[#8X2][#6;!$(C=[O,N,S])]


Thiocarbonic_acid_esterhalide: [#6;!$(C=[O,N,S])][OX2;!R][CX3](=[SX1])[OX2][FX1,ClX1,BrX1,IX1]


Thiocarbonic_acid_monoester: [#6;!$(C=[O,N,S])][OX2;!R][CX3](=[SX1])[$([OX2H]),$([OX1-])]


Thiourea: [#7X3;!$([#7][!#6])][#6X3](=[SX1])[#7X3;!$([#7][!#6])]


Isourea: [#7X2;!$([#7][!#6])]=,:[#6X3]([#8X2&!$([#8][!#6]),OX1-])[#7X3;!$([#7][!#6])]


Isothiourea: [#7X2;!$([#7][!#6])]=,:[#6X3]([#16X2&!$([#16][!#6]),SX1-])[#7X3;!$([#7][!#6])]


Guanidine: [N;v3X3,v4X4+][CX3](=[N;v3X2,v4X3+])[N;v3X3,v4X4+]


Carbaminic_acid: [NX3]C(=[OX1])[O;X2H,X1-]


Urethan: [#7X3][#6](=[OX1])[#8X2][#6]


Biuret: [#7X3][#6](=[OX1])[#7X3][#6](=[OX1])[#7X3]


Semicarbazide: [#7X3][#7X3][#6X3]([#7X3;!$([#7][#7])])=[OX1]


Carbazide: [#7X3][#7X3][#6X3]([#7X3][#7X3])=[OX1]


Semicarbazone: [#7X2](=[#6])[#7X3][#6X3]([#7X3;!$([#7][#7])])=[OX1]


Carbazone: [#7X2](=[#6])[#7X3][#6X3]([#7X3][#7X3])=[OX1]


Thiosemicarbazide: [#7X3][#7X3][#6X3]([#7X3;!$([#7][#7])])=[SX1]


Thiocarbazide: [#7X3][#7X3][#6X3]([#7X3][#7X3])=[SX1]


Thiosemicarbazone: [#7X2](=[#6])[#7X3][#6X3]([#7X3;!$([#7][#7])])=[SX1]


Thiocarbazone: [#7X2](=[#6])[#7X3][#6X3]([#7X3][#7X3])=[SX1]


Isocyanate: [NX2]=[CX2]=[OX1]


Cyanate: [OX2][CX2]#[NX1]


Isothiocyanate: [NX2]=[CX2]=[SX1]


Thiocyanate: [SX2][CX2]#[NX1]


Carbodiimide: [NX2]=[CX2]=[NX2]


Orthocarbonic_derivatives: [CX4H0]([O,S,#7])([O,S,#7])([O,S,#7])[O,S,#7,F,Cl,Br,I]


Phenol: [OX2H][c]


1,2-Diphenol: [OX2H][c][c][OX2H]


Arylchloride: [Cl][c]


Arylfluoride: [F][c]


Arylbromide: [Br][c]


Aryliodide: [I][c]


Arylthiol: [SX2H][c]


Iminoarene: [c]=[NX2;$([H1]),$([H0][#6;!$([C]=[N,S,O])])]


Oxoarene: [c]=[OX1]


Thioarene: [c]=[SX1]


Hetero_N_basic_H: [nX3H1+0]


Hetero_N_basic_no_H: [nX3H0+0]


Hetero_N_nonbasic: [nX2,nX3+]


Hetero_O: [o]


Hetero_S: [sX2]


Heteroaromatic: [a;!c]


Nitrite: [NX2](=[OX1])[O;$([X2]),$([X1-])]


Thionitrite: [SX2][NX2]=[OX1]


Nitrate: [$([NX3](=[OX1])(=[OX1])[O;$([X2]),$([X1-])]),$([NX3+]([OX1-])(=[OX1])[O;$([X2]),$([X1-])])]


Nitro: [$([NX3](=O)=O),$([NX3+](=O)[O-])][!#8]


Nitroso: [NX2](=[OX1])[!#7;!#8]


Azide: [NX1]~[NX2]~[NX2,NX1]


Acylazide: [CX3](=[OX1])[NX2]~[NX2]~[NX1]


Diazo: [$([#6]=[NX2+]=[NX1-]),$([#6-]-[NX2+]#[NX1])]


Diazonium: [#6][NX2+]#[NX1]


Nitrosamine: [#7;!$(N*=O)][NX2]=[OX1]


Nitrosamide: [NX2](=[OX1])N-*=O


N-Oxide: [$([#7+][OX1-]),$([#7v5]=[OX1]);!$([#7](~[O])~[O]);!$([#7]=[#7])]


Hydrazine: [NX3;$([H2]),$([H1][#6]),$([H0]([#6])[#6]);!$(NC=[O,N,S])][NX3;$([H2]),$([H1][#6]),$([H0]([#6])[#6]);!$(NC=[O,N,S])]


Hydrazone: [NX3;$([H2]),$([H1][#6]),$([H0]([#6])[#6]);!$(NC=[O,N,S])][NX2]=[#6]


Hydroxylamine: [NX3;$([H2]),$([H1][#6]),$([H0]([#6])[#6]);!$(NC=[O,N,S])][OX2;$([H1]),$(O[#6;!$(C=[N,O,S])])]


Sulfon: [$([SX4](=[OX1])(=[OX1])([#6])[#6]),$([SX4+2]([OX1-])([OX1-])([#6])[#6])]


Sulfoxide: [$([SX3](=[OX1])([#6])[#6]),$([SX3+]([OX1-])([#6])[#6])]


Sulfonium: [S+;!$([S]~[!#6]);!$([S]*~[#7,#8,#15,#16])]


Sulfuric_acid: [SX4](=[OX1])(=[OX1])([$([OX2H]),$([OX1-])])[$([OX2H]),$([OX1-])]


Sulfuric_monoester: [SX4](=[OX1])(=[OX1])([$([OX2H]),$([OX1-])])[OX2][#6;!$(C=[O,N,S])]


Sulfuric_diester: [SX4](=[OX1])(=[OX1])([OX2][#6;!$(C=[O,N,S])])[OX2][#6;!$(C=[O,N,S])]


Sulfuric_monoamide: [SX4](=[OX1])(=[OX1])([#7X3;$([H2]),$([H1][#6;!$(C=[O,N,S])]),$([#7]([#6;!$(C=

User e9249ba1fe

23-10-2007 11:50:08

hey thanks a lot.


i was hoping that there would be some kind of simple answer.


i know a lot about ml stuff.


but biggest problem is getting the descriptors.


vcc lab allows only 150 molecules to be processed at a time see ms i will have to go back to cdk etc.


thanks for help

User 677b9c22ff

23-10-2007 20:12:09

akshayubhat wrote:
hey thanks a lot.


i was hoping that there would be some kind of simple answer.


A) open the DOS commandline and call cxcalc


I am not quite sure what is simpler than that.


Output will be something like:


Code:



D:\temp>cxcalc plattIndex randicIndex balabanIndex hararyindex wienerindex fusedRingcount largestringsize c6h6.smi





1       12      3.00    2.00    10.00   27      0       6


2       20      2.97    1.64    10.33   27      0       3


3       20      2.97    1.74    10.67   25      2       4


4       28      2.98    2.21    11.50   22      3       5


5       36      3.00    1.29    12.00   21      4       4


6       36      3.00    1.29    12.00   21      4       4


7       8       2.91    2.34    8.70    35      0       0


8       14      2.93    1.88    9.50    31      0       3


9       14      2.93    2.01    9.75    29      0       4


10      22      2.93    1.65    10.25   28      2       3


11      12      3.00    2.00    10.00   27      0       6


12      14      2.93    1.88    9.50    31      0       3








B) Going back to CDK does not help you if you


can not program in JAVA. If you can program in JAVA


its like that:





1) Use MolImporter


2) Load and loop through all molecules


3) Initialize the plugin (see table above)


4) Perform calculation


5) Output calculation





For each of the plugins from the large list above


you can repeat that by simply calling them and


adding more functions, for the topological descriptors it


looks like that, and to be honest I am not quite


sure what is simpler than that (if you know JAVA).


The code is not pretty but it works and its quickly to built.





Code:



package examples;


import chemaxon.formats.*;


import chemaxon.struc.*;


import chemaxon.marvin.calculations.*;


import chemaxon.marvin.plugin.*;


import java.io.*;





public class CalcDescSimple {





   /** Defines a MolImporter object to the structure file. */


   private static MolImporter createMolImporter(String filename) {


      MolImporter mi = null;


      try{


         File f = new File(filename);


         FileInputStream fis = new FileInputStream(f);


         mi = new MolImporter(fis);





      } catch(FileNotFoundException ex) {


         System.err.println(filename+": not found");


         System.exit(1);


      } catch(MolFormatException ex) {


         System.err.println(filename+": "+ex.getMessage());


         System.exit(1);


      } catch(Exception ex) {


         System.err.println("Error: "+filename+" is not a structure file.");


         System.exit(1);


      }


      return mi;


   }


   /** counts molecules from a structure file. */


   private static  long countMolecules(String filename) throws PluginException, MolFormatException, IOException


   {


      MolImporter mi = createMolImporter(filename);


      long globalmolcounter = 0;


      while (( mi.read()) != null) {


         globalmolcounter++;


      }


      mi.close();


      return globalmolcounter;


   }





   public static  void main(String[] args) throws PluginException, MolFormatException, IOException {





      String    filename = "d:/temp/c6h6.smi";


      System.out.println("Number of molecules in " + filename+ ": "+ countMolecules(filename));


      MolImporter mi = createMolImporter(filename);


      TopologyAnalyserPlugin topologyplugin = new TopologyAnalyserPlugin();





      // for each input molecule run the calculation and display the results


      Molecule target = null; long molcounter = 0; long totalerrors = 0;


      while ((target = mi.read()) != null) {





         // set the input molecule


         topologyplugin.setMolecule(target);


         try {





            // run the calculation


            topologyplugin.run();





            //conversion double to string - if you want calculations with doubles use tempXXX


            //loss of precision possible 12-decimals


            java.text.DecimalFormat df12 = new java.text.DecimalFormat("0.000000000000");





            // maybe prettier to put them in array or LIST ?


            int count = target.getAtomCount();


            int aliphaticatomCount = topologyplugin.getAliphaticAtomCount();


            int aliphaticbondcount = topologyplugin.getAliphaticBondCount();


            int aliphaticringcount = topologyplugin.getAliphaticRingCount();


            int aromaticatomcount = topologyplugin.getAromaticAtomCount();


            int aromaticbondcount = topologyplugin.getAromaticBondCount();


            int aromaticringcount = topologyplugin.getAromaticRingCount();


            int asymmetricatomcount = topologyplugin.getAsymmetricAtomCount();


            double tempbalabanindex = topologyplugin.getBalabanIndex();


            String balabanindex = df12.format(tempbalabanindex);


            int bondcount = topologyplugin.getBondCount();


            int carboaromaticringcount = topologyplugin.getCarboaromaticRingCount();


            int carboringcount = topologyplugin.getCarboRingCount();


            int chainatomcount = topologyplugin.getChainAtomCount();


            int chainbondcount = topologyplugin.getChainBondCount();


            int chiralcentercount = topologyplugin.getChiralCenterCount();


            boolean tempconnectedGraph =  topologyplugin.isConnectedGraph();


            int connectedGraph= tempconnectedGraph?1:0;


            int cyclomaticNumber = topologyplugin.getCyclomaticNumber();


            int fusedaliphaticringcount = topologyplugin.getFusedAliphaticRingCount();


            int fusedaromaticringcount = topologyplugin.getFusedAromaticRingCount();


            int fusedringcount = topologyplugin.getFusedRingCount();


            double temphararyIndex = topologyplugin.getHararyIndex();


            String hararyIndex = df12.format(temphararyIndex);


            int heteroaromaticringcount = topologyplugin.getHeteroaromaticRingCount();


            int heteroringcount = topologyplugin.getHeteroRingCount();


            int hyperWienerIndex = topologyplugin.getHyperWienerIndex();


            int largestringsize = topologyplugin.getLargestRingSize();


            int plattIndex = topologyplugin.getPlattIndex();


            double temprandicIndex = topologyplugin.getRandicIndex();


            String randicIndex = df12.format(temprandicIndex);


            int ringatomcount = topologyplugin.getRingAtomCount();


            int ringbondcount = topologyplugin.getRingBondCount();


            int ringcount = topologyplugin.getRingCount();


            int rotatablebondcount = topologyplugin.getRotatableBondCount();


            int smallestringsize = topologyplugin.getSmallestRingSize();


            int szegedIndex = topologyplugin.getSzegedIndex();


            int wienerIndex = topologyplugin.getWienerIndex();


            int wienerPolarity = topologyplugin.getWienerPolarity();





            //*******************************************************************





            String TopologyResult = molcounter + "\t"+count+"\t" + aliphaticatomCount + "\t" + aliphaticbondcount + "\t" + aliphaticringcount + "\t";


            TopologyResult = TopologyResult + aromaticatomcount + "\t" +aromaticbondcount + "\t" + aromaticringcount + "\t";


            TopologyResult = TopologyResult + asymmetricatomcount + "\t" +balabanindex+ "\t"+bondcount+ "\t";


            TopologyResult = TopologyResult + carboaromaticringcount + "\t" +carboringcount + "\t" +chainatomcount + "\t" + chainbondcount + "\t";


            TopologyResult = TopologyResult + chiralcentercount +"\t" + connectedGraph + "\t" + cyclomaticNumber+ "\t";


            TopologyResult = TopologyResult + fusedaliphaticringcount +"\t" + fusedaromaticringcount +"\t" + fusedringcount +"\t" ;


            TopologyResult = TopologyResult + hararyIndex+"\t" +heteroaromaticringcount+"\t" +heteroringcount+"\t"+hyperWienerIndex+"\t" ;


            TopologyResult = TopologyResult + largestringsize +"\t" +plattIndex+"\t" +randicIndex+"\t";


            TopologyResult = TopologyResult + ringatomcount+"\t"+ringbondcount+"\t"+ringcount +"\t";


            TopologyResult = TopologyResult + rotatablebondcount+"\t"+smallestringsize +"\t"+szegedIndex+"\t";


            TopologyResult = TopologyResult + wienerIndex  +"\t"+ wienerPolarity  +"\t";





            System.out.println();


            System.out.print(TopologyResult);





         } //this is for plugin-errors


         catch (Exception e)


         {


            System.out.println ("Error - " + e );


            totalerrors++;


         }


      }


      System.out.println();


      System.out.println("Number of errors:"+totalerrors);


      mi.close();


   }


}











The output is something like:


Code:



Number of molecules in d:/temp/c6h6.smi: 217


                                    


0  6  0  0  0  6  6  1  0  2.000000000000  12  1  1  0  0  0  1  1  0  0  0  10.00000000000  0  0  42  6  12  3.000000000000  6  6  1  0  6  54  27  3 


0  6  6  7  2  0  0  0  0  1.641897173182  13  0  2  0  1  0  1  2  0  0  0  10.33333333333  0  0  43  3  20  2.966326495189  6  6  2  1  3  27  27  4 


0  6  6  7  2  0  0  0  0  1.738063991517  13  0  2  0  0  2  1  2  2  0  2  10.66666666666  0  0  37  4  20  2.966326495189  6  7  2  0  4  59  25  2 


0  6  6  8  3  0  0  0  0  2.213093912396  14  0  3  0  0  4  1  3  3  0  3  11.50000000000  0  0  29  5  28  2.983163247594  6  8  3  0  3  33  22  0 


0  6  6  9  4  0  0  0  0  1.285714285714  15  0  4  0  0  6  1  4  4  0  4  12.00000000000  0  0  27  4  36  3.000000000000  6  9  4  0  3  51  21  0 


0  6  6  9  4  0  0  0  0  1.285714285714  15  0  4  0  0  6  1  4  4  0  4  12.00000000000  0  0  27  4  36  3.000000000000  6  9  4  0  4  81  21  0 


0  6  6  5  0  0  0  0  0  2.339092314976  11  0  0  6  5  0  1  0  0  0  0  8.700000000000  0  0  70  0  8   2.914213562373  0  0  0  2  0  35  35  3 


0  6  6  6  1  0  0  0  0  1.876285894838  12  0  1  3  3  0  1  1  0  0  0  9.500000000000  0  0  56  3  14  2.931851652578  3  3  1  2  3  31  31  3 


0  6  6  6  1  0  0  0  1  2.014266206296  12  0  1  2  2  1  1  1  0  0  0  9.750000000000  0  0  49  4  14  2.931851652578  4  4  1  1  4  45  29  3 


0  6  6  7  2  0  0  0  2  1.647800297284  13  0  2  2  2  2  1  2  2  0  2  10.25000000000  0  0  47  3  22  2.931851652578  4  5  2  1  3  34  28  3 


0  6  6  6  1  0  0  0  0  2.000000000000  12  0  1  0  0  0  1  1  0  0  0  10.00000000000  0  0  42  6  12  3.000000000000  6  6  1  0  6  54  27  3 


0  6  6  6  1  0  0  0  0  1.876285894838  12  0  1  3  3  0  1  1  0  0  0  9.500000000000  0  0  56  3  14  2.931851652578  3  3  1  1  3  31  31  3 


0  6  6  6  1  0  0  0  0  2.014266206296  12  0  1  2  2  0  1  1  0  0  0  9.750000000000  0  0  49  4  14  2.931851652578  4  4  1  1  4  45  29  3 


0  6  6  7  2  0  0  0  2  1.795593921009  13  0  2  0  0  2  1  2  2  0  2  10.83333333333  0  0  34  5  20  2.966326495189  6  7  2  0  3  34  24  1 


0  6  6  6  1  0  0  0  0  2.184105569636  12  0  1  1  1  0  1  1  0  0  0  10.16666666666  0  0  39  5  14  2.893846850117  5  5  1  0  5  33  26  2 


0  6  6  6  1  0  0  0  0  2.014266206296  12  0  1  2  2  0  1  1  0  0  0  9.750000000000  0  0  49  4  14  2.931851652578  4  4  1  1  4  45  29  3 


0  6  6  6  1  0  0  0  0  1.876285894838  12  0  1  3  3  0  1  1  0  0  0  9.500000000000  0  0  56  3  14  2.931851652578  3  3  1  1  3  31  31  3 


0  6  6  7  2  0  0  0  0  1.641897173182  13  0  2  0  1  0  1  2  0  0  0  10.33333333333  0  0  43  3  20  2.966326495189  6  6  2  1  3  27  27  4 








...snip








I added the three files, the example SMILES from all


C6H6 isomers were calculated using the CDK.





Given the fact that you only need 5 lines of code


with the JChem API which actually perform the calculation


I think its quite simple. Its actually a no brainer. Just adding


up routines. The only thing which would be nice to have


a parser which loops through all the XML properties


and then automatically adds each new descriptor


and a calculation line to the JAVA code. But this would require


some serious programming and I am just too lazy for that,


or lets say that goes beyond my programming knowledges.





Kind regards


Tobias

ChemAxon efa1591b5a

24-10-2007 08:22:36

Hi Tobias, great man, thank you for all the useful suggestions, detailed explanations and for the source codes you provided.





Btw: have you received a ChemAxon User Forum t-shirt yet?





Best regards,


Miklos

User e9249ba1fe

24-10-2007 08:53:11

THANKS TOBIAS


FOR ALL THE CODE AND HELP.


I WILL TRY IT.


THANKS AGAIN

ChemAxon efa1591b5a

24-10-2007 09:26:57

Quote:



1. i have academic license for jchem and generatemd


2. i have a windows xp desktop pc


3. i have a file conitaing 50,000~ molecules represented as smiles


4. i wish to compute as many possible descriptors as i can for qspr/qsar


Hi,





as an academic user you are entitled to use all jchem tools without any size or other ind of limitations. Your ~50K compounds can be processed without any problems (i.e. practical memory or time limits will not be reached).





You can use generatemd to calculate complex descriptors like fingerprints, you can even incorporate your own descriptors, in which case, however, you need to write some java code.





As Tobias mentioned, cxcalc can also be quite useful and relevant for your prject. That program can calculate a large number of physico-chemical properties as well as topological and geometrical descriptors and write results in standard text files that are easy to process further. For a detailed list of avaialble properties you may wish to follow this link:
Quote:
http://www.chemaxon.com/marvin/chemaxon/marvin/help/calculator-plugins.html
.





Regards,


Miklos

User e9249ba1fe

06-11-2007 10:17:18

finally i could caculate the chemical fingerprints using


generatemd c aids.sdf -k CF -o descp.txt


however the desc.txt contains 34 integers


what do theses integers represent are these binary fingerprints?


if yes how can i get 1/0 values?

ChemAxon efa1591b5a

06-11-2007 12:30:53

Hi,





great!


What you got in the output file is a binary fingerprint in decimal text representation. Each consecutive 32 bits of the binary fingerprint are respected as an integer value and that value is printed in decimal format as readable text. This is a compact representation, much shorter than a 0,1 text. If you insists on using 0,1 text then add the -2 flag to the command line of generatemd. (See the command line help, generatemd -x ). The user's guide may also be useful: http://www.chemaxon.com/jchem/doc/user/GenerateMD.html.)





I still do not understand your real goal, but in most cases the binary text format is not needed and not so useful. For any kind of calculations the integers are just fine, you can directly compare them by tanimoto etc.





Does this help at all?


regards,


Miklos

User e9249ba1fe

06-11-2007 16:10:48

i want to use those fingerprints as descriptors.


also i want to load them into matlab for further calculation of tanimoto etc hence i am using text format so that i can load the delimited file into matlab.


however i would like to know whether is it possible to calculate a similarity matrix i.e. there are 4773 molecules (bursi mutagencity) i want a tanimoto similarity matrix 4773 *4773 similarity values is it possible with screenmd?


Thanks

User 677b9c22ff

07-11-2007 04:25:20

Hi,


besides using generateMD and generFP and all other tools,


you can use again the Evaluator very easily.





Assume you have the SMARTS (Derivation and Validation of Toxicophores for Mutagenicity Prediction;


Jeroen Kazius, Ross McGuire, and Roberta Bursi


J. Med. Chem.; 2005; 48(1) pp 312 - 320; (Article) DOI: 10.1021/jm040835a)





you want to use or any other SMARTS like from


Performance of Kier-Hall E-state Descriptors in Quantitative


Structure Activity Relationship (QSAR) Studies of


Multifunctional Molecules
; Darko Butina; Molecules 2004, 9, 1004-1009)





Code:



RowNo smarts-definitions estates-atom-types-Kier-Hall


1 [OH1][*] sOH


2 O=[*] dO


3 [OH0]([*])[*] ssO


4 [o] aaO


5 [NH2][*] sNH2


6 [NH1]=[*] dNH


7 [NH1]([*])[*] ssNH


8 [nH1] aaNH


9 N#[*] tN


10 [ND2](=[*])[*] dsN


11 [nH0] aaN


12 N([*])([*])[*] sssN


13 N(=[*])(=[*])[*] ddsN


14 [N;+]([*])([*])([*])[*] ssssN+


15 [SH1][*] sSH


16 S=[*] dS


17 [SX2]([*])[*] ssS


18 [s] aaS


19 S(=[*])(=[*])([*])[*] ddssS


20 [F][*] sF


21 [Cl][*] sCl


22 [Br][*] sBr


23 [I][*] sI


24 [CH3][*] sCH3


25 [CH2]([*])[*] ssCH2


26 [CH2]=[*] dCH2


27 [CH1]([*])([*])[*] sssCH1


28 [CH1](=[*])[*] dsCH1


29 [CH1]#[*] tCH


30 [cH] aaCH


31 [cH0] aasC


32 C(=[*])=[*] ddC


33 C(#[*])[*] tsC


34 C(=[*])([*])[*] dssC


35 C([*])([*])([*])[*] ssssC








what you do is you create an evaluator XML file:








Code:



array(


matchCount("[OH1][*]"),


matchcount("O=[*]"),


matchcount("[OH0]([*])[*]"),


matchcount("[o]"),


matchcount("[NH2][*]"),


matchcount("[NH1]=[*]"),


matchcount("[NH1]([*])[*]"),


matchcount("[nH1]"),


matchcount("N#[*]"),


matchcount("[ND2](=[*])[*]"),


matchcount("[nH0]"),


matchcount("N([*])([*])[*]"),


matchcount("N(=[*])(=[*])[*]"),


matchcount("[N;+]([*])([*])([*])[*]"),


matchcount("[SH1][*]"),


matchcount("S=[*]"),


matchcount("[SX2]([*])[*]"),


matchcount("[s]"),


matchcount("S(=[*])(=[*])([*])[*]"),


matchcount("[F][*]"),


matchcount("[Cl][*]"),


matchcount("[Br][*]"),


matchcount("[I][*]"),


matchcount("[CH3][*]"),


matchcount("[CH2]([*])[*]"),


matchcount("[CH2]=[*]"),


matchcount("[CH1]([*])([*])[*]"),


matchcount("[CH1](=[*])[*]"),


matchcount("[CH1]#[*]"),


matchcount("[cH]"),


matchcount("[cH0]"),


matchcount("C(=[*])=[*]"),


matchcount("C(#[*])[*]"),


matchcount("C(=[*])([*])[*]"),


matchcount("C([*])([*])([*])[*]"))








and you call it with evaluator like this (but beware this does not work


for logP because it is a real number and the array function


is only defined as integer:





evaluate -f SMARTS-kier-hall-QSAR.txt NCI2000.smi >kier-hall-smarts-out.txt





The output is a nice matrix for any tool like Statistica or WEKA.





Code:



0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;3;0;0;0;0;0;3;0


0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;2;2;0;0;0;0;0;0;0;0;0;0;0;8;6;0;0;0;0


1;2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;2;4;0;0;0;0


0;1;0;0;0;1;0;1;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;0;0;1;2;0;0;0;0


0;2;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;7;5;0;0;2;0


2;2;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;0;0;8;11;0;0;1;0


0;2;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;2;0;0;0;0;0;4;2;0;0;4;0


0;3;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;6;6;0;0;2;0


2;0;0;0;0;0;0;0;0;2;0;0;0;0;0;0;0;0;0;0;0;0;0;2;0;0;0;0;0;0;0;0;0;2;0


0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;15;3;0;0;0;0


2;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;6;0;0;0;0;0;2;4;0;0;0;2


0;1;0;0;0;0;0;0;0;1;0;1;0;0;0;0;0;0;0;0;0;0;0;1;1;0;0;0;0;5;1;0;0;2;0


0;0;0;0;1;0;0;0;0;0;1;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;5;4;0;0;0;0


...snip








Tobias

User e9249ba1fe

07-11-2007 09:50:16

thanks tobias





to calculate the similarity matrix i tried following code using screenmd.


csa.sdf csa2.sdf are same files containing same molecules.


i used


screenmd csa.sdf csa2.sdf -g -k CF -M Tanimoto -o output.txt


it seems to work.


actually while writing this post the calculation is going on.

ChemAxon efa1591b5a

07-11-2007 10:07:26

Indeed, you can calculate the similarity matrix using screenmd, just make sure that the dissimilarity threshold is 1 (for tanimoto, or a very large number when using Euclidean metric).





Regarding the use of the chemical fingerprint as a descriptor: it is possible to use the decimal values for further analysis, e.g. in matlab, there is no need to use the binary 0,1 text format. However, if you would like to perform any kind of dimension reduction then the binary form must be used.





Does this help?





Regards,


Miklos

User e9249ba1fe

07-11-2007 16:58:06

thanks Miklos


i got the similarity matrix.


could you tell me how dissimilarity threshold affect the whole procedure and how can i set it using command line?


also i want similarity values between 0 ~1. 1 indicates most similiar or equivalent molecule.


thanks

User e9249ba1fe

08-12-2007 22:14:26

when i calculated the similarity matrix using above procedure i got all diagonal elements as 0 while they should have been 1!!!


please help


thanks

ChemAxon efa1591b5a

10-12-2007 09:17:01

This is because you calculate the dissimilarity... The dissimilarity is often preferred over similarity as there are many common metrics (e.g. Euclidean) that aren't similarity but distance metrics and thus aren't upper bounded.





Hope this helps.





Regards,


Miklos