Match and matchcount

User 538416f930

14-04-2006 21:58:52

Hello,





Does evaluator support match and matchcount e.g. to simply substructure filter an SDFfile. If so, can you give me the synthax. E.g. I try evaluate "CCCCC" -e match("C") and get an error message: unrecognized symbol "C".








On a related note, is there a way to use the config file Molecule Constant Definitions (Mol IDs) to define the matching criterea for Reactivity in Reactor.





e.g.


<Reactivity>


<![CDATA[match(ratom(1),"phenol", 1]]>


...





instead of


<Reactivity>


<![CDATA[match(ratom(1),"[OH:1]c1ccccc1", 1]]>


...





Thanks,





S.

ChemAxon a3d59b832c

18-04-2006 07:44:14

sschurer1 wrote:
Does evaluator support match and matchcount e.g. to simply substructure filter an SDFfile. If so, can you give me the synthax. E.g. I try evaluate "CCCCC" -e match("C") and get an error message: unrecognized symbol "C".
I think the problem here is that quotation marks should be escaped from the shell. This works for me (cygwin, bash):
Quote:
$ evaluate "CCCCC" "O" "Cl" -e 'match("C")'


1


0


0


sschurer1 wrote:
On a related note, is there a way to use the config file Molecule Constant Definitions (Mol IDs) to define the matching criterea for Reactivity in Reactor.





e.g.


<Reactivity>


<![CDATA[match(ratom(1),"phenol", 1]]>


...





instead of


<Reactivity>


<![CDATA[match(ratom(1),"[OH:1]c1ccccc1", 1]]>


...


Do you mean the name of the molecules? (First line of a molfile?)


I don't think it is working currently. But new features are coming to more easily identify patterns within chemical terms. My colleagues will soon share the details...





Best regards,


Szabolcs

ChemAxon e08c317633

18-04-2006 13:20:11

Szabolcs wrote:



I think the problem here is that quotation marks should be escaped from the shell. This works for me (cygwin, bash):
Quote:
$ evaluate "CCCCC" "O" "Cl" -e 'match("C")'


1


0


0


If you use windows command prompt, then use quotation marks in this way:
Code:



$ evaluate "CCCCC" "O" "Cl" -e "match('C')"








An other option is to use escape characters, like this:
Code:



$ evaluate "CCCCC" "O" "Cl" -e "match(\"C\")"








Both solutions work also with linux and cygwin.





Regards,


Zsolt

ChemAxon d76e6e95eb

18-04-2006 14:13:58

You can predefine molecule constants in Chemical Terms as described in the Evaluator documentation at http://www.chemaxon.com/jchem/doc/user/Evaluator.html page.





We are just enhancing the system by making a predefined (but customizable) list of common functional groups available in Chemical Terms without the need of specifying them yourself. The new version will support set operations as well.

User 538416f930

18-04-2006 18:23:21

Thank you, that is great!





What I was asking before, can you use these predefine molecule constants in the matching expression of the reactor configuration.


Like in the xm file below, the MolID "Epoxide" is used in the reactivity configuration (match component). This gives an error message:


"Could not read structure: Epoxide"





If I simply replace the 'Epoxide' by its structure, i.e.





<Reactions>





<Reaction ID="EpoxOpenGen3" Structure="EpoxopenGen_3.rxn">





<Reactivity>


<![CDATA[


match(ratom(4),"[C:1]1[C:2](O1)[#6]", 1) && !match(ratom(3),"[C:1]1[C:2](O1)[!#6;!$(S=O);!$(N=O)]", 1,2) && match(ratom(9),"[NH2:1]C", 1) && !match(ratom(9),"[NH2:1]C=[O,N,S,C]", 1)


]]>


</Reactivity>


<Selectivity>


<![CDATA[


charge(ratom(4),"sigma") -stericEffectIndex(ratom(4)) -stericEffectIndex(ratom(9))


]]>


</Selectivity>


</Reaction>





</Reactions>





things work fine.





Here the complete file that does not work (also attached).





<ReactorConfiguration>





<Standardizer>


<Actions>


<Action ID="aromatize" Act="aromatize"/>


</Actions>


</Standardizer>





<Evaluator>





<Params Unique="true" MappingStyle="matching" Cached="true" Fragmentation="REACTIONS" Standardization="pre-post"/>





<Matching ID = "match">


<Search DoubleBondStereoMatchingMode="marked" OrderSensitiveSearch="true" SubgraphSearch="true" ExactAtomMatching = "false" ExactStereoMatching = "false"/>


</Matching>








<Mols>


<Mol ID="Epoxide" Structure="[C:1]1[C:2](O1)[#6]"/>


</Mols>





</Evaluator>





<Reactions>





<Reaction ID="EpoxOpenGen3" Structure="EpoxopenGen_3.rxn">





<Reactivity>


<![CDATA[


match(ratom(4),'Epoxide', 1) && !match(ratom(3),"[C:1]1[C:2](O1)[!#6;!$(S=O);!$(N=O)]", 1,2) && match(ratom(9),"[NH2:1]C", 1) && !match(ratom(9),"[NH2:1]C=[O,N,S,C]", 1)


]]>


</Reactivity>


<Selectivity>


<![CDATA[


charge(ratom(4),"sigma") -stericEffectIndex(ratom(4)) -stericEffectIndex(ratom(9))


]]>


</Selectivity>


</Reaction>





</Reactions>





</ReactorConfiguration>





I also attach all the relevant files.





I used 3.1.6. on windows, e.g.





>react -c reactions/EpoxOpenGen_2b.xml -i EpoxOpenGen3 epoxides_1.sdf amines_1.sdf -o aminoalcohols2.smi

ChemAxon fb166edcbd

19-04-2006 10:51:21

You should not put Epoxide between quotes in your match function because it is an identifier, not a string.


I attach the modified XML.


Test:


Code:



react -c EpoxOpenGen_2.xml -i EpoxOpenGen3 epoxides_1.sdf amines_1.sdf -o aminoalcohols2.smi








You can also use the RDF format to include the reaction rules in your reaction definition. Then your XML only contains the standardization and evaluation rules/definitions:


Code:



react -c EpoxOpenGen_3.xml -r EpoxopenGen_3.rdf epoxides_1.sdf amines_1.sdf -o aminoalcohols3.smi


ChemAxon fb166edcbd

19-04-2006 10:58:51

There is yet another possibility: you can include the molecule definition in the rule itself instead of the XML in the following way:





Code:



Epoxide="[C:1]1[C:2](O1)[#6]";match(ratom(4),Epoxide, 1) &&  !match(ratom(3),"[C:1]1[C:2](O1)[!#6;!$(S=O);!$(N=O)]", 1,2) && match(ratom(9),"[NH2:1]C", 1) && !match(ratom(9),"[NH2:1]C=[O,N,S,C]", 1)








I attach the relevant files.


Test:





Code:



react -c EpoxOpenGen_4.xml -r EpoxopenGen_4.rdf epoxides_1.sdf amines_1.sdf -o aminoalcohols4.smi


User 538416f930

19-04-2006 18:07:34

Thank you, this works great now.