searching on additional atom properties

User 870ab5b546

31-08-2006 10:32:07

Hi,





We want to compare two structures in which each atom may have "additional atom properties" (unshared electrons). We need to take the additional properties into account when finding a match.





Your JChem guide says that you can take into account atom aliases when you search, so I tried setting, e.g., an O atom with four electrons to alias O!D and one with two electrons to alias O!B, but the two structures matched to one another. Perhaps I need to set a flag? Or is there some other way of doing what I want to do? Or is there not? I don't want to have to resort to changing an O atom with four electrons to nobelium and an O atom with two electrons to hahnium.





-- Bob

ChemAxon a3d59b832c

31-08-2006 11:21:38

bobgr wrote:
Your JChem guide says that you can take into account atom aliases when you search,
We only consider atom aliases for pseudo atoms. For pseudo atoms the name (alias) is considered as the atom type whereas ordinary atoms with alias keep their original atom type.





You can draw pseudo atoms in Marvin using the More window: type the name in the text box and press the Pseudo button. To create a pseudo atom from a program, use the following code snippet:





Code:
atom.setAtNo(MolAtom.PSEUDO);


atom.setAliasStr("New name");








The matching is case insensitive.





I just noticed that this feature is heavily under-documented, both in the query guide and in the apidoc of chemaxon.struc.MolAtom. I will update the documentation.
bobgr wrote:
I don't want to have to resort to changing an O atom with four electrons to nobelium and an O atom with two electrons to hahnium.
In the future we also plan to support custom "Comparators" in MolSearch. This will allow you to add your code for extra checks during searching.





Best regards,


Szabolcs

ChemAxon a3d59b832c

31-08-2006 11:25:57

Szabolcs wrote:
For pseudo atoms the name (alias) is considered as the atom type whereas ordinary atoms with alias keep their original atom type.
As a consequence of this, pseudo atoms have no real chemical meaning, so no implicit H-s, no valence checking, etc. It may be important in your case.

User 870ab5b546

31-08-2006 13:00:38

Thanks! After some fiddling, it worked!





FYI, the fiddling was:





(1) I needed to use:





if (ourSearch.isMatchCountInRelation(">=", 1))





rather than:





if (ourSearch.isMatching())





The latter method failed to return true, even when the former method did.





(2) The methods are setAtno() and setAliasstr(), not setAtNo() and setAliasStr().





(3) Finally, paste the following structure in MarvinSketch; note the _{} characters in the pseudo tag that I did not type, but were inserted by Marvin or JChem. Now Edit -> Source, then Import the structure. Note the _{} marks triple in number.





Code:
<?xml version="1.0" ?>


<MDocument>


  <MChemicalStruct>


    <molecule molID="m1">


      <atomArray


          atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24 a25 a26 a27 a28 a29 a30 a31 a32 a33 a34"


          elementType="C C C C C C C C C C C C C H H H H H H H H H H H H H H H H H H H H H"


          mrvAlias="C C C C C O4 C C C C N2 C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"


          mrvPseudo="0 0 0 0 0 O_{4} 0 0 0 0 N_{2} 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"


          x2="2.837667433614095 5.082396113620751 5.949987641805925 1.7294534726452961 4.185674658765347 6.442984322564868 8.279380461092774 8.747235257934529 9.965513460696231 8.396342008481582 4.16788913572499 6.36898174218417 7.88950194190723 0.8389493477509276 0.2869095505784458 1.6170394416331553 2.8404045507266136 6.748083938649988 9.095032753336731 9.877882676663056 10.306740727390187 11.060934124804737 11.476382023137832 11.320426598729906 7.057974944229767 9.994838485860855 9.738783188351615 6.799756348709166 7.382664744428382 6.36898174218417 8.980268933105354 9.293060594241481 8.201404183436564 6.291014071814478"


          y2="2.3506786388442644 2.4064251645216563 3.6885866478151557 3.9951954081363175 5.082244051558944 1.793213382070344 1.4866103599401939 2.6015380043925282 1.4587370971014977 0.8176520518114896 6.175986294796566 6.420154929625341 6.175986294796566 4.831387555106186 4.095530547069106 2.3785490325874545 1.0551615159713386 2.880264894588477 3.9289996366268585 3.2064437553545946 2.7966508442634 2.3506786388442644 1.7374639872974462 1.2746530603548611 0.2869095505784458 0.8538428225214547 0.34381519084017476 4.7541601113769865 3.6328401221377633 7.172738764461145 7.1200908619300005 6.029934988074609 5.221607496656919 5.66758257117156"


          />


      <bondArray>


        <bond atomRefs2="a1 a2" order="1" />


        <bond atomRefs2="a2 a3" order="1" />


        <bond atomRefs2="a1 a4" order="1" />


        <bond atomRefs2="a4 a5" order="1" />


        <bond atomRefs2="a5 a3" order="1" />


        <bond atomRefs2="a2 a6" order="1" />


        <bond atomRefs2="a6 a7" order="1" />


        <bond atomRefs2="a7 a8" order="1" />


        <bond atomRefs2="a7 a9" order="1" />


        <bond atomRefs2="a7 a10" order="1" />


        <bond atomRefs2="a5 a11" order="2" />


        <bond atomRefs2="a11 a12" order="1" />


        <bond atomRefs2="a12 a13" order="1" />


        <bond atomRefs2="a4 a14" order="1" />


        <bond atomRefs2="a4 a15" order="1" />


        <bond atomRefs2="a1 a16" order="1" />


        <bond atomRefs2="a1 a17" order="1" />


        <bond atomRefs2="a2 a18" order="1" />


        <bond atomRefs2="a8 a19" order="1" />


        <bond atomRefs2="a8 a20" order="1" />


        <bond atomRefs2="a8 a21" order="1" />


        <bond atomRefs2="a9 a22" order="1" />


        <bond atomRefs2="a9 a23" order="1" />


        <bond atomRefs2="a9 a24" order="1" />


        <bond atomRefs2="a10 a25" order="1" />


        <bond atomRefs2="a10 a26" order="1" />


        <bond atomRefs2="a10 a27" order="1" />


        <bond atomRefs2="a3 a28" order="1" />


        <bond atomRefs2="a3 a29" order="1" />


        <bond atomRefs2="a12 a30" order="1" />


        <bond atomRefs2="a13 a31" order="1" />


        <bond atomRefs2="a13 a32" order="1" />


        <bond atomRefs2="a13 a33" order="1" />


        <bond atomRefs2="a12 a34" order="1" />


      </bondArray>


    </molecule>


  </MChemicalStruct>


</MDocument>


ChemAxon a3d59b832c

31-08-2006 13:19:52

bobgr wrote:
(1) I needed to use:





if (ourSearch.isMatchCountInRelation(">=", 1))





rather than:





if (ourSearch.isMatching())





The latter method failed to return true, even when the former method did.


This seems to be a bug. I would like to have a closer look. Can you send a query and target molecules that did not work? (The difference is that isMatchCountInRelation() uses methods findFirst() and findNext() instead of isMatching().) Did you use the latest test JChem?
bobgr wrote:
(2) The methods are setAtno() and setAliasstr(), not setAtNo() and setAliasStr().
My mistake! Sorry for the wrong capitalization.
bobgr wrote:
(3) Finally, paste the following structure in MarvinSketch; note the _{} characters in the pseudo tag that I did not type, but were inserted by Marvin or JChem. Now Edit -> Source, then Import the structure. Note the _{} marks triple in number.


This is a known issue in Marvin 4.1.0. We will fix it.


Earlier versions are OK, or you can use the mol format. (I mean the simple V2000 mol, extended mol has the same bug.)

User 870ab5b546

31-08-2006 13:42:23

Yes, we are using the latest test JChem. First molecule is target, second is query, although we eventually switch them.





Code:
<?xml version="1.0" ?>


<MDocument>


  <MChemicalStruct>


    <molecule molID="m1">


      <atomArray


          atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24


 a25 a26 a27 a28 a29 a30 a31 a32 a33 a34"


          elementType="C C C C C C C C C C C C H H H H H H H H H H H H H H H H H H H H H C"


          mrvAlias="C C C C C N2 C C C C C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O4"


          mrvPseudo="0 0 0 0 0 N_{2} 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 O_{4}"


          x2="2.387833383410143 4.27105610304014 1.5968785941805486 3.8381159018981177 5.626978227735814


5 5.852968279384736 8.527145788244328 9.092115721595816 5.091855399837763 3.329444743225141 5.8153015388


52977 7.246554546726761 10.240882463851152 8.658972885391963 10.391541632321964 10.127890035912099 7.732


133102007311 8.018677264633181 8.696637028038316 9.236963420408753 7.453704733396285 7.039396566401016 4


.666534796597641 3.272948529255615 2.0676829751453436 2.5573220253187223 1.841698119267238 6.70041928258


3858 6.173116089764128 1.578043924971965 3.2176109721780923 0.2597885408080491 0.9000841615668316 5.3776


22794726617"


          y2="4.922982456770898 4.922982456770898 3.9617752473227488 2.740769105524918 3.857846841572489


 5.962134022117687 5.676366627228833 4.481347133168031 1.5587312448482946 1.1041012984342087 0.584524216


8181105 2.013361191262381 6.234911989966139 6.494713520201228 4.624230830612458 3.9228069662015415 3.922


8069662015415 3.0655047815349796 2.312118013191637 1.5067735366866848 0.7533867683433424 0.2597885408080


491 0.3377251030504638 0.38968281121207365 1.2210061417978308 1.9224352019795634 2.623864262161296 4.728


151442706494 3.3512721764238336 5.663377200188431 5.949144595077285 4.156616652928785 3.455174603320013


2.7797373866461252"


          />


      <bondArray>


        <bond atomRefs2="a1 a2" order="1" />


        <bond atomRefs2="a1 a3" order="1" />


        <bond atomRefs2="a3 a4" order="1" />


        <bond atomRefs2="a4 a5" order="1" />


        <bond atomRefs2="a5 a2" order="1" />


        <bond atomRefs2="a2 a6" order="2" />


        <bond atomRefs2="a6 a7" order="1" />


        <bond atomRefs2="a7 a8" order="1" />


        <bond atomRefs2="a9 a10" order="1" />


        <bond atomRefs2="a9 a11" order="1" />


        <bond atomRefs2="a9 a12" order="1" />


        <bond atomRefs2="a7 a13" order="1" />


        <bond atomRefs2="a7 a14" order="1" />


        <bond atomRefs2="a8 a15" order="1" />


        <bond atomRefs2="a8 a16" order="1" />


        <bond atomRefs2="a8 a17" order="1" />


        <bond atomRefs2="a12 a18" order="1" />


        <bond atomRefs2="a12 a19" order="1" />


        <bond atomRefs2="a12 a20" order="1" />


        <bond atomRefs2="a11 a21" order="1" />


        <bond atomRefs2="a11 a22" order="1" />


        <bond atomRefs2="a11 a23" order="1" />


        <bond atomRefs2="a10 a24" order="1" />


        <bond atomRefs2="a10 a25" order="1" />


        <bond atomRefs2="a10 a26" order="1" />


        <bond atomRefs2="a4 a27" order="1" />


        <bond atomRefs2="a5 a28" order="1" />


        <bond atomRefs2="a5 a29" order="1" />


        <bond atomRefs2="a1 a30" order="1" />


        <bond atomRefs2="a1 a31" order="1" />


        <bond atomRefs2="a3 a32" order="1" />


        <bond atomRefs2="a3 a33" order="1" />


        <bond atomRefs2="a4 a34" order="1" />


        <bond atomRefs2="a9 a34" order="1" />


      </bondArray>


    </molecule>


  </MChemicalStruct>


</MDocument>











<?xml version="1.0" ?>


<MDocument>


  <MChemicalStruct>


    <molecule molID="m1">


      <atomArray


          atomID="a1 a2 a3 a4 a5 a6 a7 a8 a9 a10 a11 a12 a13 a14 a15 a16 a17 a18 a19 a20 a21 a22 a23 a24


 a25 a26 a27 a28 a29 a30 a31 a32 a33 a34"


          elementType="C C C C C C C C C C C C C H H H H H H H H H H H H H H H H H H H H H"


          mrvAlias="C C C C C O4 C C C C N2 C C 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"


          mrvPseudo="0 0 0 0 0 O_{4} 0 0 0 0 N_{2} 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0"


          x2="2.837667433614095 5.082396113620751 5.949987641805925 1.7294534726452961 4.185674658765347


 6.442984322564868 8.279380461092774 8.747235257934529 9.965513460696231 8.396342008481582 4.16788913572


499 6.36898174218417 7.88950194190723 0.8389493477509276 0.2869095505784458 1.6170394416331553 2.8404045


507266136 6.748083938649988 9.095032753336731 9.877882676663056 10.306740727390187 11.060934124804737 11


.476382023137832 11.320426598729906 7.057974944229767 9.994838485860855 9.738783188351615 6.799756348709


166 7.382664744428382 6.36898174218417 8.980268933105354 9.293060594241481 8.201404183436564 6.291014071


814478"


          y2="2.3506786388442644 2.4064251645216563 3.6885866478151557 3.9951954081363175 5.082244051558


944 1.793213382070344 1.4866103599401939 2.6015380043925282 1.4587370971014977 0.8176520518114896 6.1759


86294796566 6.420154929625341 6.175986294796566 4.831387555106186 4.095530547069106 2.3785490325874545 1


.0551615159713386 2.880264894588477 3.9289996366268585 3.2064437553545946 2.7966508442634 2.350678638844


2644 1.7374639872974462 1.2746530603548611 0.2869095505784458 0.8538428225214547 0.34381519084017476 4.7


541601113769865 3.6328401221377633 7.172738764461145 7.1200908619300005 6.029934988074609 5.221607496656


919 5.66758257117156"


          />


      <bondArray>


        <bond atomRefs2="a1 a2" order="1" />


        <bond atomRefs2="a2 a3" order="1" />


        <bond atomRefs2="a1 a4" order="1" />


        <bond atomRefs2="a4 a5" order="1" />


        <bond atomRefs2="a5 a3" order="1" />


        <bond atomRefs2="a2 a6" order="1" />


        <bond atomRefs2="a6 a7" order="1" />


        <bond atomRefs2="a7 a8" order="1" />


        <bond atomRefs2="a7 a9" order="1" />


        <bond atomRefs2="a7 a10" order="1" />


        <bond atomRefs2="a5 a11" order="2" />


        <bond atomRefs2="a11 a12" order="1" />


        <bond atomRefs2="a12 a13" order="1" />


        <bond atomRefs2="a4 a14" order="1" />


        <bond atomRefs2="a4 a15" order="1" />


        <bond atomRefs2="a1 a16" order="1" />


        <bond atomRefs2="a1 a17" order="1" />


        <bond atomRefs2="a2 a18" order="1" />


        <bond atomRefs2="a8 a19" order="1" />


        <bond atomRefs2="a8 a20" order="1" />


        <bond atomRefs2="a8 a21" order="1" />


        <bond atomRefs2="a9 a22" order="1" />


        <bond atomRefs2="a9 a23" order="1" />


        <bond atomRefs2="a9 a24" order="1" />


        <bond atomRefs2="a10 a25" order="1" />


        <bond atomRefs2="a10 a26" order="1" />


        <bond atomRefs2="a10 a27" order="1" />


        <bond atomRefs2="a3 a28" order="1" />


        <bond atomRefs2="a3 a29" order="1" />


        <bond atomRefs2="a12 a30" order="1" />


        <bond atomRefs2="a13 a31" order="1" />


        <bond atomRefs2="a13 a32" order="1" />


        <bond atomRefs2="a13 a33" order="1" />


        <bond atomRefs2="a12 a34" order="1" />


      </bondArray>


    </molecule>


  </MChemicalStruct>


</MDocument>


ChemAxon a3d59b832c

31-08-2006 14:42:32

I am in luck. A few days ago I fixed a bug that solved this issue also. For me there is no match with the second file as query in JChem TEST_2006_08_17, not even with findFirst() and findNext().





So the current development version works correctly: it finds both files in each other, so the next test version or final release will be OK, whichever comes first.





When there are no hydrogens in the molecules, TEST_2006_08_17 also works fine.





Szabolcs

ChemAxon a9ded07333

16-01-2008 14:12:06

Szabolcs wrote:



In the future we also plan to support custom "Comparators" in MolSearch. This will allow you to add your code for extra checks during searching.


From JChem 5.0 we are supporting custom MolComparators. More information is available atTamás