New type of search?

User f698d0529d

12-10-2005 12:22:39

Hi


My users are asking for functionality in an application to do the following





An exact structure match where the query Q is a single component to find mixtures where one of the components exactly matches the query - Q, Q.R, Q.R.S, etc





This I cannot do with existing JChem functionality. To do such a search quickly, I would need to have a relationship to a components table, where one record in the main table had many components.





Is that the best solution, or is there something I don't know about?





Thanks


Mark

ChemAxon a3d59b832c

12-10-2005 12:39:44

Hi Mark,





I suggest to make a substructure search with a modified form of the original query: add the s* query property to each atoms. "s*" means that nonhydrogen substitutions are not allowed at that particular atom. See the Query Guide for a definition:


http://www.chemaxon.com/jchem/doc/user/Query.html#atprop





This code sample adds this s* to all atoms:


Code:



        for (int i = 0; i < mol.getAtomCount(); i++) {


            MolAtom ma = mol.getAtom(i);


            // Add s* query property to each atom:


            ma.setQProp("s", -2);


        }//end for(i)








We are planning to add this solution as a special search type, but for you the fastest is to simply insert the above code somewhere.





Best regards,





Szabolcs

User f698d0529d

12-10-2005 13:35:30

Thanks


I should have mentioned that I am using JChem Cartridge 3.1.1 with regular Oracle tables, so this has to be done thru regular SQL.





So presumably I am looking for a SMARTS expression. However, while just trying out different options, I tried this.





select jc_compare('C1CCCC1.c2cc[nH]c2.c3ccccc3', 'C1CCCC1', 't:e') from dual; --> 1





this appears to be exactly what I need, but I am surprised. If you use t:p, it returns 0. Also, if there is no exact match, t:e returns 0.





But I am also confused over the supposed purpose of t:e and t:p, according to the JChem documentation.





I had thought that t:e was an exact search and t:p a perfect search, and according to





http://www.chemaxon.com/jchem/doc/user/Query.html#otherSearchTypes





an exact search finds matches ignoring stereochemisty and isotopes, whereas a perfect search is stricter - everything must match exactly. But when I actually try this, I find that it is not true. t:e and t:p do not do that. Instead you have to use the options stereoSearch and exactIsotopeMatching, as explained in





http://www.chemaxon.com/jchem/doc/guide/cartridge/cartapi.html#jc_compare





However, what that page says about t:e and t:p is doubly confusing. It says that t:p is the same as jc_equals and that jc_equals is an exact search





http://www.chemaxon.com/jchem/doc/guide/cartridge/cartapi.html#jc_equals





But how it then explains the difference between exact and perfect on that page I do not understand at all.





None of this is of any relevance if it turns out that t:e is what I need in this instance and that it is simply a matter of confusing documentation, but I suspect there is something wrong here...





Thanks


Mark

ChemAxon a3d59b832c

12-10-2005 15:23:10

Mark,





This is definitely a bug, we will check this.





At the same time of fixing this bug, we will add the new search flag to search the way you would like.





I am sorry if the documentation is unclear, we will review those pages as well. ("Exact search" means a substructure search where the query and target nonhydrogen atom graph is exactly the same. Probably "exact size substructure search" would have been a better name.)





Best regards,





Szabolcs

User f698d0529d

13-10-2005 13:35:13

Thanks





Two things.





First, I still don't understand what you mean? Can you provide an example of how perfect differs from exact. And can you clarify if t:e and t:p mean exact and perfect in the same sense as you are talking about here.





Second, and for your information, this was not the only requirement. There were others, but I had worked out ways around them. However, if you are adding this flag, I just thought I would make you aware of the others too.





1. An exact structure match where the query Q is a single component to find mixtures where one of the components exactly matches the query - Q, Q.R, Q.R.S, etc





This is what we discussed above. There is no workaround. I would have to write my own PL/SQL function, which would run extremely slowly, or normalize out the components of the mixtures to a separate table, which is undesirable.





2. The same as 1 above, but for substructures, i.e. query Q finds M, M.N, M.N.O etc where at least M contains Q.





This is the normal behaviour for a substructure search - no additional flag is required here.





3. An exact structure match where the query is multiple components to find single components where one of them exactly matches a component in the query. I.e. Query Q.R finds Q and R.





I was just going to do something like





select smiles from temp_mcr where jc_equals(smiles, 'C1CCC(C1)C2CCCC2') = 1 or jc_equals(smiles, 'C1CCC(CC1)C2CCCCC2') = 1;





there is supposed to be a more efficient way to write this, which is something like





select smiles from temp_mcr where jc_compare(smiles, '*',


'sep=~ t:e ~ctFilter:(match(''C1CCC(C1)C2CCCC2'')||match(''C1CCC(CC1)C2CCCCC2''))') = 1;





but although this works for a substructure search (t:s) it does not work for t:e or t:p





4. An exact structure match where the query is multiple components to find mixtures where more than one component, or possibly them all, exactly matches. I.e. Query Q.R finds Q, R, Q.R, Q.S, R.S, Q.R.S, etc





This is just an extended version of 3 above.





5. The same as 3 and 4 above, but for substructures.





This will work like





select smiles from temp_mcr where jc_compare(smiles, '*',


'sep=~ t:s ~ctFilter:(match(''C1CCC(C1)C2CCCC2'')||match(''C1CCC(CC1)C2CCCCC2''))') = 1;

ChemAxon a3d59b832c

14-10-2005 12:59:55

Mark,
Custom24 wrote:



First, I still don't understand what you mean? Can you provide an example of how perfect differs from exact.
Please find different diagrams for the intended work for substructure, exact, exact fragment and perfect searches. (An arrow means matching.)


Here I try to give a most simple explanation of each:
  • Substructure search: query is subgraph of target, no size restrictions, query features (stereo, isotopes, aromaticity etc.) are evaluated.


  • Exact fragment search(available from next release): special kind of substructure search where the query must cover one or more full fragments of the target, query features are evaluated.


  • Exact search: special kind of substructure search where the query must cover the whole target, query features are evaluated.


  • Perfect search: query must cover the whole target, query features are not evaluated, instead everything must be the same.


Currently there is a bug in the cartridge which makes exact search behave as exact fragment search. This will be fixed in the next release.





Technically all of the above search types are the collection of lower level search options. This means that you can fine-tune the searches and from a substructure search you can get the same behaviour as perfect search if you set every low-level options to the strictest value.
Custom24 wrote:
And can you clarify if t:e and t:p mean exact and perfect in the same sense as you are talking about here.
Yes, t:e means exact matching and t:p means perfect matching, but in JChem 3.1.1, exact matching is buggy in the cartridge.
Custom24 wrote:
Second, and for your information, this was not the only requirement. There were others, but I had worked out ways around them. However, if you are adding this flag, I just thought I would make you aware of the others too.
I am glad you found solutions for the rest of the requirements.





Best regards,





Szabolcs

ChemAxon a3d59b832c

05-11-2005 15:24:14

JChem 3.1.2. is out, and it contains the new search type called EXACT_FRAGMENT. EXACT matching has also been fixed.





Best regards,


Szabolcs

User 7b0ee04e66

29-06-2006 08:11:39

Hi





I have taken over one of Mark's projects and have implemented the Exact Fragment Search which works fine.





Have you got any plans to introduce the 'opposite' functionnality where we would start with several molecules and find the exact match for each component ? I seem to remmeber it was mentioned at the UGM.





For example searching on 'A + B' would retrieve 'A', 'B', 'A+C', 'A+D'





Thanks


Catherine

ChemAxon aa7c50abf8

29-06-2006 15:21:51

Hi,





If you specify a concatenation of multiple query molecules to jc_compare, a search is performed for each query molecule, the hits for each search will be combined and returned.





This may almost be what you want -- except that you need a similar functionality for one single query molecule with multiple fragments. Do you think that combining the above feature with a yet-to-be-implemented function which takes apart your query molecule into its fragments and returns the fragments as a "concatenation" of multiple queries would yield what you are aiming at?





Something like:


Code:
select ... from structtable where jc_compare(struct, jc_return_fragments_as_a_molecule_concatenation(<your-fragmented-structure>), 't:ef') = 1






P.





PS:


"Concatenation" means here a stream of bytes or characters created by using JChem's MolExporter repeatedly for multiple molecules. But this is a detail which would be hidden from you by the cartridge. I added this information, in case you wanted to know what I call "concatenation".

User 7b0ee04e66

30-06-2006 14:49:46

Hi,





Yes this is what we are looking for !
Quote:
function which takes apart your query molecule into its fragments
Catherine

ChemAxon aa7c50abf8

03-07-2006 14:00:40

Hi,





We will implement this function. Most probably in the form of a Chemical Terms expression for better reusability.





Peter