User f698d0529d
12-10-2005 12:22:39
Hi
My users are asking for functionality in an application to do the following
An exact structure match where the query Q is a single component to find mixtures where one of the components exactly matches the query - Q, Q.R, Q.R.S, etc
This I cannot do with existing JChem functionality. To do such a search quickly, I would need to have a relationship to a components table, where one record in the main table had many components.
Is that the best solution, or is there something I don't know about?
Thanks
Mark
ChemAxon a3d59b832c
12-10-2005 12:39:44
Hi Mark,
I suggest to make a substructure search with a modified form of the original query: add the s* query property to each atoms. "s*" means that nonhydrogen substitutions are not allowed at that particular atom. See the Query Guide for a definition:
http://www.chemaxon.com/jchem/doc/user/Query.html#atprop
This code sample adds this s* to all atoms:
Code: |
for (int i = 0; i < mol.getAtomCount(); i++) {
MolAtom ma = mol.getAtom(i);
// Add s* query property to each atom:
ma.setQProp("s", -2);
}//end for(i)
|
We are planning to add this solution as a special search type, but for you the fastest is to simply insert the above code somewhere.
Best regards,
Szabolcs
User f698d0529d
12-10-2005 13:35:30
Thanks
I should have mentioned that I am using JChem Cartridge 3.1.1 with regular Oracle tables, so this has to be done thru regular SQL.
So presumably I am looking for a SMARTS expression. However, while just trying out different options, I tried this.
select jc_compare('C1CCCC1.c2cc[nH]c2.c3ccccc3', 'C1CCCC1', 't:e') from dual; --> 1
this appears to be exactly what I need, but I am surprised. If you use t:p, it returns 0. Also, if there is no exact match, t:e returns 0.
But I am also confused over the supposed purpose of t:e and t:p, according to the JChem documentation.
I had thought that t:e was an exact search and t:p a perfect search, and according to
http://www.chemaxon.com/jchem/doc/user/Query.html#otherSearchTypes
an exact search finds matches ignoring stereochemisty and isotopes, whereas a perfect search is stricter - everything must match exactly. But when I actually try this, I find that it is not true. t:e and t:p do not do that. Instead you have to use the options stereoSearch and exactIsotopeMatching, as explained in
http://www.chemaxon.com/jchem/doc/guide/cartridge/cartapi.html#jc_compare
However, what that page says about t:e and t:p is doubly confusing. It says that t:p is the same as jc_equals and that jc_equals is an exact search
http://www.chemaxon.com/jchem/doc/guide/cartridge/cartapi.html#jc_equals
But how it then explains the difference between exact and perfect on that page I do not understand at all.
None of this is of any relevance if it turns out that t:e is what I need in this instance and that it is simply a matter of confusing documentation, but I suspect there is something wrong here...
Thanks
Mark
ChemAxon a3d59b832c
12-10-2005 15:23:10
Mark,
This is definitely a bug, we will check this.
At the same time of fixing this bug, we will add the new search flag to search the way you would like.
I am sorry if the documentation is unclear, we will review those pages as well. ("Exact search" means a substructure search where the query and target nonhydrogen atom graph is exactly the same. Probably "exact size substructure search" would have been a better name.)
Best regards,
Szabolcs
User f698d0529d
13-10-2005 13:35:13
Thanks
Two things.
First, I still don't understand what you mean? Can you provide an example of how perfect differs from exact. And can you clarify if t:e and t:p mean exact and perfect in the same sense as you are talking about here.
Second, and for your information, this was not the only requirement. There were others, but I had worked out ways around them. However, if you are adding this flag, I just thought I would make you aware of the others too.
1. An exact structure match where the query Q is a single component to find mixtures where one of the components exactly matches the query - Q, Q.R, Q.R.S, etc
This is what we discussed above. There is no workaround. I would have to write my own PL/SQL function, which would run extremely slowly, or normalize out the components of the mixtures to a separate table, which is undesirable.
2. The same as 1 above, but for substructures, i.e. query Q finds M, M.N, M.N.O etc where at least M contains Q.
This is the normal behaviour for a substructure search - no additional flag is required here.
3. An exact structure match where the query is multiple components to find single components where one of them exactly matches a component in the query. I.e. Query Q.R finds Q and R.
I was just going to do something like
select smiles from temp_mcr where jc_equals(smiles, 'C1CCC(C1)C2CCCC2') = 1 or jc_equals(smiles, 'C1CCC(CC1)C2CCCCC2') = 1;
there is supposed to be a more efficient way to write this, which is something like
select smiles from temp_mcr where jc_compare(smiles, '*',
'sep=~ t:e ~ctFilter:(match(''C1CCC(C1)C2CCCC2'')||match(''C1CCC(CC1)C2CCCCC2''))') = 1;
but although this works for a substructure search (t:s) it does not work for t:e or t:p
4. An exact structure match where the query is multiple components to find mixtures where more than one component, or possibly them all, exactly matches. I.e. Query Q.R finds Q, R, Q.R, Q.S, R.S, Q.R.S, etc
This is just an extended version of 3 above.
5. The same as 3 and 4 above, but for substructures.
This will work like
select smiles from temp_mcr where jc_compare(smiles, '*',
'sep=~ t:s ~ctFilter:(match(''C1CCC(C1)C2CCCC2'')||match(''C1CCC(CC1)C2CCCCC2''))') = 1;
ChemAxon a3d59b832c
05-11-2005 15:24:14
JChem 3.1.2. is out, and it contains the new search type called EXACT_FRAGMENT. EXACT matching has also been fixed.
Best regards,
Szabolcs
User 7b0ee04e66
29-06-2006 08:11:39
Hi
I have taken over one of Mark's projects and have implemented the Exact Fragment Search which works fine.
Have you got any plans to introduce the 'opposite' functionnality where we would start with several molecules and find the exact match for each component ? I seem to remmeber it was mentioned at the UGM.
For example searching on 'A + B' would retrieve 'A', 'B', 'A+C', 'A+D'
Thanks
Catherine
ChemAxon aa7c50abf8
29-06-2006 15:21:51
Hi,
If you specify a concatenation of multiple query molecules to jc_compare, a search is performed for each query molecule, the hits for each search will be combined and returned.
This may almost be what you want -- except that you need a similar functionality for one single query molecule with multiple fragments. Do you think that combining the above feature with a yet-to-be-implemented function which takes apart your query molecule into its fragments and returns the fragments as a "concatenation" of multiple queries would yield what you are aiming at?
Something like:
Code: |
select ... from structtable where jc_compare(struct, jc_return_fragments_as_a_molecule_concatenation(<your-fragmented-structure>), 't:ef') = 1 |
P.
PS:
"Concatenation" means here a stream of bytes or characters created by using JChem's MolExporter repeatedly for multiple molecules. But this is a detail which would be hidden from you by the cartridge. I added this information, in case you wanted to know what I call "concatenation".
User 7b0ee04e66
30-06-2006 14:49:46
Hi,
Yes this is what we are looking for !
Quote: |
function which takes apart your query molecule into its fragments |
Catherine
ChemAxon aa7c50abf8
03-07-2006 14:00:40
Hi,
We will implement this function. Most probably in the form of a Chemical Terms expression for better reusability.
Peter