User 8139ea8dbd
21-06-2006 16:58:15
1. I was expecting these two operations lead to the same final structure: "aromatization" and "dearomatization followed by aromatization":
a) direct aromatization
select jc_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:a_day') from dual
gives:
Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3
b) dearomatization + aromatization
select jc_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a') from dual;
gives NC1=CC=C2C(NC(=O)C2=CC3=CC=CN3)=C1
Select jc_molconvert('NC1=CC=C2C(NC(=O)C2=CC3=CC=CN3)=C1', 'smiles:a') from dual;
Nc1ccc2c(NC(=O)C2=Cc3ccc[nH]3)c1
If you check the two resultant structures in Marvin, a 5-member ring is treated differently in the two methods.
2. I was trying to do dearomatization + aromatization in the cartridge using one SQL
select jc_molconvert(jc_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a'), 'smiles:a_day') from dual;
It crashes the session. Error message: ORA-03113: end-of-file on communication channel.
Thanks.
ChemAxon aa7c50abf8
21-06-2006 20:12:31
I cannot comment on your question 1.) -- I leave it to someone more knowledgeable to sort it out.
Regarding your question 2.), please could you tell me which Oracle version are you using on which operating system?
Thanks
User 8139ea8dbd
21-06-2006 20:45:19
Oracle version is 9.2.0.7.0 running on Solaris 9 JChem version is 3.1.5
ChemAxon aa7c50abf8
21-06-2006 21:49:29
It appears that this is a bug in Oracle 9i. I can reproduce the same problem on Windows Server 2003. (I expect that a memory dump is generated in the core dump (cdump) directory of your Oracle installation.)
As a workaround, I suggest to decompose the nested elements like in the following procedure:
Code: |
declare
smi1 varchar2(32000);
begin
smi1 := jcf_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a');
dbms_output.put_line('>>>>> ' || jcf_molconvert(smi1, 'smiles:a_day'));
end; |
ChemAxon aa7c50abf8
22-06-2006 04:46:20
Or simply use functions instead of operators:
Code: |
select jcf_molconvert(jcf_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a'), 'smiles:a_day') from dual; |
With the conversion functions/operators (jc_molconvert, jc_standardize, jc_evaluate_x), you will lose no functionality when using functions instead of operators anyway.
ChemAxon a3d59b832c
22-06-2006 06:14:40
Regarding aromaticity:
I believe that the rings not aromatized back do not fulfil the Huckel rule of aromaticity. In the first example, the 4-membered ring is clearly antiaromatic, and for the second, the =O group withdraws an electron from the ring, so it ends up with 5 electrons.
(Dearomatization tries its best to give a sensible Kekule structure even for incorrectly formulated aromatic rings. I think it succeeded this time.)
I checked the molecules at Daylight's page, too:
http://www.daylight.com/daycgi_tutorials/depictmatch.cgi
It seems that even when the molecules are given in fully aromatic form (your starting molecules) the rings in question are depicted in Kekule and do not match SMARTS [a].
Kind regards,
Szabolcs
User 8139ea8dbd
22-06-2006 16:40:40
Thanks, Peter. Your explanations of operator and function make it very clear now.
User 8139ea8dbd
22-06-2006 16:47:01
Hi, Szabolcs
I understand why the 5-membered ring is not aromatic. But I was not sure why direct aromatization of the first smiles does not fix the problem, but one has to first dearomatize it and then rearomatize in order to get the correct structure. Could you comment on that?
ChemAxon aa7c50abf8
22-06-2006 17:19:22
A small addition to the operator vs. function discussion for completeness and clarity.
Cartridge operators can be evaluated/executed in index-scan mode and in function-mode ("row-by-row"). (When part of a predicate in the WHERE clause, they will be typically evaluated in index-scan mode.) Oracle provides two different interfaces (and mechanisms) for the two evaluation modes and a complete cartridge implementation will provide implementation for both evaluation modes.
In JChem Cartridge, the functional equivalents of operators (jcf_...) and the operator (jc_...) in function mode-evaluation ultimately share the same implementation core. (There is no requirement in the Oracle docs to do so, but it seems logical.) The only difference on the implementation level between the execution of, say, jc_molconvert() and jcf_molconvert() is that jcf_molconvert() explicitely sets to null those parameters of the implementation core that otherwise (in case of the function-mode execution of jc_molconvert()) are eventually set by Oracle to some value and are meant to convey information on the context the operator is executing.
There are cases when the selection of using either an operator or a function
does make a difference. Let's take the following statement:
Code: |
select max(jc_molweight(smiles)) from strtable; |
For this statement, the optimizer will most probably select a plan whereby strtable will be accessed using full-table scan and jc_molweight will be called once for each row in the table.
Assume that strtable has a jc_idxtype index on the smiles structure column. One of the extra pieces of information which will be provided for the function-mode implementation of jc_molweight for each row processed is the rowid of the current row of strtable. During indexing with jc_idxtype, molweight values for the structures in the smiles column were precomputed and stored in the index table for strtable.smiles (with the rowids of strtable as the primary key of the indextable). Using the rowid for the current row, jc_molweight can retrieve and simply return the precomputed value in the index table -- instead of computing the molweight on-the-fly.
If strtable.smiles is not indexed, the rowid parameter passed to the callback interface by Oracle will be null and the molweight will be computed on-the-fly, using the value in the smiles colunm in the current row.
While the above SQL statement takes less than 3 seconds to execute on my machine (assuming that strtable contains 1k smiles), the statement
Code: |
select max(jcf_molweight(smiles)) from strtable; |
takes more than 14 seconds to execute. Apparently, retrieving the molweight by rowid from the index table is much faster than sending a job to JChemStreams in Tomcat to do it.
jc_molformula is another operator which is faster than its functional counter-part -- because the formula is also precomputed during index scan.
User 8139ea8dbd
22-06-2006 17:38:46
Hi, Peter
That's an excellent point.
User 8139ea8dbd
22-06-2006 21:09:26
I think an option for the aromatization routine to check and fix the structure will be very useful.
If one relies on aromatization as a way to standardize compound before registration, one would expect jc_standardize(smiles, 'config:aromatize') "standardize" the input. I agree that you have some good points in having jc_molconvert the way it is, it seems conceptually unexpected to have jc_standardize behave the same way.
I will need to fix our registration code. An option will be beneficial, even if it's not the default option. Thanks.
ChemAxon a3d59b832c
23-06-2006 08:26:55
ChemAxon d76e6e95eb
23-06-2006 11:17:44
I think, that the solution is what Szabolcs proposed. Just add a Dearomatize action right before Aromatize.
ChemAxon a3d59b832c
29-06-2006 14:00:06
Hi Yingyao,
Do you think the dearomatize solution is enough ("dearomatize..aromatize:d") or a new option is needed?
The two functionalities would be basically similar.
Szabolcs
User 8139ea8dbd
29-06-2006 15:42:49
It's not a problem for us, since we know what to patch in our code.
This is mainly for someone who are not aware of the this potential problem. Maybe one can add some instructions to the manual and patch examples you use for related standardizer tools. That could be sufficient without changing the API.
Thanks.