An aromatization problem, nested jc_molconvert problem

User 8139ea8dbd

21-06-2006 16:58:15

1. I was expecting these two operations lead to the same final structure: "aromatization" and "dearomatization followed by aromatization":


a) direct aromatization


select jc_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:a_day') from dual


gives:


Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3


b) dearomatization + aromatization


select jc_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a') from dual;


gives NC1=CC=C2C(NC(=O)C2=CC3=CC=CN3)=C1





Select jc_molconvert('NC1=CC=C2C(NC(=O)C2=CC3=CC=CN3)=C1', 'smiles:a') from dual;


Nc1ccc2c(NC(=O)C2=Cc3ccc[nH]3)c1





If you check the two resultant structures in Marvin, a 5-member ring is treated differently in the two methods.





2. I was trying to do dearomatization + aromatization in the cartridge using one SQL





select jc_molconvert(jc_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a'), 'smiles:a_day') from dual;





It crashes the session. Error message: ORA-03113: end-of-file on communication channel.





Thanks.

ChemAxon aa7c50abf8

21-06-2006 20:12:31

I cannot comment on your question 1.) -- I leave it to someone more knowledgeable to sort it out.





Regarding your question 2.), please could you tell me which Oracle version are you using on which operating system?





Thanks

User 8139ea8dbd

21-06-2006 20:45:19

Oracle version is 9.2.0.7.0 running on Solaris 9 JChem version is 3.1.5

ChemAxon aa7c50abf8

21-06-2006 21:49:29

It appears that this is a bug in Oracle 9i. I can reproduce the same problem on Windows Server 2003. (I expect that a memory dump is generated in the core dump (cdump) directory of your Oracle installation.)





As a workaround, I suggest to decompose the nested elements like in the following procedure:





Code:
declare


smi1 varchar2(32000);


begin


smi1 := jcf_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a');


dbms_output.put_line('>>>>> ' || jcf_molconvert(smi1, 'smiles:a_day'));


end;

ChemAxon aa7c50abf8

22-06-2006 04:46:20

Or simply use functions instead of operators:


Code:
select jcf_molconvert(jcf_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a'), 'smiles:a_day') from dual;



With the conversion functions/operators (jc_molconvert, jc_standardize, jc_evaluate_x), you will lose no functionality when using functions instead of operators anyway.

ChemAxon a3d59b832c

22-06-2006 06:14:40

Regarding aromaticity:





I believe that the rings not aromatized back do not fulfil the Huckel rule of aromaticity. In the first example, the 4-membered ring is clearly antiaromatic, and for the second, the =O group withdraws an electron from the ring, so it ends up with 5 electrons.





(Dearomatization tries its best to give a sensible Kekule structure even for incorrectly formulated aromatic rings. I think it succeeded this time.)





I checked the molecules at Daylight's page, too:


http://www.daylight.com/daycgi_tutorials/depictmatch.cgi





It seems that even when the molecules are given in fully aromatic form (your starting molecules) the rings in question are depicted in Kekule and do not match SMARTS [a].





Kind regards,


Szabolcs

ChemAxon aa7c50abf8

22-06-2006 12:33:29

pkovacs wrote:
Or simply use functions instead of operators:


Code:
select jcf_molconvert(jcf_molconvert('Nc1ccc2c(c1)[nH]c(=O)c2=Cc3ccc[nH]3', 'smiles:-a'), 'smiles:a_day') from dual;



With the conversion functions/operators (jc_molconvert, jc_standardize, jc_evaluate_x), you will lose no functionality when using functions instead of operators anyway.
I think a brief explanation of the difference between catridge operators and cartridge functions is in order.





While the Oracle documentation explains in detail the interfaces available to cartridge operators and to cartridge functions, it does not contain a formal comparison of the two. Nor does it contain any backround information as to why they will behave differently in situations where you would expect them to behave identically.





My experience showed that when cartridge operators are used in assignments in PL/SQL, the PL/SQL runtime emits a corresponding error message. This is why we provide for each operator an equivalent functional implementation.





The fact that Oracle crashes when you embed a cartridge operator into another, is clearly a bug in Oracle. If it were not, Oracle would emit some kind of error message or handle the situation "gracefully". Also, this problem seems to be limited to Oracle 9i.





The most salient difference between cartridge operators and cartridge functions is that "evaluation" of operators may trigger a domain index scan, functions may not. When an operator is executed in domain index scan mode, a great deal of control is given to the operator's implementation. In





Code:
select count(*) from strtable where jc_contains(smiles, qrystr) = 1






jc_contains will be executed in domain index scan mode: Oracle passes a number of attributes of the execution context to our cartridge implementation (including table name, index name, the qrystr parameter of the operator) and expects a set of rowids of those rows which contain values in the "smiles" column meeting the jc_contains(...) = 1 condition. This gives our cartridge implementation a great deal of control and enables us to execute the search very efficiently on the entire table "strtable":


(a) we can execute the entire operation in Tomcat (outside the very slow Oracle JVM) with the overhead of only one single Oracle-Tomcat round-trip


(b) we can use the JChem structure cache for the search which gives us an optimized solution in terms of speed and memory consumption.





In contrast, cartridge functions will always be called "row-by-row". Cartridge functions will never be able to operate on an entire table in one single shot. Cartridge functions will never be (visually speaking) at the root of a query plan tree. Even though cartridge function implementations have, per documentation, access to some information on the context, in which they are being executed (information in addition to the actual parameters taken from the current row being processed such as "This is the last row being processed in the current SQL statements"), their input will always be fed to them "row-by-row" from the result set returned by some other access method (eg. a full table scan, btree index scan, etc.). For each "row" processed by the function, a call to Tomcat has to be made.





Back to cartridge operators. Oracle's Data Cartridge Developer's Guide says this: "Operators appearing in the WHERE clause can be evaluated efficiently by performing an index scan using the scan methods provided as part of the implementation of an index type." It goes on saying: "An index scan-based evaluation of an operator is a possible candidate for predicate evaluation only if the operator occurs in a predicate which contains any of the following operators: <, <=, =, >=,> and the LIKE operator." This basically means that operators are superior over functions only, if they are placed after the WHERE clause and they are part of a predicate. In any other situations you can use operators and functions interchangeably. (Except in PL/SQL assignments where [for some obscure reason] functions must be used.)

User 8139ea8dbd

22-06-2006 16:40:40

Thanks, Peter. Your explanations of operator and function make it very clear now.

User 8139ea8dbd

22-06-2006 16:47:01

Hi, Szabolcs





I understand why the 5-membered ring is not aromatic. But I was not sure why direct aromatization of the first smiles does not fix the problem, but one has to first dearomatize it and then rearomatize in order to get the correct structure. Could you comment on that?

ChemAxon aa7c50abf8

22-06-2006 17:19:22

A small addition to the operator vs. function discussion for completeness and clarity.





Cartridge operators can be evaluated/executed in index-scan mode and in function-mode ("row-by-row"). (When part of a predicate in the WHERE clause, they will be typically evaluated in index-scan mode.) Oracle provides two different interfaces (and mechanisms) for the two evaluation modes and a complete cartridge implementation will provide implementation for both evaluation modes.





In JChem Cartridge, the functional equivalents of operators (jcf_...) and the operator (jc_...) in function mode-evaluation ultimately share the same implementation core. (There is no requirement in the Oracle docs to do so, but it seems logical.) The only difference on the implementation level between the execution of, say, jc_molconvert() and jcf_molconvert() is that jcf_molconvert() explicitely sets to null those parameters of the implementation core that otherwise (in case of the function-mode execution of jc_molconvert()) are eventually set by Oracle to some value and are meant to convey information on the context the operator is executing.





There are cases when the selection of using either an operator or a function does make a difference. Let's take the following statement:





Code:
select max(jc_molweight(smiles)) from strtable;






For this statement, the optimizer will most probably select a plan whereby strtable will be accessed using full-table scan and jc_molweight will be called once for each row in the table.





Assume that strtable has a jc_idxtype index on the smiles structure column. One of the extra pieces of information which will be provided for the function-mode implementation of jc_molweight for each row processed is the rowid of the current row of strtable. During indexing with jc_idxtype, molweight values for the structures in the smiles column were precomputed and stored in the index table for strtable.smiles (with the rowids of strtable as the primary key of the indextable). Using the rowid for the current row, jc_molweight can retrieve and simply return the precomputed value in the index table -- instead of computing the molweight on-the-fly.





If strtable.smiles is not indexed, the rowid parameter passed to the callback interface by Oracle will be null and the molweight will be computed on-the-fly, using the value in the smiles colunm in the current row.





While the above SQL statement takes less than 3 seconds to execute on my machine (assuming that strtable contains 1k smiles), the statement





Code:
select max(jcf_molweight(smiles)) from strtable;






takes more than 14 seconds to execute. Apparently, retrieving the molweight by rowid from the index table is much faster than sending a job to JChemStreams in Tomcat to do it.





jc_molformula is another operator which is faster than its functional counter-part -- because the formula is also precomputed during index scan.

User 8139ea8dbd

22-06-2006 17:38:46

Hi, Peter





That's an excellent point.

ChemAxon a3d59b832c

22-06-2006 20:49:51

yzhou wrote:
I understand why the 5-membered ring is not aromatic. But I was not sure why direct aromatization of the first smiles does not fix the problem, but one has to first dearomatize it and then rearomatize in order to get the correct structure. Could you comment on that?
In the past we were actually discussing this internally. There are several reasons:





1. (Conceptual reason:) It seems to be misleading if a function called aromatization removed some aromatic bonds, even if they were not correctly formulated.


2. (Practical reason:) It is more efficient to not call dearomatization inside aromatization.


3. (Flexibility of representation) If someone wants to represent a ring in aromatic form which otherwise is not considered aromatic by neither of our methods, then aromatization may ruin this information. For example, an antiaromatic transition state may be represented by aromatic notation for theoretical studies, although it is not a stable form.





We may introduce new aromatization options which would fix aromatization as you suggest. Do you think it would be useful for you?





Best regards,


Szabolcs

User 8139ea8dbd

22-06-2006 21:09:26

I think an option for the aromatization routine to check and fix the structure will be very useful.





If one relies on aromatization as a way to standardize compound before registration, one would expect jc_standardize(smiles, 'config:aromatize') "standardize" the input. I agree that you have some good points in having jc_molconvert the way it is, it seems conceptually unexpected to have jc_standardize behave the same way.





I will need to fix our registration code. An option will be beneficial, even if it's not the default option. Thanks.

ChemAxon a3d59b832c

23-06-2006 08:26:55

OK, we will add an option to the "aromatize" standardizer action for this. In the meantime, you can use action "dearomatize" before "aromatize":





http://www.chemaxon.com/jchem/doc/user/StandardizerConfiguration.html#dearomatizesec





For example: "dearomatize..aromatize:d"





Best regards,


Szabolcs

ChemAxon d76e6e95eb

23-06-2006 11:17:44

I think, that the solution is what Szabolcs proposed. Just add a Dearomatize action right before Aromatize.

ChemAxon a3d59b832c

29-06-2006 14:00:06

Hi Yingyao,





Do you think the dearomatize solution is enough ("dearomatize..aromatize:d") or a new option is needed?





The two functionalities would be basically similar.





Szabolcs

User 8139ea8dbd

29-06-2006 15:42:49

It's not a problem for us, since we know what to patch in our code.





This is mainly for someone who are not aware of the this potential problem. Maybe one can add some instructions to the manual and patch examples you use for related standardizer tools. That could be sufficient without changing the API.





Thanks.

ChemAxon a3d59b832c

29-06-2006 15:48:37

OK, thank you!