Option list for jcf_Standardize PL/SQL function

User c0e481a82c

22-05-2007 15:02:20

Hi,





I'm trying to create a function which applies our internal business rules for registration using the JCF_Standardize function, to ensure that all structures are registered with functional groups, etc. in a standard format. I'm finding issues with using the outFormat option for the option list. Sometimes, depending upon how the outFormat option is used, it doesn't always get accepted, and causes the PL/SQL to fail.





Having played around with this I've noted the following:





If I use 'config:dearomatize outFormat:smiles:u', I get:





ORA-29532: Java call terminated by uncaught Java exception: java.lang.Exception: The following exception has been thrown by the servlet:


Exception: Could not read reaction: dearomatize outFormat:smiles:u





If I use 'config:aromatize outFormat:smiles:u', it works.





The code I'm using for my standardize function is:





Code:
CREATE OR REPLACE


FUNCTION F_Std_SMILES


            (pi_SMILES VARCHAR2


            )


    RETURN  VARCHAR2


IS


    vl_Reactions            VARCHAR2(4000);


    vl_StandardSMILES       VARCHAR2(4000);





BEGIN





    vl_Reactions := '[O:1][N:2]=O>>[O:1-][N+:2]=O..' ||


                    '[O:1]=[N:2]=O>>[O-:1][N+:2]=O..'||


                    '[C][N:1](=C)=[O:2]>>[C][N+:1]([O-:2])=C..' ||


                    '[C][N:1]([O:2])=C>>[C][N+:1]([O-:2])=C..' ||


                    'C[N+:1]([O:2])=C>>C[N+:1]([O-:2])=C..' ||


                    '[C][N:1](C)(C)[O:2]>>[C][N+:1]([C])([C])[O-:2]..' ||


                    '[C][N:1]([C])([C])=[O:2]>>[C][N+:1]([C])([C])[O-:2]..' ||


                    '[O:1]=[N:2]#[C:3]>>[O-:1][N+:2]#[C:3]..' ||


                    '[C:1][N:2][N+:3]#[N:4]>>[C:1][N:2]=[N+:3]=[N-:4]..' ||


                    '[O-:1][S+:2][O-:3]>>[O:1]=[S:2]=[O:3]..' ||


                    '[O:1][S:2][O:3]>>[O:1]=[S:2]=[O:3]..' ||


                    '[O-:1]-[S+:2]>>[O:1]=[S:2]..' ||


                    '[O;H1:1]-[S:2]>>[O:1]=[S:2]..' ||


                    '[N;H1,H2:1][C:2]=[C:3]>>[C:1][C:2]=[N:3]..' ||


                    '[O;H1:1][C:2]=[C:3]>>[C:1][C:2]=[O:3]';





    vl_StandardSMILES := jcf_Standardize(pi_SMILES, 'config:dearomatize..' || vl_Reactions || '..aromatize');


    vl_StandardSMILES := jcf_Standardize(vl_StandardSMILES, 'config:aromatize outFormat:smiles:u');


   


    RETURN vl_StandardSMILES;





END F_Std_SMILES;






Note, I've had to use 2 separate jcf_Standardize because I didn't seem to be able to combine them. When I did, the output was such that if I standardized a SMILES string, then standardized the output SMILES string, I ended up with 58 out of my 937 compounds having a different "standardized" SMILES vs. "twice standardized" SMILES. Perhaps I'm doing something wrong? Any comments or help would be greatly appreciated.





Regards,





Phil.

User c0e481a82c

22-05-2007 15:15:26

I should add that we're using the version 3.2.5 of the data cartridge.





Regards,





Phil.

ChemAxon aa7c50abf8

23-05-2007 12:24:15

Phil,
Quote:
ORA-29532: Java call terminated by uncaught Java exception: java.lang.Exception: The following exception has been thrown by the servlet:


Exception: Could not read reaction: dearomatize outFormat:smiles:u
I am afraid this is a bug in JChem Cartridge related to parsing the options string. It will be fixed in the next JChem version.





There is a slightly involved workaround which consists of specifying both the 'sep=' and the 'cleaningTemplate' options. The following option


Code:
'config:dearomatize outFormat:smiles:u'



could be reformulated with a no-operation cleaning template as:


Code:
'sep=! config:dearomatize!cleaningTemplate:select ''null'' from dual!outFormat:smiles'



This could be a temporary measure until the fix is out.


Quote:
If I use 'config:aromatize outFormat:smiles:u', it works.
I am afraid it is only incidentally that this works. I am not sure if this will work as expected or that it will always work as expected.





Thanks for reporting the problem and sorry for the inconvenience.





Thanks


Peter

User c0e481a82c

23-05-2007 12:41:13

Hi Peter,





I see. So is this why, when I combine the lines and it doesn't crash:





Code:



vl_StandardSMILES := jcf_Standardize(pi_SMILES, 'config:dearomatize..' || vl_Reactions || '..aromatize outFormat:smiles:u');








I get slightly different unique SMILES than if I leave them as two separate lines:





Code:



vl_StandardSMILES := jcf_Standardize(pi_SMILES, 'config:dearomatize..' || vl_Reactions || '..aromatize');


vl_StandardSMILES := jcf_Standardize(vl_StandardSMILES, 'config:aromatize outFormat:smiles:u');








It's just that, although the first example doesn't complain, it gives me different unique SMILES if I standardize the already standardized structures, than if I do the same task with the two separate lines version. Using the 2 lines, I get the same SMILES strings after two standardizations as after the first standardization, as would be expected. The only reason I noticed this was that the function I mentioned was used as a trigger upon insertion to a table of structures, and when I registered standardized structures, they were changed due to the trigger. I didn't think this should happen, as the SMILES was meant to be unique.





If I can keep using the 2 separate lines in my function, and that gives the truly unique SMILES, then I'm okay for now I think. Thanks for your help.





Regards,





Phil.

ChemAxon aa7c50abf8

23-05-2007 13:35:52

Phil,





I am not quite sure I understand why you need the second standardization. If your purpose is to obtain the unique smiles of the transformed structure, I suggest to either


(a) use the jc_molconvert function:


Code:
jcf_molconvert(vl_StandardSMILES, 'smiles:u')
or


(b) modify the first jcf_standardize call to use the workaround I mentioned:


Code:
jcf_Standardize(pi_SMILES, 'sep=! config:dearomatize..' || vl_Reactions || '..aromatize!outFormat:smiles:u');
(This second option will be significantly faster.)





Three reasons:


- You don't have to aromatize the structure prior to obtaining its unique smiles representation;


- the 'config:aromatize outFormat:smiles:u' option will yield an undefined result;


- there may be a bug related to double aromatization.





To exclude an eventual double aromatization problem we would need a sample structure, with which the problem can be reproduced. (Depending on the confidentiality level of the structure you can post-it here, send it to us by e-mail or, of course, not share it with us at all.)





Thanks


Peter

ChemAxon aa7c50abf8

23-05-2007 13:48:52

PS:
Quote:
I see. So is this why, when I combine the lines and it doesn't crash:





Code:
vl_StandardSMILES := jcf_Standardize(pi_SMILES, 'config:dearomatize..' || vl_Reactions || '..aromatize outFormat:smiles:u');
It doesn't crash because due to bug I mentioned the entire option string 'dearomatize..' || vl_Reactions || '..aromatize outFormat:smiles:u' will be passed by JChem Cartridge to the core JChem Standardizer as the standardization configuration and I suspect that the core standardizer evaluates the last portion of the string 'aromatize outFormat:smiles:u' as one single rule, checks if it starts with the 'aromatize' string and ignores the 'outFormat...' part. Consequently, this statement will not necessarily yield a unique smiles.

User c0e481a82c

23-05-2007 13:52:44

Hi Peter,





The point is that if I do:


Code:



jcf_Standardize(pi_SMILES, 'config:dearomatize..' || vl_Reactions || '..aromatize outFormat:smiles:u');








The structures I get back are changed/standardized. I'm storing these standard structures in a table. I then register these standardized structures against my main table, which has a trigger on it which performs the standardization again (hence, the procedure is being applied twice to these structures). The issue I have is that the structures which have had the above standardize function run twice on them change after the second time. If I separate and apply the two jcf_Standardize separately, as in the code i sent you, this doesn't happen; the structure registered after a second standardization are the same as after the first standardization. I would suggest that the there is a bug there.





I agree that I don't need the second jcf_standardize, and I can use jcf_molconvert instead. But why doesn't it work "all in one" as it were (as you suggested)? The scarey thing is, I've a database of just under 2.2M structures which I need to apply this routine to, to ensure I've not got any duplicate structures in there (as standardization appears to have changed a great deal since we implemented this database; originally, it was version 3.1.5 and I note now that I get different unique SMILES now, with version 3.2.5 than I did with 3.1.5). As my unique SMILES are changing, I'm worried I'm going to have to deal with a number of compounds which have been registered which are duplicate but which weren't spotted because the unique SMILES weren't unique. But, perhap using jcf_molconvert in addition to jcf_Standardize will be the answer.





Thanks for your help.





Phil.

User c0e481a82c

23-05-2007 13:55:47

Hi Peter,





Just got your PS after I sent my reply! I see, so that explains that. So for now, I should use jcf_Standardize followed by jcf_molconvert, or specify a separator of "!" to avoid the bug. Got it.





Thanks for your help.





Regards,





Phil.

ChemAxon aa7c50abf8

23-05-2007 14:02:12

Quote:
followed by jcf_molconvert, or specify a separator of "!" to avoid the bug.
Yes either add jcf_molconvert or use the separator plus the dummy cleaningTemplate option. (The separator plus the dummy cleaning template option is just a temporary workaround -- but will be faster than having an extra the jcf_molconvert call.)





Thanks


Peter

ChemAxon aa7c50abf8

27-06-2007 14:51:40

JChem 3.2.7 has been released with the fix for the problem "jc(f)_standardize doesn't work when options are specified in addition to 'config'"