Search results

User b60e1d3756

25-06-2010 10:42:17

Hello,


I search in DrugBank database for identical compounds using JChemSearch.


The problem is that paris when I set STEREO_EXACT  these compounds are found to be similar:




DB01687,DB02061


DB01687,DB02743


DB01687,DB03323


DB01687,DB04465




When creating structure table, where I imported the compounds from the DrugBank, I used standardizer:



<Sgroups ID="Ungroup" Act="Ungroup"/>


<Tautomerize ID="Tautomerize"/>


<Aromatize ID="aromatize"/>


<RemoveExplicitH ID="RemoveExplicitH"/>


If any further information is needed please let me know.


With the best wishes,


Albina


ChemAxon a3d59b832c

28-06-2010 09:46:29

Hi Albina,


 


Could you attach the structures as well?


 


Thanks,


Szabolcs

User b60e1d3756

29-06-2010 07:58:13

Hello,


these are the examples

ChemAxon a3d59b832c

03-07-2010 14:32:01

I am sorry for the late answer.


 


I am a little bit confused by these test data.


The molecules contain no stereo information, so it is correct that they are recognized as duplicates.


 


However, some fields of the SD file suggests that those structures should have stereo information.


(For example, generic name, iupac name, isomeric smiles.) Other fields, such as InChi, canonical smiles contain no stereo information again.


 


Best regards,


Szabolcs

User b60e1d3756

06-07-2010 11:38:51

Dear Szabolcs,


I am sorry, i am bad in explaining. They are found to be similar only when I put stereo_exact, but if I set stereo_ignore then they are not found to be similar.


The same with other settings for search. What actually do is I try to merge the compounds (find identical ones and place them into one line of output file) with different settings. 1. First I created the file (with structural correspondences) where I searched with stereochemistry on and other parameters ignored( isotop, radical and charge). 2. And then I created the file where all the parameters are ingored. I compare these two files. Logically all the correspondences from the first file should be in the second file. but it is not the case. some of the correspondences I gave here. But I have more examples. Maybe I do smth wrong?


With the kindest wishes,


Albina

ChemAxon a3d59b832c

07-07-2010 11:31:24

Dear Albina,


 


I am sorry, but I am still confused.


Is that correct that you insert those structures into a JChem Base table, and then  search the table one-by-one using duplicate search + several other options?


The standardizer configuration is attached to the database table, correct?


(So it is not applied to the input directly, before inserting the structures.)


 


Is your input the structure in the SDF, or one of the fields?


 


Perhaps, do you have some code excerpts or scripts about what you are doing?


 


Thanks,


Szabolcs

User b60e1d3756

07-07-2010 13:20:03

I use api functions for java to create a table and import sdf. a standardizer is applied to each structure when importing a molecule (I do it one by one). My input structure is SDF. I perform FULL search with different parameters.


searcher.setConnectionHandler(ch);
            searcher.setQueryStructure(mol);
            searchOptions.setMaxResultCount(maxResCount); // !!! to eliminate to many matches to small structure with R-group, can be changed for different databases.
           
            searchOptions.setSearchType(searchType);
            searchOptions.setStereoSearchType(stereo);
            searchOptions.setChargeMatching(charge);
            searchOptions.setIsotopeMatching(isotop);
            searchOptions.setRadicalMatching(radical);
            searchOptions.setVagueBondLevel(vaguebond);
            // to merge R-groups only with single atoms
            searchOptions.setUndefinedRAtom(SearchConstants.UNDEF_R_MATCHING_UNDEF_R);            searcher.setSearchOptions(searchOptions);
            searcher.setStructureTable(tblName);
            searcher.setRunMode(JChemSearch.RUN_MODE_SYNCH_COMPLETE);


            //System.out.println("start search " + mol.getName());
            searcher.run();
           


 

User b60e1d3756

07-07-2010 13:22:52

sorry, i pressed and it posted before I finished formatting.


in the given code stereo, charge, isotop, can be ON or IGNORE.


If any other information needed please let me know. Sorry for taking so much time


Albina

ChemAxon a3d59b832c

07-07-2010 16:01:19

a standardizer is applied to each structure when
importing a molecule (I do it one by one).


OK, then I assume that the same standardization is performed on the query as well before searching, right?


 


I perform FULL search with different
parameters.


That should be OK, but probably better to run duplicate search for deciding molecule equality.


See: http://www.chemaxon.com/jchem/doc/dev/dbconcepts/index.html#searchtypesandoptions


Duplicate search also supports all those options.


 


My input structure is SDF.


I assume that means the structure of the SDF, and not a field.


But the strange thing that I still don't understand is that your sdf input does not contain any stereochemistry. So no matter how you set option "setStereoSearchType", those structures should be found - i.e. equal.


(See the picture below. It is your file opened in Marvin View. I just renamed .txt to .sdf.)

User b60e1d3756

08-07-2010 10:06:30

yes, everything is correct.


We had discussion on using FULL search or DUPLICATE:


"Unfortunately, with the "duplicate search uses tautomers" table option,stereo = off search option is not available." (https://www.chemaxon.com/forum/ftopic6243.html)


This is exactly the problem. The compounds should not be anyhow affected by the settings of stereochemistry, but they are still not found.


Did you try to perform the search with JChemSearch? did it work for you?


With the kindest wishes,


Albina

ChemAxon a3d59b832c

13-07-2010 13:43:18

Hi Albina,


I could not reproduce the problem. Please see the attached program and output.


Best regards,


Szabolcs

User b60e1d3756

15-07-2010 16:04:43

I have found my mistake. Sorry for taking time. I think this is very classical on the forums (((( sorry again.

ChemAxon a3d59b832c

16-07-2010 06:58:35

No problem. I am happy that this is solved.


 


Best regards,


Szabolcs

User b60e1d3756

26-07-2010 18:12:18

Dear Szabolcs,


I am now with another question. I hope it will not take much of you time.


When I am searching with the compound C02579 in my database with the same program, stereo_ignore, it returns me also entries: DB01982, DB04303, C03033(in attachment). They are very different. I should merge them later into one entry of my output. I want to have only identical compounds in each entry. DB01982 and DB04303 should not be together. I already put searchOptions.setUndefinedRAtom(SearchConstants.UNDEF_R_MATCHING_UNDEF_R);


What should I do if I dont want to receive DB01982 and DB04303 when searching with C02579 or C03033? I dont want polymers be matched against non-polymers and R-groups be matched against other atoms.


With the best regards,


Albina

ChemAxon 42004978e8

27-07-2010 05:53:11

Hello Albina,


I checked the structures you've sent.


Except DB01982 the other 3 structures match only themself. In case of full search even this matches only itself. I tried stereo-specific, stereo-ignore search options and full and substructure search types. The DB table was of type any structures. What were your settings? The previously described options?


Did you search the other 3 structures? Could you verify the search options and the type? What table type did you use?


Bye,


Robert

User b60e1d3756

27-07-2010 09:04:10

Hello


Thank you very much for the reply, Robert!


I suppose the problem is with C02079. I attach how the structure looks like after standardizer. I use 2 of them. One for structure itself(aromatize, tautomerize, removeExplicitH) and another for elimination of the solvents(from Chemaxon documentation).


I perform FULL search with:


JChemSearchOptions.STEREO_IGNORE;
JChemSearchOptions.CHARGE_MATCHING_IGNORE;
JChemSearchOptions.ISOTOPE_MATCHING_IGNORE;
JChemSearchOptions.RADICAL_MATCHING_IGNORE;


For C02079 I get DB01982, DB04303, C03033. Searching with the DB01982, DB04303, C03033 as query returns me only the compounds themselves.


With the kindest wishes,


Albina


 

ChemAxon 42004978e8

28-07-2010 08:04:54

Hello Albina,


I tried the same standardization steps that you have described but couldn't get the same standardized result. It looks for me quite strange that polymer brackets are lost. I got different results with the same actions. Could you send me the standardization configuration that you use (xml files)?


Please also write me which jchem version you use. 


Thanks,


Robert

User b60e1d3756

28-07-2010 08:26:14

Hello Robert,


 I attached 2 standardizers. When importing the molecules I read them one by one:


mol = importer.read();
try {if (mol.getFragCount() > 1 && mix_standardizer != null) mix_standardizer.standardize(mol);} catch(Exception e) {Logger.getLogger("Database").info("Failed to mstandardize: " + compID);}
try {if (standardizer != null) standardizer.standardize(mol);} catch (Exception e) {Logger.getLogger("Database").info("Failed to standardize: " + compID);}


molString = mol.toFormat("mol");
uh.setStructure(molString);
try { uh.execute(); } catch(Exception e) {Logger.getLogger("Database").info("Error reading: " + compID);}


// here transport other fields from sdf file into other SQL tables


Maybe it can be somehow optimized, but this is later. I want to find what is wrong. if I am not mistaken the version is 5.3.4.


When searching with the standardized structure did you receive all the other structures(DB01982, DB04303, C03033) as a result?


Thank you very much for helping me!


Albina

User b60e1d3756

28-07-2010 09:26:22

Hello Robert,


I checked myself. For me the brackets are lost after standardizer for structure. I think the line:


    <Sgroups ID="Ungroup" Act="Ungroup"/> 


is responsible for the loss of brackets. Is it the same for you? Should it work like that?


Albina

ChemAxon 42004978e8

28-07-2010 13:48:14

Hi,


 


No, polymers are kept for me with version 5.3.4. after standardizing with your xml files.


I still have the same search results.


Every structure retrieves only itself in full search, in substructure search the third retrieves all 4 structures the rest finds itself only.


What table type are you using?


Robert

User b60e1d3756

29-07-2010 11:12:49

Hello,


I havent changed a table type, so I suppose it is Molecules. Where can I check it?


I printed out the molecule in mol format after each step. So the lines from the mol file that describe brackets disappeared after Structure standardizer. When I removed the line with Ungroup from the standardizedr the lines remaned and the imported molecule contained brackets.


I will be looking closely to the results the next days. Maybe I will find smth.


Thank you for spending time for my problem.


With the kindest wishes,


Albina

ChemAxon 42004978e8

30-07-2010 07:11:34

Hi,


I'm checking when the ungroup behaviour was changed.


However this alone won't solve the problems. I searched the DB with you standardization result , but it didn't retrieve the other 3 structures.


Bye,


Robert

ChemAxon 42004978e8

30-07-2010 12:32:03

Hi Albina,


The ungroup behaviour was changed in version 5.3.2 since then ungroup doesn't delete polymer groups.


Please recheck your version, and the version with which you standardized the files (if not the same). I'll check then the matching behaviour with it.


Bye,


Robert

User b60e1d3756

04-08-2010 14:19:53

Hello,


Thank you for helping so much. Sofar it seems that structure search works properly.


I have a question on performance. As I wrote before I import molecules one by one: first standardizing them:
try {if (mol.getFragCount() > 1 && mix_standardizer != null) mix_standardizer.standardize(mol);} catch(Exception e) {Logger.getLogger("Database").info("Failed to mstandardize: " + compID);}
try {if (standardizer != null) standardizer.standardize(mol);} catch (Exception e) {Logger.getLogger("Database").info("Failed to standardize: " + compID);}
then transforming them into mol format to import with UpdateHandler:
molString = mol.toFormat("mol");
uh.setStructure(molString);
uh.execute();
and then extracting information from some fileds to insert them into other sql tables.
String synonyms = mol.getProperty("synonyms");
fillSynonyms(synonyms);


This takes much time for the importing SDF file. Is there any possibility to speed it up?
I was thinking to use importer, but the problem is the mix_standardizer(eliminating solvents from the mixtures). In that case I can not check the condition mol.getFragCount() > 1, which is important, because some molecules can represent solvent from the list of mix_standardizer. Also I am not sure that it will be faster and I will be able to access "synonyms" field.


With the kindest wishes,
Albina

ChemAxon 42004978e8

05-08-2010 08:59:10

Hi Albina,


 


First try find out which part is the bottleneck, the standardization+import or the property string processing. (Comment out  the first then the second and measure the times.)


If the import is the slower part, then you could use importer as you mentioned. The standardization is not a problem, you can achieve it by creating the relevant table with your configuration file (have aromatization,ungroup and solvent removal in one standardization configuration xml file - you can create this with the standardizer GUI.) In this case every imported structure will have this standardization operation executed - I guess this is acceptable and desireable for you, isn't?


Bye,


Robert