Overlap analysis - ChemAxon Forum Archive

User b60e1d3756

17-03-2010 10:10:41

Hello,

Hope this will be useful.

I have KEGG Drug database. I want to see overlap with itself. First I get several exceptions on polymers:

Error: Unexpected error during search for structure 93
java.lang.IllegalArgumentException: Illegal range format: m
for D00101, D00304, D00361, and it doesnt put zero as number of hits for those compounds. But after those exceptions it goes on searching until it finds the compound with R-group:

Undefined R-atoms around stereo centers are not allowed in query.
for example for entries D00804, D01063, D02268, D04037

After that it stops.

There are too many such compounds in the databases I use. I cannot eliminate those by hand.

So what I did next:

I teliminated polymers from my database. It worked well without complaining on R-groups.

Then I tried to take only polymers(I created a new table with only polymers that were eliminated on the first step) and search the FULL database with them. It throws exceptions on R-groups, but here they were less, I eliminated those, and got the results.

For the results with polymers I should have got the entry itself as one of the hits.(entry from Polymer table should match itself in the Full table). For most compounds it was like that. For some not: D05566,D05567, D05914, D06248

I hope explained everything clear. If not please let me know. I use Cartridge for my Master Thesis and I am one of the first who benefits from its improvement.

ChemAxon fa971619eb

17-03-2010 14:02:31

Can you provide a link to the KEGG dataset invoved so that we can investigate.

Tim

User b60e1d3756

18-03-2010 09:37:37

i think this one in attachment

ChemAxon fa971619eb

18-03-2010 12:06:41

Thanks for that data.

We have tracked down the problem and Overlap Analysis will now survice these sorts of errors and continue.This fix will be available in the next release.

We will also try to track down the root cause of the failed searches.

Tim

User b60e1d3756

13-04-2010 11:42:04

Hello,

I would like to access overlap analysis through API. Is it possible?

Another question is there any link how to start with API in java?

best regards,

Albina

ChemAxon fa971619eb

13-04-2010 16:16:29

Yes, it might be possible to run overlap analysis using the API. Are you wanting to do this because you have lots of different data sets (strucutre tables) that you want to compare?

If you could give us a brief idea of what you are wanting to achieve then we can look into how it might be done.

For an idea of how to get a connection to an IJC database using the API see this forum topic:

https://www.chemaxon.com/forum/ftopic6029.html

Tim

User e34a92cce5

16-04-2010 16:14:43

Hi,

I have a list of SMARTS that are potentially considered reactive filters. I would like to run them against my main compound database table to see which compounds carry these reactive filters in them. I cannot do a substructure search using the filter table as my query and the compound table as my target, since that would essentially give me a list of compounds that hit each filter within the filter table. Now, if I choose my compound table as query and filter table as target, then will a superstructure search yield the desired results where I get the list of filters that each compound carries.

My desired result is Compound 12345 carried Filter number 123 being reported in the compound table; not the other way around (i.e. Filter number 123 is present in Compound 12345..). Am I making sense?

ChemAxon fa971619eb

16-04-2010 16:26:14

Yes, this is an interesting scenario, and one that overlap analysis is the ideal solution to.

I think you need to:

1. import your smarts strings into a structure table. Make sure you select the 'Query structures' table type.

2. Start overlap analysis and choose your compound table as the query and the smarts table as the target table. Choose Superstructure as the search type.

The output in each row of the compounds table should be the IDs of all the the rows in the smarts table that match the structure.

Let us know how it goes.

Tim

User e34a92cce5

16-04-2010 16:44:55

Thanks, Tim. I did that and wanted to check if that was what superstructure search was meant for. I found some conflicts when I compare the results of this search with one using jcsearch on command line. I used a command line script -

(for /F "" %i in (filters.smarts) do jcsearch -q "%i" -t:s -f smarts:TCD_ID Compounds.sdf) > filters_actives_all.csv

For the most part, it looks like they match, but there are certainly a few that don't.

Also, while trying to connect to an Oracle DB, IJC gives me the following error.

Error creating bean with name 'dbSchemaPersistence' defined in class path resource [META-INF/spring/services-template.xml]: Cannot resolve reference to bean 'dbSchemaManager' while setting constructor argument; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dbSchemaManager': Autowiring of fields failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.im.ijcs.api.ddl.JChemTableManager com.im.df.impl.db.DBSchemaManager.tableManager; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'tableManager' defined in URL [jar:file:/C:/Program%20Files/ChemAxon/InstantJChem/instantjchem/modules/com-im-df-server-local.jar!/META-INF/spring/server-duplicate-temporary-context.xml]: Instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public static com.im.ijcs.impl.ddl.JChemTableManagerImpl com.im.ijcs.impl.ddl.JChemTableManagerImpl.create(com.im.commons.db.DatabasePlatformX)] threw exception; nested exception is java.lang.NullPointerException

Is this a license issue? I recollect being able to connect with the earlier releases. My license is an acad license expiring in 2011-05-06 and it says server use not allowed.

ChemAxon fa971619eb

18-04-2010 07:23:34

For the differences in the search results, the options you are giving to jcsearch are for a substructure search (t:s). To specify a superstructure search you should use t:u.

This should give the same results. If not then please give us example structures.

On the problem of connecting to Oracle, yes, you will need an IJC enterprise license to connect to an Oracle or MySQL database. You will either need to purchase this or obtain one under the academic program. But I don't think the error is related to this. It is more symptomatic of the IJC metadata tables being inconsistent. Errors like this happen when some of the IJC_* tables and sequences are present but some are mising. To correct this try deleting all the tables and sequences named IJC_* directly in Oracle (don't forget the sequences) and then use the 'New schema wizard' to create a new IJC schema in Oracle this will create all the IJC metadata tables from scratch. Of course this will mean you loose you existing IJC configurations in that database, but none of your data tables.

If this doesn't help then plese report the full errors from the log file (View -> InstantJChem Log File) and information about your IJC version (Help -> About Instant JChem)

Tim

User e34a92cce5

19-04-2010 20:45:41

Hi Tim,

I wanted to compare the results of my filter based substructure search with the IJC superstructure search (since the query-target option doesn't work for my substructure search). Hence the t:s option when I did the command line search. I'll send you examples of compounds that had conflict.

On the IJC connection issue, I deleted all tables, sequences, indices, view, triggers. Made sure that select * from all_tables where table_name like '%IJC%' returns null. Tried to connect using the new schema interface. Got this error:

Caused: java.io.IOException: Error creating bean with name 'dbSchemaPersistence' defined in class path resource [META-INF/spring/services-template.xml]: Cannot resolve reference to bean 'dbSchemaManager' while setting constructor argument; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'dbSchemaManager': Autowiring of fields failed; nested exception is org.springframework.beans.factory.BeanCreationException: Could not autowire field: private com.im.ijcs.api.ddl.JChemTableManager com.im.df.impl.db.DBSchemaManager.tableManager; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'tableManager' defined in URL [jar:file:/C:/Program%20Files/ChemAxon/InstantJChem/instantjchem/modules/com-im-df-server-local.jar!/META-INF/spring/server-duplicate-temporary-context.xml]: Instantiation of bean failed; nested exception is org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public static com.im.ijcs.impl.ddl.JChemTableManagerImpl com.im.ijcs.impl.ddl.JChemTableManagerImpl.create(com.im.commons.db.DatabasePlatformX)] threw exception; nested exception is java.lang.NullPointerException
        at com.im.ijc.core.wizards.schema.NewSchemaWizardIterator.instantiate(NewSchemaWizardIterator.java:151)
        at org.openide.loaders.TemplateWizard.handleInstantiate(TemplateWizard.java:588)
        at org.openide.loaders.TemplateWizard.instantiateNewO
bjects(TemplateWizard.java:409)
        at org.openide.loaders.TemplateWizardIterImpl.instantiate(TemplateWizardIterImpl.java:248)
        at org.openide.loaders.TemplateWizardIteratorWrapper.instantiate(TemplateWizardIteratorWrapper.java:161)
        at org.openide.WizardDescriptor.callInstantiateOpen(WizardDescriptor.java:1527)
        at org.openide.WizardDescriptor.callInstantiate(WizardDescriptor.java:1481)
        at org.openide.WizardDescriptor.access$1700(WizardDescriptor.java:127)
[catch] at org.openide.WizardDescriptor$Listener$2$1.run(WizardDescriptor.java:2052)
        at org.openide.WizardDescriptor$Listener$2.run(WizardDescriptor.java:2101)
        at org.openide.WizardDescriptor$7.run(WizardDescriptor.java:1413)
        at org.openide.WizardDescriptor.lazyValidate(WizardDescriptor.java:1453)
        at org.openide.WizardDescriptor.access$1300(WizardDescriptor.java:127)
        at org.openide.WizardDescriptor$Listener.actionPerformed(WizardDescriptor.java:2108)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
        at java.lang.reflect.Method.invoke(Unknown Source)
        at org.openide.util.WeakListenerImpl$ProxyListener.invoke(WeakListenerImpl.java:451)
        at $Proxy23.actionPerformed(Unknown Source)
        at javax.swing.AbstractButton.fireActionPerformed(Unknown Source)
        at javax.swing.AbstractButton$Handler.actionPerformed(Unknown Source)
        at javax.swing.DefaultButtonModel.fireActionPerformed(Unknown Source)
        at javax.swing.DefaultButtonModel.setPressed(Unknown Source)
        at javax.swing.plaf.basic.BasicButtonListener.mouseReleased(Unknown Source)
        at java.awt.Component.processMouseEvent(Unknown Source)
        at javax.swing.JComponent.processMouseEvent(Unknown Source)
        at java.awt.Component.processEvent(Unknown Source)
        at java.awt.Container.processEvent(Unknown Source)
        at java.awt.Component.dispatchEventImpl(Unknown Source)
        at java.awt.Container.dispatchEventImpl(Unknown Source)
        at java.awt.Component.dispatchEvent(Unknown Source)
        at java.awt.LightweightDispatcher.retargetMouseEvent(Unknown Source)
        at java.awt.LightweightDispatcher.processMouseEvent(Unknown Source)
        at java.awt.LightweightDispatcher.dispatchEvent(Unknown Source)
        at java.awt.Container.dispatchEventImpl(Unknown Source)
        at java.awt.Window.dispatchEventImpl(Unknown Source)
        at java.awt.Component.dispatchEvent(Unknown Source)
        at java.awt.EventQueue.dispatchEvent(Unknown Source)
        at org.netbeans.core.TimableEventQueue.dispatchEvent(TimableEventQueue.java:104)
        at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
        at java.awt.Dialog$1.run(Unknown Source)
        at java.awt.Dialog$3.run(Unknown Source)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.awt.Dialog.show(Unknown Source)
        at org.netbeans.core.windows.services.NbPresenter.superShow(NbPresenter.java:985)
        at org.netbeans.core.windows.services.NbPresenter.doShow(NbPresenter.java:1019)
        at org.netbeans.core.windows.services.NbPresenter.run(NbPresenter.java:1007)
        at org.netbeans.core.windows.services.NbPresenter.run(NbPresenter.java:115)
        at org.openide.util.Mutex$1AWTWorker.run(Mutex.java:1370)
        at java.awt.event.InvocationEvent.dispatch(Unknown Source)
        at java.awt.EventQueue.dispatchEvent(Unknown Source)
        at org.netbeans.core.TimableEventQueue.dispatchEvent(TimableEventQueue.java:104)
        at java.awt.EventDispatchThread.pumpOneEventForFilters(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForFilter(Unknown Source)
        at java.awt.EventDispatchThread.pumpEventsForHierarchy(Unknown Source)
        at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
        at java.awt.EventDispatchThread.pumpEvents(Unknown Source)
        at java.awt.EventDispatchThread.run(Unknown Source)

ChemAxon fa971619eb

20-04-2010 11:22:31

Relating to the error connecting to Oracle, please could you provide us with:

1. version info about IJC. Help->About Instant JChem and copy and paste the contents

2. the full log file when you try to set up the connection. The log file can be seen using View->Instant JChem log file. Select the whole content and paste it to file and attach here.

Thanks

Tim

User b60e1d3756

22-04-2010 12:30:39

Hi Tim,

Thanks for the help.

What I want to do:

I have input databases of chemicals from different online resources and I would like to see the interlinks between structurally similar compounds from different databases. These interlinks I would like to use to get a file where each line would contain IDs of the compounds possessing the same structure.
I have two ideas how to do that:
1. To read whole databases in SDF format, carry out pairwise overlap analysis and then process those overlaps to get the final file.
2. To read each molecules from one SDF file and search for similar in another SDF file.
Which of them will be more efficient? Maybe you have some other ideas?

With the kindest wishes,
Albina

ChemAxon fa971619eb

23-04-2010 09:59:22

This is an interesting problem, and not trivial to solve.

You certainly could read each SDF file into a separate table and then do overlap analysis for each combination of tables, but whilst this would be OK for just a few tables it would lead to a combinatorial explosion as the number of files/tables increased, and so would not really be practical, even if you were to automate some of the analysis using the API.

So I think the only real way of approaching this is to read all the the structures into the same table, and performing overlap analysis of that table against itself. Doing this in IJC is possible, but would have some limitations.You would need to import all your SD files into the same table and include the supplier info (the source name and the source code) as a common field (this can be some when importing by mapping the source code to the correct field, and providing a default value for the source name). Then you could perform overlap analysis for the whole dataset, both for dupolicates and for similarity. A good enhancement would be to generate an extra field that combines the source name and code and user that as the field to output when running the overlap analysis, but this would require creating this merged field in the database first.

Other more sophisticated approaches would also be posible, but this would certainly require doing things outside of IJC. e.g. create some custom load scripts that load your SD files into a relational data model, including things like duplicate filtering etc. and then perfoming some form of similarity analysis. This will be a lot more complex, but I'll be talking about some things like this at the European UGM next month, so if you can do come along!

https://www.chemaxon.com/events/2010-eugm/

Tim

User b60e1d3756

26-04-2010 10:43:57

Hi Tim,

Great idea, thank you!

What API function is responsible for overlap analysis?

Albina

ChemAxon fa971619eb

29-04-2010 09:56:48

Sorry for the delay in responding. We've been discussing how to provide an example for this, but it will be quite complex to do. Also, before we can prepare an example we will need to know more about how you want to do this - do you want to be able to create your own user interface within IJC (e.g. add you own menu item that openes a dialog that lets you specify what you want to do) or are you wanting to run this as an external program that access the IJC database without running the IJC application. These are two quite different cases, and the approach taken will be quite different.

Also, I should point out that if you are going to take the approach of loading all your different files into a single table and then performing overlap analysis on this one table then there should be no real need to automate this (the work involved would greatly outweigh the benefit).

Tim

User b60e1d3756

29-04-2010 10:55:04

Hello,

I would like to create my own program where I would like to perform oderlap analysis on the database (SDF file) and then do further things with this overlap, like creating a file where all the identifiers for identical structure will be in one line, compare this file with other files created basing on different approaches. This procedure should be done several times with different parameters for overlap analysis. Have it all in one workflow and just changing parameters will make it easier (I hope)

ChemAxon fa971619eb

30-04-2010 13:09:00

Thanks for the info. We will try to prepare an example for this. It will take a few days to do this.

Tim

User c3397108ba

04-06-2010 09:30:15

Dear ChemAxon,

This topic has been already brought up here by renjntj but in that discussion my issues have not been answered yet.

I have a db of queries (substructures) and a database of molecules. I want to flag the molecules that contain the queries with a text field from the query database "filter".

There are two ways to do in now using Instant JChem's overlap analysis:

1. use the molecule database as the query and your queries database as target in the overlap analysis and run a SUPERSTRUCTURE overlap analysis.

the problem with this first option is that more queries from the query database match some of the molecules (Structure1, Quer1 and Query2 in the attached SDF) than what would have been matched on a SUBSTRUCTURE run query on molecule. Also, other queries do not get matched to the molecules (MoleculeA, QueryA and QueryB in the attachement) while those would have match on SUBSTRUCTURE search.

2. Because of the issues with the 1st option, I normally first run overlap analysis where I use the query database as the query and the molecule database as the target in the overlap analysis and run a SUBSTRUCTURE overlap analysis. Then I export a table with the "filter" field and list of CdId of molecules that hit that filter (so, the “Overlap hits” field). After some Excel manipulations, the table can be imported to the molecule database using CdId to match the "filter" to the molecules in the database.

the problem with this second option is that if the molecule database is rather large (~100k) and there are sufficient number of matches in the queries (e.g. there are ~1000 structures containing hydrazine, which is one of the filters), quickly the 4000 character long text list field in the query database to store Overlap hits is not long enough and Instant JChem returnes an error. What one can do is to split the molecule database and run multiple overlap analysis runs which is cumbersome.

Q: intermediate solution: can the text field list be longer? Can I first create a new text list field (where I have the option to make it longer) and point the overlap analysis to put the results there?

Q: a real solution: would it be possible to run a SUBSTRUCTURE overlap analysis where the query database is the query and the molecule database is the target but the match (overlap) is written out into the TARGET database?

Looking forward to hearing your thoughts on this issue!

Thanks in advance,

Anna

ChemAxon fa971619eb

04-06-2010 13:44:34

Hi Anna,

You describe an interesting problem.

The 4000 character limit is because it is the largest size that can easily be handed on all database types we use, becuase of Oracle's limit. We can certainly increase this, but it will mean using a CLOB column for Oracle. We will look into doing this. Which database type do you use?

As for writing the results to the taget table instead of the query table, then I guess you mean that if you found hits with IDs 100, 110 and 138 then the ID of the query would be appended to the output of those rows in the target table? Assuming so, then this should be possible, but it might substantially slow down the process as many more updates to the DB would be needed.

As an alternative, we we planning to provide an alternative output mode for overlap analysis where the query and target IDs were added to a new table that could be used as a join table for a many-to-many relationship between the query and target tables. Might this be a better alternative for you?

Tim

User c3397108ba

07-06-2010 07:50:44

Hi Tim,

Thanks for your quick reply.

We are using Oracle and I know that there is the limit of 4000 characters. Sometimes, I also run overlap analysis on a local table saved on my Windows machine and there I don't have that limit.

Your plan to provide alternative output mode for overlap analysis where the query and target IDs were added to a new table that could be used as a join table for a many-to-many relationship between the query and target tables sounds like a very good option for me. I am assuming that the user in such case will have the possibility to choose which of the fields out of each table should go to the 'join table', am I assuming right?

If I wanted to achieve my 'put Overlap hits into TARGET database" I would import the 'join table' into my TARGET database using the Cdid of the target for matching and the filter field to be added. Is it possible to append filter names on such import?

Most important question: what is your planning for 'join table'? When should it become available?

Thanks in advance for your help.

Kind regards,

Anna

ChemAxon fa971619eb

08-06-2010 13:18:56

Hi Anna,

Your plan to provide alternative output mode for 

overlap analysis where the query and target IDs were added to a new 

table that could be used as a join table for a many-to-many relationship

 between the query and target tables sounds like a very good option for 

me.

Good. I think this will be useful.

I am assuming that the user in such case will 

have the possibility to choose which of the fields out of each table 

should go to the 'join table'

Yes, you would be able to specify the fields to use.

If I wanted to achieve my 'put Overlap hits into 

TARGET database" I would import the 'join table' into my TARGET database

 using the Cdid of the target for matching and the filter field to be 

added. Is it possible to append filter names on such import?

Its probably possible to do this using export and import, but I would think its much easier to do directly in the DB using SQL - add extra column to target table and concatenate the query values in the join table using a SQL statement.

Most important question: what is your planning 

for 'join table'? When should it become available?

Unfortunately, it won't be soon :-( We have a very full task list. But your vote for this is noted.

Tim