How to delete similar structures after overlap analysis

User 955c6e7d6d

14-01-2013 11:50:43

Hi,


I just did an Overlap Analysis between two databases and from the results that were returned, I would like to delete the structures that are similar in both databases. I was wondering if I can somehow delete all the structures that have a value in the Overlap Count column (a value of 1 or more) from my database (these structures are similar to those in the other database that I have already virtually screened). My database has 3,000,000+ compounds so I don't want go through them and manually delete all the similar ones! 


Any help on how to do this in IJC will be highly appreciated. 


Thanks!


Rcamara

ChemAxon 2bdd02d1e5

14-01-2013 18:56:13

Hi Rcamara,


I think this could be quite easy. Even so it is manual delete.


Search in Overlap Count column for values greater or eaqual to 1, ie. ">= 1".  Then in Browse mode select all rows found and delete them.


Filip

User 955c6e7d6d

15-01-2013 17:06:50

Hi Filip,


Thanks for the help. I tried deleting the rows but first I got a warning saying deleting more than 1000 cannot be undone which is fine. However, when I said Ok, I got an error saying it couldn't delete (see attached log message). I have made screen prints of the other errors as well if you would like me to send them to you. In my case I wanted to delete about 2,000,000+ rows. Is that possible?


Rcamara

ChemAxon 2bdd02d1e5

15-01-2013 18:18:23

Hi Rcamara,


this could be problem. Instant JChem has some troubles to delete more than appx. 50 000 rows on Derby (local) DB.


The only solution, I can see, is to write a script which delete those duplicated rows. However this requires some knowledge of IJC architecture and Groovy scripting language.


For inspiration, some script examples can be found at
http://www.chemaxon.com/instantjchem/ijc_latest/docs/developer/index.html 


but there is no "Delete row" example.


Filip

User 955c6e7d6d

16-01-2013 14:15:22

Hmm...that will be a bit tricky. I will have to learn how to write scripts but for now, I want to know if there is a way that I can save the other structures that I actually need without going through deleting the rest of the 2,000,000 compounds (since I can't do this easily)?


I selected the structures I needed (about 760,000+) but I couldn't find a way of saving them as a new project or database. I can't seem to be able to copy them neither do I have the option to click Save As to save them for example as an SD file. 


I think IJC did a brilliant job so far but I have to screen the 760,000+ compounds so I need to find a way of removing them from IJC as SD File or another readable format for virtual screening purposes. 


If you have any suggestions with regards to how to go about doing this that will be really helpful. 

ChemAxon 2bdd02d1e5

16-01-2013 14:45:17

Yes, I understand, it's not an easy task. Probably easier to export it as SDF file and then import to a new project. The data from IJC can be exported by File -> Export to file... (6th icon in main toolbar).


The tricky thing is that it does not export selected compound but all coumpound found by a query. Firstly, please define your query so it retrieves all compounds except duplicates (use Overlap count field). Then export to SDF ... etc.


Please let me know if this procedure finally works as you expect.


Cheers,
Filip