Technical Support Forum Index
Technical Support Forum
Access ChemAxon scientists and developers here. For registration and login issues contact website support.

Support Ticket System is replacing forum

This forum was converted into a searchable archive. You cannot add posts here any more. For support please use our new Ticket System.

Create your first ticket
how to filter duplicate items
To watch this topic for replies  Register (enables digests) or give email address:
This topic is locked: you cannot edit posts or make replies.
Display posts from previous:   
    View previous topic :: View next topic    
Author Message
li

Joined: 18 Sep 2011
Posts: 25

View user's profile

Back to top
Link to postPosted: Tue Nov 29, 2011 8:27 amPost subject: how to filter duplicate items Reply with quote

Dears:

how to filter duplicate items

 

for example
I want to filter duplicate items by structure、 Cas Reg No.、chemical name , or other items
how to carry out it?

 

can you add the filter duplicate function in the right-click?

 

when I click the filter duplicate,all the same items  are shown to me。I can selectively choose to delete or modify there  manually。




 Filename: 11.jpg    Filesize: 31.26 KB    Viewed: 14641 Time(s)
 Description:  
11.jpg
Tim
ChemAxon personnel
Joined: 05 Oct 2004
Posts: 1703

View user's profile

Back to top
Link to postPosted: Wed Nov 30, 2011 9:50 amPost subject: Reply with quote

You can do duplicate filtering at the the structure level. To do this go to the schema editor and look at the properties of your structure entity and check the duplicate filtering option. Once this option is saved no more duplicates can be added to. Trying to add a duplicate will fail. This option can also be set when you create the table/entity so preventing any duplicates. This option works at the structure level. So if you try to enter the same structure in two different formats it will still be considered a duplicate.

To prevent duplicates of text fields like CAS number then you can create a unique index on the column in the database. Again this can be done in the schema editor. Find the field, right click on it and choose 'New index' and then make sure that you check the 'Unique keys' option.

Finally, if your entity/table already has duplicate structures then you can use the Overlap analysis function to identify them: http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/chemistry_functions/performing_overlap_analysis.html

TIm

li

Joined: 18 Sep 2011
Posts: 25

View user's profile

Back to top
Link to postPosted: Mon Dec 05, 2011 2:15 pmPost subject: Reply with quote

thanks .

jiye

Joined: 01 Nov 2011
Posts: 37

View user's profile

Back to top
Link to postPosted: Fri Dec 16, 2011 8:51 pmPost subject: can not find duplicate filtering option Reply with quote

 

Hi Tim,

I have the similar question to remove the duplicate item. I can not find the duplicate filtering option to check at eh properties of structure entity




 Filename: 1.png    Filesize: 107.74 KB    Viewed: 14526 Time(s)
 Description:  
1.png
fzimandl

Joined: 23 Jun 2011
Posts: 434

View user's profile

Back to top
Link to postPosted: Mon Dec 19, 2011 8:55 amPost subject: Reply with quote

You can find it in Data trees editor. See the image...

Filip




 Filename: DataTreesEditor.png    Filesize: 128.04 KB    Viewed: 14517 Time(s)
 Description:  
DataTreesEditor.png
Tim
ChemAxon personnel
Joined: 05 Oct 2004
Posts: 1703

View user's profile

Back to top
Link to postPosted: Mon Dec 19, 2011 9:58 amPost subject: Reply with quote

Yes, its a property of the entity (and also shown in the data tree editor). It is not a property of the structure field.

Tim

li

Joined: 18 Sep 2011
Posts: 25

View user's profile

Back to top
Link to postPosted: Fri Mar 02, 2012 2:25 pmPost subject: Reply with quote

tdudgeon wrote:

You can do duplicate filtering at the the structure level. To do this go to the schema editor and look at the properties of your structure entity and check the duplicate filtering option. Once this option is saved no more duplicates can be added to. Trying to add a duplicate will fail. This option can also be set when you create the table/entity so preventing any duplicates. This option works at the structure level. So if you try to enter the same structure in two different formats it will still be considered a duplicate.

To prevent duplicates of text fields like CAS number then you can create a unique index on the column in the database. Again this can be done in the schema editor. Find the field, right click on it and choose 'New index' and then make sure that you check the 'Unique keys' option.

Finally, if your entity/table already has duplicate structures then you can use the Overlap analysis function to identify them: http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/chemistry_functions/performing_overlap_analysis.html

TIm

Hi,  other questions

how to filte the duplicate text fields which are alread inputed into jchem.

how to delet empty text fields?

 

thanks 

fzimandl

Joined: 23 Jun 2011
Posts: 434

View user's profile

Back to top
Link to postPosted: Mon Mar 05, 2012 7:50 pmPost subject: Reply with quote

There is no simple way how to filter duplicate text items directly within the IJC. However there are various ways how to find duplicate items. Depends on what you want to do with duplicates(ignore multiple occurences automatically except the first or just identified them?). If you are using MySQL or Oracle DB the simplest way is to find them by using SQL command:
i.e. select field,count(FIELD) from TABLE group by FIELD having count(FIELD) > 1 and it will print you rows which have duplicates. Or it can be implemented by a Groovy script from IJC, for which you need some experiences with it.

Empty text fields can be found by running query "is null" on given field. Then these can be deleted by selecting and deleting multiple rows. Is that what you want to do?

Filip

li

Joined: 18 Sep 2011
Posts: 25

View user's profile

Back to top
Link to postPosted: Tue Mar 06, 2012 6:30 amPost subject: Reply with quote

Dear Filip :

Thanks for your reply.

Yes ,According to your instructions I  filted the empty text fields easily.

But ,I think it is very useful to jchem to add find empty structurse and duplicate text fields or strcutures function .As chemfinder,it has the function of find empty structures duplicates by unique structures only or clustered list. 

The current version of ijc,it is very difficult to find the empty structures and deplicate structures or text fields from thousants of items which are already exieted in a data tree.

For example,some compounds have structure but have deplicate catalog number ,some compounds have no structure but have other information. I want to find out these and modify 。

Thanks very much。

 

fzimandl

Joined: 23 Jun 2011
Posts: 434

View user's profile

Back to top
Link to postPosted: Tue Mar 13, 2012 2:09 pmPost subject: Reply with quote

Hello Lipan,

I'm sorry for the deleyed response, I was not available last week. I was thinking about other possibilities for you and have found solution to some problems.

Empty structures can be easily found when searching using  Chemical Terms. In query mode right click on the structure field and chose Chem Terms... as the expression write "(atomCount <1 )", leave the structure field empty and proceed with search. All rows with empty structures are retrieved.

Duplicate structures can be searched by Overlap Analysis (Chemistry -> Overlap Analysis). Set the same desired table as the Query table and as the Target table. Duplicate search mode will add the fields Overlap count and Overlap hits to the table. Then simply search on Overlap count field > 0. You can see ID of duplicates in Overlap hits field.

Searching duplicate text items is only possible via direct access to the database or by writing small script for it(would it be feasible for you?).

Thanks for your feedback.

Filip

 

li

Joined: 18 Sep 2011
Posts: 25

View user's profile

Back to top
Link to postPosted: Sat Mar 17, 2012 9:19 amPost subject: Reply with quote

Hi Filip

Thanks very much.

According to your introduction:

I can find empty structures useing mol weight <=1 or formula in null.But there is no right click list on the structure field.

I can use overlap analysis to find out duplicate structures,but still don't know how to find duplicate text. 

Best regards,

Petr
IJC personnel
Joined: 25 Jan 2006
Posts: 217

View user's profile

Back to top
Link to postPosted: Fri Mar 23, 2012 6:08 pmPost subject: Reply with quote

Hi,

the attached scripts can be used for finding duplicates in text or numeric fields. They can be used the following way:

  1. Create a new script under datatree where you want to filter duplicates (right click to datatree node and choose New script)
  2. Open one of the attached script and paste the text to editor
  3. Save the script and execute it
  4. Open gridview
  5. Go to Lists and Queries panel
  6. There should be a new temporary list called e.g. "Duplicates in Formula field (314)". Double click to this list and it will be applied to your current results
  7. Sort by that field where you are looking duplicates

Now you should see only rows where the value of the field (e.g. Formula) exist in more occurrences in the table. And the table is sorted by this field, so you see the duplicated values grouped.

There are two scripts attached. The simple one requires you specify field name inside the script. In example it's set to "Formula". The other one uses dialog chooser so it allows you to select a field where you are looking for the duplicates.

Let us know, please, if it works for you.

Petr




 Filename: FindDuplicatesWithChooser.groovy.txt    Filesize: 1.91 KB    Downloaded: 311 Time(s)
 Description:  2. Better version with a field chooser

 Filename: FindDuplicatesSimple.groovy.txt    Filesize: 855 Bytes    Downloaded: 286 Time(s)
 Description:  1. Simple version where field name is hardcoded inside script
Igor

Joined: 16 May 2011
Posts: 101

View user's profile

Back to top
Link to postPosted: Mon Mar 26, 2012 6:09 pmPost subject: Reply with quote

phamernik wrote:

Hi,

the attached scripts can be used for finding duplicates in text or numeric fields. They can be used the following way:

  1. Create a new script under datatree where you want to filter duplicates (right click to datatree node and choose New script)
  2. Open one of the attached script and paste the text to editor
  3. Save the script and execute it
  4. Open gridview
  5. Go to Lists and Queries panel
  6. There should be a new temporary list called e.g. "Duplicates in Formula field (314)". Double click to this list and it will be applied to your current results
  7. Sort by that field where you are looking duplicates

Now you should see only rows where the value of the field (e.g. Formula) exist in more occurrences in the table. And the table is sorted by this field, so you see the duplicated values grouped.

There are two scripts attached. The simple one requires you specify field name inside the script. In example it's set to "Formula". The other one uses dialog chooser so it allows you to select a field where you are looking for the duplicates.

Let us know, please, if it works for you.

Petr

Thanks for a great script!
 but I have a problem - it finds duplicates in all fields except the "structure"... but error does not give... what is wrong?

Thanks!

fzimandl

Joined: 23 Jun 2011
Posts: 434

View user's profile

Back to top
Link to postPosted: Mon Mar 26, 2012 6:45 pmPost subject: Reply with quote

It is not designated for searching duplicates among structures. There is overlap analysis for it. See http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/chemistry_functions/performing_overlap_analysis.html

Which error do you mean? I got this when trying to search structure field:
ERROR X0X67: Columns of type 'BLOB' may not be used in CREATE INDEX, ORDER BY, GROUP BY, UNION, INTERSECT, EXCEPT or DISTINCT statements because comparisons are not supported for that type.

Filip

Igor

Joined: 16 May 2011
Posts: 101

View user's profile

Back to top
Link to postPosted: Mon Mar 26, 2012 7:39 pmPost subject: Reply with quote

fzimandl wrote:

It is not designated for searching duplicates among structures. There is overlap analysis for it. See http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/chemistry_functions/performing_overlap_analysis.html

Which error do you mean? I got this when trying to search structure field:
ERROR X0X67: Columns of type 'BLOB' may not be used in CREATE INDEX, ORDER BY, GROUP BY, UNION, INTERSECT, EXCEPT or DISTINCT statements because comparisons are not supported for that type.

Filip

 

I do not obtain a error message, when I choose structure - it just return 0 duplicates, but when I looking for  duplicates by other fields, as example by formula or by molar mass it is working clearly.

I'm using first script with choosing of field ability .

Igor.

 

li

Joined: 18 Sep 2011
Posts: 25

View user's profile

Back to top
Link to postPosted: Tue May 29, 2012 3:16 amPost subject: Reply with quote

tdudgeon wrote:

You can do duplicate filtering at the the structure level. To do this go to the schema editor and look at the properties of your structure entity and check the duplicate filtering option. Once this option is saved no more duplicates can be added to. Trying to add a duplicate will fail. This option can also be set when you create the table/entity so preventing any duplicates. This option works at the structure level. So if you try to enter the same structure in two different formats it will still be considered a duplicate.

To prevent duplicates of text fields like CAS number then you can create a unique index on the column in the database. Again this can be done in the schema editor. Find the field, right click on it and choose 'New index' and then make sure that you check the 'Unique keys' option.

Finally, if your entity/table already has duplicate structures then you can use the Overlap analysis function to identify them: http://www.chemaxon.com/instantjchem/ijc_latest/docs/user/help/htmlfiles/chemistry_functions/performing_overlap_analysis.html

TIm

Dear

 another qustion

 I  creaded a unique index(eg:cas no),when I inport a sdf; all of the compounds which have no cas number can not been imported.

fzimandl

Joined: 23 Jun 2011
Posts: 434

View user's profile

Back to top
Link to postPosted: Tue May 29, 2012 11:14 amPost subject: Reply with quote

Only unique values are allowed on indexed column when index has unique constraint. You can not import the rows with blank CAS NOs because there is already one row with no CAS NO present (first entry in the SD file which has no CAS NO).

If you need to import such file don't thick Unique keys checkbox when creating index.

Best regards,

Filip

li

Joined: 18 Sep 2011
Posts: 25

View user's profile

Back to top
Link to postPosted: Tue May 29, 2012 2:37 pmPost subject: Reply with quote

Thanks very much.

Are the any ways to import such file when created a unique values?

Because I want to filter out the duplicated cas no,but import these compounds  with blank CAS NOs.


fzimandl

Joined: 23 Jun 2011
Posts: 434

View user's profile

Back to top
Link to postPosted: Tue May 29, 2012 5:50 pmPost subject: Reply with quote

Then I would recommend to go to Database Tables view and delete the index with Unique keys. You can recreate the index without Unique keys contraint just after. Then you should be able to import rows with duplicate values if it is not restricted on the column.

You can still find the duplicates when using the scripts above.

Does it help in your case?

You are very welcome.
Filip 

li

Joined: 18 Sep 2011
Posts: 25

View user's profile

Back to top
Link to postPosted: Fri Jun 01, 2012 3:09 pmPost subject: Reply with quote

Although a little trouble, but this method is very effective.

Thanks very much.

This topic is locked: you cannot edit posts or make replies.
Page 1 of 1


To watch this topic for replies   Register (enables digests) or give email address  
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
You cannot attach files in this forum
You can download files in this forum