Do you have anything to compare SDFiles?
I mean I have files A.sdf and B.sdf and want to get A_AND_B.sdf and A_NOT_B.sdf. A_AND_B.sdf would contain molecules which are in both A and B and A_NOT_B.sdf those which are in A but not in B.
ChemAxon a3d59b832c
03-02-2005 10:31:04
Yes, you can do that using jcsearch:
http://www.jchem.com/doc/user/Jcsearch.html
Use these parameters:
Code: |
jcsearch --or -t:p -q A.sdf B.sdf -f sdf -o A_AND_B.sdf
jcsearch --and -n -t:p -q A.sdf B.sdf -f sdf -o B_NOT_A.sdf |
I would go for it for small sdf-s with up to several thousand structures. See time measurements below for two files sized 1000 and 2533. (They had no common structures, so they represented the worst case: 1000 x 2533 structural searches per run.)
Code: |
$ time jcsearch --or -t:p -q nci1000.smiles med2533.sdf -f sdf -o nci1000_AND_med2533.sdf
real 2m37.685s
user 0m0.937s
sys 0m0.888s
$ time jcsearch --and -n -t:p -q nci1000.smiles med2533.sdf -f sdf -o med2533_NOT_nci1000.sdf
real 2m36.397s
user 0m0.197s
sys 0m0.166s |
The machine was P4 2.66GHz, 768MB RAM, and I used JChem 3.0.7.
For larger sdf-s I would prepare a small script or program using the database parts of JChemBase. I expect that would be magnitudes faster.
ChemAxon a3d59b832c
02-07-2007 07:03:31
Discussion about the union set of sdf-s in another forum topic:
union set