sdf intersection and subtraction

03-02-2005 10:19:18

Do you have anything to compare SDFiles?





I mean I have files A.sdf and B.sdf and want to get A_AND_B.sdf and A_NOT_B.sdf. A_AND_B.sdf would contain molecules which are in both A and B and A_NOT_B.sdf those which are in A but not in B.

ChemAxon a3d59b832c

03-02-2005 10:31:04

Yes, you can do that using jcsearch:


http://www.jchem.com/doc/user/Jcsearch.html





Use these parameters:


Code:
jcsearch --or -t:p -q A.sdf B.sdf -f sdf -o A_AND_B.sdf


jcsearch --and -n -t:p -q A.sdf B.sdf -f sdf -o B_NOT_A.sdf






I would go for it for small sdf-s with up to several thousand structures. See time measurements below for two files sized 1000 and 2533. (They had no common structures, so they represented the worst case: 1000 x 2533 structural searches per run.)





Code:
$ time jcsearch --or -t:p -q nci1000.smiles med2533.sdf -f sdf -o nci1000_AND_med2533.sdf





real    2m37.685s


user    0m0.937s


sys     0m0.888s





$ time jcsearch --and -n -t:p -q nci1000.smiles med2533.sdf -f sdf -o med2533_NOT_nci1000.sdf





real    2m36.397s


user    0m0.197s


sys     0m0.166s
The machine was P4 2.66GHz, 768MB RAM, and I used JChem 3.0.7.





For larger sdf-s I would prepare a small script or program using the database parts of JChemBase. I expect that would be magnitudes faster.

ChemAxon a3d59b832c

02-07-2007 07:03:31

Discussion about the union set of sdf-s in another forum topic:





union set