Large performance diff in search for cyclopropene

User 677b9c22ff

02-10-2008 18:28:54

Hi,


I was searching in Instant-JChem (JChem 5.0.3) for


cyclopropylene SMARTS:





A) *1-*=*1 which is very fast like every search


B) [*,#1]1[*,#1]=[*,#1]1 which is very slow but including H





actually the B) term is not feasible to use on a DB larger than 200k and takes minutes. The A) term is pretty quick or seconds on a 200k DB.





Wouldn't it be better to perform the A) term first and then filter with the B) term? I know I can do it manually but I wouldn't expect such a huge difference at all.





Bye


Tobias

ChemAxon a3d59b832c

03-10-2008 08:32:05

Yes, the second query takes much longer, because it needs to initialize all implicit H-s and the searching space becomes much larger. (Although I think that you do not get many hits with H-s in realistic molecules.)





I checked it on our online example:


http://www.chemaxon.com/jchem/examples/db_search/index.jsp





and it took about 2 minutes for the 250K NCI dataset. (JChem 5.1.2)





We will check if it can be improved, but just because of the larger search space I reckon it will always be much slower than the first query.





Regards,


Szabolcs

ChemAxon a9ded07333

03-10-2008 10:58:20

Hi Tobias,





May I ask the aim of using the second query? Are you looking for incorrectly structured molecules in your database (substituting any [*,#1] with an H will result in valence error)?





Regards,


Tamás

User 677b9c22ff

06-10-2008 18:55:16

cheese wrote:
Hi Tobias,





May I ask the aim of using the second query? Are you looking for incorrectly structured molecules in your database (substituting any [*,#1] with an H will result in valence error)?





Regards,


Tamás
The first one (*1-*=*1) is the one I created with Msketch query editor. The second one is the one I also get from MSketch but this time any atom including H [AH] ([*,#1]1[*,#1]=[*,#1]1).





Regarding the valence errors, that was something I did


not care about in the first place, I am just looking at CHNSOP


and N and S and P can have valences higher 4.





The aim is to find a set of structures which are not found


in common databases like PubChem. Similar to this one:


http://www.ch.ic.ac.uk/ectoc/echet96/papers/014/index.htm





Bye


Tobias

ChemAxon a9ded07333

08-10-2008 08:27:54

Hi Tobias,





N, S and P atoms can have higher valences but I'm wondering whether you really need AH atoms instead of A atoms (see also Generic query atoms). None of the AH atoms of the second example would match an H atom, because the valence of that H would be at least three!


Are you looking for any such malformed structures? Or would you like to assure that the atoms can have higher valences?


In the latter case you may run a series of queries including query molecules like [*,#1]-*1=*(-[*,#1])-*1 (this query finds C1C=N1C, and the valence of N is at least 4)





Best regards,





Tamás