Most effective way to compare with literature from table

User 2b68687bb8

30-10-2009 17:57:49

Hi,

in some publication I found information about a molecule, scaffold with several substituents, R1, R2, ..., R8. On each line of the table, one can find the values of R1, ..., R8 and the values for some experimental data like log P, etc. In total one finds experimental values for 56 different susbstituents combinations. So I took this scaffold and generated with marvin all different possible combinations (each R can be only H or Cl), and I got 256. For each one I calculated theoretical log P with some method.

Now I want to find an automated way so I can compare for each compound of the experimental table the theoretical and experimental values. Is it possible to do such thing with chemaxon programs?

The only practical approach that I thought of was to generate for each line of the table some kind of string like H Cl Cl H H H H Cl. Then it would be nice if when I generate the 256 compounds with marvin, on each generated structure, I can also generate this string to each molecule, so at the end, I just write a script with compares both sets of molecules. Is it possible this last thing in marvin?

Thanks in advance

ChemAxon 8b644e6bf4

02-11-2009 04:49:54

Hi,

The 256 structures can be generated using Markush enumeration plugin. A possible approach is to save them as SMILES and use chemical terms "atno()" function to extract atomic numbers for the substituted positions.

If you have any further questions do not hesitate to ask them,

Regards,

Gabor

ChemAxon a3d59b832c

02-11-2009 14:54:26

Hi Horacio,

There are more than one possible methods. You could either use the Markush code feature of Markush enumeration. This would give you a string representation of the expansions used.

Another method would be to use different R-group definitions for each row. (One Markush structure belongs to each row, with exactly one enumerated structure.)

Let us know if anything is not clear, or have any more questions.

Best regards,

Szabolcs

ChemAxon a3d59b832c

02-11-2009 14:55:52

(I moved this topic over to the Markush search & enumeration forum.)

ChemAxon a3d59b832c

04-11-2009 06:09:35

A colleague suggested that you could also consider R-group decomposition:

http://www.chemaxon.com/jchem/doc/user/RGroupDecomposition.html

BR,

Szabolcs

User 2b68687bb8

04-11-2009 15:06:11

gimre wrote:

Hi,

The 256 structures can be generated using Markush enumeration plugin. A possible approach is to save them as SMILES and use chemical terms "atno()" function to extract atomic numbers for the substituted positions.

If you have any further questions do not hesitate to ask them,

Regards,

Gabor

Hi

I am rather new to chemaxon, could you please tell me how can I "use chemical terms "atno()" function to extract atomic numbers for the substituted positions." ?

Thanks a lot

User 2b68687bb8

04-11-2009 15:16:42

Szabolcs wrote:

Hi Horacio,

There are more than one possible methods. You could either use the Markush code feature of Markush enumeration. This would give you a string representation of the expansions used.

Another method would be to use different R-group definitions for each row. (One Markush structure belongs to each row, with exactly one enumerated structure.)

Let us know if anything is not clear, or have any more questions.

Best regards,

Szabolcs

Hi, thanks a lot. I understand now how can I get different kinds of information or nomenclature for the generated files like smiles, etc. This is good.

My problem now, is how to compare the generated info (in smiles, etc...) with the information about the compounds of the table, which are in the form:

compd	R1	R2	R3	R4	R6
26	H	Cl	H	H	H
27	H	H	Cl	H	H
28	H	H	H	Cl	H

User 2b68687bb8

04-11-2009 15:17:44

Hefeweizen wrote:

Szabolcs wrote:

Hi Horacio,

There are more than one possible methods. You could either use the Markush code feature of Markush enumeration. This would give you a string representation of the expansions used.

Another method would be to use different R-group definitions for each row. (One Markush structure belongs to each row, with exactly one enumerated structure.)

Let us know if anything is not clear, or have any more questions.

Best regards,

Szabolcs

Hi, thanks a lot. I understand now how can I get different kinds of information or nomenclature for the generated files like smiles, etc. This is good.

My problem now, is how to compare the generated info (in smiles, etc...) with the information about the compounds of the table, which are in the form:

compd	R1	R2	R3	R4	R6
26	H	Cl	H	H	H
27	H	H	Cl	H	H
28	H	H	H	Cl	H

sorry, I got a problem pasting the table

apart from that, I will continue with this and tell you later

if somebody has some hints, it will be wellcome

User 2b68687bb8

05-11-2009 11:38:25

Hefeweizen wrote:

Szabolcs wrote:

Hi Horacio,

There are more than one possible methods. You could either use the Markush code feature of Markush enumeration. This would give you a string representation of the expansions used.

Another method would be to use different R-group definitions for each row. (One Markush structure belongs to each row, with exactly one enumerated structure.)

Let us know if anything is not clear, or have any more questions.

Best regards,

Szabolcs

Hi, thanks a lot. I understand now how can I get different kinds of information or nomenclature for the generated files like smiles, etc. This is good.

My problem now, is how to compare the generated info (in smiles, etc...) with the information about the compounds of the table, which are in the form:

compd	R1	R2	R3	R4	R6
26	H	Cl	H	H	H
27	H	H	Cl	H	H
28	H	H	H	Cl	H

sorry, I got a problem pasting the table

apart from that, I will continue with this and tell you later

if somebody has some hints, it will be wellcome

Ok, I found the following workaround:

1- convert each molecule of the table (24, H, Cl, H, ...) to Markush code (python script)

2- generate all necessary molecules for the simulation, in sdf format (or the other one that stores Markush code)

3- convert molecules from 2) to mol2 format (necessary for simulations) and get results

4- now compare molecules from 1) with molecules from 3). The problem is that mol2 molecules lack markush code (I do not understand why, but when I clean 3D the sdf molecules generated in 2), the y lose markusch code). But as I have the same list of molecules in SDF and MOL2 format, I write a script that reassigns names to mol2 files comparing with sdf files (they appear just in the same order)

If somebody knows of a better approach, please tell me

thanks

ChemAxon a3d59b832c

05-11-2009 14:24:50

Hi Horatio,

I am glad that you found a solution. Probably 1. could have been done more easily with R-group decomposition instead of a script. (See link in earlier post.) That would need the scaffold and the enumerated molecule as input.

Otherwise, I don't think you can do much simpler with the current methods.

In the far future we plan to introduce conditions for Markush structures. With that feature, you will be able to describe each row as a condition on R-group definitions. However, neither the specification of the conditions nor the timeline of the development is decided.

Best regards,

Szabolcs