Synthesis code

ChemAxon 60ee1f1328

10-01-2006 14:10:39

Next questions:





Using react at command line, is it possible to retain fields <cd_id> from each input sdf to write directly (maybe combined) preceding the product SMILES code...basically I want to retain an ID from each reactant from my inputs and have these ids associated with the relevant output entry...





Does react only output SMILES codes or can an SDF file be written instead, maybe this moves into the realm of one of the bespoke Java wrapper class...but if it is possible to control from command line, I should like to know...





Using react at command line, can I eliminate any repetitive by-product data by issuing a particular command, for instance I would like to remove (H)Cl from my amide output SMILES...is it possible to specify this requirement from command line...





Cheers,


Daniel.

ChemAxon d76e6e95eb

10-01-2006 14:52:13

Quote:
Using react at command line, is it possible to retain fields <cd_id> from each input sdf to write directly (maybe combined) preceding the product SMILES code...basically I want to retain an ID from each reactant from my inputs and have these ids associated with the relevant output entry...
I am afraid, that the sdf fields of the original compounds are not retained currently, but we will examine the issue.


We planned to generate a so called synthesis code for each product by combining the IDs of the reactants with the ID of the reaction. Would you like to see something like this?
Quote:
Does react only output SMILES codes or can an SDF file be written instead, maybe this moves into the realm of one of the bespoke Java wrapper class...but if it is possible to control from command line, I should like to know...
You can choose from lot's of output formats even from the command line.


This is a sample sending products to an SDF output using the react command line tool:





Code:
react a.mol b.mol -t reaction.rxn -f sdf -o out.sdf



Quote:
Using react at command line, can I eliminate any repetitive by-product data by issuing a particular command, for instance I would like to remove (H)Cl from my amide output SMILES...is it possible to specify this requirement from command line...
Yes, you can define what products should appear in the output. This is especially useful to eliminate HCl or H2O from the output. The example below will not store the second product (i.e. HCl) just the first product only (-x 1).








Code:
react amines.sdf acidchlorides.smiles -t acylation.mrv -x 1 -o products.smiles

ChemAxon 60ee1f1328

10-01-2006 15:47:12

Quote:
I am afraid, that the sdf fields of the original compounds are not retained currently, but we will examine the issue.


We planned to generate a so called synthesis code for each product by combining the IDs of the reactants with the ID of the reaction. Would you like to see something like this?
Yes please and ideally with direct user control over configuring what contributes to the synthesis ID from the fields in the relevant input sdf file.


I guess API access/control to this "ProductID" would be useful as well.


How would you implement a ReactionID data item, presumably including a date stamp as well as a TLA?





Thanks for the other comments which are most helpful.





Cheers,


Daniel.

ChemAxon d76e6e95eb

10-01-2006 16:05:55

I am interested in your proposal. Something I have in mind is the following (via the example of a dummy bimolecular reaction):





reaction ID: R1


first reactant ID: A1


second reactant ID: B1





ID of the first product: R1(A1, B1:1)


ID of the second product: R1(A1, B1:2)





React the first product with C8 by reaction R2:


ID of the first product R2(R1(A1, B1:1), C8:1)





The ID of the reactant could be a user defined field, the ID of the reaction could come from the reaction file or could be specified runtime.








Why do you need a date stamp? It should be a separate field, shouldn't it? What is TLA?

ChemAxon 60ee1f1328

11-01-2006 09:58:16

TLA - three letter acronym i.e. for amide AMD?





Thought the date stamp may be useful to include if the reaction is run several times with updated inputs...





Cheers,


Daniel.

ChemAxon d76e6e95eb

11-01-2006 10:24:30

Quote:
TLA - three letter acronym i.e. for amide AMD
You could include these acronyms in the IDs of the compounds. I mean your amines could have IDs like AMN1, AMN2 and so on. Or you can use this notation in the reaction ID, like AMN+ACD. The the TLAs will appear in the synthesis code.
Quote:
Thought the date stamp may be useful to include if the reaction is run several times with updated inputs
I still suppose, that a date stamp should not be part of the synthesis code. It would probably make the synthesis code hard to read. I think, that the date could be stored as a separate field, or alternatively in the file name. However, the date stamp is also an attribute of the file and in case of databases, you can generate it with a trigger.





In case of updated inputs or reactions, just update their IDs as well.

ChemAxon 60ee1f1328

27-01-2006 11:23:54

I would like to code up an imidazole formation reaction and of course run it successfully using the react tool - please see attached .rxn file for my initial attempt.





The problem(s) I (think) have are as follows





1. I think I should need to map the N atoms as part of the reaction, and I suspect this will require a split of the reaction into an N atom addition to each reactant (2 reactions) and then combination of the subsequent outputs in a third separate defined reaction? Please confirm this approach is expected.





2. How to handle the reagent NH4OAc which provides the N atoms. How should I provide the runtime input data for this reagent so that all my instance combinations pick up a reagent each?I can of course include a N atom in the 2 reactions mentioned above but am not sure if I need a list of N atoms to represent my reagent at runtime?





3. Orientation: If reactant A dicarbonyl is not symmetrical then it is possible (I think) for two different products per instance and in my example attached, atoms 1 and 4 would swap mappings. Again presumably these 2 possibilities would need to be coded as separate reactions?





So in total it appears that 4 separate defined virtual reactions are required in order to implement this "real" reaction?





Many thanks for your help,





Daniel.

ChemAxon d76e6e95eb

27-01-2006 19:14:38

1. Orphan atoms might be helpful in your case. They are atoms appearing only on one side of the reaction arrow. By default, you should add atom numbers to them. (see the Reactor FAQ at http://www.chemaxon.com/jchem/doc/user/ReactorFAQ.html for more detals). What about the reaction I attached. Does it solve your problem?





2. You can avoid adding NH4OAc runtime if it does not appear on the left of the reaction scheme. However, if you would like to highlight, that ammonium acetate is needed for the reaction, just draw it above or below the arrow. As an agent, Reactor will simply ignore it.





3. Reactor will generate all possible isomers for you. You can influence this behaviour by modifying the scheme or setting a SELECTIVITY rule.

ChemAxon 60ee1f1328

01-02-2006 14:37:02

Thanks - in the first instance we will likely generate all the possible products which leads to a further question regarding the unique ID generation. Introducing the SELECTIVITY rule will surely come soon after.





If dicarbonyl reactantA is unsymmetrical how will reactor construct the new ProductID for each of the products generated? Should it append a further label i.e. :1,:2...etc? So as to give (in this example) somthing like:


IMIDAZ-001-002:1:1?





Regards,


Daniel.

ChemAxon d76e6e95eb

01-02-2006 18:58:43

If:





reaction code: R_IMIDAZOL


dicarbonyl code: DIOX-13


aldehyde code: ALD-5





Then Reactor will generate these codes for the two isomeric products:





R_IMIDAZOL(DIOX-13, ALD-5):1


R_IMIDAZOL(DIOX-13, ALD-5):2

ChemAxon d76e6e95eb

10-04-2006 16:34:21

The synthesis code generation is implemented, but we there is still an open question. Take the bromination of toluene as an example, and see the three products with the generated synthesis codes below:





Cc1ccccc1Br Bromination(Toluene):1


Cc1cccc(Br)c1 Bromination(Toluene):2


Cc1ccc(Br)cc1 Bromination(Toluene):3





That's OK, but what should be the synthesis code if the reaction scheme contains the HBr side product as well? In this case, 6 products generated at the moment (3 pairs):





Cc1ccccc1Br Bromination(Toluene)/1:1


Br Bromination(Toluene)/2:1


Cc1cccc(Br)c1 Bromination(Toluene)/1:2


Br Bromination(Toluene)/2:2


Cc1ccc(Br)cc1 Bromination(Toluene):/1:3


Br Bromination(Toluene)/2:3








How should the synthesis codes should look like in case of reactions producing more than one products and/or more than one isomers?

ChemAxon 60ee1f1328

11-04-2006 09:28:18

Hello,





I think the following numbering would be logical?


(I have added the additional "rxn1-r1-r2-" as this is how I expect the final derived synthesis ID to look, the bit in question is in [])





Ortho rxn1-r1-r2-[1:1] HBr [1:2]





Meta rxn1-r1-r2-[2:1] HBr [2:2]





Para rxn1-r1-r2-[3:1] HBr [3:2]





I would want to exclude/extract 1:2, 2:2,3:2 from each pair and so the -x option would need to be applied to x:2, i.e. to each second member of product sets?


Should the relevant plug-in be used/available, then the order 1,2,3 could be derived from the order of the largest % product predicted.


What do you think?





Daniel.

ChemAxon d76e6e95eb

11-04-2006 10:09:08

Thanks for the idea! We discussed it and agreed, that the generated synthesis codes would look like these:





Ortho


rxn1(r1,r2):1/1


HBr


rxn1(r1,r2):1/2





Meta


rxn1(r1,r2):2/1


HBr


rxn1(r1,r2):2/2





Para


rxn1(r1,r2):3/1


HBr


rxn1(r1,r2):3/2








You will be able to exclude the HBr with the -x extract filter.





Regarding the first part of the synthesis code, probably each synthetic chemist/lab/company has its own format for coding the products, at least we are no aware of a common standard (please inform me if there is one). However, you will be able to reformat the currently implemented synthesis codes above easily, if you prefer different separator characters.





We probably should leave out the trailing /1 if there is only one product in a reaction, shouldn't we? (-x will not influence this)

ChemAxon 60ee1f1328

11-04-2006 10:53:52

No common standard on this one I think?





The only order that really would be useful is viable product order with an associated index of say, 0->100 which could be used to order or eliminate minor products (although this would have to be in association with a plug-in I guess)





I think you could leave it out for single products, that way if a reaction generates more than one product "like %/%" will be able to identify them.

ChemAxon d76e6e95eb

11-04-2006 11:02:05

Reactor already supports the product ordering feature! If a reaction contains selection and tolerance rules then only the main products are generated, and the order of the products is according to the selectivity rule. (If no selectivity rule, the isomer order is random). I suppose, that predicting the main products and their order is the best we can expect from the fast virtual synthesis at the moment, I would not be dare to predict yields in case of generic reactions.