prevent aromatization in usmiles

User 568550d85a

16-10-2008 08:06:40

Hello





When I try to covnert a molecule in usmiles it automatically gets aromatized. How do I prevent this?


Or if I subsequently dearomatize how can I sure that it goes back to the same resonance structure?





Lorenz





Code:



echo "C1=CC(=CC(=C1)O)C=O" | molconvert smiles:u


Oc1cccc(C=O)c1


ChemAxon 25dcd765a3

16-10-2008 09:50:29

Hi,





If you use the 'u' option your molecule is aromatized. It is not possible to prevent this. If you want non-aromatic smiles just use smiles export without 'u' option. In general, it generates the same resonant structure.





Andras

User 568550d85a

03-12-2008 16:08:37

volfi wrote:
Hi,





If you use the 'u' option your molecule is aromatized. It is not possible to prevent this. If you want non-aromatic smiles just use smiles export without 'u' option. In general, it generates the same resonant structure.





Andras
OK, slowly I get stuck with that...


We are using a different aromatization scheme than your aromatizer uses. So it would be necessary for me to have the possibility to make unique smiles that are not aromatized.


Or, if I aromatize my structures before, is it sure the chemaxon aromatizer is not modifying the structure during the canonization process?





btw: Which aromatization scheme is applied? arom_basic?





Regards,


Lorenz

ChemAxon 25dcd765a3

04-12-2008 11:08:51

Hi,





So what you should do is to make a general standardization on your molecule (aromatize it as you wish, probably convert explicit H to implicit) and export it to smiles with the 'q' option set.





Code:



molconvert smiles:q molecule.sdf





In this case your molecule will not be modified at all.


That's all.





The default aromatization scheme is arom_general.





All the best


Andras

User 568550d85a

04-12-2008 17:50:23

volfi wrote:
Hi,





So what you should do is to make a general standardization on your molecule (aromatize it as you wish, probably convert explicit H to implicit) and export it to smiles with the 'q' option set.


Sounds good.


Just to be very sure:


I have different files with lots (millions) of molecules. I want to merge them together and be sure that there are as few duplicates as possible.


To achieve that I take my database (I don't have stereo information nor explicit hydrogens) and aromatize it to my wishes. When I store them I do this with


Code:



mol.toFormat("smiles:q")





Then I can merge several files and be sure that duplicates also have the same string representation and easily be filtered out.


Code:



cat m1.smi m2.smi | sort -u > unique.smi





Is this correct, or do I even not need :q when there is no stereo information?


How large is the "error rate", how many molecules do not get the same smiles although they have the same structure?





many thanks


lorenz

ChemAxon 25dcd765a3

04-12-2008 22:25:17

Hi,





That is what you should do to filter out duplicates.
Quote:
Is this correct, or do I even not need :q when there is no stereo information?


Even if you have surely no stereo information, (no CIS / TRANS isomers, no chiral centers) than you need the 'q' option.
Quote:
How large is the "error rate", how many molecules do not get the same smiles although they have the same structure?
We have found that our unique smiles generation is not enough unique in case of some special symmetrical chiral molecules. But have not found any molecule in the group of non-stereo molecules for which our unique smiles generation fails.





Andras

User 568550d85a

05-12-2008 13:15:36

volfi wrote:



We have found that our unique smiles generation is not enough unique in case of some special symmetrical chiral molecules. But have not found any molecule in the group of non-stereo molecules for which our unique smiles generation fails.





Andras
OK, good enough for me.





Now, just for interest: What is exactly the difference between 'smiles:q' and 'smiles:u' and normal 'smiles' export.


I saw in the SMILES help that there is made a difference between the different types:


generic smiles (basically everything)


isomeric smiles (smiles with stereo info)


unique smiles (unambigous smiles, no stereo info)


absolute smiles (unabigous smiles with stereo info)





Is 'smiles' creating generic , ':q' unique and ':u' absolute smiles?





lorenz

ChemAxon 25dcd765a3

05-12-2008 14:32:03

Hi,
Quote:
What is exactly the difference between 'smiles:q' and 'smiles:u' and normal 'smiles' export.
- normal smiles output generates a smiles string using certain algorithm, this is does not consider for example graph invariants


- smiles:q generates normal smiles string considering graph invariant information also


- smiles:u generates unique smiles, it aromatizes the molecule and also considers graph invariant info.





All of them considers stereo information.





Andras