User 779e37e0e6
06-11-2015 16:20:53
Hello everyone,
I have an issue that I hope ypu would help me solve. Given the SMILES string of a molecule/substance X, which could be a mixture or not. How do I find all the different components of X. e.g. C1CCOC1.C1CCNC1 has 2 components.
I have been using the dot (.) as a delimiter. I realize that it does not work all the time. Here are some examples where it does not:
C1.C1 (ethane): C1 is not a component.
CCOP123OC4=CC=CC=C4O1.FC(F)(F)C(S2)=C(S3)C(F)(F)F: The substrings separated by the dot are no valid SMILES and do not represent valid molecules.
As explained on this page (http://www.opensmiles.org/opensmiles.html), "A dot-bond '.' means that the atoms to which it is adjacent in the SMILES string are not bonded to each other."
So could you telll me what is the best alternative here?
Thank you for your consideration.
Best,
MrYan
ChemAxon 044c6721bc
08-11-2015 18:19:35
Hi,
All fragments has to be a valid smiles. If there is a number in a fragment it should have a pair (because it means it is a cycle), or remove it.
Your examples can be written as (if I understand your aim):
C.C
CCOP1OC4=CC=CC=C4O1.FC(F)(F)C(S)=C(S)C(F)(F)F
Is this what you would like to see?
Janos
User 779e37e0e6
08-11-2015 23:39:56
Hi Janos,
Thanks for replying. Acutally, I looked at the second structure in PubChem. Here is the link: https://pubchem.ncbi.nlm.nih.gov/compound/313904#section=IUPAC-Name
The smiles is CC(=CC(=O)O)[As]12(OC(O1)(C)C3CCC4C3CCC4)OC(O2)(C)C56CCCC5CCC6.
As you can see it does contain a dot and the resulting structure as one component only. Is there a way to go from the original smiles CC(=CC(O)=O)[As]123OC(C)(O1)C1CCC4CCCC14.CC(O2)(O3)C12CCCC1CCC2 to this one (without dot). I have tried to generate canonical smiles, etc...from the original SMILES, but I still get the same one.
Is there a way around it?
I am wondering what function I can use to count the number of moieties/components and get their smiles back?
Thank you.
Regards,
MrYan
ChemAxon 044c6721bc
09-11-2015 11:03:09
Hi,
This is a strange result of our smiles export, but still valid. I think handling the 5-bonded As is not perfect in this case. Unfortunately there is no way to get the smiles without the dot.
For getting the number of components and their smiles try this code:
Molecule molecule = MolImporter.importMol("C1CCCC1.C1CCCCC1.C1=CC=CC=C1");
Molecule[] fragments = molecule.findFrags(Molecule.class, MoleculeGraph.FRAG_KEEPING_MULTICENTERS);
System.out.println(fragments.length);
for(Molecule frag : fragments) {
System.out.println(MolExporter.exportToFormat(frag, "smiles"));
}
I hope it helps.
Janos
User 779e37e0e6
09-11-2015 22:03:20
Thank you Janos,
This method is working fine so far.
Best,
MrYan