Getting atom names from substructure search

User 221559a7ff

10-07-2013 18:49:07

Hi,

I'm trying to get the names of the atoms that are matched in a substructure search. How would I do this?

Thanks,

Ulysse

ChemAxon abe887c64e

11-07-2013 06:20:52

Hi Ulysse,

What do you mean under the name of the atoms matched in a substructure search (e.g., if query is a benzene molecule, the matched atoms are carbon-carbon-carbon-carbon-carbon-carbon-) ?

Best regards,

Krisztina

User 221559a7ff

11-07-2013 15:43:23

kvajda wrote:

Hi Ulysse,

What do you mean under the name of the atoms matched in a substructure search (e.g., if query is a benzene molecule, the matched atoms are carbon-carbon-carbon-carbon-carbon-carbon-) ?

Best regards,

Krisztina

Hi Krisztina,

I'm afraid I don't know the term for what I want; I'd like atom names like "CA1" or "OE1", not simply "carbon" or "oxygen". These names could be used to uniquely identify an atom in a given residue.

Thanks,

Ulysse

ChemAxon abe887c64e

12-07-2013 08:28:04

Hi Ulysse,

I'm afraid we don't apply atom identifiers like the ones in your examples. Each atom in a chemical structure - even if the structure consists of more residues - has unique atom index and we use these atom indexes to identify the matching atoms in a substructure search.

Best regards,

Krisztina

User 221559a7ff

22-07-2013 16:13:05

kvajda wrote:

Hi Ulysse,

I'm afraid we don't apply atom identifiers like the ones in your examples. Each atom in a chemical structure - even if the structure consists of more residues - has unique atom index and we use these atom indexes to identify the matching atoms in a substructure search.

Best regards,

Krisztina

Is there any way I could use these atom indices to find an atom's corresponding entry in a .CIF file or some other type of file with information about atom names?

Thanks.

ChemAxon abe887c64e

24-07-2013 12:42:29

Could you please give us a more detailed description of your use case, e.g. with an example of "query / target / expected result" ?

Thank you,

Krisztina

User 221559a7ff

25-07-2013 19:05:03

kvajda wrote:

Could you please give us a more detailed description of your use case, e.g. with an example of "query / target / expected result" ?

Thank you,

Krisztina

If I query for the structure that is simply an atom of nitrogen bonded to a carbon, I'd like a way of knowing the names of those matched atoms each of the structures returned from a substructure search; in this case the names might be things like N1, N2, N3, CA1, CA2, CE, etc. . The names of these atoms will undoubtedly change depending on which target structure I'm looking at (if I search for some substructure that's found in both ASP and SER, I'd expect different names for some particular atom of carbon in ASP than in SER).

I've investigated further on my own and it appears as though a way I could find an atom's name is by using MolAtom#getAtom(int n). It seems as if the atom at index n in a MolAtom corresponds to the n-th entry in a corresponding mmCIF file. However, I don't know if this always holds, and it doesn't help me too much in cases where I might have multiple substructure matches on a single structure.

I apologize if I'm not being clear with this; I'm not at all very knowledgeable in this sort of stuff.

ChemAxon 2cbec8f2c5

26-07-2013 12:57:51

Ulysse,

let me see, if I understand you correctly:

You have a CIF-file (maybe even mmCIF, I presume), in which atoms are labeled with names, identifying their position in the amino acid residue. You then (somehow) import this molecule. Finally you draw a chemical query structure, and do a substructure search against the imported molecule. You would like to retrieve the hit and see the associated labels of the hit structure.

As we unfortunately have no direct import option for CIF/mmCIF, can I assume, you've worked out the solution yourself, do you convert to PDB before import, or do you use another tool ?

I would try to associate the atom name upon import of the molecule by using the MolAtom.putProperty API method
http://www.chemaxon.com/marvin/help/developer/beans/api/chemaxon/struc/MolAtom.html#putProperty%28java.lang.String,%20java.lang.Object%29 />These labels can even be added post-import, if the molecule hasn't been modified, standardized, etc. - I think, the atom indexing won't change. For substructure search however we do optimize the structure, so the atom index will change here, which is why adding the labels may be the solution what you are looking for.

Would that work out for you?

User 221559a7ff

29-07-2013 16:19:03

rknispel wrote:

Ulysse,

let me see, if I understand you correctly:

You have a CIF-file (maybe even mmCIF, I presume), in which atoms are labeled with names, identifying their position in the amino acid residue. You then (somehow) import this molecule. Finally you draw a chemical query structure, and do a substructure search against the imported molecule. You would like to retrieve the hit and see the associated labels of the hit structure.

As we unfortunately have no direct import option for CIF/mmCIF, can I assume, you've worked out the solution yourself, do you convert to PDB before import, or do you use another tool ?

I would try to associate the atom name upon import of the molecule by using the MolAtom.putProperty API method
http://www.chemaxon.com/marvin/help/developer/beans/api/chemaxon/struc/MolAtom.html#putProperty%28java.lang.String,%20java.lang.Object%29 />These labels can even be added post-import, if the molecule hasn't been modified, standardized, etc. - I think, the atom indexing won't change. For substructure search however we do optimize the structure, so the atom index will change here, which is why adding the labels may be the solution what you are looking for.

Would that work out for you?

Sorry, I wasn't very clear. What's happening is that the user draws some portion of a molecule in the applet, and I run a substructure search for that query against a large set of ligands in the PDB. I can then get back each ligand as a Molecule. From there, the things I need but do not know how to do are:

Getting which atoms are part of the substructure match, and

Getting the names (things like CA1, OE, N3, etc.) of those atoms

The "substructure matches" here would be ligands in the PDB, not the query the user draws.

If it helps, this search tool would be an extension based off of the PDB's ligand substructure-search applet (http://pdb.org/pdb/ligand/chemAdvSearch.do, under the 'Structure' tab).

ChemAxon 2cbec8f2c5

01-08-2013 08:06:27

Thank you for your clarification, it helped indeed.

1. Getting the atoms of the matching structure:
Have a look at the MolSearch class of the JChem API, you'll find an example how to use it there:
http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/Search.html

A MolSearch object inherits methods (findFirst, findNext) from its parent Search class, which can be used to find the atom indexes of the target structure atoms
that match the query structure atoms. See the documentation here:
http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/Search.html#findNext%28%29

Alternatively you can use the command line tool jcsearch to do the same:
https://www.chemaxon.com/forum/ftopic554.html

2. Getting the names of those atoms
Your substructure search in 1 will likely require some sort of standardization to perform well, which may scramble the atom indexes assigned directly after any import of a (target or query) molecule structure. Hence I do not see an elegant way of assigning or mapping the atom names post-search.
What is not clear to me right now is how/where are your target structures (the ligands database) currently stored? Do the ligands already contain the appropriate atom labels, or are they at least stored as some metadata? I feel you need to incorporate those into the target molecules themselves prior to performing the sss using the MolAtom.putProperty as described in my previous answer. Because in this way, all you'd need to do is to look up the labels at the given atom indices, which you retrieved in 1.

User 221559a7ff

01-08-2013 16:51:27

rknispel wrote:

Thank you for your clarification, it helped indeed.

1. Getting the atoms of the matching structure:
Have a look at the MolSearch class of the JChem API, you'll find an example how to use it there:
http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/Search.html

A MolSearch object inherits methods (findFirst, findNext) from its parent Search class, which can be used to find the atom indexes of the target structure atoms
that match the query structure atoms. See the documentation here:
http://www.chemaxon.com/jchem/doc/dev/java/api/chemaxon/sss/search/Search.html#findNext%28%29

Alternatively you can use the command line tool jcsearch to do the same:
https://www.chemaxon.com/forum/ftopic554.html

2. Getting the names of those atoms
Your substructure search in 1 will likely require some sort of standardization to perform well, which may scramble the atom indexes assigned directly after any import of a (target or query) molecule structure. Hence I do not see an elegant way of assigning or mapping the atom names post-search.
What is not clear to me right now is how/where are your target structures (the ligands database) currently stored? Do the ligands already contain the appropriate atom labels, or are they at least stored as some metadata? I feel you need to incorporate those into the target molecules themselves prior to performing the sss using the MolAtom.putProperty as described in my previous answer. Because in this way, all you'd need to do is to look up the labels at the given atom indices, which you retrieved in 1.

Thanks for your continued help!

I think I understand how to find individual matches.

Regarding finding the names:

I think that the target structures are stored in a table called `jchem_named_ligands`; in the Java code that executes the query there is the line:

JChemSearch searcher = new JChemSearch();

[...]

searcher.setStructureTable("jchem_named_ligands");

However, I don't think there is any data about atom names in that table; though I don't really understand how that table works, it doesn't seem as if there's any atom name data in there:

> SELECT * FROM `jchem_named_ligands` LIMIT 123, 1

******************** 1. row *********************

              cd_id: 124

       cd_structure: ...

          cd_smiles: OC(=O)CCc1ccc(cc1)-c1cc(cs1)-c1ccccc1

         cd_formula: C19H16O2S

cd_sortable_formula: C00019H00016O00002S00001

       cd_molweight: 308.394

            cd_hash: -295048518

           cd_flags: 

       cd_timestamp: 2013-07-10 04:47:58

  cd_pre_calculated: 0

             cd_fp1: 273428484

             cd_fp2: 1125130241

             cd_fp3: 558850

             cd_fp4: -1036992208

             cd_fp5: 11571336

             cd_fp6: 1073743888

             cd_fp7: 269500416

             cd_fp8: -2146746016

             cd_fp9: 100664481

            cd_fp10: -1071161280

            cd_fp11: 1905797124

            cd_fp12: 336450

            cd_fp13: 693777

            cd_fp14: -1975156475

            cd_fp15: -2147417064

            cd_fp16: -2012741616

              hetId: 03J

If I don't have any data about atom names in that table, is getting atom names from matched target ligands impossible to do?

ChemAxon 2cbec8f2c5

02-08-2013 14:21:14

The table you are referring to is a JChem table. If the atom labels were included for each ligand when this table was populated, then you would find them in the cd_structure field, which stores the untouched imported chemical structures.

As I presume they are not: Are there any relational tables in the database schema storing the atom names per ligand, or maybe a reference to the PDBID entry from where the ligand structure was extracted?

The crucial point is to have a reliable mapping between your atom names and the atom indices of the structure stored in the JChem table. If that hasn't been created previously, it may require some effort to add this particular data to the dataset. My feeling is that since you state that the atom names are generally known per ligand, the ligand structures must have been imported from a dataset readily containing this information. If you extend this import method used to generate the JChem table to additionally extract the atom names and assign them as atom labels to the imported structures and repopulate the table, that could be all you need.

To answer your question, if getting atom names from matched target ligands is impossible:
It is well possible though you need to ensure, that the dataset you are querying actually contains that information or it can be mapped from an external source.