User 7910dcb734
16-01-2013 16:44:02
Hi,
I insert molecules into a JCHEM structure table via the following method (trimmed for reading):
private IntArray insertStructures(List<Molecule> molecules) {
final Molecule[] molArray = molecules.toArray(new Molecule[molecules.size()]);
ConnectionCallback<IntArray> connectionCallback = new ConnectionCallback<IntArray>() {
@Override
public IntArray doInConnection(Connection connection) {
try {
ConnectionHandler connectionHandler = new ConnectionHandler(connection, propertyTableName);
byte[] molBytes;
String molString;
molString = (String) MolExporter.exportToObject(molArray, "mol", new MolExport());
if (molString == null) {
return new IntArray(0);
}
molBytes = molString.getBytes();
Importer importer = new Importer();
importer.setConnectionHandler(connectionHandler);
importer.setInput(new ByteArrayInputStream(molBytes));
importer.setTableName(structureTableName);
importer.setDuplicateImportAllowed(UpdateHandler.DUPLICATE_FILTERING_TABLE_OPTION);
importer.setStoreImportedIDs(true);
importer.setStoreDuplicates(false);
importer.setEmptyStructuresAllowed(false);
importer.run();
return importer.getImportedIDs();
}
...
Afterwards, I need the cd_id of each imported molecule. I run a search on each molecule via its structure:
private void assignStructureId(final Molecule molecule) {
ConnectionCallback<Integer> connectionCallback = new ConnectionCallback<Integer>() {
@Override
public Integer doInConnection(Connection connection) {
try {
ConnectionHandler connectionHandler = new ConnectionHandler(connection, propertyTableName);
JChemSearch searcher = new JChemSearch(); // Create searcher object
searcher.setQueryStructure(molecule);
searcher.setConnectionHandler(connectionHandler);
searcher.setStructureTable(structureTableName);
searcher.setRunMode(JChemSearch.RUN_MODE_SYNCH_COMPLETE);
JChemSearchOptions searchOptions = new JChemSearchOptions(SearchConstants.DUPLICATE);
searcher.setSearchOptions(searchOptions);
searcher.run();
int[] cd_ids = searcher.getResults();
if (cd_ids.length > 1) {
String message = "Multiple entries returned for same structure: " + cd_ids[0] + ", " + cd_ids[1];
throw new CompoundRepositoryException(message);
} else if (cd_ids.length == 0) {
String message = "No match returned for entered structure.";
throw new CompoundRepositoryException(message);
} else {
return cd_ids[0];
}
...
However, for one particular molecule (attached as .sdf) I do not find any matching structures - despite having inserted the very same molecule into the structure table (and I have checked that this is indeed inserted - there is just no match found with the searcher).
Any thoughts? It seems like this may be a bug, as it happens only with this molecule out of tens of thousands. However I have not managed to track down a cause, so perhaps I am doing something wrong.
Brendan
ChemAxon 9c0afc9aaf
16-01-2013 18:01:58
Hi,
Could you let us know please
1) The exact version of the JChemBase API used
(chemaxon.jchem.version.VersionInfo.JCHEM_VERSION )
2) The table settings printed by:
jcman t <table_name>
Best Regards,
Szilard
User 7910dcb734
17-01-2013 10:09:36
Hi Szilard,
1) The exact version is 5.11.5
2)
Table type: Molecules
Table version: 5110000
Uses tautomers for duplicate search: No
Filters out the duplicate structures: Yes
Fingerprint settings:
Length (bits): 512
Pattern length: 6
Bits per pattern: 2
Table uses default standardization.
Column name Type name
1 CD_ID INT
2 CD_STRUCTURE MEDIUMBLOB
3 CD_SMILES VARCHAR
4 CD_FORMULA VARCHAR
5 CD_SORTABLE_FOR VARCHAR
6 CD_MOLWEIGHT DOUBLE
7 CD_HASH INT
8 CD_FLAGS VARCHAR
9 CD_TIMESTAMP DATETIME
10 CD_PRE_CALCULAT TINYINT
11 CD_FP1 INT
12 CD_FP2 INT
13 CD_FP3 INT
14 CD_FP4 INT
15 CD_FP5 INT
16 CD_FP6 INT
17 CD_FP7 INT
18 CD_FP8 INT
19 CD_FP9 INT
20 CD_FP10 INT
21 CD_FP11 INT
22 CD_FP12 INT
23 CD_FP13 INT
24 CD_FP14 INT
25 CD_FP15 INT
26 CD_FP16 INT
ChemAxon 9c0afc9aaf
17-01-2013 14:49:55
Hi,
Strangely using the command-line tools "jcman" and "jcsearch" the structure is found OK with duplicate search (these are essentially using the same API.)
I have tested with the same version, same settings.
I assume you are you are using MySQL, right ? (tested with that)
There is one potential for discrepancy in your approach: you convert the molecule to "mol" before inserting. If some features of the Molecule cannot nbe represented in "mol" format then obviously there should not be a match.
- Is the attached SDF the original source/format of the Molecule object ?
- How was the Molecule created; apart from import were there any manipulations on it ?
- Could you attach or paste the Molecule converted to MRV format right before insert please ?
MolExporter.exportToFormat.(mol, "mrv")
Best regards,
Szilard
User 7910dcb734
17-01-2013 15:37:15
Hi Szilard,
Yes, I found the same thing with the JchemManager software, which I found strange.
I am using MySQL, yes.
If there a better alternative to converting to mol before inserting? I could find no way to directly insert from a Molecule object; have I missed this?
The attached sdf is the original source of the Molecule object.
The Molecule was created using the MolImporter class to read the sdFile. It did have some manipulations: the structure checker (with default fixers for each error found) and the standardizer. I have attached the configuration xmls for these as well as the exported molecule in .mrv format immediately prior to insertion.
Many thanks for the help,
Brendan
ChemAxon 9c0afc9aaf
17-01-2013 23:51:42
Hi Brendan,
We could reproduce he issue with the MRV, thank you.
We will investigate this further and get back to you here.
Regarding the other questions:
- In general converson to "mrv" format is the best (supports all possible features). It also seems to be a workaround for this problem.
molString = (String) MolExporter.exportToObject(molArray, "mrv", new MrvExport());
- You need to specify some String, as this String will be stored in the cd_structure column, and possibly accessed for display directly.
BTW it seems that UpdateHandler could be quite handy in this case instead of Importer - have you taken a look at that class yet ?
Best regards,
Szilard
User 7910dcb734
21-01-2013 10:15:31
Thanks Szilard. I will start using "mrv" (particularly as it seems to be a workaround for this problem).
I will also look at UpdateHandler.
Cheers,
Brendan
User 7910dcb734
11-03-2013 13:25:09
Hi Szilard,
Has there been any progress with this issue? I have a number of molecules that have the same problem, and using "mrv" does not work as a workaround.
Regards,
Brendan
Edited to remove spurious attachement.
ChemAxon 9c0afc9aaf
11-03-2013 13:33:26
I think the original problem was fixed in 5.12.
We will also check the recently posted structures and get back to you.
Szilard
User 7910dcb734
11-03-2013 13:35:57
Hi Szilard,
I've not updated to 5.12 yet (I missed that release). I'll do so now and get back to you if the problem persists.
Regards,
Brendan
User 7910dcb734
11-03-2013 16:32:51
Hi Szilard,
I think the original problem has been fixed (certainly the molecule I originally posted).
Unfortunately I have a new molecule I am still getting the same issue with. I have attached it in mrv form, immediately prior to insertion.
To be clear, as before, after inserting into the database I then immediately query for it. It is not found.
ChemAxon a3d59b832c
11-03-2013 17:36:21
Hi Brendan,
It is not clear what is the role of the standardizer and structure checker configurations attached.
From the jcman output, it seems that the standardization is not applied to the table, so it must have been run external to JChem Base.
Do you apply the standardization and structure checking to both the inserted and the query structures?
Thanks,
Szabolcs
User 7910dcb734
11-03-2013 18:37:37
Hi Szabolcs,
The standardiser and checker are applied to the molecule object (loaded into memory from an sdfile format) before attempting the insert. You can see my code above for usage. The molecule object is turned into an mrv string for insertion, while the same object is used in the following search. I will use jcman again tomorrow for the latest table properties; it should use the same standardiser config (though this shouldn't matter).
Brendan
User 7910dcb734
12-03-2013 09:23:09
The jcman output for the table:
Table type: Molecules
Table version: 5120000
Uses tautomers for duplicate search: No
Filters out the duplicate structures: Yes
Fingerprint settings:
Length (bits): 512
Pattern length: 6
Bits per pattern: 2
Custom standardization configuration:
----------------------------------------
<?xml version="1.0" encoding="UTF-8"?>
<!-- Standardizer configuration file -->
<!-- Sample example from ChemAxon documentation -->
<StandardizerConfiguration Version ="0.1">
<Actions>
<Action ID="aromatize" Act="aromatize"/>
<Transformation ID="PlusMinus" Structure="[*+:1][*-:2]>>[*:1]=[*:2]"/>
<!-- File missing for test <Transformation ID="PlusMinusDouble" Structure="molfiles/PlusMinusDouble.mol"/> -->
<Transformation ID="Enamine" Structure="[H]N[C:1]=[C:2]>>[H][C:2][C:1]=N"/>
<Transformation ID="Enol" Structure="[H:4][O:3][C:1]=[C:2]>>[H:4][C:2][C:1]=[O:3]"/>
<Transformation ID="ClMinus" Structure="[Cl-]>>" Exact="true" Groups="target,g1"/>
<RemoveExplicitH ID="removeH" Charged="true" Radical="true" Mapped="true"/>
<Removal ID="keepOne" Method="keepLargest" Measure="molMass"/>
<RemoveRGroupDefinitions ID="removeRGroupDefinitions"/>
<RemoveAttachedData ID="removeAttachedData"/>
<RemoveAtomValues ID="removeAtomValues"/>
<Aromatize ID="chemaxonaromatize" Type="basic"/>
<AddExplicitH ID="addH"/>
<AliasToGroup ID="aliastogroup"/>
<AliasToAtom ID="aliastoatom"/>
<Sgroups ID="expand" Act="Expand" Exclude="Ph,Ac"/>
<ClearStereo ID="clearstereo" Type="Chirality"/>
<AbsoluteStereo ID="setstereo" Act="Set"/>
<Expand ID="stoichiometry" Data="COEFF"/>
<Dearomatize ID="dearomatize"/>
<Neutralize ID="neutralize"/>
<ClearIsotopes ID="clearisotopes"/>
<!-- File missing for test <Clean Type="TemplateBased" TemplateFile="templates.mrv" ID="clean"/> -->
<Tautomerize ID="tautomer"/>
<Mesomerize ID="mesomer"/>
<Removal ID="RemoveFragment" Method="keepLargest" Measure="atomCount"/>
</Actions>
</StandardizerConfiguration>
----------------------------------------
Column name Type name
1 CD_ID INT
2 CD_STRUCTURE MEDIUMBLOB
3 CD_SMILES VARCHAR
4 CD_FORMULA VARCHAR
5 CD_SORTABLE_FOR VARCHAR
6 CD_MOLWEIGHT DOUBLE
7 CD_HASH INT
8 CD_FLAGS VARCHAR
9 CD_TIMESTAMP DATETIME
10 CD_PRE_CALCULAT TINYINT
11 CD_FP1 INT
12 CD_FP2 INT
13 CD_FP3 INT
14 CD_FP4 INT
15 CD_FP5 INT
16 CD_FP6 INT
17 CD_FP7 INT
18 CD_FP8 INT
19 CD_FP9 INT
20 CD_FP10 INT
21 CD_FP11 INT
22 CD_FP12 INT
23 CD_FP13 INT
24 CD_FP14 INT
25 CD_FP15 INT
26 CD_FP16 INT
ChemAxon abe887c64e
12-03-2013 16:58:58
Hi Brendan,
Reviewing the standardizer configuration file you sent we have to mention that 'tautomerize' and 'mesomerize' actions will transform the structure into Kekule form. If you would like to get aromatic molecule you should execute 'aromatize' (and only one 'aromatize' action) after these two actions only.
In addition, 'tautomerize' and 'mesomerize' actions are canonical transformations and do not retain substructure parts exactly, therefore, substructure search may not get hits if these standardizer actions are applied.
Could you run a test with accordingly modified standardizer configuration?
Best regards,
Krisztina
User 7910dcb734
13-03-2013 09:25:29
Hi kvajda,
Thanks, I think that has fixed it.
Cheers,
Brendan