User 343b63fb52
16-12-2015 09:06:46
When using your PeriodicSystem to get the atom number for elements, I think there's a major bug in your program.
When querying with the symbol "CO" (two uppercase characters), the method findAtomicNumber return 27 for cobalt. This is wrong. Cobalt's symbol is Co with one uppercase 'C' and one lowercase 'o'.
CO is stemming from a SMILES string with a carbon and oxygen not cobalt!
ChemAxon 044c6721bc
16-12-2015 09:52:07
Hi,
PeriodicSystem.findAtomicNumber works only for one element, not for a smiles. It doesn't care about upper or lowercase. For the string "CO" the other option would be an exception, because it is not one element. Would this solution better for you?
Janos
User 343b63fb52
16-12-2015 10:13:11
I do get the idea that PeriodicSystem works on single elements only, however I made an small adaptation that reads element symbols from SMILES, that how I ended up with CO.
BUT: No method in chemistry should say CO is a valid symbol. There's no way you should justify that with uppercase and lowercase programming. CO is an invalid element!
You should reflect correct element writing in findAtomicNumber, otherwise you're not doing chemistry! There's really no justification to deviate from this convention.
CO is not an element symbol, so the method should produce an error, return -1 or something else but not return 27. Atno number 27 should only be produced when I ask the method to check for "Co" which is the one and only correct symbol of combalt.
User 343b63fb52
16-12-2015 10:26:48
/**
* Method checks the symbol of elements.
* @author rvandeursen
*/
public class ElementSymbolCheck
implements alphabet
{
/**
* Desired symbol.
*/
final String symbol;
/**
* Constructor of the class ElementCheck to check symbols.
* @param symbol
*/
public ElementSymbolCheck(String symbol) {
this.symbol = symbol;
}
/**
* Method returns the atom number for the symbol.
* @return Atomic number for the symbol or -1 if invalid.
*/
public int getAtno() {
if (this.isValidElementSymbol(symbol)) {
try {
return PeriodicSystem.findAtomicNumber(symbol);
} catch (Exception e) {
return -1;
}
}
return -1;
}
/**
* Method checks if a character is an upper case alphabetical character.
*
* @param symbol Character
* @return True is uppercase A-Z / false if not.
*/
private boolean isUpperCaseCharacter(String symbol) {
return this.isDesiredCaseCharacter(symbol, this.A.upper);
}
/**
* Method checks if a character is a lower case alphabetical character.
*
* @param symbol Character
* @return True is lowercase a-z / false if not.
*/
private boolean isLowerCaseCharacter(String symbol) {
return this.isDesiredCaseCharacter(symbol, this.A.lower);
}
/**
* Method checks if a characters is in the desired array.
*
* @param symbol Character.
* @param array Array to check for the character.
* @return
*/
private boolean isDesiredCaseCharacter(String symbol, String[] array) {
for (String a : array) {
if (symbol.equals(a)) {
return true;
}
}
return false;
}
/**
* Method checks whether a symbol is a valid element symbol composed of a
* starting uppercase character followed by lower case characters.
*
* @param symbol
* @return
*/
private boolean isValidElementSymbol(String symbol) {
int L = symbol.length();
for (int i = 0; i < L; i++) {
String s = symbol.substring(i, i + 1);
switch (i) {
case 0:
/* Check for an uppercase character */
if (!this.isUpperCaseCharacter(s)) {
return false;
}
break;
default:
/* Check for a lowercase character */
if (!this.isLowerCaseCharacter(s)) {
return false;
}
break;
}
}
return true;
}
}
Note:
this.A.lower is a String[] from a-z.
this.A.upper is a String[] from A-z.
So it implicitly checks for other characters as well.
ChemAxon 25dcd765a3
16-12-2015 13:00:28
Hi
You are right, PeriodicSystem works on a single element only and as you have mentioned you wanted to use it for multiple elements, which is not valid. We use findAtomicNumber method in quite a lot of cases and we will not modify it. If you need an other method which takes upper-case and lower-case characters into consideration you can easily write it based on or API. We don't want to extend our API with a method which is quite obvious to create.
ChemAxon 044c6721bc
16-12-2015 13:05:05
It is shorter if you use Character.isUpperCase(char) method, or the "[A-Z][a-z]?" regular expression.
Janos
ChemAxon 25dcd765a3
16-12-2015 13:10:00
r.vandeursen wrote: |
CO is not an element symbol, so the method should produce an error, return -1 or something else but not return 27. Atno number 27 should only be produced when I ask the method to check for "Co" which is the one and only correct symbol of combalt.
|
The method may be misleading, but in the description we correctly describe the method scope. It handles symbols regardless of its capitalization.
So we agree that CO is not an element symbol, neither BR (as in the example in the method description). but this method finds the corresponding element. If you have a better name suggestion for this method name, please do not hesitate to share with us.
User 343b63fb52
16-12-2015 13:14:28
volfi wrote: |
Hi
You are right, PeriodicSystem works on a single element only and as you have mentioned you wanted to use it for multiple elements, which is not valid. We use findAtomicNumber method in quite a lot of cases and we will not modify it. If you need an other method which takes upper-case and lower-case characters into consideration you can easily write it based on or API. We don't want to extend our API with a method which is quite obvious to create.
|
Well, I hope your *** <this part of the message was removed by a moderator> *** ignorance of chemical conventions in the past (that you still keep justifying) will not affect other users. It was plain wrong, it is plain wrong and it will ever be plain wrong as long as we deal with chemistry. These basics should be accounted from the very start. It's the first lesson in chemistry on high school and you decide to ignore it.
If it's obvious: what about you add a method findAtomicNumberWithStrictSymbolDefintion() so people know when they use the function they can be assured that the symbol is checked for integrity with chemical conventions.
If you do not want that: I do require you to at least update the comments for the method findAtomicNumber(), so people know that this ignorant and wrong function neglects uppercase and lowercase.
ChemAxon 25dcd765a3
16-12-2015 13:31:20
r.vandeursen wrote: |
Well, I hope your stupidity and naughty ignorance of chemical conventions in the past (that you still keep justifying) will not affect other users. It was plain wrong, it is plain wrong and it will ever be plain wrong as long as we deal with chemistry. These basics should be accounted from the very start. It's the first lesson in chemistry on high school and you decide to ignore it.
If it's obvious: what about you add a method findAtomicNumberWithStrictSymbolDefintion() so people know when they use the function they can be assured that the symbol is checked for integrity with chemical conventions.
If you do not want that: I do require you to at least update the comments for the method findAtomicNumber(), so people know that this ignorant and wrong function neglects uppercase and lowercase.
|
Thank you for your kind words and thoughts.
We do not want to extend our public API with findAtomicNumberWithStrictSymbolDefintion() method, however you can easily do it for yourself using the functionality provided by our API, if you require that.
Regarding your comment about the findAtomicNumber() function, it exactly states that it neglects uppercase and lowercase, the only missing description is that "this is an ignorant and wrong function" which we would like to avoid.
User 343b63fb52
16-12-2015 13:37:18
volfi wrote: |
Hi
You are right, PeriodicSystem works on a single element only and as you have mentioned you wanted to use it for multiple elements, which is not valid. We use findAtomicNumber method in quite a lot of cases and we will not modify it. If you need an other method which takes upper-case and lower-case characters into consideration you can easily write it based on or API. We don't want to extend our API with a method which is quite obvious to create.
|
Stop! No findAtomSymbol() doesn't work on a single element. It works on a String what ever this might be! In this particular example, I create them as substring in a SMILES.
Your intention is to have it work on a single element. However, I can legitimately call this function with any String expecting that the function can distinguish invalid element symbols from valid symbol. And CO is an invalid element symbol.
You did however decide to ignore chemical conventions and introduce a case-insentive element symbol check, what ever the reason might be.
I just don't get, how you could ever decide to ignore correct element symbols. This is probably something that was introduced at the very beginning and you should have known better.
And please stop justifying it with code integrity. That's a consequence of your action.
ChemAxon 25dcd765a3
16-12-2015 13:53:12
findAtomSymbol() works on any string it does not work on an element.
You can legitimately call this function with any
String. You cannot expect that the function can distinguish invalid element
symbols from valid symbol as the valid chemical symbol is always starts with capital letter.
We did decide to introduce a case-insentive string matching for chemical elements which we call findAtomSymbol().
I just don't get what is the problem with that?
I think you have a problem with the name of the function.
User 343b63fb52
16-12-2015 14:08:03
volfi wrote: |
findAtomSymbol() works on any string it does not work on an element.
You can legitimately call this function with any
String. You cannot expect that the function can distinguish invalid element
symbols from valid symbol as the valid chemical symbol is always starts with capital letter.
We did decide to introduce a case-insentive string matching for chemical elements which we call findAtomSymbol().
I just don't get what is the problem with that?
I think you have a problem with the name of the function.
|
The problem is and you admit that here:
you willfully decided - what ever the reason might be - to introduce a case-insentive method to lookup element symbols. That's either plain wrong because the person in the past didn't know the basics* or plain ignorant because you ignore long-standing chemical conventions.
*If you fail to correctly implement chemical basics and conventions, one might seriously doubt the integrity of conventions and functions for the remaining part of the API that are slightly more sophisticated.
User 343b63fb52
16-12-2015 14:17:19
volfi wrote: |
findAtomSymbol() works on any string it does not work on an element.
You can legitimately call this function with any
String. You cannot expect that the function can distinguish invalid element
symbols from valid symbol as the valid chemical symbol is always starts with capital letter.
We did decide to introduce a case-insentive string matching for chemical elements which we call findAtomSymbol().
I just don't get what is the problem with that?
I think you have a problem with the name of the function.
|
To simply state the problem:
It's about expectation. I expect a program to return me 27 when I ask for the atom number "Co". Or 17 when asking for Cl. Depending on character sequence in a SMILES string it can return wrong elements, wrong HAC and more. It does matter when one has to recompute databases of billions of compounds!
In a SMILES string SC stands for a thiol, but ignoring case-sensitivity have this set of characters identified as scandium. And there might be actually numerous combination of characters that can have a different meaning by not holding in to long-standing chemical conventions.
User 343b63fb52
16-12-2015 14:19:51
volfi wrote: |
findAtomSymbol() works on any string it does not work on an element.
You can legitimately call this function with any
String. You cannot expect that the function can distinguish invalid element
symbols from valid symbol as the valid chemical symbol is always starts with capital letter.
We did decide to introduce a case-insentive string matching for chemical elements which we call findAtomSymbol().
I just don't get what is the problem with that?
I think you have a problem with the name of the function.
|
Stop turning the argumentation, please!
ChemAxon 25dcd765a3
17-12-2015 07:36:51
Thank you for your comments.
It is not in our short term scope to modify findAtomSymbol() method.
You mention quite often SMILES so I would like to call your attention to our SMILES importer which helps you to convert SMILES to molecule.
ChemAxon 996dedebe0
17-12-2015 10:43:51
r.vandeursen's comment was removed because its content violated Chemaxon's Technical Support Forum's Terms of Service available in FAQ.
"Moderators are individuals (or groups of
individuals) whose job it is to look after the running of the forums
from day to day. They have the power to edit or delete posts and lock,
unlock, move, delete and split topics in the forum they moderate.
Generally moderators are there to prevent people going off-topic or posting abusive or offensive material."
ChemAxon 74610a0807
17-12-2015 15:07:16
Just to be clear, the findAtomSymbol() method is meant to process any string and parse the ENTIRE string as a single atomic symbol. It is only meant to process a single atom symbol at a time. It is not meant to parse an entire SMILES string. In fact, if you were to give it "CCO" or any other string that it does not recognize as an atomic symbol, it would throw an exception.
Chemical convention does have that atomic symbols start with a capitalized letter, but this is not always the case in cheminformatics. Even in SMILES, a carbon can be represented by a lower case letter. In SMARTS, you could have a substring "Ac" which could mean "Any aliphatic atom attached to an aromatic carbon". If you tried to pass that to findAtomSymbol() it would say it had atomic number 89, whereas if you tried to pass the correct SMILES/SMARTS for Actinium ([Ac]) to findAtomSymbol() it would throw an exception, because that is not the string format it is looking for.
There may be additional cases where converting the string representation of your atomic element in your program to byte code, all uppercase, or all lower case makes it easier to do searches or string comparisons. My point is that that any string parsing and/or validation must occur before using findAtomSymbol(). If you want the atomic symbol of each element in a
SMILES string, you would want to use a SMILES importer or otherwise
parse the SMILES string into the individual atoms before calling
findAtomSymbol().
User 343b63fb52
17-12-2015 15:46:22
mbraden wrote: |
Just to be clear, the findAtomSymbol() method is meant to process any string and parse the ENTIRE string as a single atomic symbol. It is only meant to process a single atom symbol at a time. It is not meant to parse an entire SMILES string. In fact, if you were to give it "CCO" or any other string that it does not recognize as an atomic symbol, it would throw an exception.
Chemical convention does have that atomic symbols start with a capitalized letter, but this is not always the case in cheminformatics. Even in SMILES, a carbon can be represented by a lower case letter. In SMARTS, you could have a substring "Ac" which could mean "Any aliphatic atom attached to an aromatic carbon". If you tried to pass that to findAtomSymbol() it would say it had atomic number 89, whereas if you tried to pass the correct SMILES/SMARTS for Actinium ([Ac]) to findAtomSymbol() it would throw an exception, because that is not the string format it is looking for.
There may be additional cases where converting the string representation of your atomic element in your program to byte code, all uppercase, or all lower case makes it easier to do searches or string comparisons. My point is that that any string parsing and/or validation must occur before using findAtomSymbol(). If you want the atomic symbol of each element in a
SMILES string, you would want to use a SMILES importer or otherwise
parse the SMILES string into the individual atoms before calling
findAtomSymbol().
|
You guys simply don't get the point. But as you wish and have supported here with evidence, I decided to step down from using bogus methods.
User 343b63fb52
19-12-2015 11:43:28
mbraden wrote: |
Just to be clear, the findAtomSymbol() method is meant to process any string and parse the ENTIRE string as a single atomic symbol. It is only meant to process a single atom symbol at a time. It is not meant to parse an entire SMILES string. In fact, if you were to give it "CCO" or any other string that it does not recognize as an atomic symbol, it would throw an exception.
Chemical convention does have that atomic symbols start with a capitalized letter, but this is not always the case in cheminformatics. Even in SMILES, a carbon can be represented by a lower case letter. In SMARTS, you could have a substring "Ac" which could mean "Any aliphatic atom attached to an aromatic carbon". If you tried to pass that to findAtomSymbol() it would say it had atomic number 89, whereas if you tried to pass the correct SMILES/SMARTS for Actinium ([Ac]) to findAtomSymbol() it would throw an exception, because that is not the string format it is looking for.
There may be additional cases where converting the string representation of your atomic element in your program to byte code, all uppercase, or all lower case makes it easier to do searches or string comparisons. My point is that that any string parsing and/or validation must occur before using findAtomSymbol(). If you want the atomic symbol of each element in a
SMILES string, you would want to use a SMILES importer or otherwise
parse the SMILES string into the individual atoms before calling
findAtomSymbol().
|
Well, just top keep it simple with some disciplines and subdisciplines.
(1) Discipline is chemistry with it's main conventions - independent on subdiscipline.
(2) Subdiscipline is chemoinformatics with additional conventions used by chemoinformatics.
So, as you cannot guarantee that your software will be exclusively used for chemoinformatics - nor has it been designed as such - it means that functions in a class that's obviously meant to represent chemical conventions (PeriodicSystem is at the very top of chemistry basics and goes beyond the scope of chemoinformatics), should follow the chemical conventions.
Subdisciplines like cheminformatics that have been built and building onto standard conventions by introducing own conventions, e.g. the mentioned small 'c' for aromatic carbons in a SMILES string, do require you to make a patch that explicitly deals with these conventions and should have been restricted for use in chemoinformatic applications.
That's the only and really only correct heritage within the mentioned fields. By turning around the heritage, you have been introducing additional features into the wider superdiscipline chemistry, that should have been strictly bound to the subdiscipline chemoinformatics, messing up the conventions of the superdiscipline chemistry.
That's wrong.
User 343b63fb52
19-12-2015 11:48:24
volfi wrote: |
Thank you for your comments.
It is not in our short term scope to modify findAtomSymbol() method.
You mention quite often SMILES so I would like to call your attention to our SMILES importer which helps you to convert SMILES to molecule.
|
Lack of language knowledge from your side, had the previous entries moderated.
So, the SMILES importer is very very very slow and cannot deal with billions of molecules.
Pay attention: this is a hint, you should do something about it.
It's upsetting, because your feature of uppercase and lowercase writing has the effect that many many calculations were wrong and have to be repeated.
User 870ab5b546
21-12-2015 20:50:39
Seems to me the documentation is pretty clear: findAtomicNumber() acts on a String that contains the symbol for a single element, so if you give it a SMILES string for a polyatomic molecule, you're asking for trouble.
I agree that the behavior is counter to normal conventions, but the documentation makes it clear that it is, so it's fair.
If you want to ensure that the string CO returns the atomic number of C and not that of Co, it's easy enough to write some code.
public static int myFindAtomicNumber(String smilesStr) {
final String oneAtomSymbol =
(smilesStr.length() <= 1 ? smilesStr
: smilesStr.substring(0, Character.isLowerCase(smilesStr.charAt(1)) ? 2, 1));
return PeriodicSystem.findAtomicNumber(oneAtomSymbol);
}
I had to deal with this issue because I interpret input from students, who often do not follow the capitalization conventions. I had to write methods both to accept and reject improper capitalization, because there were some circumstances where we might have wanted to accept it, and others where we might not. It wasn't that hard.
User 343b63fb52
22-12-2015 08:34:57
bobgr wrote: |
Seems to me the documentation is pretty clear: findAtomicNumber() acts on a String that contains the symbol for a single element, so if you give it a SMILES string for a polyatomic molecule, you're asking for trouble.
I agree that the behavior is counter to normal conventions, but the documentation makes it clear that it is, so it's fair.
If you want to ensure that the string CO returns the atomic number of C and not that of Co, it's easy enough to write some code.
public static int myFindAtomicNumber(String smilesStr) {
final String oneAtomSymbol =
(smilesStr.length() <= 1 ? smilesStr
: smilesStr.substring(0, Character.isLowerCase(smilesStr.charAt(1)) ? 2, 1));
return PeriodicSystem.findAtomicNumber(oneAtomSymbol);
}
I had to deal with this issue because I interpret input from students, who often do not follow the capitalization conventions. I had to write methods both to accept and reject improper capitalization, because there were some circumstances where we might have wanted to accept it, and others where we might not. It wasn't that hard.
|
Long solved with an own patch that follows conventions.
I do understand that CO is a poly atomic entry when obtained from a SMILES string. However, a general method that looks for atomic symbols and implies that it can be used beyond cheminformatics, should understand the difference. Cheminformatics should have the patches for rule extensions, not vice verse. In other words, cheminformatics should create the proper atom symbol when looking for the corresponding number.
However, I still insist on the fact that neither cheminformatics nor any other subdiscipline should extrapolate it's conventions onto the wider field of chemistry.