Need for reading CIF data files

User 25d107bd42

24-03-2008 09:19:41

Hi,


there is a real need for the possibilty to read CIF data files in the MARVIN programs, CIF = Crystallographic Information File.


In the web site of FIZ-Karlsruhe "Depositing crystal structures" http://www.fiz-karlsruhe.de/depositing_crystal_structures.html , you can read "Over the last 10 years, CIF has become the standard data exchange format for crystallographic information. See http://www.iucr.org/iucr-top/cif/index.html for detailed information."





Of course it is possible to transform CIF data to other formats using "babel", but not all "babel"-versions do this. And Jmol is able to read CIF data without any problems, see the screenshot.





Regards, Hans-Ulrich

ChemAxon efa1591b5a

26-03-2008 15:08:25

Hi Hans-Ulrich,





from time to time we receive this kind of request from our users. However, it's not so frequent as our main target group is more involved in cheminformatics than in molecular modelling or computational chemistry.


As a consequence we have no clear understanding of trends and needs in the crystallographic area. The statement that you quoted in you post is rather important for us to know, but the inevitable question is whether this implies that PDB is becoming less important for this community?





To support PDB is already a disaster but mmCIF looks even worse :-). Though the language is well defined but rather complex to process. Before making any commitment we need to learn about it and explore what problems may arise during the development.





So, let me ask some questions:


In your opinion and according to your experience, are the mmCIF files less problematic to deal with than with PDB files? I mean, for instance, that 3rd party programs rarely if ever export PDB that meets the basic standard. Mandatory record types are missing, mandatory field separators are missing. Hydrogen atoms are not labelled properly, secondary structure and atom information is contradictory etc. even in PDB files deposited in the RCSB PDB archive.





In most implementation both PDB export and import is partial (just like in ours, see documentation: http://www.chemaxon.com/marvin/help/formats/pdb-doc.html). Most users can live with this without any problem. But what about mmCIF? Is there a subset of the language that is already meaningful and useful? I don't mean in terms of syntax elements, that's not a big issue, but in terms of internal representation. Our Molecule and MacroMolecule classes may not be sufficient to represent all features that can be depicted in mmCIF and to develop these internal core classes is substantial work.





To summarise all these: I don't know the answer yet, it really depends on the amount of work involved that I cannot estimate at the moment.


I know it's not much, but we are open to discuss this issue with you and other members in the community.





Regards,


Miklos

ChemAxon efa1591b5a

26-03-2008 15:35:29

And one more thought: the UGM (http://www.chemaxon.com/UGM/08/index.html) is the best place to discuss such matters! There you can easily gather some supporters to push us... :-)





Miklos

User 25d107bd42

26-03-2008 17:38:44

Hi Miklos,





thank you for the invitation to the UGM, but as I already answered Alex, it is to far away for me from Munich.


So we must discuss the things in this very well organized forum :-)





Regards, Hans-Ulrich

User 25d107bd42

26-03-2008 19:40:08

Hi Miklos,





I got the CIF files when I fetched organic molecule structures from the Cambridge Crystallographic Data Centre, see the screenshot for the URL . I got f.e. the structures 241170 - 241176. Seperating the structures in seven files it was fine to see the molecules in Jmol.





And there is another way to get structures from Cambridge using the software Conquest installed on our server at the chemistry campus. There one can get both CIF and PDB data for the same molecule.





My intension is to import the structures in Marvin, to calculate properties and to export it to other formats, especially xyz.





I don't know the difference between CIF files and mmCIF files. "mm" means "macromolecular" but the format seems also be used for "small" molecules. Looking in the files using a text editor, the format seems to be not very difficult, f.e. one line for each atom with the coordinates. But I didn't find the exact definition for CIF data. Do you know the place to find the exact definition ?





Regards, Hans-Ulrich

ChemAxon efa1591b5a

27-03-2008 11:07:48

Right, so CIF can be regarded as a subset of mmCIF - supposedly. This can be a feasible start to implement CIF first, and later mmCIF. I register this task in our request tracking system, though at present I cannot foresee when this task will be scheduled for development.





Regarding the CIF specification I found a detailed language syntax and semantics definition among those pages you referred to in your first post: http://www.iucr.org/iucr-top/cif/spec/version1.1/cifsyntax.html.





Regards,


Miklos

User 870ab5b546

22-05-2008 20:08:47

I would like to chime in in support of Hans-Ulrich's request. Our crystallographer gives us results in two formats, cif and res. res is the SHELX output format. I would like to use MarvinSpace to look directly at the molecular structure.

ChemAxon efa1591b5a

27-05-2008 22:25:34

Hi,





we decided to support CIF import in the short term (this year, though no specific deadline or release number is known at the moment), and mmCIF in the longer term only.


CIF export is not planned to be implemented.





Regards,


Miklos

User 02d47ec30e

20-06-2010 12:10:50










mvargyas wrote:
Hi,




we decided to support CIF import in the short term (this year, though no specific deadline or release number is known at the moment), and mmCIF in the longer term only.


CIF export is not planned to be implemented.




Regards,


Miklos

Your mail was from 2008; is there a working "cif-importer" meanwhile?


regrds kris

ChemAxon efa1591b5a

21-06-2010 14:34:50

Hi Kris,


I admit that we could not meet our promise regarding CIF/mmCIF import due to other priorities. The related task has not yet been scheduled for implementation.


Please accept my apologies for the inconvenience this missing feature might cause.


Regards


Miklos

User 02d47ec30e

21-06-2010 20:34:30










mvargyas wrote:

Hi Kris,


I admit that we could not meet our promise regarding CIF/mmCIF import due to other priorities. The related task has not yet been scheduled for implementation.


Please accept my apologies for the inconvenience this missing feature might cause.


Regards


Miklos



Hi Miklos than two other questions:


1) Is there any working example of "whatever" import filter for JChem available (for example plain xyz; so that I could try to write this filter myself)? Is this at all possible to extend JChem in such way? Sorry for this trivial question but I'm quite new to java and jsp. Till now I've made my web-sites only with php/mysql and other programs in F, C and long time ago in pascal, so it is quite new world for me.


2) do you need any help with crystallographic part (I believe that parsing of file should not be a problem - I could help with cif) or is it just time/priority problem?


 


regards kris

ChemAxon efa1591b5a

28-06-2010 09:16:40

Hi Kris,


Thank you for your kind response and for the help you offered. We may need an expert for the crystallographic as soon as we have the capacity to start work on CIF format import. It's not only the implementation but everything that is needed: documentation, testing, support, feature requests etc. that requires continues resources allocated, this is why it's hard to start such new development.


Regarding your 1st question about working example: the current implementation of file import/export is not open, not plug-and-play type, thus not extensible in a straightforward way. As for sample codes, only those that show the usage of the API are available, but there's nothing about possible extensions to support external formats, I'm afraid.


Regards


Miklos