Please remove truncation from fields [...truncated...]

User 677b9c22ff

24-10-2007 03:06:08

Hi,

I imported some spectral data (x/y values) into Instant-JChem and

was happy because I could merge DBs very easily (something I had

to use a parser before). However after exporting the datafiles

I had around 1900 truncated datasets from a small 10k Library.

[...truncated...]

The horror! It actually destroyed my spectral dataset.

I know its not important but would it be possible to remove this limitation?

After all its a database so it should not care about space used.

Also a warning would be nice if truncation occurs during import or export.

Thank you

Tobias

ChemAxon fa971619eb

24-10-2007 08:34:15

Tuncation occurs when the length of text data is longer than the length of the column you created. Without truncation inserting the data would either result in an error, or the data being silently truncated (e.g user not knowing about it), depending on the type of databas used. Hence why we add the [...truncated...] message so that the user is aware that this has happened.

The solution is simple - create a column whose length is adequate for the data it needs to store. By default you will probably find that the length of text columns is 1000, which is sufficient for most normal data. But where you need more you can increase this, and/or change the column type.

Please note that all databases differ slightly here, and the column types you can use and their length limitations are all different. Most notably Oracle's VARCHAR2 column (the default used by IJC) is limited to 4000 characters, and if you need to go over this you should select the CLOB column type. MySQL and Derby are more tolerant here, but for very long values you might need to use a different column type.

You can of course also decrease the column length if you are sure that 1000 is much too long for your particular column. This may result in a marginal improvement of efficiency.

Tim

User 677b9c22ff

24-10-2007 21:05:37

Hi,

Thank you.

Its all about automation, I would have never looked again at the exported

data because if I import without problems and export without problems

I usually assume everything is fine. Even ACCESS (is it a database or

a list handler?) gives me a truncation warning.

Still a truncation warning would be nice during import.

The suggestions you made are good and I tried, but I have 600 x/y

values and spaces or semicolon between, how do I convert them

to VAR? I have to screen manually through thousand of values,

export them with x/y and then calculate how many bytes it is.

Again this could be done automatically by Instant-JCHEM.

The Oracle and Derby arguments are fine, still just its because

almighty Oracle doesn't mean its right or a good standard.

I used the CLOB as you suggested and maybe it would be good

to parse some more entries at the beginning and auto-suggest

to use CLOB if truncation occurs.

I know its hard to implement everything but, there it comes again,

how much should I know about database design? The average user

wouldn't know. Such data could be much better handled with

plugins for MS or NMR data. But I am not quite sure if there is a demand for that.

Much more severe in terms of database issues is the column count

restriction, do you think thats expandable? I mean MS ACCESS 2003

(8-bit) still can only read 256 columns, and Derby is the same I guess.

Tobias

ChemAxon fa971619eb

27-10-2007 10:33:16

We will look into making the import more tolerant of long text fields. But I think what we can do here will be somewhat limited, as the restriction is the underying database, not IJC.

We do plan to add support for spectral data in the future. See

http://www.chemaxon.com/forum/ftopic2646.html

The limit for the number of fields is caused by the number of columns that are allowed in the underlying database table. In Derby this is 1024, in Oracle it is 1000 and in MySQL it is 4096. We cannot change this. In future we plan to add pivotted table types which will provide an alternative approach to handling data where there are large number of fields, but this development is relatively long term.

Tim