ORA-600, [peshmgel: Table size], [8388608]

User 7f33ec9a5c

07-10-2016 16:19:26

Hi,

I'm bulk-loading chemical catalogs into a JChem indexed table, committing every 100 inserts, and every 10,000 to 50,000 structures, the cartridge blows up with the following error:

System.Data.OleDb.OleDbException (0x80040E14): ORA-00600: internal error code, arguments: [peshmgel: Table size], [0x1F5FF19E8], [8388608], [], [], [], [], [], [], [], [], []
ORA-06512: at "JCHEM.JCHEM_CORE_PKG", line 326
ORA-06512: at "JCHEM.JCF", line 784
ORA-06512: at "JCHEM.JCF", line 759

Our loader is pretty resilient, so when the cartridge blows up, the loader automagically shuts down the database connection, backs up to the last commit, and starts the load again. When the load resumes, the insert which caused the error works just fine. This error and successful recovery has happened 10 times so far during the load.

I am concerned about this because running the exact same load over again does not produce the same crash in the same place. It's almost like JChem is building some temp structures in the background, and they get full and crash? So Terminating/restarting the session fixes the issue? Very strange.

ChemAxon abe887c64e

10-10-2016 14:22:20

Hi Michael,

As ORA-600 is an internal Oracle error, our first guess is that there might be any hardware related issue in your database. Would you check it ?

We'll investigate the lines in JChem codes referred in the error message and will be back with our results.

Best regards,

Krisztina

User 7f33ec9a5c

10-10-2016 19:32:21

kvajda wrote:

Hi Michael,

As ORA-600 is an internal Oracle error, our first guess is that there might be any hardware related issue in your database. Would you check it ?

Krisztina,

ORA-600 is thrown for any unhanded oracle error. Nothing in that error suggests a hardware failure, our database shows no sign of any hardware falure.

The Oracle Docs for ORA-600 specifically emphasize looking at the ARGUMENTS associated with ORA-600 to learn more about the cause of the error: http://www.oracle.com/technetwork/issue-archive/2011/11-sep/o51support-453463.html

Looking at the thrown error:

ORA-00600: internal error code, arguments: [peshmgel: Table size], [0x1F5FF1BF8], [8388608], [], [], [], [], [], [], [], [], []

and googling (ORA-00600: [peshmgel: Table size]) the first relevant reference was to the ChemAxon forum where Francis reported an identical error:

https://www.chemaxon.com/forum/viewpost66995.html

Two identical errors on two separate installations of your cartridge, with no other mention of that error on the internet suggests strongly that your code has a serious bug that needs repair.

However, there is no way to be sure about this, because of the complete lack of error handling and stack-trace in your code. See my post: https://www.chemaxon.com/forum/ftopic15533.html for a description of that problem.

ChemAxon abe887c64e

12-10-2016 08:38:36

Hi Michael,

According to our best knowledge, ORA-600 is generated by the followings

time-outs,

file corruption,

failed data checks in memory, hardware, memory, or I/O messages,

incorrectly restored files

However, we will certainly check our codes if you describe which steps to follow to reproduce this error.

Furthermore, we kindly inspect the Oracle trace file - if you send it - for finding any information about the events around the ORA-600 in order to identify any possible JChem Cartridge specific error source and can reproduce the issue.

Best regards,

Kisztina

ChemAxon 25dcd765a3

17-10-2016 08:00:38

Hi,

Could you please send us steps-by-step instructions how to reproduce the issue?

That would highly help us to fix the bug.

best

User 7f33ec9a5c

18-10-2016 00:56:59

Hi,

I'm still trying to get you a useful reproduction. This is happening during a big data load, so I'm trying to cut it down to a more consice version which still crashes JCart.

IN the meantime here is an excerpt from the alert.log, with all the junk cut out (logfile changes etc....)

The data-loader runs the load on a thread, and when the thread crashes, it waits a while, backs up to the previous commit point and starts to load from the last commit point forward, so that is why the repeated crashes in the log file. We get 10K to 50K structures loaded between crashes.

The last two trace files assocated with the error at Mon Oct 17 17:19:30 2016 are attached as a zip file.

Mon Oct 17 11:09:24 2016

Errors in file /12c/diag/rdbms/discover/discover/trace/discover_ora_24460.trc (incident=93975):

ORA-00600: internal error code, arguments: [peshmgel: Table size], [0x1F5FF1A40], [8388608], [], [], [], [], [], [], [], [], []

Incident details in: /12c/diag/rdbms/discover/discover/incident/incdir_93975/discover_ora_24460_i93975.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Oct 17 11:09:40 2016

Dumping diagnostic data in directory=[cdmp_20161017110940], requested by (instance=1, osid=24460), summary=[incident=93975].

Mon Oct 17 11:09:41 2016

Sweep [inc][93975]: completed

Sweep [inc2][93975]: completed

Mon Oct 17 12:37:19 2016

Errors in file /12c/diag/rdbms/discover/discover/trace/discover_ora_5251.trc (incident=93408):

ORA-00600: internal error code, arguments: [peshmgel: Table size], [0x1F5FF1BF8], [8388608], [], [], [], [], [], [], [], [], []

Incident details in: /12c/diag/rdbms/discover/discover/incident/incdir_93408/discover_ora_5251_i93408.trc

Mon Oct 17 12:40:58 2016

Dumping diagnostic data in directory=[cdmp_20161017124058], requested by (instance=1, osid=5251), summary=[incident=93408].

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Oct 17 12:41:00 2016

Sweep [inc][93408]: completed

Sweep [inc2][93408]: completed

Mon Oct 17 14:12:53 2016

Errors in file /12c/diag/rdbms/discover/discover/trace/discover_ora_14576.trc (incident=93482):

ORA-00600: internal error code, arguments: [peshmgel: Table size], [0x1E5DD7A20], [8388608], [], [], [], [], [], [], [], [], []

Incident details in: /12c/diag/rdbms/discover/discover/incident/incdir_93482/discover_ora_14576_i93482.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Oct 17 14:13:00 2016

Dumping diagnostic data in directory=[cdmp_20161017141300], requested by (instance=1, osid=14576), summary=[incident=93482].

Mon Oct 17 14:13:02 2016

Sweep [inc][93482]: completed

Sweep [inc2][93482]: completed

Mon Oct 17 14:36:36 2016

WARNING: inbound connection timed out (ORA-3136)

Mon Oct 17 15:43:05 2016

Errors in file /12c/diag/rdbms/discover/discover/trace/discover_ora_23524.trc (incident=93650):

ORA-00600: internal error code, arguments: [peshmgel: Table size], [0x1C27B7D40], [8388608], [], [], [], [], [], [], [], [], []

Incident details in: /12c/diag/rdbms/discover/discover/incident/incdir_93650/discover_ora_23524_i93650.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Oct 17 15:43:11 2016

Sweep [inc][93650]: completed

Sweep [inc2][93650]: completed

Mon Oct 17 15:43:11 2016

Dumping diagnostic data in directory=[cdmp_20161017154311], requested by (instance=1, osid=23524), summary=[incident=93650].

Mon Oct 17 15:53:00 2016

opiodr aborting process unknown ospid (26577) as a result of ORA-28

Mon Oct 17 17:00:23 2016

WARNING: Heavy swapping observed on system in last 5 mins.

pct of memory swapped in [1.20%] pct of memory swapped out [2.85%].

Please make sure there is no memory pressure and the SGA and PGA

are configured correctly. Look at DBRM trace file for more details.

Errors in file /12c/diag/rdbms/discover/discover/trace/discover_dbrm_6592.trc (incident=92480):

ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed], [], [], [], [], [], [], [], [], [], [], []

Incident details in: /12c/diag/rdbms/discover/discover/incident/incdir_92480/discover_dbrm_6592_i92480.trc

Mon Oct 17 17:00:26 2016

Dumping diagnostic data in directory=[cdmp_20161017170026], requested by (instance=1, osid=6592 (DBRM)), summary=[incident=92480].

Mon Oct 17 17:00:27 2016

Sweep [inc][92480]: completed

Sweep [inc2][92480]: completed

Mon Oct 17 17:19:30 2016

Errors in file /12c/diag/rdbms/discover/discover/trace/discover_ora_32401.trc (incident=93626):

ORA-00600: internal error code, arguments: [peshmgel: Table size], [0x1CD2B92E8], [8388608], [], [], [], [], [], [], [], [], []

Incident details in: /12c/diag/rdbms/discover/discover/incident/incdir_93626/discover_ora_32401_i93626.trc

Use ADRCI or Support Workbench to package the incident.

See Note 411.1 at My Oracle Support for error and packaging details.

Mon Oct 17 17:19:35 2016

Dumping diagnostic data in directory=[cdmp_20161017171935], requested by (instance=1, osid=32401), summary=[incident=93626].

Mon Oct 17 17:19:36 2016

Sweep [inc][93626]: completed

Sweep [inc2][93626]: completed

Mon Oct 17 17:29:05 2016

User 7f33ec9a5c

18-10-2016 01:24:29

Hi,

Our loader always crashes on the PL/SQL line that calls jcf.molconvertv(). (the ORA-600 error)
The PL/SQL calls many other JCart functions as well as inserting to a JCart indexed table, but that particular function is where it crashes (see my eariler post with the trace file in Archive.zip).

So, to try and get a reproduceable error, I wrote a PL/SQL procedure that only calls molcovert (equivalent to the following) and it is filling up the PGA.

declare

sMOl CLOB;

sSMILES varchar2(4000);

begin

for i in (select SMILES

from table_with_14526137_unique_smiles) loop

begin

sMOL:= jcf.molconvertc( i.SMILES, 'mol');

sSMILES := jcf.molconvertv(sMOL, 'smiles');

exception

when others then

LogErrMessageToTable();

end;

end loop;

end;

================= here is the section of the error log with the crash caused =======================

Mon Oct 17 17:00:23 2016

WARNING: Heavy swapping observed on system in last 5 mins.

pct of memory swapped in [1.20%] pct of memory swapped out [2.85%].

Please make sure there is no memory pressure and the SGA and PGA

are configured correctly. Look at DBRM trace file for more details.

Errors in file /12c/diag/rdbms/discover/discover/trace/discover_dbrm_6592.trc (incident=92480):

ORA-00700: soft internal error, arguments: [kskvmstatact: excessive swapping observed], [], [], [], [], [], [], [], [], [], [], []

Incident details in: /12c/diag/rdbms/discover/discover/incident/incdir_92480/discover_dbrm_6592_i92480.trc

I attached zipped copies of the trace files, discover_dbrm_6592_i92480.trc seems to be a better description of the PGA that is being used.

ChemAxon abe887c64e

18-10-2016 12:18:29

Hi Michael,

Thank you for the information and for the trace files, we try to reproduce the issue in our test environment.

Would you also send us the trace<n>.log files found in <jchem_home_directory>/cartridge/logs/ folder .

In these files are the Oracle side JChem processes logged.

Thank you,

Krisztina

ChemAxon abe887c64e

11-11-2016 15:15:26

Hi Michael,

Sorry for the long silence. We tried but could not reproduce the ORA-600 error with the script you provided. Unfortunately, there wasn't any useful information found in the trace files. We are still waiting for the trace<n>.log files from the cartridge/logs folder of the jchem server.

Do you still experience this error ?

Best regards,

Krisztina

User 7f33ec9a5c

11-11-2016 16:53:26

Yes, we still see intermittent errors coming from the session generated by JCart.

I have not been able to generate a simple, contained, testable repro for this error, as it only occurs during long bulk-loading sessions that run for many hours (like 10-20 hour loads).

During these big loads, we use many JCart functions to standardize the structure, then use a regular table with a JCart index to register only unique structures, so each insert to the table indexed with JCart is preceded by a search of the index on that table.

To me, the error looks like a very slow memory leak that takes many operations before it is visible, and then crashes the JCart session on whatever operation manages finally overfill the PGA? For the code I sent you, it took ~600,000 iterations before it filled up the PGA on our server. This was very reproducible, and independent of the structures being used in the test, or other activity on the database. Anytime I ran that test code, it ran between 600,000 structures and 610,000 structures and then crashed when it had consumed all available space in the PGA.

ChemAxon abe887c64e

24-11-2016 09:53:29

Hi Michael,

We tried again to run your script on a 1 M data set, in different environments, but could not reproduce the error. So we still suggest to check the hardware components, and also to check the database and operational system settings different from the default values.

One additional comment: In the API documentation of the JChem Oracle Cartridge there is a proposal regarding the function jcf.molconvertc, the use a clob type third parameter is recommended:

jcf.molconvertc ( query_structure IN VARCHAR2/BLOB/CLOB, options_outputformat IN VARCHAR2, temp_clob CLOB) = CLOB;

Best regards,

Krisztina

User 7f33ec9a5c

28-11-2016 19:17:09

Krisztina,

Your link to Known Issues solved this! Thank You.

Following your link, I got to:

https://docs.chemaxon.com/display/docs/JChem+Cartridge+for+Oracle#JChemCartridgeforOracle-issues

Since we are seeing out-of-control growth of the PGA, and the issue states " dbms_lob.freetemporary does not free temporary BLOBs returned by JChem Cartridge functions."

I looked at V$TEMPORARY_LOBS with my test script running and sure enough, one entry in nocache_lobs was being created for every call to jcf.molconvertc().

Using the comments in Known Issues I modified my code to use the 3rd parameter of jcf.molcovertc as follows, and this fixed the issue.

dbms_lob.createtemporary(sMOL, TRUE);

sMOL:= jcf.molconvertc( i.SMILES, 'mol', sMOL);

sSMILES := jcf.molconvertv(sMOL, 'smiles');

dbms_lob.freetemporary(sMOL);

ChemAxon abe887c64e

29-11-2016 12:21:51

Hi Michael,

We are happy that your issue has been solved. Thank you for the feedback.

Krisztina

User aef72c4777

14-02-2017 05:59:53

ORA-600 is an internal error generated by the generic kernel code of the Oracle RDBMS software. It is different from other Oracle errors in many ways. The following is a list of these differences:

1. An ORA-600 error may or may not be displayed on the screen. Therefore, screen output should not be relied on for capturing information on this error. Information on ORA-600 errors are found in the database alert and trace files. We recommend that you check these files frequently for database errors. (See the Alert and Trace Files section for more information.)

2. Each ORA-600 error comes with a list of arguments They usually enclosed in square brackets and follow the error on the same line for example:

Possible causes include:

§ time-outs,

§ file corruption,

§ failed data checks in memory, hardware, memory, or I/O messages,

§ incorrectly restored files

§ a SELECT FROM DUAL statement in PL/SQL within Oracle Forms (you have to use SELECT FROM SYS.DUAL instead!)

How to Fix it

§ events that led up to the error

§ the operations that were attempted that led to the error

§ the conditions of the operating system and database at the time of the error

§ any unusual circumstances that occurred prior to receiving the ORA-00600 message.

§ contents of any trace files generated by the error

§ the relevant portions of the Alert file

§ in Oracle Forms PL/SQL, use SELECT FROM SYS.DUAL to access the system "dual" table

§ Sometimes gathering statistics on the involved tables resolves the problem.

§ Deleting statistics for a table involved would also solve the problem

§ Try to reduce the no of CTE blocks in Stored Procedure.It's work in some scenarios as oracle doesn't support more than 20 cte's in single SP.

More updates about Oracle errors and their fixes are mentioned here. You can get your errors fixed with simple steps.