[Solved] 2229604 – Compileserver crashes and remains offline after OOM event

fail: process hdbcompileserver hdb compileserver not running

The compileserver experienced an OOM (out of memory) dump followed by a signal 6 crash dump

[CRASH_EXTINFO]  Extended exception info: (0000-00-00 00:00:00 Local)
—-> Dump of siginfo contents <—-
signal:      6(SIGABRT)
code:        -6(SI_TKILL: signal send by tkill)

[CRASH_STACK]  Stacktrace of crash: (0000-00-00 00:00:00 Local)
—-> Pending exceptions (possible root cause) <—-
exception  1: no.2100002  (Basis/Diagnose/impl/FaultProtectionImpl.cpp:769)
Illegal call to exit(), _exit() or _Exit() detected
exception throw location:
1: 0x00007fa5c5269c8d in Diagnose::exitHandler(int)+0x59 at FaultProtectionImpl.cpp:769 (libhdbbasis.so)
2: 0x00007fa5c55b0cea in exit+0x16 at IsInMain.cpp:260 (libhdbbasis.so)
3: 0x00007fa581b2b560 in rml::internal::doInitialization()+0x190 (libiomp5.so)

After the crash the daemon disables automatic restart of the service:

[91521]{-1}[-1/-1] 0000-00-00 00:00:00.121961 i Daemon           TrexDaemon.cpp(13321) : process hdbcompileserver with pid 12301 exited because it caught signal 6
[91521]{-1}[-1/-1] 0000-00-00 00:00:00.121976 i Daemon           TrexDaemon.cpp(13345) : child <hdbcompileserver> terminates during startup -> disabled


Other Terms

hdbcompileserver, oom


Reason and Prerequisites

Prior to HANA SPS10, in the event that the compileserver is not granted more memory upon requesting it from the operating system (in other words an OOM event) the compileserver service will restart itself to recover from such error. During this process the compileserver service releases all of its allocated memory back into the free memory pool. However, it is possible that during the restart another service such as the indexserver claims the remaining free space and prevents the compileserver from allocating enough memory to complete the restart.

As a result, the daemon will stop trying to restart the compileserver service and keep it offline until it is manually started again. This outcome may lead to subsequent errors such as failing to execute stored procedures and system replication errors.



This behaviour is a limitation that was corrected in HANA Revision 100 (SPS10)

The workaround is to restart the entire database, or the individual HANA instance that owns the compileserver, or manually start the compileserver by using the following commands as user <sid>adm on the host that owns the compileserver:

> cdpy
> python servicecontrol.py addToDaemon <hostname>:3<instanceNumber>10 type=compileserver

Leave a Reply