Skip to content

Connection Pool Leak in Out-of-Band Management (OOB) background task causes Management Server crash (HikariPool MaxActive Exhaustion) #13382

@mortenstevens

Description

@mortenstevens

problem

We are experiencing a critical connection leak in Apache CloudStack 4.22.1.0 (running on Ubuntu 24.04). The background task for Out-of-Band Management (OOBM) leaks database connections every time it executes.

Even when setting wait_timeout on MySQL server side or trying to inject pool properties, the connections remain blocked within the Java application state as active, eventually hitting the db.cloud.maxActive threshold (default 250), which causes the management server to stop responding and throw SQLTransientConnectionException.

versions

CloudStack Version: 4.22.1.0
OS: Ubuntu 24.04.4 LTS
DB: MySQL 8.0.45
Java Version: 17.0.19

The steps to reproduce the bug

  1. Configure Out-of-Band Management (OOBM) for multiple physical hosts
  2. Set outofbandmanagement.background.task.execution.interval to a lower value for testing purposes (e.g., 60 or 300 to accelerate the leak
  3. Monitor the MySQL SHOW FULL PROCESSLIST; vs. the CloudStack Management Server Metrics over time.
    ...

What to do about it?

Every time the OOBM task runs, it opens 1 connection per configured host (3 connections in total for our setup). These connections are never returned to the HikariCP pool (missing .close() or unhandled exception block in the OOBM plugin execution layer).

The Mismatch between DB and Java Pool:

MySQL Side: Sells connections as Sleep. If MySQL kills them via wait_timeout, the sockets are closed on the network layer.

CloudStack/HikariCP Side: Because the leaked connections are still flagged as active (In-Use) by the OOBM thread, HikariCP never runs a health check on them and refuses to evict them via maxLifetime. The internal counter stays at active=250.

Once the counter hits 250, the management server crashes.

Logs & Error Stacktrace:

2026-06-08 21:14:28,070 ERROR [c.c.s.S.ManagementServerCollector] (StatsCollector-1:[ctx-fd90ba07]) (logid:182faa68) Error trying to retrieve management server host statistics com.cloud.utils.exception.CloudRuntimeException: Unable to find on DB, due to: cloud - Connection is not available, request timed out after 30000ms (total=250, active=250, idle=0, waiting=22)

Is there any workaround available? Maybe switching to dbcp?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions