Container health checks

Open Klant is deployed as a collection of containers. Containers can be checked if they’re running as expected, and actions can be taken by the container runtime or container orchestration (like Kubernetes and Docker) when that’s not the case, like restarting the container or removing it from the pool that serves traffic.

Health checks are responsible for detecting anomalies and reporting that a container is not running as expected. They can take different forms, for example:

  • running a script and checking the exit code of the process

  • making an HTTP request to an endpoint which responds with a success or error status code

  • opening a TCP connection to a particular port

This section of the documentation describes the recommended health checks to use that are provided in Open Klant, or the health checks to implement in containers of third party software typically used in an Open Klant deployment. You can incorporate these in your infrastructure code (like Helm charts).

You can find code examples of these health checks in our docker-compose.yml on Github.

Open Klant containers

HTTP service

The Open Klant web service listens on port 8000 inside the container and accepts HTTP traffic. Three endpoints are exposed for health checks.

http://localhost:8000/_healthz/livez/

The liveness endpoint - checks that HTTP requests can be handled. Suitable for liveness (and readiness) probes. This is the check with lowest overhead.

http://localhost:8000/_healthz/

Endpoint that checks connections with database, caches, database migration state…

Suitable for the startup probe. The most expensive check to run, as it checks all dependencies of the application.

http://localhost:8000/_healthz/readyz/

The readiness endpoint - checks that requests can be handled and tests that the default cache (used by for sessions) and database connection function. Slightly more expensive than the liveness check, but it’s a good candidate for the readiness probe.

Tip

Ensure the ALLOWED_HOSTS environment variable contains localhost. See Environment configuration reference for more details.

Tip

The executable maykin-common is available in the container which can be used to perform the health checks, as an alternative to HTTP probes.

maykin-common health-check \
    --endpoint=http://localhost:8000/_healthz/livez/ \
    --timeout=3

Celery workers

The Celery Worker service is responsible for picking up and executing background tasks scheduled by the web service or Celery beat.

The worker creates and updates an event loop liveness file at /app/tmp/celery_worker_event_loop.live, which is touched every minute. Additionally, when the worker is ready to accept tasks, it creates the /app/tmp/celery_worker.ready file and removes it when the worker shuts down.

The worker liveness can be checked with the maykin-common CLI:

maykin-common worker-health-check \
    --broker redis://redis:6379/0 \
    --liveness-file /app/tmp/celery_worker_event_loop.live \
    --worker-name celery@docker

Caution

Adapt the --broker and --worker-name options to your environment.

  • --broker must match the value of the CELERY_BROKER setting.

  • --worker-name should not be necessary as it is taken from the CELERY_WORKER_NAME envvar if set, and otherwise falls back to celery@<hostname>, where the hostname of the container is used.

    If pings are failing, you may need to provide the worker name(s) explicitly.

Tip

You can also use the health checks for readiness in rolling deployments on Kubernetes, so that old pods are only stopped when the new versions are confirmed to be ready.

maykin-common worker-health-check \
 --skip-ping \
 --skip-event-loop-liveness \
 --no-skip-readiness \
 --readiness-file /app/tmp/celery_worker.ready

Celery flower

Celery Flower is a web-app which binds to port 5555 by default. You can use the generic HTTP health check utility from maykin-common, or set up an equivalent HTTP probe:

maykin-common health-check \
    --endpoint=http://localhost:5555/ \
    --timeout=3

Third party containers

Redis

The Redis container images include a command line utility - redis-cli which has a ping command to test connectivity to the server:

redis-cli ping

The command exits with exit code 0 on success and exit code 1 on failure.

PostgreSQL

Warning

Running the database as a container can bring certain scaling and disaster recovery challenges. We only provide this check for completeness sake.

PostgreSQL container images typically include the pg_isready binary, which tests the database connection (accepting traffic on the specified host and port). It has a non-zero exit code when the database is not ready.

nginx

nginx proxies HTTP traffic from the browser/client to the backend service. It also serves static assets directly. The nginx config needs to be extended with location handlers for the health checks. This ensures that the health endpoints are not accessible from outside.

Example nginx configuration snippet:

location = /_healthz/ {
    access_log off;
    add_header Content-Type text/plain;
    # block outside traffic
    allow 127.0.0.1;
    allow ::1;
    deny all;
    return 200 "ok\n";
}

location = /_healthz/livez/ {
    access_log off;
    add_header Content-Type text/plain;
    # block outside traffic
    allow 127.0.0.1;
    allow ::1;
    deny all;
    return 200 "ok\n";
}

location = /_healthz/readyz/ {
    access_log off;
    add_header Content-Type text/plain;
    # block outside traffic
    allow 127.0.0.1;
    allow ::1;
    deny all;
    return 200 "ok\n";
}

We recommend this cheap check for both the liveness and readiness checks.

You can then wire up an HTTP probe or curl script to make a GET call to http://localhost:8080/_healthz/livez/. Note the port number - often the nginx unprivileged image will be used, which binds to 8080 by default, but check your specific environment to confirm.

Smart readiness probe

You may want to consider proxying to the backend-service for the readiness check.

Warning

This can lead to cascading failures where first your backend-service becomes unavailable, which leads to nginx becoming unavailable and possible other dependent services.

Tip

Even if the backend is not available, nginx may still be performing useful work by serving static files.

Example nginx configuration snippet:

location = /_healthz/readyz/ {
    access_log off;
    # block outside traffic
    allow 127.0.0.1;
    allow ::1;
    deny all;

    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header X-Forwarded-Proto $scheme;
    proxy_set_header X-Forwarded-Host $server_name;
    proxy_set_header X-Scheme $scheme;
    proxy_pass   http://web:8000/_health/readyz/;
}