How to easily create a healthcheck endpoint for your Phoenix app, the Elixir way
Monitor your services health in parallel using Elixir and Phoenix.
How often have you found yourself wondering why your GitHub powered CI was broken just to find out on GitHub Status that everything was screaming red?
While it’s quite frustrating having to wait before your new juicy commits can be deployed, at least the status page gives us a clue of what is going on.
Today we will be implementing a similar thing for our Phoenix application, leveraging the beautiful tools that OTP gives us.
Our end result will be an endpoint that returns the status of our internal services. The data returned will be structured like this:
Requirements
Since we are lucky enough to be using a language that allows us to easily squeeze performance out of our CPU, let’s make use of it!
We will monitor the status of our services concurrently and we want to be able to set a healthcheck time interval per service. In this way we can customize the behaviour of our monitor at a very low granularity and crank up the polling rate for the services that we care the most!
Architecture
Everything we just said just screams GenServers, doesn’t it?
We can spawn a GenServer for each service we want to monitor and supervise all of them using a supervisor that will take care or all our workers and make sure they are always up and running.
Fairly easy, isn’t it? Enough talking, let’s get our hands dirty now!
Implementation
The first thing we will do is initializing a new phoenix project and create our database.
mix phx.new phx_healthcheck --no-webpack --no-html --no-gettext --no-dashboardcd phx_healthcheckmix ecto.create
We then want to focus on our workers. Ideally, we want to abstract all the logic related to healthcheck polling and status retrieval in our worker (the GenServer) and isolate the healthcheck implementation in separate modules. In this way we will be able to use the same GenServer to monitor all our services, and we will just have to write the healthcheck logic for each of them.
To achieve this we will create the following behaviour:
For each service we want to monitor we will just need to create a module that implements this behaviour and provide a function that will contain the logic for the healthcheck.
Well, right now we really don’t have many services to monitor, but for the sake of this article we will create a module to check the status of the database connection, it will be enough to demonstrate the potential of this architecture.
The function check_status
tries to execute the simplest possible query on the database and, in case it’s successful, we can assume that the database connection is working properly.
Let’s move on to the worker now. As we said before, we can use a GenServer to cache the service health status and refresh it on a given interval.
The implementation above is quite straight forward. The only thing to notice is the use of handle_continue
. If you have never heard of it, handle_continue
is a relatively new feature of OTP (it was introduced in OTP 21) and it is very useful when we need to perform potentially long-running operations during a GenServer initialization. It enqueues a message in the mailbox of the process and guarantees us that that message will be the first to be processed by the GenServer. Performing long-running operations in GenServer init
function is otherwise very dangerous and can significantly slow down our application start up.
The next thing we will need is a supervisor to handle our beautiful workers and a couple of configuration lines where we enumerate all the services we want to monitor.
The name
field in the Database service configuration will be then used as the service name returned by our final endpoint.
In the supervisor init
function we fetch all the services to monitor from the application environment and create a child specification for each of them. In this way our supervisor will be able to spawn a GenServer for every service we want to monitor at the application startup. One thing to notice here is line 18: we had to explicitly set the id of the children because otherwise it would be set by default to the worker module name. Since the worker module will be the same for all the children, this would break our application when we try to monitor more than one service because children IDs must be unique.
We are almost there! For the sake of ease of use, we can add a function to our PhxHealthcheck.Healthcheck
module to retrieve our services status:
The last missing thing is our router and the controller to handle the requests:
The status
function of our controller reads all the services from the application environment and for each of them calls the get_service_status
function. This function, in return, makes a call to the GenServer responsible for that service and the worker will reply with the cached health status of the monitored service.
That’s it! We have just implemented our own status endpoint to check for our services health. To add more services to monitor it’s just about creating a module that implements the healthcheck behaviour we previously defined and add a new line to the configuration. Thanks to Elixir we can monitor hundreds of services with very little performance hit, since all our workers will run in parallel!
GitHub repo: https://github.com/lpeppe/phoenix-healthcheck