Thoughts on using Kafka with Node.js (node-rdkafka)
Let me start by saying, node-rdkafka is a godsend. When we first started using it, the library was the only one fully compatible with the latest version of Kafka and the SSL and SASL features. I owe webmakersteve and other contributors all a six-pack of beer for making this possible (thank you!!!!).
With that said, I guarantee you will run into problems at some point using node-rdkafka
. I hope some of the lessons we learned at Peachjar will help you in work through these issues.
1. Understand that node-rdkafka is just Node.js bindings to librdkafa¹.
If you need to troubleshoot your node-rdkafka
library, you are going to be using/referencing librdkafka
. In fact, the node-rdkafka
configuration is simply a passthrough to the other library. You can find the definitive configuration reference to both libraries here: https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
¹. The library does have a little more (streams and admin API). But at its core, it is primarily leveraging
librdkafka
.
2. Start by getting your configuration working with kafkacat.
kafkacat
is a utility for interacting with Kafka clusters (including producing and consuming messages). The utility is built on top of librdkafka
. This means that the configuration options you use with kafkacat
will be identical to those used with node-rdkafka
.
When developing Peachjar's Service Framework (PSF), I included a fully-configured kafkacat
instance in our repo. This helped verify that our Kafka settings were correct, as well as, eased access to Kafka for our developers.
I created this BASH script for configuring kafkacat
against multiple environments (it assumes relative locations to certain things like SSL certs):
#!/usr/bin/env bash
OPTS=$@
HOST=$([ -n "$KAFKA_HOST" ] && echo "$KAFKA_HOST" || echo "localhost:9091")
CMD="kafkacat -b $HOST"
if ([ -n "$KAFKA_DEBUG" ]); then
CMD="$CMD -v -X debug=generic,broker,security"
fi
SSL_KEY=$([ -n "$KAFKA_SSL_KEY_PATH" ] && echo "$KAFKA_SSL_KEY_PATH" || echo "./etc/security/client.key")
SSL_CERT=$([ -n "$KAFKA_SSL_CERT_PATH" ] && echo "$KAFKA_SSL_CERT_PATH" || echo "./etc/security/client.certificate.pem")
SSL_CA=$([ -n "$KAFKA_SSL_CA_PATH" ] && echo "$KAFKA_SSL_CA_PATH" || echo "./etc/security/ca.pem")
# Note: Aiven uses security protocol SSL, while the local instance uses SASL_SSL.
SEC_PROTO=$([ -n "$KAFKA_SECURITY_PROTOCOL" ] && echo "$KAFKA_SECURITY_PROTOCOL" || echo "SASL_SSL")
CMD="$CMD -X security.protocol=$SEC_PROTO -X ssl.key.location=$SSL_KEY -X ssl.certificate.location=$SSL_CERT"
CMD="$CMD -X ssl.ca.location=$SSL_CA"
if ([ -n "$KAFKA_SSL_KEY_PASSWORD" ]); then
CMD="$CMD -X ssl.key.password=$KAFKA_SSL_KEY_PASSWORD"
fi
if ([ -n "$KAFKA_USERNAME" ]); then
CMD="$CMD -X sasl.mechanisms=PLAIN -X sasl.username=$KAFKA_USERNAME -X sasl.password=$KAFKA_PASSWORD"
fi
CMD="$CMD $OPTS"
echo "------------------------------------------------"
echo "EXECUTING: $CMD"
echo "------------------------------------------------"
eval "${CMD}"
Then we added script tasks in package.json
to make it easy to use:
{
"name": "@peachjar/service-framework",
"scripts": {
"kafka:metadata": "npm run kafkacat -- -L",
"kafkacat": "KAFKA_SSL_KEY_PASSWORD=peachjar KAFKA_USERNAME=client KAFKA_PASSWORD=client-secret ./bin/kafkacat.sh",
},
}
Pay special attention to KAFKA_DEBUG
(i.e. -X debug=generic,broker,security
. If you have a sophisticated Kafka setup (SSL, SASL), getting detailed log output is essential to diagnosing problems.
3. Node.js version compatibility can cause problems with node-rdkafka
.
We were stuck for the longest time using Node 8 Alpine Docker image with node-rdkafka
(https://github.com/Blizzard/node-rdkafka/issues/673). Part of this issue was caused by the version of libssl
compiled with Node.js on different platforms. As the Github thread indicates, some developers were able to work through this issue by compiling librdkafka
from source. We have since been able to do this with Node 10 on Alpine, but could not get Node 12 (LTS) to work.
These kinds of version compatibility issues will be a continuing source of frustration for your developers and DevOps folks. Working through these issues takes a lot of effort and you will likely wonder why you are using Kafka in the first place!
4. Use the OS installation of librdkafka
or build from source.
When you include node-rdkafka
into your project, the default behavior will cause NPM to build librdkafka
from source when generating the node-gyp
bindings. This can be a real pain in the ass if you have a bunch of services on the same machine using the same library.
There's a little known trick to avoid this. You can force node-rdkafka
to use the system's version of librdkafka
by specifying the BUILD_LIBRDKAFKA=0
environment variable.
Note: Unfortunately, the
node-gyp
bindings tend to always rebuild onnpm i
, but this will finish in under 20s compared to a minute or two if you have to buildlibrdkafka
from source.
You can install librdkafka
easily on most platforms. In OSX, there is a Homebrew package for it:
brew install librdkafka
On Linux, you can add Aptitude or Yum repositories provided by Confluent.io.
Installing from source is also pretty easy:
git clone https://github.com/edenhill/librdkafka.git && \
cd librdkafka && \
./configure --install-deps && \
make && \
make install
5. Node devs are going to hate using Kafka.
It pains me to say this, especially as one that so often advocates using Kafka, but the developer experience really sucks. Kafka and node-rdkafka
were by far our developers' least favorite part of development.
People were constantly forgetting to create topics in Kafka before starting services (causing events to drop or clients to crash). We added guards to ensure topics existed before a server would start (or crash if they didn't exist). This became a very common cause of failures. Initial attempts to auto-create topics were clunky. By the time node-rdkafka
introduced the admin API, I think our developers had finally figured out how to triage the problem (so there was less of a concern to address this).
The Kafka infrastructure is also pretty heavy-weight for local development. At the minimum, you are going to need a Zookeeper and Kafka broker instance (2 processes). These are Java processes, too, which tend to gobble up all the remaining RAM on your machine. If you want to hear your Macbook Pro turn into a wind tunnel, add the Kafka REST server, Schema Registry and Connect Platform.
Another common problem we experience is a race condition starting the Kafka broker and applications that rely on it. If you have a bunch of historical data on your broker, it can take a good 1-2 minutes (on a laptop) to come online while it rebalances. In the meantime, your services are in a crash loop waiting for the broker to come online. This isn't a deal-breaker, but it's annoying to have to constantly explain to devs.
Our biggest problem with node-rdkafka
was the length of compilation of librdkafka
, which would happen at unpredictable times. We also had a microservices environment with numerous services using node-rdkafka
. Until we figured out the BUILD_LIBRDKAFKA=0
trick, we would often waste an hour or two rebuilding projects.
If you can, I highly recommend abstracting Kafka out of local development (unless you are building analytic pipelines) and use something lighter-weight (easier to maintain). This could be an HTTP endpoint that triggers your event handler or a NATS broker if you have an EDA (event-driven architecture). If you do this, you will need to make an investment in integration testing to make sure your code works in staging/production.
Conclusion
Kafka is a fantastic technology for moving data around but can be a real pain to develop against (especially in Node.js). If you use node-rdkafka
, you are bound to encounter compatibility issues as you upgrade the library or versions of Node.js. I recommend you use a system installation of librdkafka
and the BUILD_LIBRDKAFKA=0
flag to prevent the recompilation of the library on npm install
. Configuring Kafka can be complicated, especially when you use SSL and SASL.
kafkacat
is an excellent tool for testing configuration options and debugging problems. Finally, development with Kafka can be a real pain.
Finally, do what you can to abstract the broker from your code. Developers should have easy ways to simulate producing and consuming messages without needing all of the infrastructure running on their machine. However, if you do abstract Kafka/node-rdkafka
, make sure you have good integration tests to verify services use the frameworks correctly.
Bonus Content
This is a Docker build for Node 10 (Alpine) that preinstalls librdkafka
. This is a fragment of a couple of images we build (I have not tested this specifically), but it should work. We create a base image similar to this and have our service images extend it.
It's likely you won't need all of these libraries in your build (I don't have time to prune and test them).
FROM node:10-alpine
RUN apk add --no-cache --update bash ca-certificates curl gnupg \
g++ make lz4-dev musl-dev cyrus-sasl-dev openssl-dev \
python unzip wget jpeg-dev pango-dev cairo-dev pixman-dev \
git
RUN apk add --no-cache --virtual .build-deps gcc zlib-dev libc-dev \
bsd-compat-headers py-setuptools bash
RUN git clone https://github.com/edenhill/librdkafka.git && \
cd librdkafka && \
./configure --install-deps && \
make && \
make install
ENV BUILD_LIBRDKAFKA=0
ENV LD_LIBRARY_PATH=/usr/local/lib
You might also be interested in these articles...
Stumbling my way through the great wastelands of enterprise software development.