Let me start by saying, node-rdkafka is a godsend. When we first started using it, the library was the only one fully compatible with the latest version of Kafka and the SSL and SASL features. I owe webmakersteve and other contributors all a six-pack of beer for making this possible (thank you!!!!).
With that said, I guarantee you will run into problems at some point using
node-rdkafka. I hope some of the lessons we learned at Peachjar will help you in work through these issues.
1. Understand that node-rdkafka is just Node.js bindings to librdkafa¹.
If you need to troubleshoot your
node-rdkafka library, you are going to be using/referencing
librdkafka. In fact, the
node-rdkafka configuration is simply a passthrough to the other library. You can find the definitive configuration reference to both libraries here: https://github.com/edenhill/librdkafka/blob/master/CONFIGURATION.md
¹. The library does have a little more (streams and admin API). But at its core, it is primarily leveraging
2. Start by getting your configuration working with kafkacat.
kafkacat is a utility for interacting with Kafka clusters (including producing and consuming messages). The utility is built on top of
librdkafka. This means that the configuration options you use with
kafkacat will be identical to those used with
When developing Peachjar's Service Framework (PSF), I included a fully-configured
kafkacat instance in our repo. This helped verify that our Kafka settings were correct, as well as, eased access to Kafka for our developers.
I created this BASH script for configuring
kafkacat against multiple environments (it assumes relative locations to certain things like SSL certs):
HOST=$([ -n "$KAFKA_HOST" ] && echo "$KAFKA_HOST" || echo "localhost:9091")
CMD="kafkacat -b $HOST"
if ([ -n "$KAFKA_DEBUG" ]); then
CMD="$CMD -v -X debug=generic,broker,security"
SSL_KEY=$([ -n "$KAFKA_SSL_KEY_PATH" ] && echo "$KAFKA_SSL_KEY_PATH" || echo "./etc/security/client.key")
SSL_CERT=$([ -n "$KAFKA_SSL_CERT_PATH" ] && echo "$KAFKA_SSL_CERT_PATH" || echo "./etc/security/client.certificate.pem")
SSL_CA=$([ -n "$KAFKA_SSL_CA_PATH" ] && echo "$KAFKA_SSL_CA_PATH" || echo "./etc/security/ca.pem")
# Note: Aiven uses security protocol SSL, while the local instance uses SASL_SSL.
SEC_PROTO=$([ -n "$KAFKA_SECURITY_PROTOCOL" ] && echo "$KAFKA_SECURITY_PROTOCOL" || echo "SASL_SSL")
CMD="$CMD -X security.protocol=$SEC_PROTO -X ssl.key.location=$SSL_KEY -X ssl.certificate.location=$SSL_CERT"
CMD="$CMD -X ssl.ca.location=$SSL_CA"
if ([ -n "$KAFKA_SSL_KEY_PASSWORD" ]); then
CMD="$CMD -X ssl.key.password=$KAFKA_SSL_KEY_PASSWORD"
if ([ -n "$KAFKA_USERNAME" ]); then
CMD="$CMD -X sasl.mechanisms=PLAIN -X sasl.username=$KAFKA_USERNAME -X sasl.password=$KAFKA_PASSWORD"
echo "EXECUTING: $CMD"
Then we added script tasks in
package.json to make it easy to use:
"kafka:metadata": "npm run kafkacat -- -L",
"kafkacat": "KAFKA_SSL_KEY_PASSWORD=peachjar KAFKA_USERNAME=client KAFKA_PASSWORD=client-secret ./bin/kafkacat.sh",
Pay special attention to
-X debug=generic,broker,security. If you have a sophisticated Kafka setup (SSL, SASL), getting detailed log output is essential to diagnosing problems.
3. Node.js version compatibility can cause problems with
We were stuck for the longest time using Node 8 Alpine Docker image with
node-rdkafka (https://github.com/Blizzard/node-rdkafka/issues/673). Part of this issue was caused by the version of
libssl compiled with Node.js on different platforms. As the Github thread indicates, some developers were able to work through this issue by compiling
librdkafka from source. We have since been able to do this with Node 10 on Alpine, but could not get Node 12 (LTS) to work.
These kinds of version compatibility issues will be a continuing source of frustration for your developers and DevOps folks. Working through these issues takes a lot of effort and you will likely wonder why you are using Kafka in the first place!
4. Use the OS installation of
librdkafka or build from source.
When you include
node-rdkafka into your project, the default behavior will cause NPM to build
librdkafka from source when generating the
node-gyp bindings. This can be a real pain in the ass if you have a bunch of services on the same machine using the same library.
There's a little known trick to avoid this. You can force
node-rdkafka to use the system's version of
librdkafka by specifying the
BUILD_LIBRDKAFKA=0 environment variable.
Note: Unfortunately, the
node-gyp bindings tend to always rebuild on
npm i, but this will finish in under 20s compared to a minute or two if you have to build
librdkafka from source.
You can install
librdkafka easily on most platforms. In OSX, there is a Homebrew package for it:
On Linux, you can add Aptitude or Yum repositories provided by Confluent.io.
Installing from source is also pretty easy:
git clone https://github.com/edenhill/librdkafka.git && \
cd librdkafka && \
./configure --install-deps && \
make && \
5. Node devs are going to hate using Kafka.
It pains me to say this, especially as one that so often advocates using Kafka, but the developer experience really sucks. Kafka and
node-rdkafka were by far our developers' least favorite part of development.
People were constantly forgetting to create topics in Kafka before starting services (causing events to drop or clients to crash). We added guards to ensure topics existed before a server would start (or crash if they didn't exist). This became a very common cause of failures. Initial attempts to auto-create topics were clunky. By the time
node-rdkafka introduced the admin API, I think our developers had finally figured out how to triage the problem (so there was less of a concern to address this).
The Kafka infrastructure is also pretty heavy-weight for local development. At the minimum, you are going to need a Zookeeper and Kafka broker instance (2 processes). These are Java processes, too, which tend to gobble up all the remaining RAM on your machine. If you want to hear your Macbook Pro turn into a wind tunnel, add the Kafka REST server, Schema Registry and Connect Platform.
Another common problem we experience is a race condition starting the Kafka broker and applications that rely on it. If you have a bunch of historical data on your broker, it can take a good 1-2 minutes (on a laptop) to come online while it rebalances. In the meantime, your services are in a crash loop waiting for the broker to come online. This isn't a deal-breaker, but it's annoying to have to constantly explain to devs.
Our biggest problem with
node-rdkafka was the length of compilation of
librdkafka, which would happen at unpredictable times. We also had a microservices environment with numerous services using
node-rdkafka. Until we figured out the
BUILD_LIBRDKAFKA=0 trick, we would often waste an hour or two rebuilding projects.
If you can, I highly recommend abstracting Kafka out of local development (unless you are building analytic pipelines) and use something lighter-weight (easier to maintain). This could be an HTTP endpoint that triggers your event handler or a NATS broker if you have an EDA (event-driven architecture). If you do this, you will need to make an investment in integration testing to make sure your code works in staging/production.
Kafka is a fantastic technology for moving data around but can be a real pain to develop against (especially in Node.js). If you use
node-rdkafka, you are bound to encounter compatibility issues as you upgrade the library or versions of Node.js. I recommend you use a system installation of
librdkafka and the
BUILD_LIBRDKAFKA=0 flag to prevent the recompilation of the library on
npm install. Configuring Kafka can be complicated, especially when you use SSL and SASL.
kafkacat is an excellent tool for testing configuration options and debugging problems. Finally, development with Kafka can be a real pain.
Finally, do what you can to abstract the broker from your code. Developers should have easy ways to simulate producing and consuming messages without needing all of the infrastructure running on their machine. However, if you do abstract Kafka/
node-rdkafka, make sure you have good integration tests to verify services use the frameworks correctly.
This is a Docker build for Node 10 (Alpine) that preinstalls
librdkafka. This is a fragment of a couple of images we build (I have not tested this specifically), but it should work. We create a base image similar to this and have our service images extend it.
It's likely you won't need all of these libraries in your build (I don't have time to prune and test them).
RUN apk add --no-cache --update bash ca-certificates curl gnupg \
g++ make lz4-dev musl-dev cyrus-sasl-dev openssl-dev \
python unzip wget jpeg-dev pango-dev cairo-dev pixman-dev \
RUN apk add --no-cache --virtual .build-deps gcc zlib-dev libc-dev \
bsd-compat-headers py-setuptools bash
RUN git clone https://github.com/edenhill/librdkafka.git && \
cd librdkafka && \
./configure --install-deps && \
make && \