Hey friends, welcome back to WMMH! It's about a month since my last post -- my sincere apologies; work and vacation have sapped my free time. Fortunately, the extra time has given me more things to be happy about!
Over the last couple of weeks, I have been working on various projects up and down the stack. This work included building a support tool with React and Nodejs, troubleshooting deployments in Kubernetes, tuning metrics and service checks in DataDog and refactoring CI pipelines in CircleCI.
Personally, I found the breadth of the work quite invigorating for the start of the year, which is why I'm happy about the following things:
React w/out Redux
I started the month with the goal of building a UI to a support tool. At this point, it's important to note that I've become somewhat of a "pear-shaped" Fullstack, with most of my knowledge on the backend. Evaluating the right UI framework for the job sent me into temporary paralysis. In my defense, I've used many tools in the past, including React+Redux, Vue.js, Angular 1, etc.
Why is it that in 2019, I'm constantly reminded?:
I asked our Frontenders at work for some advice about how which technology to use and how to get started. I was quickly pointed to Create React App. More importantly, the consensus amongst the team was to minimize the use of global state (e.g. Redux) and prefer component-localized data retrieval and mutation via Apollo.
This was a pleasant surprise to me. I learned React+Redux at a time when the community was advocating for 100% of the state to be located in Redux and having built complex applications using the approach, I absolutely hated it . Removing Redux from the equation allowed me to rapidly build and ship the tool.
Let me caveat this by saying, Redux absolutely has its place in the modern Frontend stack, but probably not on small projects that can afford to reload state on every page. I was told one viable alternative to Redux was to imperatively update the Apollo store and allow Apollo to control the global state. I don't think this is an elegant solution and I would consider this a code smell indicating it's time to introduce a state management system like Redux.
Kubernetes Node Draining
Kube and I have been in a protracted romance (bromance?) for the last year and I can honestly say I'm more deeply in love as time goes on. I've used many deployment targets in the past (bare metal, VMs, ElasticBeanstalk, ECS, Mesos) and Kubernetes is by far the easiest and most powerful. One feature I was incredibly appreciative of this month was node draining.
We had an EC2 instance misbehaving and we were struggling to diagnose the problem. The instance was a part of an autoscale group with the exact same configuration as other nodes (all launched around the same time) and none of the other machines were experiencing the problem. Perhaps this wasn't the best idea (maybe we should have spent more time diagnosing), but our time is in short supply, so we decided to kill the node and allow a new one to respawn.
Now think about doing this in ECS. To do it safely, you would probably need multiple instances of the same container/service running on different EC2 instances in the cluster so when you kill the machine, you won't have downtime . However, Kubernetes makes it safe an easy to move the workload from the node without disrupting applications:
kubectl drain <node name>
Once you've drained the instance, you can perform maintenance on the machine or simply kill it. If you are performing maintenance, you can make the node eligible for container placement executing the
kubectl uncordon <node name>
For more information about this feature, refer to the documentation here: https://kubernetes.io/docs/tasks/administer-cluster/safely-drain-node/
Finally, the last thing I'm happy about this week is customizing DataDog.
Rolling your own logging and metrics infrastructure is not worth the effort. One mistake I've made in the past is thinking I've saved the company money by standing up my own instances of ElasticSearch, Logstash, Grafana, Kibana, InfluxDB, Telegraf, Jaeger, etc. We developers tend to underestimate the difficulty of maintaining this infrastructure. I'm reminded of the many times I had to stop development work because ElasticSearch ran out of disk space or we hit the tag limit on Influx.
Many organizations overlook the resource costs of building and maintaining monitoring infrastructure. Is the money you save insourcing logs and metrics more than the potential revenue generated by new features? I doubt it. This was a lesson that took way too long for me to learn, and I think it's a trap many developers fall into. I used to justify the "roll your own" argument by assuming services like DataDog cost too much. However, having recently seen our monthly bill, I've completely changed my view. While I can't divulge the actual cost, for a 69 host / 100m+ logs per week integration, we are paying approximately 1/6th the cost of a full-time DevOps engineer -- and that doesn't include hosting fees!
Note: I am not sponsored by DataDog. But if they are inclined to send me a shirt, I would happily wear it.
We've been using DataDog for a long time, but we never really customized it for our infrastructure and use cases. We were collecting a lot of data, but we weren't organizing it to our advantage. Over the last couple of weeks, we dedicated some effort towards creating log pipelines and alerts, integrating third-party tools like Github and CircleCI, and building dashboards for our services. As we start to take more ownership of the product we are really starting to see the benefits. We now have a clearer view of our system and have already mitigated issues that would have led to downtime with our system.
. To be fair, Dan Abramov said this a long time ago (https://medium.com/@dan_abramov/you-might-not-need-redux-be46360cf367). Unfortunately, development can sometimes have a mob mentality pushing you in the wrong direction.
. Correct me if I'm wrong here. Is there a better way to move load off of an EC2 instance in AWS?