Writing Better Apps: Implementing Configuration

Richard Clayton

Stumbling my way through the great wastelands of enterprise software development.

"Configuration" is the settings your application uses to change its behavior at runtime. Configuration might include simple values that adjust timeouts on requests to more complicated settings that swap out database or cloud service providers.

The process of collecting and validating configuration is critical to a stable application. The most common cause of deployment failure is misconfiguration. A robust configuration system exhibits the following attributes:

Simple
Strict
Fails Fast
Layered

Simple

Configuration is essential to your application starting. You should prefer methods that maximize the chance of this phase working. This means prefer synchronous sources, standard formats, and sensible defaults.

Environment variables are preferred. This is the most reliable mechanism for sourcing configuration. Environment variables work the same on most platforms (as a dictionary/map of key-value pairs available to a process). They are generally pretty easy to use:

const dbHost = process.env.DB_HOST

All popular deployment frameworks have some way of seeding a deployment's environment with these variables. The only real drawback to environment variables is that they are accessible to operators (so they might not be appropriate for high-security environments).

Files sources are also useful, but have some complexity due to potential failure modes (missing, insufficient permissions, unparsable). If you choose to use files, use standardized formats. Your language or platform will typically have an idiomatic solution (JSON for Node.js, YAML, for Python). Do not create custom formats! Custom formats increase the complexity of your configuration system with virtually no reward.

// Javascript Configuration is impossibly easy:
const config = require('/etc/my-app/config.json')

const dbHost = config.db.host

// Alternatively, you could source something like XML (why, IDK!?).
import { readFileSync } from 'fs'
import libxml from 'libxmljs'

const xmlStr = readFileSync('/etc/my-app/config.xml')
const xmlDoc = libxml.parse(xmlStr)

const dbHost = xmlDoc.get('//db/host').text()

Another valid, but less recommended strategy are networked sources of configuration. Networked sources can give you centralized control of configuration, but also increases the likelihood and time to report failures.

import Axios from 'axios'

const environment = process.env.NODE_ENV
const configToken = process.env.CONFIG_TOKEN

const url = `https://config.mycompany.com/service/my-service/env/${environment}?token=${configToken}`

const res = await Axios.get(url, {
  timeout: 10000, // 10s
})

const dbHost = res.data.db.host

Configuration should also not be a pain in the butt to maintain. It may seem like a great idea to be able to change every setting in your application, but in practice, "You Ain't Gonna Need It" (YAGNI). Instead of specifying every configurable item in the environment or file, rely on sensible defaults. Try to use the fewest possible configuration items possible. If you use defaults, those defaults should be production values. We will talk more about this in "layered."

Strict

Configuration from dynamic sources needs to be validated. Applications die all the time because an expected variable is missing or in the wrong format. Imagine passing null or a number as a hostname to a database connection. Most likely, a connection will fail, and the app will crash (hopefully). A more challenging problem to diagnose are variables outside a valid range of values. For example, timeouts for requests can be set too low or high. Misconfigurations may give operators/admins the perception that the service is running correctly, but the service may be behaving incorrectly in actuality.

Validating configuration can be as straight-forward as making assertions:

const fail = (message: string) => {
   console.error(message)
   process.exit(1)
}

const checkExists = (envVar: string): string => {
   if (!process.env[envVar]) {
     fail(`${envVar} is missing from the environment and is required.`)
   }
}

const checkBetween = (envVar: string, lowBound: number, highBound: number): number => {
   checkExists(envVar)
   const actual = parseInt(envVar, 10)
   if (isNan(actual)) {
      fail(`${envVar} is not a number`)
   }
   if (actual < lowBound) {
      fail(`${envVar} is lower than accepted low bound of ${lowBound}`)
   }
   if (actual > highBound) {
      fail(`${envVar} is higher than accepted high bound of ${lowBound}`)
   }
}

const dbHost = checkExists('DB_HOST')
const dbPort = checkBetween('DB_PORT', 1023, 65536)

But this kind of validation logic is tedious. A better approach is to use an input validation framework (the kind you use for forms) against your environment:

// Joi is my favorite.
import Joi from 'joi'

type Environment = {
  DB_HOST: string,
  DB_PORT: number,
}

const EnvironmentSchema = Joi.object().keys({
  DB_HOST: Joi.string().required(),
  DB_PORT: Joi.number().integer().min(1024).max(65535).required()
})
.unknown() // Allow variables not specified in the keys

const { error, value } = Joi.validate(process.env, EnvironmentSchema)

if (error) {
  console.error('An error occurred validating the environment:', error.message)
  process.exit(1)
}

// Joi will automatically parse strings to numbers, booleans, arrays, etc.
const { DB_HOST: dbHost, DB_PORT: dbPort } = value as Environment

Fail Fast

Terminate the application immediately if the configuration is invalid or unresolvable.

If the process cannot find/read configuration from its source, terminate the application immediately. If the configuration state is invalid, quit the application immediately. Don't try to recover applications from hopeless situations. If an application cannot find a file when it first starts, it's unlikely to find that file on the Nth attempt; the same is valid for network sources. Subsequent retries will not remediate configuration failures most of the time if the problem is an invalid configuration setting.

Retries are typically an answer to race conditions in the infrastructure. Most commonly, this is where the system requesting the config is deployed at the same time as its configuration source. The app will start faster than the configuration provider, and then block until that provider comes online. I see this mostly in development environments that begin with a single command (like docker-compose up). The app might need the database or a Consul agent to be available, but both (particularly with a lot of data) might need a couple of seconds to come online. Developers will engineer retry mechanisms to solve a development solution, but that code will also exist in production and have the same behavior.

The answer to development environment race conditions is to start your essential infrastructure and wait for it to come online. Once online, you can initialize dependent services. A better general answer to "retries" is to allow the deployment environment (ECS, ElasticBeanstalk, Kubernetes, Heroku, Docker, PM2, etc.) to restart your application on failure. These tools are better at reporting the problem and may transparently solve the issue by moving your application to a working node.

Layered

Configuration is a hierarchy where sources should "layer" (override) each other.

Most apps will support at least two configuration sources: code defaults and environment variables. Your application should have a strategy for determining the precedence of configuration (meaning which sources override others' values).

The rule of thumb for precedence is that the "most dynamic sources override the least dynamic."

Code is the least dynamic; therefore, it should have the lowest precedence. Command-line arguments, environment variables, and networked sources are the most dynamic, so they should override everything else.

If you use a combination of dynamic sources, choose and enforce your precedence, and be consistent. For example, you might choose a strategy that uses this precedent:

Code < Config File < Environment < Network Sources < Command Line Arguments

For example:

import { merge } from 'lodash'
// Code config
import Defaults from './defaults'

import loadConfigFromEnv from './loadConfigFromEnv'
import loadConfigFromFile from './loadConfigFromFile'
import loadConfigFromDB from './loadConfigFromDB'
import loadConfigFromArgs from './loadConfigFromArgs'

import app from './app'

async function main() {
  const [envConfig, fileConfig, dbConfig, argsConfig] = await Promise.all([
    loadConfigFromEnv(),
    loadConfigFromFile(),
    loadConfigFromDB(),
    loadConfigFromArgs(),
  ])

  // Shallow merge (this is probably the best strategy because you don't want unintended
  // merges of deeply nested objects)
  let config = Object.assign({}, Defaults, envConfig, fileConfig, dbConfig, argsConfig)

  // Deep merge if you want to support that.
  config = merge({}, Defaults, envConfig, fileConfig, dbConfig, argsConfig)

  await app(config)
}

main()

Conclusion

The configuration process is perhaps the most critical piece of code in an application. Configuration allows applications to change their behavior, which can be as small as increasing the log level, to as big as choosing a different cloud provider. While I wrote the examples in TypeScript, the principals apply to all languages. Configuration should be as simple as possible, using the least dynamic sources, standard file formats, and sensible defaults. Configuration should be strict and minimize the chance of an application misbehaving. Configuration should fail fast in the presence of errors. Finally, layer configuration, overlapping production defaults, environment settings, configuration files, etc. (least dynamic to most dynamic) in a well-defined order of precedence.

My Thoughts

My philosophy for configuration took years to evolve and was hard-earned. When I was a younger developer, I was intrigued by the Java ecosystem, specifically the magnificent "App Server." JBoss, Glassfish, and others had sophisticated configuration systems that would allow you to remote into a server and modify settings on the fly. This was the neatest thing.

Over the years, I fell into the trap of thinking I needed the complexity of changing configuration on the fly. I had this dream of a framework that could watch a configuration source like Consul and restart/reconfigure the application without needing to redeploy the app (or restart the process). A couple of years ago, I worked on a framework called Evergreen that attempted to do just that. While Evergreen was not terrible, it didn't provide value greater than the cost of maintaining the framework. It also encouraged a lot of bad practices and was easy to abuse.

I've realized that deployments, when using the right framework, are cheap and fast. I baked a lot of complexity into a configuration framework to avoid having to redeploy an app. However, as environments like Kubernetes have matured, the cost of updating static configuration and redeploying an app has become negligible (often less than a second for config change when the node already has the Docker container). The simpler (less sexy) solution was a lot more reliable and practical.

I have since sought minimalism in my configuration practices. I tend to use only production defaults and environment variables. Sometimes I use files, but generally, that's only to inject things like SSL certs into a container. My apps spectacularly fail if the configuration is missing or invalid. Typically this happens during deployment, and I get a notification through Slack that the pipeline has failed. I don't dread these failures. A modern deployment system will keep the old version of the app alive until I've fixed the problem.

You might also be interested in these articles...

Richard Clayton

Stumbling my way through the great wastelands of enterprise software development.

September 30, 2020

Subscribe to this blog

Posted in: craftsmanship nodejs typescript