Why can’t “configuration” be made simple?--A (rather long) view on the configuration-compliance-coherence trust problem in IT

Three central C’s. No matter how many new technologies we invent for configuring infrastructure and data in IT systems, they all eventually fall from favour. Sometimes it’s because infrastructure moves up a level (as with cloud) and sometimes it’s because the older methods overwhelm new users by expecting knowledge they don’t have. The rapid rate of change in infrastructure leaves people confused. More spuriously, technology has become a fashion accessory rather than an engineering decision for many in the 21st century. Engineers tend to choose products based on brand rather than technical analysis. In the 21st century, engineers want to be part of a tribe. Some technologies come from industry, some from academia. Some disappear more quickly than others. Most of them are forced to endure critical abuse from their replacements. Weary engineers argue for a return to the ways of olde: let’s just use shell scripts!

Although all these issues originated in the configuration of basic infrastructure (hardware and software), today the lines between infrastructure and software are blurred by virtualization and service architectures. As IT penetrates every digit of society, it becomes a clear and present safety issue. We are increasingly insisting on accurate and timelyservices that can “promise a stable and consistent experience”?–this is how trust is built. Thus, regulation is coming, from the highest levels to ensure this is not left to the whim of companies with other agendas. Resistance is futile.

Random picture of me being tediously explanatory!

The hard problem of configuration?

For many, configuration means software settings and preferences–something you hope to decide once and for all when you start using some software, and can then forget about. So what’s the big deal? Infrastructure engineers know it as something broader than just personal choices: configuration is the arrangement and fine tuning of all aspects of service delivery. It concerns everything from the design layout to the security of a working system. It includes how the parts are positioned within a particular realm of interest (say the local network, a single computer, or a hosting platform) and how those various services talk to each other. It’s a circuit diagram of every detail.

The IT community can't even agree on what configuration means. Pages about it on Wikipedia studiously refuse to acknowledge one another's work, leading to a mess of conflicting terminology and distorted historical record.

As time goes by, most of us realise that the techniques for managing personal settings turn directly into this more general problem, particularly when we deal with scale. Positioning and plugging together resources creates patterns, as in all programming, and we reuse these patterns to create just as much uniformity and consistency of experience as we need. Hopefully, we don’t simply impose it on those who don’t want it. Some standards are regulated, some are policy, some are convenience.

Engineers don’t always make the right choices for users. Recently, I moved into a new apartment. It has a nice kitchen but the tap over the sink is mounted too high so that water splashes everywhere when you turn it on. If you put a cup or a bowl in the sink and turn on the tap, it never goes into the cup or the bowl because the position is aligned with the drain rather than the possible contents of the sink. It’s a similar story in the bathroom. This too is a configuration problem–a rather simple one, in fact–and yet the solution still wasn’t fit for purpose. Half the problem (at least from my perspective) was neglected. Yes, the tap delivers water and the water runs to the drain, and there is space for things in between. But what about the relationship between water flow and the things to be washed or filled? The designers only solved a static problem instead of the dynamic one it really is. The flaw is easily solved by making the tap more flexible. Bendable nozzles and extensions are easily fitted but are not standard. Too often, with infrastructure, we imagine a fixture rather than a part of an adaptive process. The same is true in IT.

Configuration becomes both a risk and safety issue as well as a usability advantage. A functioning system must have both correct information and correct behaviour, but one does not imply (or guarantee) the other. If data are incorrect, our intentions will not be captured correctly and everything we do will be misconceived and irrelevant. If behaviours are incorrect then every time we alter or move data they will become corrupted–either for some or for all.

Perhaps the most common example of this concerns databases, where a major concern is the “consistency” of distributed data across replicas and sites. Consistency means that data must be true to a common standard when passed around. It sounds easy enough, but it’s a difficult and contested topic. Somewhat bizarrely, we use loose and fluffy ideas like quorum (borrowed from human management) to determine crucial correctness means for critical outcomes (i.e. a majority vote by possibly inconsistent opinion holders). One might think we could do better than asking for a committee to vote on correctness where matters of human safety could be at stake, but (for better or worse) this remains the industry norm.

As engineers, we don’t want too much fuss. We start out hoping for maximum simplicity: an economy of effort over the lifecycle of software. This includes ongoing maintenance, which not everyone plans for with too much care. Thus product managers and developers tend to strip away as much sophistication and adaptability from methods as they can, until that minimalism backfires and results in an incident report. After rounds of tweaking and patching to try to refloat a simple model it’s time for starting again. It’s not that we couldn’t solve the problems elegantly at the outset, I suspect it’s rather that many in IT find these details of management to be boring or trivial next to what they would really like to be doing. A common solution is to impose an artificial simplicity onto software and users as a condition of use, in order to get on with more interesting aspects of the code.

From prior research, we know that decision selection is a formally hard problem, yet technologists increasingly reject the science and pursue their own priorities.

Change management as code?

In IT, most approaches to configuration deal only with static patterns: data fixed once, for all, and forever at the beginning of an installation. Once set, we don’t expect to touch these values again. But this ignores drift, erosion, and other maintenance issues like garbage collection that unintentionally change the conditions of the system. Change doesn’t require a human hand, yet, we nearly always assume that all changes are intentional. In the cloud era, where programs are not expected to last very long, our modern narrative is to view change as the evolution of intentional “software releases” and we neglect the hidden resources that the software implicitly depends on.

Terms like “infrastructure as code”, DevOps, GitOps, etc have been coined to try to persuade engineers to adopt a similar level of discipline around infrastructure. After all, even these campaigns for better management of software (based on versioning) were also only recently championed by Continuous Delivery methods. These issues were taken for granted by previous generations, but because housekeeping methodology was never taught in colleges, the latter generations needed to be taught these things through community campaigns.

The problem of managing real time change in “complete computer systems’’ is still woefully neglected, even after 30 years of work. The result is that, every year some new emergency configuration language or system gets invented to “solve” the immediate problems that occur due to whatever happens to be the fashionable negligence of the day. Modern digital regulatory laws will soon render such negligence illegal. Most noticeably, the trend has been to refactor automation and abstraction into cloud platforms, and go back to occasional hands-on intervention by humans to manage the rest.

Perhaps the most important lesson we learn as we get older is that ideology is rarely useful in the face of the hard facts of what can be achieved in practice.

After 30 years of configuration research, it’s time to review these trends in the light of a deeper understanding about how other fields deal with configurations.

Other fields…

I started life as a physicist. As we learn in Newtonian physics, a configuration is a snapshot of a “system of interest” described by the coordinate positions of all bodies at a certain time. We can calculate past and future trajectories of bodies, to some extent, because the “software” of physics is fixed and known in the form of Newton’s laws. Everyone knows that we can only predict configurations accurately for two bodies. The Three Body Problem in physics is already too complex to be fully reliable. So we don’t expect even initially simple configurations to remain simple over time, as we do in IT.

We call the (x,t) realm of coordinates, in which bodies are arranged, configuration space to distinguish it from other parameters that describe changes to their interior states–though to be fair, one could equally call the realm of internal quantum numbers a configuration too (in the IT sense). The distinction is more a point of view than a reality.

We summarise behavioural “policy” by equations of motion (or their solutions) in physics. Together with the initial settings or boundary conditionsfor externally decided fixtures, these allow us to predict the state of bodies over time–not just at the moment of process initiation.

Top highlight

In physics, what we would call configurable data in IT are the parameters that are used to adapt the general equations to specific cases. We separate initial (or boundary) conditions from dynamic evolution in this way. Why is there such a separation? It’s really a reflection of a hierarchy of scales. Nothing is truly fixed, but some variables change quickly and some change only very slowly. The slow ones are what we choose to abstract as constants, invariants, or fixed parameters.

In IT, most people think that configuration only refers to the initial conditions of a trajectory. Unlike the idealised equations of physics, we can’t completely isolate trajectories from external forces, because there isn’t such a convenient separation of scales into fast and slow change. This is why flight trajectories need constant corrections (I tried to change this perception in the 1990s). When Windows began to dominate computing, a course correction meant a complete reboot of a system: killing off the pilot and passengers and trying the journey again in a new (improved) vessel. In the cloud era, we are basically back to that method of restarting with disposable cloud models.

In this sense, sticking to the name “configuration” is misleading. Although physics has learned to solve many issues, what it doesn’t describe is semantic purposes–or the intent of a physical mechanism–so it’s not a complete analogy to IT. In physics, structures don’t have a purpose: they simply are. In IT, they serve the aims of users. To add the dimension of intent to a system, we need to think beyond pure data and expand our idea of what space and time represents for IT.

Semantic Spacetime is where your configurations actually live

We can go much further in formulating a solid abstraction than Newtonian mechanics did for physics, by building on fundamental truths about patternand causality, and moving them into a more realistic representation of space than Euclidean coordinates. Locations in IT are discrete, networked process locations–we can just call them “agents”. All these ideas have been brought together in a general description of intent and state that I call Semantic Spacetime that began with Promise Theory. In doing this, it’s possible to predict the challenges that IT is going to face over the coming years. Indeed, many have already been predicted and even solved.

What we call an agent in IT depends on the scale of what we’re doing and the isolation of process that implies. The focus has shifted over the decades, due to the economics of shared computing (see figure below).

The evolution of shared resources followed the economics of varying issues.

When planning services, we tend to think from the top-down (from the outside-in). However, the core behaviours are actually governed from the bottom-up (from the inside-out) of the devices involved.

A conductor in an orchestra can’t play every instrument in parallel from the top down, e.g. by remote command, but he or she can shape and coordinate the performance as long as every player has detailed instructions and listens for signals. Coordination is a top down problem, because it requires a central standard for calibration (like a conductor). Playing is bottom up, because that’s where the levers of change are. This points us to the basic model of agents as atoms shows us how physics and IT become a single point of view. That model became Promise Theory.

Atomic specifications and desired states

The key ingredients for engineering a desired outcome are: