Stop your (business rules) engines!

One of the many surprising artifacts of the initial failed HealthCare.gov launch was a “business rules engine.” This was a new concept for most of the team that was assembled to help rescue the site.

A business rules engine (BRE) is a piece of software that encapsulates a set of business rules, or bits of logic that relate to an organization’s activities. A BRE is used alongside another main piece of software, such as a website. At key points in the functioning of the main software process, it transfers control to the BRE, which — this is the “engine” part — applies the rules to the current state of the main process, triggering actions and transitioning to new states. For example, a business rule might be, “apply a 10% tax to the order subtotal if the customer is from Illinois”, or “reject enrollment from this health plan if household member #1 is over 30 years of age.”

The value proposition of a BRE is two-fold. First, an organization can write down all of its business rules in one place, and policy experts can review them and make sure they make sense. Second, because the BRE is a self-contained system, separate from the main software application, people in the organization don’t have to be programmers to write and update business rules or to apply them to the functioning of the main application.

I can’t tell you if the government started asking for BREs or vendors started pitching them first. But that value proposition for the government seems reasonable at face value:

  • Non-programmers can seemingly update software without lengthy change management processes.
  • The tools usually let you audit the rules for consistency and cohesion.
  • They provide some organizational transparency.

These are all solutions to real problems that government technology leaders face. The trouble with BREs is, like many things from the world of enterprise software, they’re a complicated solution to a simpler problem. And yet they ultimately fail to solve the problem.

The problem any organization faces when it comes to software is how to represent processes and logic that are important to their core functions in their mission-critical digital services. For example, when a user of a retail site checks out with their shopping cart, the software running on the backend needs to faithfully execute the business rules to ensure the organization’s policies are applied so that all the charges to the customer and fulfillment instructions to the warehouse are performed accurately.

The solution BREs offer is to encode that logic, or business rules, in a language- and platform-neutral way (usually in some sort of document in an XML dialect), and have a separate tool with a user interface designed for updating rules.

Unfortunately, the notion that business rules and their creation and lifecycle should be kept separate from application code is fundamentally flawed. The idea that arbitrary changes to the logic and actions of a system can be made without coordinating with the rest of the application or its runtime environment is not one that experience has taught us will be successful. A system like this has complex interactions, and the ramifications of such changes are generally hard to predict and not well-understood.

We already have an effective means for encoding conditional logic and consequent action in a form that humans can produce and machines can process: programming languages. And we have the means for assembling a coherent set of business rules into functioning services and coordinating across stakeholders and business units: modern software engineering.

The fact that large governmental organizations look around for, and are guided to, BRE-based solutions and fail to see the enormous cost, not just in terms of dollars but also in design and operations complexity, highlights the continued importance of education about the patterns and practices of modern private-sector software development and how they can help deliver effective public-sector services.

A key way in which BREs fail to deliver is on the promise of isolating software and policy. Software, when executed, causes side-effects, which are typically useful and desired. The execution of a business rule may trigger arbitrary side-effects: a database is queried; data is sent on or received from a network connection; a file is written to a storage medium. Inevitably, program logic and application logic finds their way into the clean world of business rules. What winds up happening is the BRE evolves into yet another dependency the software engineering team must manage and integrate, not the idealized lightweight, decoupled service.

Further, it becomes an operational challenge, because, in order to fully test the system end-to-end, the BRE must be in the testing path for any change to the main application. Otherwise, you can’t ensure that a change to either component is correct. Coordinating changes in the interest of testing can be difficult, especially for organizations where it is risky or difficult to deploy changes in the first place. And let’s say an organization committed itself to testing with the fully-integrated system. Teams that could be logically encapsulated from each other will now have to mix with others, because developers are now dependent on policy experts, and their team’s processes, and vice versa. Even if the testing is done properly, the benefit is offset by the needless organizational complexity.

Business rules engines are little Conway’s Law devices: a manifestation of the distrust between stakeholders, customer, and contractor. We require BREs so that separate business units need not talk to each other to solve problems. They are communication and organizational dysfunction made silicon.

There is value in having your organization’s business rules be auditable, discoverable, and testable. There is even value in having components of a system specified in domain-specific languages, for documentation, code and interface generation, and soliciting external contributions. I would argue, however, the modern software engineering practice already knows how to solve all the problems BREs purport to address through the use of unit and integration tests, static analysis, APIs, and DevOps. The stuff of software engineering is the solving of business rule problems. The right approach is not to wall off teams and contractors from each other, and then use a BRE to bridge the divide. The right approach is to harmonize the software engineering effort, improve communication between customer and vendor, and reduce operational impediments to making changes to production software.

During the HealthCare.gov rescue, updates to the rules in the BRE necessitated downtime in order to be performed safely (in part, ironically, to prevent disruption to active sessions, and also in part to prevent unforeseeable corruption of the user’s data). This was a fragile process, requiring a massive (read: multi-gigabyte) XML blob to be marshaled into place. You could blame this on the particular implementation of the HealthCare.gov system, but there’s no denying the use of a BRE, adding a large complicating factor, made operations much more difficult.

How do we know plain old software engineering works better? When we (Ad Hoc) built a new version of Plan Compare, the core health plan shopping experience, for HealthCare.gov, we implemented a complex set of business rules and requirements from CMS, concerning the correct interplay between QHPs (qualified health plans, or plans sold on the marketplace, in the parlance) and households enrolling for coverage. We elected not to use a BRE. The logic for the application was encoded directly in the if/else and other control flow constructs in the programming languages used by the application and validated in the form of unit tests and integration tests. We verified correctness by comparing our new system’s results against a large corpus of plans and households, curated by policy experts, run through the legacy system being replaced. Further, as we worked across the developer/customer divide to flesh out and better understand the rules, we uncovered subtle inconsistencies, conditions, and actions that needed to be elaborated. In other words, the business rules improved as we worked through the standard software engineering process.

So, please, designers of government IT services, if you’re reading this, don’t require BREs in your RFPs. They’re expensive, they add complication to your architecture and operations, and they’re utterly foreign to the kind of high-quality private-sector software engineering teams you desperately need to attract to work on public sector problems. Most importantly of all, they’re simply not needed; we already have tools that sufficiently address the problem of encoding and verifying business rules in our software.