Coupling

Coupling is a term that arises regularly in programming discussions, often accompanied by copious hand-waving and vague assertions. So let's focus our attention on a concrete example.

Say we have a SQL database.

Is it coupled to anything? Or, asked another way, must we consider anything else when we think about our SQL database?

The answer is, of course we do! We must ask, "Does it have enough disk space?" "Does it have enough RAM?" "Is its network card fast enough?" Though we might not ask these questions every time we interact with the database, we're not allowed to forget to ask them either.

Okay, is it coupled to anything besides hardware?

The answer here is, it depends.

Do we store identifiers to another data store in our SQL database? Do we store, for example, S3 keys or Elasticsearch ids? If so, our SQL database is coupled to those other data stores. The data in our database is tied to those other data stores. We cannot fully consider one without the other.

Notice, this sort of coupling is probably good! SQL databases are not the best blob stores, and they're certainly not the best full-text search engines, so we get a lot of leverage for storing our data in multiple locations. However, it does present some amount of mental overhead. Any time we add an item from one store, we must add it to all stores. The same is true for modifications or deletions.

We could resolve this need any number of ways. We could, for example, try to remember to write to every store every time we make changes to our documents:

// one code path
sqlDb.create(doc);
s3.putObject(doc.id, doc.text);
es.index(doc.id, doc.text);

// another code path
sqlDb.update(doc);
s3.putObject(doc.id, doc.text);
es.index(doc.id, doc.text);

// yet another code path
sqlDb.delete(doc);
s3.deleteObject(doc.id);
es.delete(doc.id);

If we're lucky, there will only be one code path which creates documents, one code path which updates documents, and one code path which deletes documents, and this will work just fine.

Unfortunately, I've never met a lucky programmer.

The problem with this approach is that it encodes the same logic all over our codebase. This means that, when making code modifications, we're relying heavily on our own diligence. We have created a situation where there are many code paths that must be edited in unison.

This is what we call a relatively tight coupling.

The word "relatively" implies there are other options, and indeed there are! An alternative implementation might looks something like this:

public class DocumentManager {
  private final SqlConnection sqlDb = new SqlConnection();
  private final S3Client s3 = new S3Client();
  private final ESClient es = new ESClient();

  public void add(Document doc) {
    sqlDb.create(doc);
    s3.putObject(doc.id, doc.text);
    es.index(doc.id, doc.text);
  }

  public void update(Document doc) {
    sqlDb.update(doc);
    s3.putObject(doc.id, doc.text);
    es.index(doc.id, doc.text);
  }

  public void delete(Document doc) {
    sqlDb.delete(doc);
    s3.deleteObject(doc.id);
    es.delete(doc.id);
  }
}

With this solution, we take on the burden of ensuring that every code path is using our DocumentManager class. If we are successful, we are afforded two rather significant advantages:

  1. Individual code paths no longer need to re-encode all of the steps required to add, update, and delete documents. This means that programmers are allowed to "forget" these details.
  2. All code paths now change in unison by definition rather than convention.

This sort of decoupling is often referred to as a seam. It is an interface which allows you to make drastic implementation changes without wandering the code base for affected paths.

Compared to our first solution, this solution is relatively loosely coupled. It's important to note how careful we've been to frame the "tightness" or "looseness" of coupling in relative terms. This is because, though coupling does exist in absolute terms (for example, remember that our databases are coupled), it can only be evaluated in relative terms. That is, we can only say that some code, data, or system is tightly coupled as compared to some alternative configuration.

Returning to our example, though we found a more loosely-coupled solution, it is far from the only solution. For example, we might introduce a queue for processing document changes. Alternatively, we might add a background process which polls our SQL database for changes and propagates updates to S3 and Elasticsearch. Each of these solutions is more loosely-coupled than the last, and each comes with its own set of difficulties and potential error modes. This means that, though relative coupling is an important factor in decision making, it is not necessarily the most important factor.

That said, it should be noted that, in this example, taking the very first step toward decoupling and introducing a seam makes all subsequent solutions significantly easier to implement later on. And the cost of this solution a bit of forethought.

At this point, one might be tempted to proclaim, "All database operations should have a special method – a seam!" Indeed many developers create methods such as:

public class UserRepository {
  public User findByFirstname(String name) {
    // impl...
  }
  public User findByFirstnameOrLastname(String name) {
    // impl...
  }
  public User findByLastnameOrderByUsernameAsc(String name) {
    // impl...
  }
}

This popular pattern fails to recognize two important details:

  1. Though a seam can decouple code, it does introduce a form of coupling: semantic coupling. Implementers of an interface must be able to reasonably uphold the semantics of the method.
  2. Different databases support drastically different read patterns. This makes it difficult to uphold the semantics of an interface and change from, say, SQL to DynamoDB. DynamoDB must perform a scan for many queries that SQL supports with indexed access.

This means there is an imbalance between writing to and reading from a database. Most databases offer similar semantics for writing single documents (when writing multiple documents, some offer transactions and some do not), but they offer very different semantics for querying documents. This translates to a tighter coupling on reads. In other words, every code path which reads from a database is more tightly coupled to the database than a code path which writes to the database.

Our intent is not to suggest that there is no value in creating a seam for database reads. Rather, we want to be honest and say that the utility of such a seam is significantly diminished by semantic coupling.

Questions