ActiveRecord
A comprehensive look at and around the pattern.
The Goal
The goal of this post is to contain as much information related to the ActiveRecord pattern as possible.
My motivation is in the fact that I've found it difficult to engage developers in analysis of this pattern due to political factors.
Do you want me to update something? Post in the comments and I'll work it in. Attribution will be listed at the bottom of the article.
To-do
- Make more of a distinction between current ActiveRecord implementations and the essence of the pattern itself.
- Ensure that language is written as neutral as possible so as not to give pattern-extremists fuel one way or another. Focus on the trade-offs.
- Explain cohesion in more thorough detail with examples.
Understanding the Nature of a Pattern
The ActiveRecord pattern is neither good nor bad. It is only in the analysis of the aggregation of its trade-offs that it can be understood. The determination of whether or not it's an appropriate tool for a specific context should be left up to the developer, whose job is to make this contextual judgement.
Important Prerequisite Information
The following is an excerpt from my Domain Modeling post. It's included as it's necessary to follow the rest. Feel free to skip down to 'Serializing to and Deserializing from Data Storage' section if you are familiar with it.
What is a Domain?
Software modules operate against a context. That context may be the rendering of windows, the construction of HTTP responses, database persistence, or the definition of business requirements.
The context of the problem in which a module is designed as a solution is called the domain. All applicable contexts of a module are the module's domain.
For example, the domain of the FastRoute library is routing; deference of a web request to a specific piece of code based on the contents of the request. Included in the domain are HTTP concepts like GET, POST, query strings, etc. In contrast, the domain of an e-commerce system includes concepts like products, categories, discounts, payment methods, etc.
What is Modeling?
Bounded by limited cognitive capabilities, decision-makers resort to using mental models (reduced versions of real world dynamics) for decision-making and interventions in complex tasks. Such mental models are constantly updated with new experience and knowledge acquired, facilitating a learning process. Through this learning process, mental models can be refined to better represent real world dynamics.
Systems theory suggests that updates of mental models happen in continuous cycles involving conceptualization, experimentation, and reflection (C-E-R), which closely resembles a dynamic decision-making process (DDM). source
In the context of software development, modeling is the design and implementation of algorithms as solutions to defined problems. The algorithms are encoded cohesively according to the structure of a mental model.
Cohesion
In computer science, Cohesion is "the degree to which the elements of a module belong together." - Yourdon & Constantine 1979
There are a number of measurements that one could use to determine how the elements of a module belong together.
Cohesion by Model
When you implement a domain model, a core decision-making component is how closely related the concepts are in the model.
The primary benefit of cohesion by model is that we can avoid unintended complexity in designing the software. So long as the software model is compatible with the conceptual model, changes in the domain can be incorporated equivocally into the code.
Domain Modeling
In practice, concepts in a model such as 'Members', 'Posts', 'Payments' and 'Invoices' must be persisted. These are elements of the domain model. These concepts are encoded into a programming language in a way that matches their conceptual model as closely and reasonably as possible.
In the object-oriented paradigm, objects directly represent concepts in the model and encapsulate the necessary data and behavior accordingly. The data and behavior related to a member is likely stored in an instance of the Member class, etc.
There are many ways to structure applications. Domain modeling is only one. Domain modeling is about designing software modules around the cohesive structure of the domain.
To contrast with domain modeling, another approach might be to bypass modeling the domain entirely and to instead create procedures that map input directly to output.
When code is not cohesively grouped as a representation of a mental model, changes in the collective mind of the business will not map directly to changes in the code. The link between the structure of the software and the structure of the domain model is lost. Without this link, the divergence between the significance of changes to the model and the necessary changes to the software that result are allowed and encouraged to become disproportionate. In this case, a small model change is more likely to result in a large change to the code.
Bypassing domain modeling is particularly effective when a small amount of code is being written that will not need to be quickly, easily, or frequently changed.
Serializing to and Deserializing from Data Storage
These objects are stored in memory using object references. Object references are essentially addresses that allow the object to be found in memory. These references provide our only capability to interact with the objects.
Frequently, these objects are going to need to be persisted so that the same objects can be referenced despite the fact that the object references to these objects will eventually be lost and cleared from memory.
If you want an object to be accessible despite losing a reference to it, it becomes necessary to store it externally and to be able to reproduce it when needed. So, we serialize (format for storage) the necessary information and place it into an external persistence system. Then, we retrieve the stored data and recreate the objects.
Ideally, the persistence system behaves as if we've had a reference to them the whole time.
Is the object that was serialized and persisted the same exact object as the one that was recreated later? For most intents and purposes, yes.
Relational Databases and ORMS
The most common type of data-store for this purpose is a relational database.
A relational database is composed of a series of relations (database tables), each composed of tuples (rows) that are in themselves composed of attributes (fields). Each field is of a primitive type such as integers, strings, etc.
It's easy to imagine that a Member object that is composed of an identity, name, and email address of a member might be serialized to a relation composed of attributes for storing each field.
(Int id, String name, String email)
(1, shawn, myemail@whatever.com)
(2, simon, simonsemail@whatever.com)
When we need the object for the Member with an id of 2, we simply ask some subsystem to retrieve the necessary data from the external store and to deserialize it (rebuild the object).
What is a Tuple?
In our context, a tuple is a sequence of individual types of data. You can think of it as a single-dimensional array in which each element has its own type and domain context. But, maybe it's easier to think of a tuple as a single record in a database table.
For example, I can use a tuple as a data structure that describes the information that I have about a person.
(Int id, String name, String email)
(1, shawn, myemail@whatever.com)
Look familiar? In this example, the tuple is (1, shawn, myemail@whatever.com) and the definition for each column is:
Element 0 - an integer that represents the domain concept identity
Element 1 - a string that represents the domain concept of a person's name
Element 2 - a string that represents the domain concept of a person's email address
Each field (or element) in the tuple has its own type and represents its own domain concept.
Relational Means Tabular
When we talk about ORM (Object-Relational Mappers) we are talking about mapping data in a relational database to objects.
A relational database is one that is built upon the relational model.
When we think about the word 'relational' our first thoughts might be to different types of relationships (for example: one-to-one, one-to-many, many-to-many). Those are something completely different. In this case a "relation" is nothing more than a synonym for a table.
A MySQL database table IS a relation. A relation is a series of tuples which make up a set.
In the example of a set of Person tuples, we may name the relation "people". If we query the set of people, we can pull out individual tuples, each of which represents a person.
ActiveRecord Models the Relation
The ActiveRecord pattern is fundamentally a relational mapping pattern. ActiveRecord is tied to relational database systems. There is some confusion about elements of ActiveRecord that tend to be applied to non-relational systems.
A query tool that allows you to retrieve documents from a document-store may have a similar API to an ActiveRecord. However, it is NOT an ActiveRecord. It is not even an ORM. It'd be an ODM (Object-Document Mapper).
<?php
$person = People::find(1);
When it comes to the question of what an ActiveRecord object models (what it represents); part of the answer is that it models a database row (a tuple that represents a single instance from a set). In the example above, the $person object represents a single instance of "person" from the set "people".
Each of the database fields are made available within the record without translation. The database data is mapped one-to-one to fields on the object.
But, ActiveRecord models more than just a database record.
<?php
// create a record
Member::create([
'email' => 'cindy@email.com',
'name' => 'Cindy',
]);
// query a record
$cindy = Member::where('email', '=', 'cindy@email.com')->find();
echo $cindy->email;
// update a record
$cindy->email = $newEmail;
$cindy->save();
// delete a record
$cindy->delete();
In this case, you can see that the object also models database interactions.
The O in ORM Stands for Object
In the Object-Oriented, data and behavior are combined cohesively into a unit called an object.
Our ActiveRecord object DOES contain a representation of a person data from the people set. But, it also contains behavior in the form of methods.
The behavior can be of a technical implementation nature or of a domain nature. For example, it's conceivable that we can place a method on the Person object that filters certain characters away. There may be a bit of a backlash against the idea. We may start talking about separation of responsibilities. But, that assumes that we understand the context that the object exists in. What if the ActiveRecord object is used solely as a view model?
It's important to understand that ActiveRecord as a pattern can be used in many contexts to achieve many goals.
ActiveRecord Can Model a Domain
Perhaps we have a model that represents a subscription. A subscription is a domain concept because it's a concept that is directly important to our business, not our tech stack.
Perhaps our Subscription object has domain behavior. For example, a subscription can be canceled by a client. If the domain behavior for a subscription is encapsulated within the Subscription object in a way that reflects the mental model of our business then we're modeling the domain.
Potential for Leaking Behavior
Since an ActiveRecord model represents a database row, it's completely possible to perform the following operation:
<?php
class ActivateTrialSubscription {
public function activateTrial($member, $plan) {
$subscription = new Subscription([
'planId' => $plan->id,
'memberId' => $member->id,
'isTrial' => 1,
'isActive' => 1,
]);
$subscription->save();
return $subscription;
}
}
The preceding code shows how the implementation details of configuring a new subscription have leaked out of the Subscription object into the surrounding scope. The surrounding scope takes the form of a service object that exists to represent the use case of activating a trial.
In this example, the Subscription object is nothing more than a data structure. It has no behavior. The flags for trial and activation status are defined in the database. Whenever we need to check if a subscription is active, we'll test the database field directly through the Subscription object with direct field access. if ($subscription->isActive) {}
In this example, our technical implementation and our domain concepts are both intertwined and exposed. There's no encapsulation. This is not an object.
The following code changes the data structure to an object and removes the service.
<?php
class Subscription extends Model {
public static function activateTrial(Plan $plan, Member $member) {
$subscription = new static([
'planId' => $plan->id,
'memberId' => $member->id,
'isTrial' => 1,
'isActive' => 1,
]);
$subscription->save();
return $subscription;
}
}
$subscription = Subscription::activateTrial($plan, $member);
Now, structure of the database and the nature of database access (the Model superclass) are left to the Subscription object.
Immediate Side-Effects and Testing
When we call Subscription::activateTrial(...)
we are instantly persisting state to the database.
Sure, we can remove the call to save()
from the Subscription object. However, in doing so we relegate it to being done in a different scope.
If we test the 'ActivateTrialSubscription' class, we're essentially verifying that both the ActivateTrialSubscription class and the Subscription class can be instantiated and that there is a database table that has the correct fields. Perhaps, we're testing if the database table has the correct foreign key constraints. This test is going to be extremely slow. It's probably better to use an end-to-end test to find issues because almost all of the application stack is used anyway.
When testing the ActivateTrialSubscription
service, the model (data structure) itself doesn't need to be tested because it has no behavior.
The service itself, on the other hand, cannot be effectively tested without building a clean database. In this case, you'll probably have a members
table, a plans
table, and a subscriptions
table. A member and plan will have to be created, then we can verify that the subscription information is correct.
In the version of the code that has the named constructor public static activateTrial(...)
we have much the same problem. We require a database because the object is saved just as soon as it's created.
If we remove the call to save()
from the activateTrial() method, then the test will be very fast, we'll know that the fields are set correctly, and we'll know that we can successfully instantiate the class (no syntax errors).
It's a trade-off. Either the nature of the pattern needs to be exposed (calling save()
outside of the Subscription model) , or the object must have immediate side-effects.
Removing save()
from the named constructor allows us to unit test the activateTrial(...)
functionality. However, it's just pushing the problem out a layer. SOMETHING won't be testable.
This can be resolved if you inject an object responsible for persistence. For example:
<?php
class UndeterminedEncapsulatingScope {
public function __construct(SubscriptionRepository $subscriptions) {
$this->subscriptions = $subscriptions;
}
public function activateTrial(Member $member, Plan $plan) {
$subscription = Subscription::activateTrial($member, $plan);
$this->subscriptions->add($subscription);
}
}
In this example, a test-double can be injected for the repository. You can test the behavior of the activateTrial(...)
method and avoid having to configure a purely clean environment for each and every test.
Maintaining Consistency
A consistent model is one that is in a valid state. Our skill in maintaining consistency has a direct effect on the amount of cognitive overhead that is required to make changes to existing code without introducing bugs.
Let's look at a code sample:
<?php
$invoice = new Invoice;
$invoice->company = '...';
$invoice->amount = 123.12;
Yes, this is a somewhat silly example in which we're using primitives for money. But, imagine that an invoice needs a company and an amount in order to be valid to our business.
Assuming that we REQUIRE a company and an amount for the invoice to be valid, then we can say that in the code above... immediately after the company is set, the object is invalid.. or inconsistent.
If we forget to set the amount then the invoice is simply broken.
Alternatively, we can create an invoice as such:
<?php
$invoice = new Invoice($company, $amount);
In this case, the object cannot be constructed without a company or amount. Any validations are handled by the Money object, the Company object, or inside the constructor of the invoice.
The Invoice has the opportunity to throw an exception to prevent an inconsistent (invalid) Invoice from being instantiated. Consequently, it's impossible to instantiate an invalid Invoice.
Direct Field Access
Every ActiveRecord implementation that I've ever seen makes the direct database data available publicly. It also uses associative arrays for assigning values.
Observe the following:
<?php
$invoice = new Invoice;
$invoice->company = '...';
$invoice->amount = 123.12;
In this example, you're able to directly access fields from outside of the object.
<?php
$invoice = new Invoice([
'company' => $company,
'amount' => $amount
]);
And this is how you'll use a constructor for most (if not all) ActiveRecord models.
There are a few consequences to this approach to a constructor:
- You may be able to set unintended fields.
- It prevents type validation.
- It prevents language idiomatic constructor usage.
Language Idiom
Idiom is communication in a native capacity. Native language speakers are able to use the language in a more expressive way than non-native fluent speakers.
In the context of programming languages, idiom is the expression of a conceptual idea metaphorically with a feature of a language. In game theory, this is equivalent to ludonarrative harmony.
For example, if we want to express the concept that certain dependencies are required for an object to be consistent, then we can use PHP's constructor.
Because ActiveRecord implementations have their own constructors that takes a freeform array, we're unable to use the natural idiom of the language.
Instead, each AR implementation brings its own sets of conventions. They're not tied to the language. So, knowing PHP isn't enough. You actually have to get in and learn how to write objects according to the ActiveRecord's design.
This makes ActiveRecord more than just a mapper. It doesn't only map a database record to an object. Instead, it determines the actual structure of your object. Consequently, it determines the way that you write your software.
---
This article is unfinished. I'm publishing it because I'm tired of writing it without getting feedback. I'll keep working on it, please post feedback.
software, business, and game design