I'm building out a Shedquarters in my backyard! Check out the ever-evolving post + pictures here →

Bitmasking in Laravel and MySQL

November 21, 2021

Bitmasks are a great way to compress multiple boolean flags into a single variable. They can reduce memory consumption and storage and provide a less cumbersome interface to interact with. Instead of passing arrays of options around, you can store all the information you need in a single integer.

Any time you're storing or passing multiple flags, options, states, or boolean values in your application, a bitmask might be a good tool to reach for.

Before diving in, let's quickly do a refresher on the binary number system.

Binary Numbers

Binary numbers are made up of solely 1s and 0s. For example, the decimal number 9 is represented as 1001 in binary. The first bit is turned on for a value of 1, and the fourth bit is turned on for a value of 8. When combined, the value comes to 9.

1001
│ └─ 1
└──── 8
───
9
Code highlighting powered by torchlight.dev (A service I created!)

Taking a look at 8 bits, you can see what each bit represents more clearly: increasing powers of 2.

11111111
│││││││└─ 1 = 2^0
││││││└── 2 = 2^1
│││││└─── 4 = 2^2
││││└──── 8 = 2^3
│││└───── 16 = 2^4
││└────── 32 = 2^5
│└─────── 64 = 2^6
└──────── 128 = 2^7
─────
255

Add them up and you'll see that the binary number 11111111 is equal to the decimal number 255.

With that refresher out of the way, let's see how we can use this knowledge to store and retrieve information.

Assigning Values to Bits

In one of the applications I built, we have a huge table of property data that we get from a third party provider. That provider does not give us the ID that the county uses to uniquely identify a piece of property. Because of that, we have to figure out those IDs on our own.

It's not an exact science, so we have about 15 different methods we use to decipher the proper ID. We try each method, one at a time, stopping when we figure out the unique ID.

Now we could keep track of those attempts in individual columns, like this:

$property = Property::find(1);
 
// Store each individual method as its own column.
$property->attempted_address = true;
$property->attempted_parcel = true;
$property->attempted_geocode = true;
$property->attempted_description = true;
$property->attempted_alternate = false;
$property->attempted_street = false;
$property->attempted_single_scraper = false;
$property->attempted_search_scraper = false;

The downside to this is it takes up 8 columns to store the 8 pieces of boolean info. Storage is cheap so that's not the end of the world, but you always want your data to be stored in the most compact, reasonable form that it can be stored in.

In our case we have 15 methods and adding 15 columns felt like a bit much.

Another downside to the multiple column strategy is that when you want to add or remove a method you'll have to add a new column, which means writing and running a new migration.

Enter: the bitmask. We can assign each one of these attempts its own bit, and store them all together as a single integer, in a single column.

Taking our methods from earlier, let's assign each one to a specific bit.

00000000
│││││││└─ Bit 1: ADDRESS
││││││└── Bit 2: PARCEL
│││││└─── Bit 3: GEOCODE
││││└──── Bit 4: DESCRIPTION
│││└───── Bit 5: ALTERNATE
││└────── Bit 6: STREET
│└─────── Bit 7: SINGLE_SCRAPER
└──────── Bit 8: SEARCH_SCRAPER

ADDRESS occupies the first bit, PARCEL occupies the second, and so on.

Now if we can toggle certain bits on and off, we can represent the individual boolean values:

┌───── Bit 5: ALTERNATE ┐
│ ├─ These are set to 1, therefore true
│ ┌─ Bit 1: ADDRESS ┘
00010001
│││ ││└── Bit 2: PARCEL ┐
│││ │└─── Bit 3: GEOCODE │
│││ └──── Bit 4: DESCRIPTION ├─ These are set to 0, therefore false
││└────── Bit 6: STREET │
│└─────── Bit 7: SINGLE_SCRAPER │
└──────── Bit 8: SEARCH_SCRAPER ┘

We have now compressed 8 individual bits of information into the single value 17.

00010001
│ └─ ADDRESS = 1
└───── ALTERNATE = 16
────
17

Let's move over to PHP and see how this would play out.

In our class, we'll set up some constants for each bit like we illustrated above.

class FindIds extends Command
{
const MASK_ADDRESS = 1;
const MASK_PARCEL = 2;
const MASK_GEOCODE = 4;
const MASK_DESCRIPTION = 8;
const MASK_ALTERNATE = 16;
const MASK_STREET = 32;
const MASK_SINGLE_SCRAPER = 64;
const MASK_SEARCH_SCRAPER = 128;
}

If you don't want to remember the specific bit values, you can use the bit shift operator, which will shift bits a certain number of steps to the left.

class FindIds extends Command
{
const MASK_ADDRESS = 1 << 0; // 1
const MASK_PARCEL = 1 << 1; // 2
const MASK_GEOCODE = 1 << 2; // 4
const MASK_DESCRIPTION = 1 << 3; // 8
const MASK_ALTERNATE = 1 << 4; // 16
const MASK_STREET = 1 << 5; // 32
const MASK_SINGLE_SCRAPER = 1 << 6; // 64
const MASK_SEARCH_SCRAPER = 1 << 7; // 128
}
Code highlighting powered by torchlight.dev (A service I created!)

As pointed out by /u/__radmen, you can also use the binary notation. This makes it even easier to see the different bits.

class FindIds extends Command
{
const MASK_ADDRESS = 0b00000001;
const MASK_PARCEL = 0b00000010;
const MASK_GEOCODE = 0b00000100;
const MASK_DESCRIPTION = 0b00001000;
const MASK_ALTERNATE = 0b00010000;
const MASK_STREET = 0b00100000;
const MASK_SINGLE_SCRAPER = 0b01000000;
const MASK_SEARCH_SCRAPER = 0b10000000;
}

With our masks set up, let's start using them in our application.

Using a Bitmask

To use a bitmask, you'll need to familiarize yourself with a few bitwise operators.

The first is the bitwise and operator. The bitwise and operator will return bits that are set in both values.

Let's take a look at the expression 24 & 16 and see how that would evaluate:

00011000 -- 24
& 00010000 -- AND 16
──────────
00010000 -- 16
└─────── This is the only bit that is set in both

How about 24 & 2?

00011000 -- 24
& 00000010 -- AND 2
──────────
00000000 -- 0
There aren't any matching bits!

The inclusive or operator will give you bits that are set in either number.

Let's try 24 | 2 now:

00011000 -- 24
| 00000010 -- OR 2
──────────
00011010 -- 26
││ └──── Bit from 2
└┴────── Bits from 24

A nice benefit of the or operator is that it will only ever set the value once, so you can use it as many times as you want without fear of corrupting your value. If you were to evaluate 24 | 2 | 2 | 2 | 2, you'd still get 26.

00011000 -- 24
| 00000010 -- OR 2
| 00000010 -- OR 2
| 00000010 -- OR 2
| 00000010 -- OR 2
| 00000010 -- OR 2
──────────
00011010 -- 26
││ └──── Bit from all the 2s
└┴────── Bits from the one 24

Let's hop back to PHP and see how we might implement this in Laravel.

The first thing we're going to do is add a single, unsigned integer column to our table.

Schema::table('properties', function (Blueprint $table) {
// You could use a different size integer, depending
// on how much data you need to store.
// BIGINT : 64 bits
// INT : 32 bits
// SMALLINT : 16 bits
$table->integer('attempts')->unsigned();
});

In this example we're going to use an integer, which is 32 bits. That will give us room for 32 unique pieces of information. This may be too big for your app. As always, you should use the smallest data type that you can get away with!

Also make sure to make it unsigned, as we won't be using negative values. This will preserve all the room on the positive side.

Looking at the model now, let's set up a couple of helper functions on our model:

<?php
 
class Property extends Model
{
protected $casts = [
// Cast our mask to an integer
'attempts' => 'integer',
];
 
public function hasAttempted($bit)
{
// Bitwise `and` comparison.
return $this->attempts & $bit === $bit;
}
 
public function hasNotAttempted($bit)
{
return !$this->hasAttempted($bit);
}
}

Back in our command, we can now use our bitmask to see if we've already attempted a certain method.

class FindIds extends Command
{
const MASK_ADDRESS = 1 << 0; // 1
const MASK_PARCEL = 1 << 1; // 2
const MASK_GEOCODE = 1 << 2; // 4
const MASK_DESCRIPTION = 1 << 3; // 8
const MASK_ALTERNATE = 1 << 4; // 16
const MASK_STREET = 1 << 5; // 32
const MASK_SINGLE_SCRAPER = 1 << 6; // 64
const MASK_SEARCH_SCRAPER = 1 << 7; // 128
 
public function findId(Property $property)
{
// Compare our MASK_ADDRESS bit to the `attempts` column.
if ($property->hasNotAttempted(static::MASK_ADDRESS)) {
// Attempt to find via address since we
// have not tried that method yet.
}
}
}

We can another helper to easily record when a new method has been attempted, using the bitwise or operator:

<?php
 
class Property extends Model
{
public function hasAttempted($bit)
{
// Bitwise `and` comparison.
return $this->attempts & $bit === $bit;
}
 
public function hasNotAttempted($bit)
{
return !$this->hasAttempted($bit);
}
 
public function recordAttempt($bit)
{
$this->attempts |= $bit;
}
}

Using the or operator, it will combine the bits from our value (attempts) with the new bit that we pass in:

00011000 -- 24 current value of `attempts`
| 00000010 -- OR 2 new bit that we're recording
──────────
00011010 -- 26 new value of `attempts`

Now we're all set up to use the bitmasks and modify our value:

<?php
 
class FindIds extends Command
{
public function findId(Property $property)
{
if ($property->hasNotAttempted(static::MASK_ADDRESS)) {
// Attempt to find via address since we
// have not tried that method yet.
 
// Record the attempt.
$property->recordAttempt(static::MASK_ADDRESS);
}
}
}

Resetting Bits

For completeness, let's look at how we would reset a single bit. Provided we have 24 again and we want to turn off the 5th bit (16), we'll need a new bitwise operator: the not (~) operator.

The not operator flips all the bits, so if we were to evaluate ~16, we'd have the following:

00010000 -- 16
 
11101111 -- ~16 (Completely opposite)

That gets us halfway there, the next step is to and the two values together to complete the reset.

If we're trying to reset the 5th bit (16), we can use and and not to get us all the way there: 24 & ~16

00011000 -- 24
& 11101111 -- AND ~16
──────────
00001000 -- 8
└─────────── This is the only bit that is set in both

We have reset the 5th bit by inverting it and using the bitwise and operator.

We can wrap this up in a tidy helper method for convenience:

<?php
 
class Property extends Model
{
public function hasAttempted($bit)
{
// Bitwise `and` comparison.
return $this->attempts & $bit === $bit;
}
 
public function hasNotAttempted($bit)
{
return !$this->hasAttempted($bit);
}
 
public function recordAttempt($bit)
{
$this->attempts |= $bit;
}
 
public function resetAttempt($bit)
{
// Invert the bit we're resetting and then
// `and` it with our bitmask.
$this->attempts &= ~$bit;
}
}

Usage in Our Application

Here's a look at a portion of the real function we're using in our application.

In it, we assign a method to each bit and loop through the bits, attempting the methods that have not yet been tried.

class FindIds extends Command
{
const MASK_ADDRESS = 1 << 0;
const MASK_PARCEL = 1 << 1;
const MASK_GEOCODE = 1 << 2;
const MASK_DESCRIPTION = 1 << 3;
const MASK_ALTERNATE = 1 << 4;
const MASK_STREET = 1 << 5;
const MASK_SINGLE_SCRAPER = 1 << 6;
const MASK_SEARCH_SCRAPER = 1 << 7;
 
protected $signature = 'properties:pids';
 
public function handle()
{
// We are using a bitmask to record all attempts in a single column.
// Here we match a bit with a method. This is also the order they
// will be attempted, so put the least expensive ones first.
$methods = [
static::MASK_PARCEL => 'attemptParcel',
static::MASK_ADDRESS => 'attemptAddress',
static::MASK_DESCRIPTION => 'attemptDescription',
static::MASK_ALTERNATE => 'attemptAlternate',
static::MASK_STREET => 'attemptStreet',
static::MASK_SINGLE_SCRAPER => 'attemptSingleScraper',
static::MASK_GEOCODE => 'attemptGeocode',
static::MASK_SEARCH_SCRAPER => 'attemptSearchScraper',
];
 
Property::whereNull('pid')->each(function ($property) use ($methods) {
// Loop through the methods.
foreach ($methods as $bit => $name) {
// Skip ones we've already tried.
if ($property->hasAttempted($bit)) {
continue;
}
 
// Delegate to the appropriate method.
$result = $this->{$name}($property);
 
// Methods can return `false` on failure, or if they
// are currently disabled for any reason. We don't
// record an attempt, as we'll try those again.
if ($result !== false) {
$property->recordAttempt($bit);
}
 
// Stop processing once we find the PID.
if (!is_null($property->pid)) {
break;
}
}
 
$property->save();
});
 
}
}

In the following random sample of records from our database, you can see the various values in the attempts column. Some properties take many different methods to find the unique ID, and therefore have many bits set. Some just have one or two bits set!

mysql> select id, attempts from properties order by rand() limit 35;
 
+---------+----------+
| id | attempts | -- binary
+---------+----------+
| 206283 | 131 | -- 00000000 00000000 10000011
| 1252953 | 2050 | -- 00000000 00001000 00000010
| 1754896 | 6386 | -- 00000000 00011000 11110010
| 1212965 | 80379 | -- 01010101 00111001 11111011
| 1588159 | 2050 | -- 00000000 00001000 00000010
| 145920 | 2 | -- 00000000 00000000 00000010
| 244418 | 11 | -- 00000000 00000000 00001011
| 664929 | 14834 | -- 00000000 00111001 11110010
| 645256 | 2290 | -- 00000000 00001000 11110010
| 592645 | 2 | -- 00000000 00000000 00000010
| 435104 | 4096 | -- 00000000 00010000 00000000
| 738488 | 14834 | -- 00000000 00111001 11110010
| 577804 | 2 | -- 00000000 00000000 00000010
| 547907 | 114 | -- 00000000 00000000 01110010
| 1914334 | 2290 | -- 00000000 00001000 11110010
| 1036378 | 2162 | -- 00000000 00001000 01110010
| 1272753 | 14834 | -- 00000000 00111001 11110010
| 1236993 | 2162 | -- 00000000 00001000 01110010
| 756991 | 6386 | -- 00000000 00011000 11110010
| 370417 | 3 | -- 00000000 00000000 00000011
| 1559556 | 2 | -- 00000000 00000000 00000010
| 1764737 | 2050 | -- 00000000 00001000 00000010
| 656436 | 6386 | -- 00000000 00011000 11110010
| 238380 | 11 | -- 00000000 00000000 00001011
| 452781 | 81915 | -- 01010101 00111111 11111011
| 1517090 | 2290 | -- 00000000 00001000 11110010
| 1237790 | 14834 | -- 00000000 00111001 11110010
| 679580 | 14834 | -- 00000000 00111001 11110010
| 477233 | 4338 | -- 00000000 00010000 11110010
| 1501125 | 2290 | -- 00000000 00001000 11110010
| 1526051 | 2 | -- 00000000 00000000 00000010
| 475546 | 4096 | -- 00000000 00010000 00000000
| 1356272 | 6386 | -- 00000000 00011000 11110010
| 341 | 9 | -- 00000000 00000000 00001001
| 598380 | 6386 | -- 00000000 00011000 11110010
| 177387 | 2 | -- 00000000 00000000 00000010
+---------+----------+

Bitmasks in MySQL

Looking at these records in the database leads us to the next question: how do we query against bitmask columns? Fortunately MySQL has bitwise operators that make this extremely easy. (Postgres has them too.)

To look for bits that are on, you can use the bitwise and operator.

SELECT
id,
attempts,
FROM
properties
WHERE
-- Using the bitwise `and` operator to look for
-- records where the third bit (4) is ON.
attempts & 4 = 4

To look for bits that are off you can use the bitwise and operator again, but check the result using the != inequality operator instead.

SELECT
id,
attempts,
FROM
properties
WHERE
-- Using the bitwise `and` operator to look for
-- records where the third bit (4) is OFF.
attempts & 4 != 4

If you want to check multiple bits at once, you can add them all together. To look for records where bits 1, 3, and 5 are all on, we could use the and operator and compare against the sum of our bits.

00010101
│ │ └─ Bit 1 = 1
│ └─── Bit 3 = 4
└───── Bit 5 = 16
────
21

The sum of bits 1, 3, and 5 is 21.

SELECT
id,
attempts,
FROM
properties
WHERE
-- Using the bitwise `and` operator to look for
-- records where bits 1, 3, and 5 are all ON.
attempts & 21 = 21

We can run a few simple SQL statements against constant numbers to check our logic:

mysql> select 21 & 21;
+---------+
| 21 & 21 |
+---------+
| 21 | <-- Bits 1, 3, and 5 are on in both.
+---------+
 
mysql> select 85 & 21;
+---------+
| 85 & 21 |
+---------+
| 21 | <-- Bits 1, 3, and 5 are on in both.
+---------+
 
mysql> select 1 & 21;
+--------+
| 1 & 21 |
+--------+
| 1 | <-- Only the first bit is on in both.
+--------+

This logic is a great candidate to wrap up into some Laravel query scopes.

<?php
 
class Property extends Model
{
public function scopeAttempted($query, $bits)
{
$this->queryAttemptsBitmask($query, $bits, $attempted = true);
}
 
public function scopeNotAttempted($query, $bits)
{
$this->queryAttemptsBitmask($query, $bits, $attempted = false);
}
 
protected function queryAttemptsBitmask($query, $bits, $attempted)
{
// Sum up all the bits that were given to us.
$bits = array_sum(Arr::wrap($bits));
 
// Set our operator based on the whether we're looking
// for the presence or absence of bits.
$operator = $attempted ? '=' : '!=';
 
// Add the raw SQL.
$query->whereRaw("attempts & $bits $operator $bits");
}
}

We can now use our scopes to query for records that meet certain criteria:

$query = Property::query()
// These two methods have been attempted.
->attempted([
static::MASK_ADDRESS,
static::MASK_PARCEL,
])
 
// This one has not.
->notAttempted([
static::MASK_GEOCODE,
]);
 
dump($query->toSql());
// select * from `properties` where attempted & 3 = 3 and attempted & 4 != 4

Drawbacks to Bitmasks

Bitmasks are extremely helpful, but they aren't without their tradeoffs.

On the database side, it's important to know that MySQL (or any other database) can't apply any helpful indexes to bitmasked columns, because you're querying against a part of the column, not the value in the column itself.

You could set up indexes on individual, "hot" bits as a part of a functional key part, but that won't get you terribly far. Because bitmasks can have a huge number of combinations, it makes it extremely difficult to have generic indexes that cover them.

In our case the index is on pid and that is selective enough for us.

If you're querying solely against a bitmask column, make sure you check the performance on a dataset that is comparable to your production dataset.

The other drawback is that they are completely inscrutable to humans. When looking through your records, would you have any idea what the value 81915 is supposed to represent? Is it good, is it bad? Who knows.

It takes a lot of mental effort to turn 81915 into 01010101 00111111 11111011, and then when you finally do get it converted to binary, you have to then go figure out what each individual bit is! It's a painstaking process that's a huge waste of mental energy.

If you find that you need to be able to "read through" the mask, but still don't want to have 15 boolean columns, you could opt for a JSON column and just shove all your info in there. It will be quite a bit larger, but that may not matter for your application.

It's important to fully understand the drawbacks before you go all in on bitmasks. If you're ok with them (we certainly are), then bitmasks may work for you!

Bitmasks in the Wild

Probably the most common use of bitmasks is to represent file permissions. There is a great article on bitmasks + unix file system permissions. From the article:

For example, on unix-like operating systems, file system permissions are managed in three distinct scopes: the user owning the file, the group owning the file, and others.

For each of these scopes, there are three different permissions: read, write, and execute.

These flags are stored internally using the individual bits of an integer according to the following scheme:

rwx r-x r-x
111 101 101

111101101 binary is equal to 493 decimal or 755 octal.

I highly recommend reading through that article (it's a short one.)

In PHP, we use bitmasks all the time, though you might not realize it. There are several builtin functions that accept flags as a parameter; json_encode is one of them.

// Force this array to be an object.
// JSON_FORCE_OBJECT === 16
json_encode([1, 2, 3], JSON_FORCE_OBJECT);
 
// {"0":1,"1":2,"2":3}
 
// Force it to be an object and pretty print it, using the
// bitwise `or` operator to turn on the bits for both.
// JSON_PRETTY_PRINT === 128
json_encode([1, 2, 3], JSON_FORCE_OBJECT | JSON_PRETTY_PRINT);
 
// {
// "0": 1,
// "1": 2,
// "2": 3
// }

You can see that PHP has given nice constant names for ease of use, but you could use the raw integers if you wanted to! (I don't recommend this.)

// Forget the constants, we'll do it ourselves!
// JSON_FORCE_OBJECT | JSON_PRETTY_PRINT === 144
json_encode([1, 2, 3], 144);
 
// {
// "0": 1,
// "1": 2,
// "2": 3
// }

That's probably more than you ever wanted to know about bitmasks, but I enjoy bitmasks to an unreasonable degree.

Whenever you find yourself in the situation where a bitmask is the perfect solution, it sure is fun to be able to proficiently use them!

11110111
││││ ││└─ thanks
││││ │└── for
││││ └─── reading
││││
│││└───── follow
││└────── me
│└─────── at
└──────── https://twitter.com/aarondfrancis

In the next installment, I'll write about how I use bitmasks to draw ASCII lines. Stay tuned!

Thanks for reading! My name is Aaron and I'm currently working at small property tax firm in Texas, where I serve in dual roles as COO & CTO.

My main project focus is Hammerstone, where we build components for your Laravel and Rails applications. I currently do a podcast.

If you ever have any questions or want to chat, I'm always on Twitter
Copyright 2013 - 2021, Aaron Francis.