Bitmasking in Laravel and MySQL
November 21, 2021
Bitmasks are a great way to compress multiple boolean flags into a single variable. They can reduce memory consumption and storage and provide a less cumbersome interface to interact with. Instead of passing arrays of options around, you can store all the information you need in a single integer.
Any time you're storing or passing multiple flags, options, states, or boolean values in your application, a bitmask might be a good tool to reach for.
Before diving in, let's quickly do a refresher on the binary number system.
Binary Numbers
Binary numbers are made up of solely 1
s and 0
s. For example, the decimal number 9
is represented as 1001
in
binary. The first bit is turned on for a value of 1, and the fourth bit is turned on for a value of 8. When combined,
the value comes to 9.
1001│ └─ 1└──── 8 ─── 9
Taking a look at 8 bits, you can see what each bit represents more clearly: increasing powers of 2.
11111111│││││││└─ 1 = 2^0││││││└── 2 = 2^1│││││└─── 4 = 2^2││││└──── 8 = 2^3│││└───── 16 = 2^4││└────── 32 = 2^5│└─────── 64 = 2^6└──────── 128 = 2^7 ───── 255
Add them up and you'll see that the binary number 11111111
is equal to the decimal number 255
.
With that refresher out of the way, let's see how we can use this knowledge to store and retrieve information.
Assigning Values to Bits
In one of the applications I built, we have a huge table of property data that we get from a third party provider. That provider does not give us the ID that the county uses to uniquely identify a piece of property. Because of that, we have to figure out those IDs on our own.
It's not an exact science, so we have about 15 different methods we use to decipher the proper ID. We try each method, one at a time, stopping when we figure out the unique ID.
Now we could keep track of those attempts in individual columns, like this:
$property = Property::find(1); // Store each individual method as its own column.$property->attempted_address = true;$property->attempted_parcel = true;$property->attempted_geocode = true;$property->attempted_description = true;$property->attempted_alternate = false;$property->attempted_street = false;$property->attempted_single_scraper = false;$property->attempted_search_scraper = false;
The downside to this is it takes up 8 columns to store the 8 pieces of boolean info. Storage is cheap so that's not the end of the world, but you always want your data to be stored in the most compact, reasonable form that it can be stored in.
In our case we have 15 methods and adding 15 columns felt like a bit much.
Another downside to the multiple column strategy is that when you want to add or remove a method you'll have to add a new column, which means writing and running a new migration.
Enter: the bitmask. We can assign each one of these attempts its own bit, and store them all together as a single integer, in a single column.
Taking our methods from earlier, let's assign each one to a specific bit.
00000000│││││││└─ Bit 1: ADDRESS││││││└── Bit 2: PARCEL│││││└─── Bit 3: GEOCODE││││└──── Bit 4: DESCRIPTION│││└───── Bit 5: ALTERNATE││└────── Bit 6: STREET│└─────── Bit 7: SINGLE_SCRAPER└──────── Bit 8: SEARCH_SCRAPER
ADDRESS
occupies the first bit, PARCEL
occupies the second, and so on.
Now if we can toggle certain bits on and off, we can represent the individual boolean values:
┌───── Bit 5: ALTERNATE ┐ │ ├─ These are set to 1, therefore true │ ┌─ Bit 1: ADDRESS ┘00010001│││ ││└── Bit 2: PARCEL ┐│││ │└─── Bit 3: GEOCODE ││││ └──── Bit 4: DESCRIPTION ├─ These are set to 0, therefore false││└────── Bit 6: STREET ││└─────── Bit 7: SINGLE_SCRAPER │└──────── Bit 8: SEARCH_SCRAPER ┘
We have now compressed 8 individual bits of information into the single value 17
.
00010001 │ └─ ADDRESS = 1 └───── ALTERNATE = 16 ──── 17
Let's move over to PHP and see how this would play out.
In our class, we'll set up some constants for each bit like we illustrated above.
class FindIds extends Command{ const MASK_ADDRESS = 1; const MASK_PARCEL = 2; const MASK_GEOCODE = 4; const MASK_DESCRIPTION = 8; const MASK_ALTERNATE = 16; const MASK_STREET = 32; const MASK_SINGLE_SCRAPER = 64; const MASK_SEARCH_SCRAPER = 128;}
If you don't want to remember the specific bit values, you can use the bit shift operator, which will shift bits a certain number of steps to the left.
class FindIds extends Command{ const MASK_ADDRESS = 1 << 0; // 1 const MASK_PARCEL = 1 << 1; // 2 const MASK_GEOCODE = 1 << 2; // 4 const MASK_DESCRIPTION = 1 << 3; // 8 const MASK_ALTERNATE = 1 << 4; // 16 const MASK_STREET = 1 << 5; // 32 const MASK_SINGLE_SCRAPER = 1 << 6; // 64 const MASK_SEARCH_SCRAPER = 1 << 7; // 128}
As pointed out by /u/__radmen, you can also use the binary notation. This makes it even easier to see the different bits.
class FindIds extends Command{ const MASK_ADDRESS = 0b00000001; const MASK_PARCEL = 0b00000010; const MASK_GEOCODE = 0b00000100; const MASK_DESCRIPTION = 0b00001000; const MASK_ALTERNATE = 0b00010000; const MASK_STREET = 0b00100000; const MASK_SINGLE_SCRAPER = 0b01000000; const MASK_SEARCH_SCRAPER = 0b10000000;}
With our masks set up, let's start using them in our application.
Using a Bitmask
To use a bitmask, you'll need to familiarize yourself with a few bitwise operators.
The first is the bitwise and
operator. The bitwise and
operator will return bits that are set in both values.
Let's take a look at the expression 24 & 16
and see how that would evaluate:
00011000 -- 24& 00010000 -- AND 16 ────────── 00010000 -- 16 └─────── This is the only bit that is set in both
How about 24 & 2
?
00011000 -- 24& 00000010 -- AND 2 ────────── 00000000 -- 0 There aren't any matching bits!
The inclusive or
operator will give you bits that are set in either number.
Let's try 24 | 2
now:
00011000 -- 24| 00000010 -- OR 2 ────────── 00011010 -- 26 ││ └──── Bit from 2 └┴────── Bits from 24
A nice benefit of the or
operator is that it will only ever set the value once, so you can use it as many times as you
want without fear of corrupting your value. If you were to evaluate 24 | 2 | 2 | 2 | 2
, you'd still get 26
.
00011000 -- 24| 00000010 -- OR 2| 00000010 -- OR 2| 00000010 -- OR 2| 00000010 -- OR 2| 00000010 -- OR 2 ────────── 00011010 -- 26 ││ └──── Bit from all the 2s └┴────── Bits from the one 24
Let's hop back to PHP and see how we might implement this in Laravel.
The first thing we're going to do is add a single, unsigned integer column to our table.
Schema::table('properties', function (Blueprint $table) { // You could use a different size integer, depending // on how much data you need to store. // BIGINT : 64 bits // INT : 32 bits // SMALLINT : 16 bits $table->integer('attempts')->unsigned();});
In this example we're going to use an integer
, which is 32 bits. That will give us room for 32 unique pieces of
information. This may be too big for your app. As always, you should use the smallest data type that you can get away
with!
Also make sure to make it unsigned, as we won't be using negative values. This will preserve all the room on the positive side.
Looking at the model now, let's set up a couple of helper functions on our model:
<?php class Property extends Model{ protected $casts = [ // Cast our mask to an integer 'attempts' => 'integer', ]; public function hasAttempted($bit) { // Bitwise `and` comparison. return ($this->attempts & $bit) === $bit; } public function hasNotAttempted($bit) { return !$this->hasAttempted($bit); }}
Back in our command, we can now use our bitmask to see if we've already attempted a certain method.
class FindIds extends Command{ const MASK_ADDRESS = 1 << 0; // 1 const MASK_PARCEL = 1 << 1; // 2 const MASK_GEOCODE = 1 << 2; // 4 const MASK_DESCRIPTION = 1 << 3; // 8 const MASK_ALTERNATE = 1 << 4; // 16 const MASK_STREET = 1 << 5; // 32 const MASK_SINGLE_SCRAPER = 1 << 6; // 64 const MASK_SEARCH_SCRAPER = 1 << 7; // 128 public function findId(Property $property) { // Compare our MASK_ADDRESS bit to the `attempts` column. if ($property->hasNotAttempted(static::MASK_ADDRESS)) { // Attempt to find via address since we // have not tried that method yet. } }}
We can add another helper to easily record when a new method has been attempted, using the bitwise or
operator:
<?php class Property extends Model{ public function hasAttempted($bit) { // Bitwise `and` comparison. return ($this->attempts & $bit) === $bit; } public function hasNotAttempted($bit) { return !$this->hasAttempted($bit); } public function recordAttempt($bit) { $this->attempts |= $bit; }}
Using the or
operator, it will combine the bits from our value (attempts
) with the new bit that we pass in:
00011000 -- 24 current value of `attempts`| 00000010 -- OR 2 new bit that we're recording ────────── 00011010 -- 26 new value of `attempts`
Now we're all set up to use the bitmasks and modify our value:
<?php class FindIds extends Command{ public function findId(Property $property) { if ($property->hasNotAttempted(static::MASK_ADDRESS)) { // Attempt to find via address since we // have not tried that method yet. // Record the attempt. $property->recordAttempt(static::MASK_ADDRESS); } }}
Resetting Bits
For completeness, let's look at how we would reset a single bit. Provided we have 24
again and we want to turn off
the 5th bit (16), we'll need a new bitwise operator: the not (~)
operator.
The not
operator flips all the bits, so if we were to evaluate ~16
, we'd have the following:
00010000 -- 16 11101111 -- ~16 (Completely opposite)
That gets us halfway there, the next step is to and
the two values together to complete the reset.
If we're trying to reset the 5th bit (16), we can use and
and not
to get us all the way there: 24 & ~16
00011000 -- 24& 11101111 -- AND ~16 ────────── 00001000 -- 8 └─────────── This is the only bit that is set in both
We have reset the 5th bit by inverting it and using the bitwise and
operator.
We can wrap this up in a tidy helper method for convenience:
<?php class Property extends Model{ public function hasAttempted($bit) { // Bitwise `and` comparison. return ($this->attempts & $bit) === $bit; } public function hasNotAttempted($bit) { return !$this->hasAttempted($bit); } public function recordAttempt($bit) { $this->attempts |= $bit; } public function resetAttempt($bit) { // Invert the bit we're resetting and then // `and` it with our bitmask. $this->attempts &= ~$bit; }}
Usage in Our Application
Here's a look at a portion of the real function we're using in our application.
In it, we assign a method to each bit and loop through the bits, attempting the methods that have not yet been tried.
class FindIds extends Command{ const MASK_ADDRESS = 1 << 0; const MASK_PARCEL = 1 << 1; const MASK_GEOCODE = 1 << 2; const MASK_DESCRIPTION = 1 << 3; const MASK_ALTERNATE = 1 << 4; const MASK_STREET = 1 << 5; const MASK_SINGLE_SCRAPER = 1 << 6; const MASK_SEARCH_SCRAPER = 1 << 7; protected $signature = 'properties:pids'; public function handle() { // We are using a bitmask to record all attempts in a single column. // Here we match a bit with a method. This is also the order they // will be attempted, so put the least expensive ones first. $methods = [ static::MASK_PARCEL => 'attemptParcel', static::MASK_ADDRESS => 'attemptAddress', static::MASK_DESCRIPTION => 'attemptDescription', static::MASK_ALTERNATE => 'attemptAlternate', static::MASK_STREET => 'attemptStreet', static::MASK_SINGLE_SCRAPER => 'attemptSingleScraper', static::MASK_GEOCODE => 'attemptGeocode', static::MASK_SEARCH_SCRAPER => 'attemptSearchScraper', ]; Property::whereNull('pid')->each(function ($property) use ($methods) { // Loop through the methods. foreach ($methods as $bit => $name) { // Skip ones we've already tried. if ($property->hasAttempted($bit)) { continue; } // Delegate to the appropriate method. $result = $this->{$name}($property); // Methods can return `false` on failure, or if they // are currently disabled for any reason. We don't // record an attempt, as we'll try those again. if ($result !== false) { $property->recordAttempt($bit); } // Stop processing once we find the PID. if (!is_null($property->pid)) { break; } } $property->save(); }); }}
In the following random sample of records from our database, you can see the various values in the attempts
column.
Some properties take many different methods to find the unique ID, and therefore have many bits set. Some just have
one or two bits set!
mysql> select id, attempts from properties order by rand() limit 35; +---------+----------+| id | attempts | -- binary+---------+----------+| 206283 | 131 | -- 00000000 00000000 10000011| 1252953 | 2050 | -- 00000000 00001000 00000010| 1754896 | 6386 | -- 00000000 00011000 11110010| 1212965 | 80379 | -- 01010101 00111001 11111011| 1588159 | 2050 | -- 00000000 00001000 00000010| 145920 | 2 | -- 00000000 00000000 00000010| 244418 | 11 | -- 00000000 00000000 00001011| 664929 | 14834 | -- 00000000 00111001 11110010| 645256 | 2290 | -- 00000000 00001000 11110010| 592645 | 2 | -- 00000000 00000000 00000010| 435104 | 4096 | -- 00000000 00010000 00000000| 738488 | 14834 | -- 00000000 00111001 11110010| 577804 | 2 | -- 00000000 00000000 00000010| 547907 | 114 | -- 00000000 00000000 01110010| 1914334 | 2290 | -- 00000000 00001000 11110010| 1036378 | 2162 | -- 00000000 00001000 01110010| 1272753 | 14834 | -- 00000000 00111001 11110010| 1236993 | 2162 | -- 00000000 00001000 01110010| 756991 | 6386 | -- 00000000 00011000 11110010| 370417 | 3 | -- 00000000 00000000 00000011| 1559556 | 2 | -- 00000000 00000000 00000010| 1764737 | 2050 | -- 00000000 00001000 00000010| 656436 | 6386 | -- 00000000 00011000 11110010| 238380 | 11 | -- 00000000 00000000 00001011| 452781 | 81915 | -- 01010101 00111111 11111011| 1517090 | 2290 | -- 00000000 00001000 11110010| 1237790 | 14834 | -- 00000000 00111001 11110010| 679580 | 14834 | -- 00000000 00111001 11110010| 477233 | 4338 | -- 00000000 00010000 11110010| 1501125 | 2290 | -- 00000000 00001000 11110010| 1526051 | 2 | -- 00000000 00000000 00000010| 475546 | 4096 | -- 00000000 00010000 00000000| 1356272 | 6386 | -- 00000000 00011000 11110010| 341 | 9 | -- 00000000 00000000 00001001| 598380 | 6386 | -- 00000000 00011000 11110010| 177387 | 2 | -- 00000000 00000000 00000010+---------+----------+
Bitmasks in MySQL
Looking at these records in the database leads us to the next question: how do we query against bitmask columns? Fortunately MySQL has bitwise operators that make this extremely easy. (Postgres has them too.)
To look for bits that are on, you can use the bitwise and
operator.
SELECT id, attempts,FROM propertiesWHERE -- Using the bitwise `and` operator to look for -- records where the third bit (4) is ON. attempts & 4 = 4
To look for bits that are off you can use the bitwise and
operator again, but check the result using the !=
inequality operator instead.
SELECT id, attempts,FROM propertiesWHERE -- Using the bitwise `and` operator to look for -- records where the third bit (4) is OFF. attempts & 4 != 4
If you want to check multiple bits at once, you can add them all together. To look for records where bits 1, 3, and 5
are all on, we could use the and
operator and compare against the sum of our bits.
00010101 │ │ └─ Bit 1 = 1 │ └─── Bit 3 = 4 └───── Bit 5 = 16 ──── 21
The sum of bits 1, 3, and 5 is 21
.
SELECT id, attempts,FROM propertiesWHERE -- Using the bitwise `and` operator to look for -- records where bits 1, 3, and 5 are all ON. attempts & 21 = 21
We can run a few simple SQL statements against constant numbers to check our logic:
mysql> select 21 & 21;+---------+| 21 & 21 |+---------+| 21 | <-- Bits 1, 3, and 5 are on in both.+---------+ mysql> select 85 & 21;+---------+| 85 & 21 |+---------+| 21 | <-- Bits 1, 3, and 5 are on in both.+---------+ mysql> select 1 & 21;+--------+| 1 & 21 |+--------+| 1 | <-- Only the first bit is on in both.+--------+
This logic is a great candidate to wrap up into some Laravel query scopes.
<?php class Property extends Model{ public function scopeAttempted($query, $bits) { $this->queryAttemptsBitmask($query, $bits, $attempted = true); } public function scopeNotAttempted($query, $bits) { $this->queryAttemptsBitmask($query, $bits, $attempted = false); } protected function queryAttemptsBitmask($query, $bits, $attempted) { // Sum up all the bits that were given to us. $bits = array_sum(Arr::wrap($bits)); // Set our operator based on the whether we're looking // for the presence or absence of bits. $operator = $attempted ? '=' : '!='; // Add the raw SQL. $query->whereRaw("attempts & $bits $operator $bits"); }}
We can now use our scopes to query for records that meet certain criteria:
$query = Property::query() // These two methods have been attempted. ->attempted([ static::MASK_ADDRESS, static::MASK_PARCEL, ]) // This one has not. ->notAttempted([ static::MASK_GEOCODE, ]); dump($query->toSql());// select * from `properties` where attempted & 3 = 3 and attempted & 4 != 4
Drawbacks to Bitmasks
Bitmasks are extremely helpful, but they aren't without their tradeoffs.
On the database side, it's important to know that MySQL (or any other database) can't apply any helpful indexes to bitmasked columns, because you're querying against a part of the column, not the value in the column itself.
You could set up indexes on individual, "hot" bits as a part of a functional key part, but that won't get you terribly far. Because bitmasks can have a huge number of combinations, it makes it extremely difficult to have generic indexes that cover them.
In our case the index is on pid
and that is selective enough for us.
If you're querying solely against a bitmask column, make sure you check the performance on a dataset that is comparable to your production dataset.
The other drawback is that they are completely inscrutable to humans. When looking through your records, would you
have any idea what the value 81915
is supposed to represent? Is it good, is it bad? Who knows.
It takes a lot of mental effort to turn 81915
into 01010101 00111111 11111011
, and then when you finally do get it converted to binary, you have to then go figure out
what each individual bit is! It's a painstaking process that's a huge waste of mental energy.
If you find that you need to be able to "read through" the mask, but still don't want to have 15 boolean columns, you could opt for a JSON column and just shove all your info in there. It will be quite a bit larger, but that may not matter for your application.
It's important to fully understand the drawbacks before you go all in on bitmasks. If you're ok with them (we certainly are), then bitmasks may work for you!
Bitmasks in the Wild
Probably the most common use of bitmasks is to represent file permissions. There is a great article on bitmasks + unix file system permissions. From the article:
For example, on unix-like operating systems, file system permissions are managed in three distinct scopes: the user owning the file, the group owning the file, and others.
For each of these scopes, there are three different permissions: read, write, and execute.
These flags are stored internally using the individual bits of an integer according to the following scheme:
rwx r-x r-x111 101 101
111101101
binary is equal to 493 decimal or 755 octal.
I highly recommend reading through that article (it's a short one.)
In PHP, we use bitmasks all the time, though you might not realize it. There are several builtin functions that accept
flags as a parameter; json_encode
is one of them.
// Force this array to be an object.// JSON_FORCE_OBJECT === 16json_encode([1, 2, 3], JSON_FORCE_OBJECT); // {"0":1,"1":2,"2":3} // Force it to be an object and pretty print it, using the// bitwise `or` operator to turn on the bits for both.// JSON_PRETTY_PRINT === 128json_encode([1, 2, 3], JSON_FORCE_OBJECT | JSON_PRETTY_PRINT); // {// "0": 1,// "1": 2,// "2": 3// }
You can see that PHP has given nice constant names for ease of use, but you could use the raw integers if you wanted to! (I don't recommend this.)
// Forget the constants, we'll do it ourselves!// JSON_FORCE_OBJECT | JSON_PRETTY_PRINT === 144json_encode([1, 2, 3], 144); // {// "0": 1,// "1": 2,// "2": 3// }
That's probably more than you ever wanted to know about bitmasks, but I enjoy bitmasks to an unreasonable degree.
Whenever you find yourself in the situation where a bitmask is the perfect solution, it sure is fun to be able to proficiently use them!
11110111││││ ││└─ thanks││││ │└── for││││ └─── reading│││││││└───── follow││└────── me│└─────── at└──────── https://twitter.com/aarondfrancis