These days I follow a hard heuristic: Always use synthetic keys for database tables.
And the way to follow this rule is fairly simple, but it has a few twists.
For internal use, the best and most common key (in a relational database) is an auto-generated incremental sequence. But it it ok to use it externally? – across databases, across types of data storage, across APIs / services etc.
It’s tempting to refer to the sequence number in API calls, after all they are going to that particular database and are only going to be used with it, right? Well not necessarily; the database and the code powering the API are different systems, who says there won’t be other apps accessing the database for example.
The current OpSec school of thought is that sequence keys are an internal database mechanism and sequence numbers should only be used for internal consistency, never used as external references (even for the “local” API).
Sequence keys also don’t offer any way to deal with creating duplicate data entries. If you’ve been around for a while you’ve seen this, the client sends the same “create” request twice for whatever reason (UI lets user multiple-click a button, client assumes timeout when in fact it had gone through etc.) Some programmers attempt to run heuristics on the data and ignore successive create attempts that look “too similar” but it can backfire in many ways.
An UUID
is pretty much universally supported nowadays, its designed to be unique across a vast amount of systems, doesn’t give anything away about your internal mechanisms, and if you ask the client to generate the UUID for create requests you can neatly solve the duplicate issue.
Do keep in mind that this doesn’t solve the problem of bijection across many years and many systems and many databases. An entity may still acquire multiple UUID’s, even if they’re each individually perfectly fine.
There can also be circumstances where you have to offer people a natural-looking key for general consumption. You can’t put UUID’s on car plates for example.
I love UUID keys. Only drawback is they can be a bit difficult to share, but they solve so many other problems. Being able to generate them client-side makes batch inserting with dependencies easier, being globally unique means you can copy data between environments, never needing to “fix” a sequence again… So many up-sides.
Two of the three big booking systems - Amadeus and Travelport - assign booking codes sequentially, making brute-force computer guesswork easier. Of the three, Amadeus, through its web portal CheckMyTrip, is especially vulnerable, Nohl said.
The PNRs (flight booking code) have many more security issues, but at least nowadays, their sequential aspect should no longer be exposed.
So that’s one more reason to be careful when exposing DB id in APIs, even if converted to a natural looking key or at least something easier to remember.
There can also be circumstances where you have to offer people a natural-looking key for general consumption. You can’t put UUID’s on car plates for example.
Often times, the first section of the UUID is unique enough. With certain UI design choices, one can encourage users to normally work with that, while having the full UUID available in a detail view or from a copy-button.
Another strategy I quite like, is to have the UUID as the definitely-always-unique identifier, and then have a separate name, which either the users can enter or we generate something like random adjective+animal.
But yes, neither of those strategies would work for car plates.
Speaking of car plates, the Wikipedia pages for “Vehicle license plates of [insert country here]” are a rabbit hole.
I was just reading the page for Romania the other day, speaking of uniqueness, and they had this issue apparently where the combinations overall were enough for the whole country but not enough for their capital city, so they had to hack an extra digit into the plates for the capital.
60M total but divided among 40 counties makes 1.5M variations per county and the capital city (which is its own county, like Berlin) went over that.
I looked it up and Bucharest actually has only a 1.7M population so… I think it’s understandable that nobody expected an almost 1:1 person-to-car ratio. Exactly why and how they reached that crazy ratio I have no idea. 😆
And the way to follow this rule is fairly simple, but it has a few twists.
For internal use, the best and most common key (in a relational database) is an auto-generated incremental sequence. But it it ok to use it externally? – across databases, across types of data storage, across APIs / services etc.
It’s tempting to refer to the sequence number in API calls, after all they are going to that particular database and are only going to be used with it, right? Well not necessarily; the database and the code powering the API are different systems, who says there won’t be other apps accessing the database for example.
The current OpSec school of thought is that sequence keys are an internal database mechanism and sequence numbers should only be used for internal consistency, never used as external references (even for the “local” API).
Sequence keys also don’t offer any way to deal with creating duplicate data entries. If you’ve been around for a while you’ve seen this, the client sends the same “create” request twice for whatever reason (UI lets user multiple-click a button, client assumes timeout when in fact it had gone through etc.) Some programmers attempt to run heuristics on the data and ignore successive create attempts that look “too similar” but it can backfire in many ways.
An UUID is pretty much universally supported nowadays, its designed to be unique across a vast amount of systems, doesn’t give anything away about your internal mechanisms, and if you ask the client to generate the UUID for create requests you can neatly solve the duplicate issue.
Do keep in mind that this doesn’t solve the problem of bijection across many years and many systems and many databases. An entity may still acquire multiple UUID’s, even if they’re each individually perfectly fine.
There can also be circumstances where you have to offer people a natural-looking key for general consumption. You can’t put UUID’s on car plates for example.
I love UUID keys. Only drawback is they can be a bit difficult to share, but they solve so many other problems. Being able to generate them client-side makes batch inserting with dependencies easier, being globally unique means you can copy data between environments, never needing to “fix” a sequence again… So many up-sides.
On the topic of exposing sequence number in APIs, this has been a security issue in the past. Here is one I remember: https://www.reuters.com/article/us-cyber-travel-idUSKBN14G1I6/
From the article:
The PNRs (flight booking code) have many more security issues, but at least nowadays, their sequential aspect should no longer be exposed.
So that’s one more reason to be careful when exposing DB id in APIs, even if converted to a natural looking key or at least something easier to remember.
Often times, the first section of the UUID is unique enough. With certain UI design choices, one can encourage users to normally work with that, while having the full UUID available in a detail view or from a copy-button.
Another strategy I quite like, is to have the UUID as the definitely-always-unique identifier, and then have a separate name, which either the users can enter or we generate something like random adjective+animal.
But yes, neither of those strategies would work for car plates.
Speaking of car plates, the Wikipedia pages for “Vehicle license plates of [insert country here]” are a rabbit hole.
I was just reading the page for Romania the other day, speaking of uniqueness, and they had this issue apparently where the combinations overall were enough for the whole country but not enough for their capital city, so they had to hack an extra digit into the plates for the capital.
Seeing as just five alphanumeric symbols gets you over 60 million variations, exactly how the hell did they fuck up that hard
60M total but divided among 40 counties makes 1.5M variations per county and the capital city (which is its own county, like Berlin) went over that.
I looked it up and Bucharest actually has only a 1.7M population so… I think it’s understandable that nobody expected an almost 1:1 person-to-car ratio. Exactly why and how they reached that crazy ratio I have no idea. 😆
Told you it’s a crazy rabbit hole.
Ah Ok, so the mistake was allotting per county.