Database source control
When source-controlling databases, there are two main tasks that you might want to do: creating a database, and upgrading an existing database. There are (roughly) two schools of thoughts on how to handle these tasks, but we'll first introduce a couple of common notions:
- Upgrade scripts, also known by other names such as migrations. Each upgrade script describes how to transform a database from one state to another. Examples include creating tables on a blank database, adding tables to an existing database, or modifying existing tables.
- Completed upgrades table: a table in the database that contains the names of upgrade scripts that have already been run.
There are other ways of accomplishing the same job without some or all of these notions, but they're sufficient to illustrate the two main schools of how to handle database source control, specifically:
- A database is described by all upgrade scripts. The way you create a database is by running all upgrades, starting from a blank database. In this scenario, you always have to keep all upgrade scripts, since they're also your creation scripts. When you want to bring a database up-to-date, you run all upgrade scripts that haven't been run in the past, which is any script whose name isn't in the completed upgrades table.
- A database is described by a set of creation scripts, with separate upgrade scripts. Rather than running upgrade scripts to create a database, you run the creation scripts. Upgrading a database is the same as before: you run all upgrade scripts that haven't been run in the past. You can delete upgrade scripts once you're sure that there are no existing databases that are missing that upgrade.
The advantage of #1 is consistency: freshly created databases and databases that have been upgraded many times should have had the same set of operations applied to them. The biggest downside is that there's no easy way to inspect the schema of the database: you have to apply all the upgrades to work out what the current schema is. Additionally, if your schema has changed a lot, you're doing a lot of unnecessary work to create a database,
The advantage of #2 is that the creation scripts provide an easy way to tell what the current schema of the database is, as well as avoiding unnecessary work when creating a database. If you're releasing regularly, then upgrade scripts can be deleted fairly rapidly, helping to keep your source code repository tidy and understandable. If you're worried about separate creation and upgrade scripts causing inconsistency, then you can use some form of SQL diff to check.
There is something of a middle ground between these two approaches. We start with approach #1: a database is described by all upgrade scripts. Periodically, we combine all upgrade scripts into a single upgrade script that avoids any unnecessary work, such as creating tables that later get deleted.
Although it's possible to make this approach work, I'd suggest that you may as well as use the second approach. By rolling the upgrade scripts into one, you're essentially creating a creation script, but you're potentially doing so long after the original upgrade script was written. Over time, you might forget some of the details or nuance that you had in mind when the upgrade script was originally written.
Many ORMs require you to describe the database schema using the ORM, such as the names of tables and the details of each column (name, type, etc.). This information is often sufficient to allow the ORM to automatically generate a creation script. In other words, if you're already using such an ORM, approach #2 becomes more attractive since you get the creation scripts for little extra effort.