<a class="breadcrumbLink" href="/2001-1-0.html">Home : Architect : Database : The great primary-key debate |
Correcting errors Let’s further complicate the example by supposing that you enter the name incorrectly. Initially, this doesn’t seem like such a big problem; you simply correct the value. But remember, you’re not supposed to change the primary-key value. Doing so often violates referential integrity if there are related records. With referential integrity features enabled, you can usually update a primary-key value, and the data engine will update related values automatically. But just because you can update a primary-key value doesn’t mean you should, and you definitely shouldn’t allow the uneducated user to do so. A primary-key value shouldn’t be subject to data entry errors, because changing the value violates a rule. In our example, correcting a misspelled name probably won’t have too many repercussions. But suppose your primary key is based on a purchase-order number. When you try to correct an incorrectly entered value, you might learn that the correct purchase-order number already exists. Now you’ve got a problem that your application’s referential integrity feature isn’t equipped to resolve. Any time a primary key depends on external data, you risk typographical errors. Other problems can crop up with a key based on a purchase order or similar information. A change in business could change the value’s format by adding or deleting characters or even completely changing the nature of the value. This type of change is dictated by business and is beyond your control. Imagine your boss’ surprise when he or she learns your order and inventory application can’t integrate a new company policy on order numbers without a complete overhaul. For this reason, we recommend you select a value that you can control and maintain. You will never be able to control natural data. Surrogate keys comply with the rules Clearly, using a natural key as your primary key poses more than a few problems. Now, consider an incrementing value field as the employee table’s primary key and see how it measures up. (You can use an expression to create the incrementing value or depend on the database program if it has an auto-incrementing data type.) Most importantly, the value will always be unique. Since the system generates the value, you’ll avoid data entry (and other human) errors. In addition, the value will always exist at the time the record is entered, so the primary-key value will never be null. A surrogate key is immune to changes in business. In addition, the key depends on only one field, so it’s compact. The auto-incrementing field provides a unique, stable, and compact primary key. Other arguments debunked Many database development systems apply a unique index to a primary key, which eliminates duplicate records—the system simply won’t accept a duplicate. However, you can apply a unique index manually; you don’t need a primary key for indexing. A manually applied index will consume a little more overhead than the primary key’s index, but it is well worth the small drain on your resources. Some developers think a primary key should identify the record by association. In other words, the user should readily recognize that the primary-key value “Jane Smith” relates to the record for the employee Jane Smith. If the primary-key value for Jane Smith is a meaningless value, such as an auto-incremented value, there’s no way to associate that value to Jane Smith’s information. The truth is, no rule requires any association between the primary-key value and the record. In a well-designed database, users never need to see a primary-key value. In fact, a user need never know the primary key even exists. Used correctly (to establish relationships), primary-key values are useless to the user, since your application maintains the relationships behind the scenes. In fact, surrogate keys work well precisely because there isno association between the value and the record. No matter what happens to the business or the entity, the surrogate key remains neutral. Keeping score When comparing the two types of keys side by side, natural keys lose, as Table A demonstrates. Data is just that—data. Data shouldn’t be used as a system pointer, because these items are subject to input error and are beyond the control of the developer. Programmatically or system-generated keys are stable, they’re not subject to input errors, and they’re never null. They provide the perfect pointer to related data. Table A
The single-most important issue facing the database developer is good design. If the foundation is weak, so is the building. To avoid future problems and subsequent (and perhaps convoluted) repairs, we recommend that you use surrogate keys. This simple design choice is one of the easiest ways to provide your application with a strong, stable, yet flexible foundation. |