Sane Database Schema Conventions
2017-03-23 12:11
Rationale
These are sane conventions for constructing and naming a database schema. They aren't new, and there's sure to be something someone doesn't like. They are biased toward the .Net EntityFramework, which itself was influenced by the Ruby on Rails ActiveRecord conventions by David Heinemeier Hansson.
Using these conventions makes it easier to translate the tables into classes. While this isn't always desirable (or correct), it often is.
Sample
This sample schema exemplifies the conventions, and shows most relationships you'll encounter. It's semi-realistic. PropertyRecords is contrived to show a pseudo one-to-one relationship. (Note: MS SQL doesn't allow a true one-to-one structure, as different tables' rows can't be simultaneously created.)
- A Customer has an Initial contact Employee, Support Employee and a Salesperson. An Employee can service multiple Customers.
- An Employee can be a Salesperson.
- A Customer has one or more Addresses, an Address belongs to one Customer.
- An Address has one Property Record, a Property Record is for one Address.
- A Customer has zero or more Orders, an Order has one Customer.
- An Order can have many Vendors, a Vendor can fulfill many Orders.
- An Order can have many Promotions, a Promotion is for zero or more Orders.
- An Order Promotion has zero or more customer notifications.
Table Column Layout
I like my table columns ordered this way.
- Primary key
- Foreign keys
- Regular columns
- Audit columns
General Names
Pluralize Most Table Names
Pluralizing table names reduces the chances of keyword conflicts, and matches how the table will (typically) be treated in an ORM tool.
Yes |No
---------------|-----------------
Customers |Customer
Salespeople |Salesperson
Don't pluralize many-to-many join tables. By convention, keep the table name parts alphabetical.
Note
Using Code First, Entity Framework might pluralize this to OrderVendors. Personally, I'd use the FluentAPI to force the table name to the OrderVendor.
Yes |No
---------------|-----------------
OrderVendor |OrdersVendors, OrderVendors, VendorOrder
Use PascalCase
Tables and columns should be in PascalCase.
Yes |No
---------------|-----------------
Customers |customer
OrderNbr |orderNbr
No Dashes or Underscores
It's tempting to separate words in either table or column names, but don't. Keeping them PascalCased makes the transition to classes easier and clearer. Also, some databases don't play nicely with underscores, or dashes, depending on how they're used.
Yes |No
---------------|-----------------
OrderVendor |Order_Vendor
OrderNbr |order-number
CustomerId |Customer_ID
Keys
My personal preference is to end key names with "Id", rather than "ID". It reads just as well, and is consistent with PascalCasing and .Net naming conventions.
Primary Key
Some people prefer a primary key of just "Id", but if you need to run SQL queries (and you will), it's easier to have the table name in the primary key for creating joins and reading the results.
SELECT c.CustomerId, c.Name, a.AddressId, a.Address1
FROM Customers c
JOIN Addresses a on c.CustomerId = a.CustomerId
--Result:
CustomerId Name AddressId Address1
---------- ---- --------- ----------
23 Ron 402 12 Main St
47 Eve 11 3 Polo Ave
Simple Tables
TableName + "Id"
Yes |No
---------------|-----------------
CustomerId |CustomerID, Customer_id, Customer_ID
Many-to-Many Tables
A regular join table doesn't need its own primary key. Just use the other tables' primary keys to form a composite key.
OrderVendor
===========
OrderId (PK, FK)
VendorId (PK, FK)
The resulting classes should have these properties
class Order
{
IEnumerable<Vendor> Vendors
}
class Vendor
{
IEnumerable<Order> Orders
}
A join table with payload--one that has its own columns and/or will be joined to another table, should have its own primary key of Table1+Table2+Id
OrderPromotion
==============
OrderPromotionId (PK)
OrderId (FK)
PromotionId (FK)
The resulting classes
class Order
{
IEnumerable<OrderPromotion> OrderPromotions
//You can manually add this method to get Promotions
IEnumerable<Promotion> Promotions
{get {return OrderPromotions.Select(op => op.Promotion);}}
}
class Promotion
{
IEnumerable<OrderPromotion> Order Promotions
}
Foreign Key
When possible, use the referenced primary key name. If there are multiple foreign keys to the same table, end with the foreign primary key name.
Yes No
------------- -----------------
Employees
=========
EmployeeId (PK)
Customers Customers
========= =========
CustomerId (PK) Customer_Id (PK)
SupportEmployeeId (FK) SupportPerson (FK)
InitialEmployeeId (FK) InitialEmp_ID (FK)
Addresses Addresses
========= =========
AddressId (PK) AddressId (PK)
CustomerId (FK) CustID (FK)
Date Columns
Use a verb, and end the date or datetime column names with "On".
Following this convention often leads to clearer column meanings, and consistency. For example, DateToPlace or PlaceDateTime becomes PlaceOrderOn.
Yes |No
---------------|-----------------
OrderedOn |OrderDateTime
ShouldShipOn |DateShipExpect, ExpectedShip_DT, AnticipatedDate
Don't Abbreviate
Abbreviations are, by their nature, ambiguous and culture-centric. Avoid them unless they are very common, consistent, and/or well-known in the organization or industry.
A good example of ambiguity is how to abbreviate "number". Even my schema example, "InvoiceNbr", is potentially ambiguous. But InvNo is worse. Is there another column "InvYes"?
Yes |No
---------------|-----------------
Customers |Custs
CustomerId |CustId
FirstName |Fname
InvoiceNbr |InvNo
Use Consistent Names (and Abbreviations if You Must)
- If you must abbreviate, be consistent.
- If it's spelled "InvoiceNbr" in one table, it's that way in all tables.
- If everyone knows what DestinationBOL means, that might be OK. But maybe it's better to expand it to DestinationBillOfLading.
When in doubt, refer to Flatt's Law #6: Clarity is more important than brevity.
Yes |No
---------------|-----------------
InvoiceNbr |InvoiceNum, InvNumber, Inv_Nmbr
Auditing
To be honest, I've often found audit columns to be more trouble than they're worth. I think if auditing is needed, it's better to have a separate audit history table where you can record many kinds of changes, including deletions.
But, if I am using them, and am tracking who took an action, I don't link to another table (such as Users or Employees), but instead record the physical name. This significantly reduces linking, and makes it easy to indicate that a process (rather than a person) performed an action. In other words, UpdatedBy is a string column and contains a value like "cflatt" or "Nightly Batch Process".
- CreatedOn
- CreatedBy
- UpdatedOn
- UpdatedBy
Inheritance
Inheritance can be modeled in the database a couple of different ways. I prefer Table Per Type, as shown by the Employee and Salesperson tables. Note that the Salesperson table has an EmployeeId primary key. This is what implies the inheritance. In the application's class model, these would become:
public class Employee
{
int EmployeeId
string Name
}
public class Salesperson: Employee
{
double CommissionPercent
}
public class Customer
{
int CustomerId
string Name
Employee SupportEmployee
Employee InitialEmployee
Salesperson Salesperson
}