Pages

Thursday, March 23, 2017

Sane Database Schema Conventions

Rationale

These are sane conventions for constructing and naming a database schema. They aren't new, and there's sure to be something someone doesn't like. They are biased toward the .Net EntityFramework, which itself was influenced by the Ruby on Rails ActiveRecord conventions by David Heinemeier Hansson.

Using these conventions makes it easier to translate the tables into classes. While this isn't always desirable (or correct), it often is.

Sample

This sample schema exemplifies the conventions, and shows most relationships you'll encounter. It's semi-realistic. PropertyRecords is contrived to show a pseudo one-to-one relationship. (Note: MS SQL doesn't allow a true one-to-one structure, as different tables' rows can't be simultaneously created.)

 

sample-schema

  • A Customer has an Initial contact Employee, Support Employee and a Salesperson. An Employee can service multiple Customers.
  • An Employee can be a Salesperson.
  • A Customer has one or more Addresses, an Address belongs to one Customer.
  • An Address has one Property Record, a Property Record is for one Address.
  • A Customer has zero or more Orders, an Order has one Customer.
  • An Order can have many Vendors, a Vendor can fulfill many Orders.
  • An Order can have many Promotions, a Promotion is for zero or more Orders.
  • An Order Promotion has zero or more customer notifications.

Table Column Layout

I like my table columns ordered this way.

  1. Primary key
  2. Foreign keys
  3. Regular columns
  4. Audit columns

General Names

Pluralize Most Table Names

Pluralizing table names reduces the chances of keyword conflicts, and matches how the table will (typically) be treated in an ORM tool.

Yes            |No               
---------------|-----------------
Customers      |Customer
Salespeople    |Salesperson

Don't pluralize many-to-many join tables. By convention, keep the table name parts alphabetical.

Note
Using Code First, Entity Framework might pluralize this to OrderVendors. Personally, I'd use the FluentAPI to force the table name to the OrderVendor.

Yes            |No               
---------------|-----------------
OrderVendor    |OrdersVendors, OrderVendors, VendorOrder              

Use PascalCase

Tables and columns should be in PascalCase.

Yes            |No               
---------------|-----------------
Customers      |customer
OrderNbr       |orderNbr

No Dashes or Underscores

It's tempting to separate words in either table or column names, but don't. Keeping them PascalCased makes the transition to classes easier and clearer. Also, some databases don't play nicely with underscores, or dashes, depending on how they're used.

Yes            |No               
---------------|-----------------
OrderVendor    |Order_Vendor
OrderNbr       |order-number
CustomerId     |Customer_ID

Keys

My personal preference is to end key names with "Id", rather than "ID". It reads just as well, and is consistent with PascalCasing and .Net naming conventions.

Primary Key

Some people prefer a primary key of just "Id", but if you need to run SQL queries (and you will), it's easier to have the table name in the primary key for creating joins and reading the results.

SELECT  c.CustomerId, c.Name, a.AddressId, a.Address1
FROM    Customers c
        JOIN Addresses a on c.CustomerId = a.CustomerId

--Result:

CustomerId  Name  AddressId Address1
----------  ----  --------- ----------
        23  Ron         402 12 Main St
        47  Eve          11 3 Polo Ave        

Simple Tables
TableName + "Id"

Yes            |No               
---------------|-----------------
CustomerId     |CustomerID, Customer_id, Customer_ID

Many-to-Many Tables
A regular join table doesn't need its own primary key. Just use the other tables' primary keys to form a composite key.

OrderVendor
===========
OrderId  (PK, FK)
VendorId (PK, FK)

The resulting classes should have these properties

class Order 
{
  IEnumerable<Vendor> Vendors
}

class Vendor
{
  IEnumerable<Order> Orders
}

A join table with payload--one that has its own columns and/or will be joined to another table, should have its own primary key of Table1+Table2+Id

OrderPromotion
==============
OrderPromotionId (PK)
OrderId          (FK)
PromotionId      (FK)

The resulting classes

class Order
{
  IEnumerable<OrderPromotion> OrderPromotions
  //You can manually add this method to get Promotions
  IEnumerable<Promotion> Promotions
  {get {return OrderPromotions.Select(op => op.Promotion);}}
}

class Promotion
{
  IEnumerable<OrderPromotion> Order Promotions
}

Foreign Key

When possible, use the referenced primary key name. If there are multiple foreign keys to the same table, end with the foreign primary key name.

Yes                       No               
-------------             -----------------
Employees
=========
EmployeeId (PK)

Customers                 Customers
=========                 =========
CustomerId (PK)           Customer_Id (PK)
SupportEmployeeId (FK)    SupportPerson (FK)
InitialEmployeeId (FK)    InitialEmp_ID (FK)

Addresses                 Addresses
=========                 =========
AddressId  (PK)           AddressId (PK)
CustomerId (FK)           CustID    (FK)

Date Columns

Use a verb, and end the date or datetime column names with "On".

Following this convention often leads to clearer column meanings, and consistency. For example, DateToPlace or PlaceDateTime becomes PlaceOrderOn.

Yes            |No               
---------------|-----------------
OrderedOn      |OrderDateTime
ShouldShipOn   |DateShipExpect, ExpectedShip_DT, AnticipatedDate

Don't Abbreviate

Abbreviations are, by their nature, ambiguous and culture-centric. Avoid them unless they are very common, consistent, and/or well-known in the organization or industry.

A good example of ambiguity is how to abbreviate "number". Even my schema example, "InvoiceNbr", is potentially ambiguous. But InvNo is worse. Is there another column "InvYes"?

Yes            |No               
---------------|-----------------
Customers      |Custs
CustomerId     |CustId
FirstName      |Fname
InvoiceNbr     |InvNo

Use Consistent Names (and Abbreviations if You Must)

  • If you must abbreviate, be consistent.
  • If it's spelled "InvoiceNbr" in one table, it's that way in all tables.
  • If everyone knows what DestinationBOL means, that might be OK. But maybe it's better to expand it to DestinationBillOfLading.

When in doubt, refer to Flatt's Law #6: Clarity is more important than brevity.

Yes            |No               
---------------|-----------------
InvoiceNbr     |InvoiceNum, InvNumber, Inv_Nmbr

Auditing

To be honest, I've often found audit columns to be more trouble than they're worth. I think if auditing is needed, it's better to have a separate audit history table where you can record many kinds of changes, including deletions.

But, if I am using them, and am tracking who took an action, I don't link to another table (such as Users or Employees), but instead record the physical name. This significantly reduces linking, and makes it easy to indicate that a process (rather than a person) performed an action. In other words, UpdatedBy is a string column and contains a value like "cflatt" or "Nightly Batch Process".

  • CreatedOn
  • CreatedBy
  • UpdatedOn
  • UpdatedBy

Inheritance

Inheritance can be modeled in the database a couple of different ways. I prefer Table Per Type, as shown by the Employee and Salesperson tables. Note that the Salesperson table has an EmployeeId primary key. This is what implies the inheritance. In the application's class model, these would become:

public class Employee
{
  int EmployeeId
  string Name 
}

public class Salesperson: Employee 
{
  double CommissionPercent 
}

public class Customer
{
  int CustomerId
  string Name
  Employee SupportEmployee
  Employee InitialEmployee
  Salesperson Salesperson
}

No comments:

Post a Comment