2008-06-14

Why CHECK Constraints Matter

Well-written applications validate their data. No one disputes that. The debate has always been: Where should the validation logic go?

David Heinemeier Hansson, the inventor of Ruby on Rails, strongly believes that validation and other business logic belong in an ORM layer. I don’t think Hansson would fault anyone for putting check constraints in the database. He’d just think you were wasting your time, and violating one of the first principles of Rails development: Don't Repeat Yourself. In order to take advantage of Rails’ built-in validation machinery, you’ll have to repeat your constraints in the ORM anyway. So why bother putting them in the database at all?

In the case of most Rails applications, I can’t argue with him. The software behind this blog follows exactly that approach and doesn’t suffer. However, this blog is typical of a class of applications in which the database and the application are very tightly bound. They form a single, inseparable unit and the database is not shared.

This arrangement is not typical in the corporate world. There, databases are shared resources accessed by multiple applications, often with a mixture of data access technologies. The mixture arises because databases tend to outlast the technologies that are used to access them. SQL Servers in a Microsoft shop (for instance) may get upgraded, but ODBC, RDO, ADO, and ADO.NET have come (and in some cases gone), not to mention the vast array of DALs and ORMs built atop them.

The database is where the buck stops, and in a shared environment it’s the one constant you can count on. ORMs and DALs will come and go, but if you throw the database out, only then have you truly hit the reset button. This is why at a minimum your validation logic belongs in the database in the form of CHECK constraints.

You will almost certainly have to repeat yourself in your DAL or ORM. I see no responsible way around this. But to bet the farm on a chunk of middle-tier code that may not be used everywhere at all times is not a good idea.

I’m not naive enough to believe that CHECK constraints are the holy grail of data integrity. Along with great testing, talented development, and strong domain knowledge, they form one of the pillars of robust application development. (And they’ll help you build better ORMs, too!)

For the record, I’m not an opponent of ORMs. I think they can serve a very useful purpose. Many ORMs have powerful facilities to interact with the user interface and present the opportunity to fix a problem before it gets written to the database. However, no matter how sophisticated the business rules are in the ORM, the database should be the final arbiter of its own data integrity.

Note: This article was originally hosted on my own server in blogging software I'd written myself in RoR.

7 comments:

  1. I've unified the ORM and CHECK constraints layer with SQL Server's CLR integration feature.
    Let me explain.
    All my data classes (e.g. Username,Password,EmailAddress,Filename,Text) extend from my C# classes RegexConstrainedString and ConstrainedNumber<T>, which are constructed with explicit cast syntax (e.g. string unchecked_password = "abc"; Password password = (Password)unchecked_password; or Password password = Password.TryCreateInstance<Password>(unchecked_password);).
    The data in those classes is guaranteed to be valid (enforced by the constructor throwing an error, or TryCreateInstance returning null), so their properties such as EmailAddress.LocalPart and EmailAddress.DomainPart are always valid.
    In other words, its impossible to construct an invalid instance of any of my data types, which pushes all data validation out to it's entry point into the system, no matter where that entry point is (as you will see).
    Now, any of those data classes that correspond to fields in my database, such as the Password:RegexConstrainedString class are marked with a custom attribute I've created called SqlFunctionCheck (e.g. [SqlFunctionCheck(Table=DB.Table_Administrator,Field="Password",AllowNull=False)]) indicating which tables/fields in the database will have check constraints that validate data using these classes. A (very important) utility I built uses reflection to scan these attributes for all classes within a namespace of my choosing (e.g. MyApp.Database.Values) and uses the attribute information to generate a separate assembly to be deployed to the database. This dynamically generated assembly contains one function for each data class marked with at least one SqlFunctionCheck attribute. The utility generates boolean check functions named after each class such as "CheckPassword", "CheckEmailAddress", etc. It then actually builds and deploys the check constraints for each table by aggregating all the checked fields for a table {e.g. "alter table TableName with [no]check add constraint CK_TableName check ((CheckPassword(FieldName) == 1) && (CheckUsername(FieldName) == 1))"} The field and table names are pulled from the attributes, and of course the contraint uses a naming convention of prepending "CK_" to the table name. The utility automates the whole process, and provides buttons for creating/updating/dropping the assembly, creating/droping the check functions, and creating/dropping/enabling(with or without check)/disabling the check constraints for each table, plus it keeps track of all their states with queries I hijacked from running sql server profiler while opening the list of scalar functions in SS management studio. Impressive, eh?
    When a field with a check constraint is modified (through a query or in SSMS, etc.), the very same code that checks the value in my application runs in the database... so I have the best of both worlds. A single, rich data constraint class in C#, that is enforced everywhere, even when people try to enter values in SSMS. This would not be feasible without the utility I built, and the intersection of multiple technologies including C# Attributes, .NET Reflection, ILMerge assembly merger, and SQL CLR Integration. Also, I think i posted the code for the constraint classes on StackOverflow.

    ReplyDelete
  2. Triynko, that's a great idea! As a sign that great minds think alike, it's something I'd thought about doing myself, but how is the performance?

    ReplyDelete