månadsarkiv: maj 2010

Data Versioning

Data Versioning, Temporal Data, Transaction Time Validity. Different names for maintaining snapshots of data that are consistent across multiple tables at any point in time. Just like Time Machine, but for data. 

This is a description of how it can be implemented with MySQL. 

The basic technique with the use of views is described and illustrated here (under ”The better plan”), we’ve just improved it a bit.

First of all a data table to store information about people. Every time a record is updated, a new record is created. 

The table will hold all historic versions of each people record, including the most current one.

CREATE TABLE people_data (

id_version BIGINT UNSIGNED,

Firstname VARCHAR(255),

Lastname VARCHAR(255),

moddate TIMESTAMP NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,

KEY id_version(id_version)

);

Then we have a table to keep track of versions and hold some meta data about each version. This table is common for all data tables that we want to enable versioning for. This is a variant of the Audit table on http://www.codeproject.com/KB/architecture/LisRecordVersioning.aspx but we use timestamp_start and timestamp_end instead of IsActive, to be able to get the historic state across all data tables of any point in time (this is picked up from http://www.paragon-cs.com/mag/issue7.pdf ).

CREATE TABLE version (

id BIGINT UNSIGNED auto_increment,

guid VARCHAR(20),

changedby_start VARCHAR(100),

changedby_end VARCHAR(100),

dt_start DATETIME,

dt_end DATETIME DEFAULT ’3000-01-01 00:00:00’,

PRIMARY KEY (id),

KEY guid(guid),

KEY dt_start (dt_start),

KEY dt_end (dt_end));

Läs mer