This is the next post in a series of discussions about data warehousing best practices based on this blog entry from February 11. Today, we look at capturing history in a data warehouse. "The data warehouse is able to present a view of the business at a
particular point in time and track Key Performance Indicators (KPI's)
over time." There's not too much to argue here.
One of the primary functions of a data warehouse is to capture history and perform point-in-time reporting and analysis. There are a couple of ways to do this depending on the architecture and requirements. The most common method in dimensional modeling is the Type 2 Slowly Changing Dimension (SCD). Type 2 SCD's track tie a transactional or fact record to it's dimension attributes, as it was at the time it occurred. Variations on the type 2 SCD will allow us to query the transaction as it was at the time it occurred or as it appears now. If we don't care how it was in the past and only care how it looks now, we'll use a Type 1 SCD.
Let's look at an example using a hotel transaction. Suppose we are looking at a transaction that occurred on 12/20/08. The transaction includes the room rate plus taxes that the guest paid, the type of payment he used, the guest's name and rewards number, and the hotel brand name and location.
Let's say on 1/1/09, the hotel company changed the brand name of that property to Fairview Inn from Village Suites. Hotel companies, such as Hilton and Marriott, own a number of brands, and there are occassions that they may convert one brand into another.
On 2/1/09, we run a report showing sales by brand for each month in 2008. Do we include that hotel's 2008 transactions under Fairview or Village? That probably depends on the use. If we are tracking revenue in order to calculate bonuses for the Village Suites VP, then he would certainly want to us to include those transactions in his bucket (as-was). However, if we are using last year sales to project Fairview sales for 2009, then we want to include the changed properties sales with Fairview (as-is).
In this case, we need to be able to query it both "as-is" and "as-was". A flexibile architecture will allow us to do either one. Much of the time, the "as-was" serves no purpose. For example, if we have status codes, and we change the description of a particular code. Most of the time, we'll want to see all records associated with the new description.
Determing which method we use to track history is one of the most important decisions we make in the data warehouse design. It affects the flexibility of the warehouse, as well as performance. We have to balance each. However, it is also one way to derive the greatest value from the solution, since this historical information is rarely available in any consummable form anywhere else.
Recent Comments