In my previous post, Data Warehouse Basics, I reprinted something that I written a while back about what makes a data warehouse. I want to explore each point in further detail and see if the meaning still holds up. In this post, we take on "Data is comprehensive".
The description says that data must consolidated from multiple systems. True, but does that in itself make a data warehouse comprehensive?
Not all data warehouses should include all subjects. It is only realistic that some data warehouses are directed towards finance, while others are focused on marketing.
Perhaps, it would be more accurate to say that the data is comprehensive around a particular subject. Our data warehouse should reflect all known details about a particular entity so that analysis produced from it is accurate.
For example, we might be performing customer valuation. However, if we have consolidated data from our only 3 of our 4 customer facing systems, how can be sure that our results are correct?
This happens often in the hospitality industry where the parent company operates several brands and many offer reward programs, in which the customer earns points for stays at any of the chains' brands. In order to evaluate the true value of a customer, we must be able to consolidate information from each of the company's brands. Often, these brands operate on entirely different point-of-sale or property-management-systems. The data warehouse must capture customer transactions from every system in order to place a true valuation on customers.
So from that respect, data must be comprehensive. However, there is another point of comprehensiveness that we should consider. Yes, we should capture every transaction for a customer from every system, but should we store every attribute from the transaction if we are not required to do so?
Experienced data warehouse designers know this can be a fine line to walk. On the one hand, we want to maintain the usability of the warehouse by not over complicating it with clutter that has no business relevance or use. However, when we omit certain data elements, it usually isn't long before someone comes along and requests that the data be added to the warehouse.
The rule of thumb is to include all business information available during the initial design. The cost of additional storage usually pales in comparison to the cost to modify, load, and test additions at a later time.
So to wrap up, yes a data warehouse should be comprehensive around a subject. However, that does not mean that it must include every subject relevant to the company.
Recent Comments