For reporting, the data from Raw and Business Vault – the Core Warehouse – is transferred into a dimensional model. While the focus in the Core Warehouse is on fast loading and building complete histories, the focus in the Data Mart (or Raw or Information Mart) shifts to the fast evaluation of the relevant / desired data. The more metrics there are, the more obvious parallels between these metrics become. How do you avoid calculating the same key figure several times? What is the optimal process for creating key figures in the data mart?
Typical target environments
There are many reporting tools on the market, all with major differences. There are the older OLAP-based products such as Cognos, MS Analysis Services or Microstrategy, which map a multi-dimensional cube. In parallel, tools such as qlik, tableau and PowerBI have become popular in recent years, each of which provides the contents of the cube in a large file.
They all have one thing in common: Numbers are filtered and aggregated on the basis of descriptive attributes, and further calculations such as ratios, averages, etc. are then made on the basis of this result. All these tools are very efficient in aggregating and filtering values. And this is exactly what the dimensional model supports. One or more numbers (ratios) are linked to descriptive attributes (dimensions) via a fact table. From this construct, it is then easy to create all desired forms of input for the respective reporting tool.
Decompose key figures
The dimensional model offers the possibility of defining each dimension and key figure exactly once and then distributing them over several evaluations. In the requirements from the business unit for key figures, for example, ‘sales with persons under 25‘, ‘% of initially successfully produced parts‘ or ‘excess stock in the warehouse‘ are mentioned. These key figures are made up of several parts:
|Turnover with persons under 25
|Number of products sold, price per product or amount paid, discounts, age of buyer.
|% of initial successfully produced parts
|Number of successfully produced parts, number of defective parts, number of successfully corrected parts
|Warehouse stock, current orders without delivery, forecast orders or comparative values from another period
So ultimately, a ratio is a formula of ratios and dimensions. This is a recursive definition. This is where a further differentiation of ratios is needed. Is there a beginning? A key figure with which everything begins?
Set the basic key figure as the origin exactly once
If you divide the key figures according to their origin, then at the beginning of the implementation there are those key figures that can be derived directly from a satellite in the Core Warehouse, such as the number of products in the order item. This is the basis that we select from the Core Warehouse. These basic key figures are not always delivered ready for use. The actual amount paid as a basis for sales can depend on many factors that need to be taken into account beforehand, such as complaints, cancellations, etc.
If the basic key figure must first be calculated, then this basic key figure must be determined and historicized in the Business Vault at the individual case level. This is the only way to ensure that this key figure is used in the same way everywhere. Why this effort? Why not simply create the key figure as part of the fact table and save the effort of storing and historicizing it?
The problem can best be explained with an example: The insurance company ‘sosecure‘ has an intensive reporting. Two important fact tables are on the one hand the contracts on the basic key figure of the individual contract (number of contracts). The other fact table has the individual claim as the basic key figure. It is sometimes the case that the contract was not valid at the time of a claim, for example, due to breach of contract or non-payment. Several evaluations are then possible on both fact tables. You can evaluate ‘Number of contracts with more than one claim’ in the contract facts and ‘More than one claim per contract’ in the claim data.
Here, the same result must be displayed in both cases, otherwise no one will trust the numbers in the data warehouse. The easiest way to achieve a common result is to determine this basic key figure in the Business Vault. This ensures that both have the same basis. This eliminates a major source of error and there is no redundant code for determining the key figure. This is because the code would then have to be changed in all the places where changes occur.
Some calculations are better made in reporting
However, splitting off the basic key figure is not yet sufficient. The calculation of a key figure can also contain divisions. These should always be calculated in the reporting tool. Let’s take the example ‘% initial successfully produced parts’, this key figure is evaluated according to dimensions such as material, machine, product line, plant, etc.. Depending on the selection of these dimensions, the numerator and denominator of this percentage calculation will adjust. This number range cannot be efficiently precalculated. It must be done at evaluation time. These calculated key figures form their own subcategory of key figures.
Key figures by type of origin
What is left after these two spin-offs? All other calculations on the ratios: Addition, subtraction and multiplication as well as fixed precalculated filters like ‘people under 25‘. Basically, everything that can’t be selected by the user at runtime. Or better: everything that can be aggregated and filtered based on dimensions afterwards.
Here is an overview of the key figures according to the way they were generated:
- Basic key figures – are selected from the Core Warehouse for the fact tables. These are key figures at the lowest level, such as the number of an article sold or a successfully produced part (make-to-order production) or the number of successfully produced parts (batch production).
- Derived key figures – are always calculated on basic key figures, other derived key figures and dimensions.
- Calculated ratios – are best calculated in the frontend reporting tool. Either because they are ratios or because they can only be calculated if the context (the expression of the respective dimensions) is clear.
For derived key figures there is always the possibility of calculation in the reporting tool. This is a matter of discretion. Adding an age group like ‘under 25’ is done quickly in a reporting tool. If the calculation becomes more complicated, then it is better to do it only once in the data mart for maintenance reasons. Likewise, if the derived metric is used in many different reports, a central calculation is better. Ultimately, the exact decision on where to implement is a matter of effort and maintainability.
The classification according to the type of origin enables the optimisation of the calculation
The classification of the key figures according to their origin allows an implementation with only little redundant code. Changes to the logic are thus only to be maintained in one place.
The metrics can thus be reused and ensure consistent figures in all reports, which increases trustworthiness.
The criteria for splitting according to type of origin are simple and comprehensible. This leads to clear rules in the architecture and in the implementation guidelines, which are also quickly adapted by new employees.