Five ways of combining several fact tables in a PowerBI report

Power BI reports often combine several fact tables, typically from several sources. Videos on multiple fact tables often claim one method as ‘best practice’ and deprecate alternative approaches. Different approaches have different strengths and weaknesses, and in this blog I discuss five ways separate approaches to combining several fact tables in a single report. The advantages and limitations of each are highlighted.

The examples shown use the following two fact tables:

Sales

SalesDateOutletManufacturerProductNameColour#SalesSalesAmount
       

Returns

ReturnDateReturnReasonManufacturerProductNameColour#ReturnsReturnAmount
       

The tables are similar, but not identical. The Manufacturer, ProductName and Colour fields are common to both tables.

Five possible ways of modeling these two fact tables are presented – Classic Kimball, Junk and degenerate dimensions, Many-to-many junk and degenerate dimensions, virtual dimensions, and consolidated fact models.

1. Classic Kimball

In classic Kimball, tables can be fact tables or dimension tables, but not both. In classic Kimball, fact tables consist only of measure fields or key fields. In the model below dimension tables have been created for all the fields in the fact table that are not measures. This makes all these fields foreign key fields. (Setting up separate dimension tables for every dimension is not just being purist. There can be practical reasons for it.) Since the fact tables in this example contain different data values, the dimensions need to be  consolidated, that is, they need to contain all the values in both fact tables. In this example the Manufacturer, Product and Colour dimensions are common to both the Sales and Reseller fact tables, while the other dimensions only relate to one fact table.

The classic Kimball model results in a fast, small model that scales up well.  The disadvantage is that the dimensions are independent of each other, and so can’t support cascading slicers. A common approach to report design is to have several slicers on a page, as shown below

In classic Kimball these slicers are all independent. Some uses may find this exasperating since selecting a value such as ‘Samsung’ displays all the ProduceNames, including those that are inappropriate for ‘Samsung’. A better user experience is obtained if selecting a slicer cross-filters the other slicers, so only choices appropriate for the selected slicer are displayed. This is particularly important if the list of possible options is long. While classic Kimball doesn’t support this, other approaches do.

2 . Junk and Degenerate Dimensions

Grouping the dimensions common to the fact tables together in one consolidated dimension table provides an alternative to having many independent dimension tables. A dimension table that contains logically unrelated fields is called a ‘junk’ dimension. Relationships in Power BI can only be created on a single field, so it is necessary to add a concatenated primary key field to the dimension table and to add it as a foreign key to the two fact tables.

The resulting model is

The advantage of this is that slicers for the common fields can cascade – users only get options that are appropriate to earlier selections. Selected ‘iPhone’ below limits the Manufacturer and Colour slicers to appropriate values.

There is only a need for one composite key, and the relationships between the dimension table and the fact tables are one-to-many. Fields specific to only one fact table could be placed in separate dimension tables that relates to only one fact table, but this is not always done. Dimension fields in a fact table are called degenerate dimensions. While this violates the requirement that fact tables only contain keys or measures, degenerate dimensions are often used.

Composite keys can require a huge amount of space. This is especially true if the fact tables have a large number of fields, and these occur in many different combinations. A composite key is not a Cartesian product of the underlying fields because it is not necessary to store all possible combinations of fields. It only necessary to store the combinations that actually exist in the data. This can still be large, and in many cases models that use composite keys don’t scale, and can’t be used in production because the model gets too large to upload to the Power BI service.

3. Many-to-many Junk and Degenerate Dimensions

In some situations, it may not be necessary to have a consolidated dimension table for the composite key at all. This is the case if the values in the degenerate dimensions in one fact table is a sub-set of the values in the degenerate dimensions in the other fact table. In the example used here, if the data set is complete and all returns have to have been sold initially, the Returns table will be a sub-set of the Sales table. In this case it is possible to implement a many-to-many relationship between the two fact tables. Using a many-to-many relationship that filters from the larger fact table to the smaller one means that the dimension table is not required and can be deleted to save space.  Implementing the many-to-many relationship, with the cross-filtering direction from the larger fact table to the smaller one, is shown below.

Don’t be frightened off by this. The warning directs the user to the following link: https://learn.microsoft.com/en-us/power-bi/transform-model/desktop-many-to-many-relationships .  While many-to-many relationships do have limitations, in cases where the data integrity is reliable and the limitations are understood, many-to-many relationships can still be used. (This clip from Marco Russo is a good overview.)

4. Virtual Dimensions

Composite keys may use up so much space that the data model becomes too big for PowerBI. One solution to this is to delete the composite keys (and therefore remove the relationships) and re-implement the relationships entirely in DAX.

If the measure [Sales Amount] in a model with composite key relationship between the dim table and the FactSales table is:

The equivalent DAX in model without a composite keys relationship would be

This saves on space when loading the model, but adds to the computational load when refreshing the PowerBI visuals. This approach would work well if the DAX computations are relatively straightforward. If the DAX is complex, refreshing visuals may be slow. It is also possible that some visuals may error and fail to refresh because PowerBI has run out of memory.

5. A Consolidated Fact Model

The idea behind the Consolidated Model is to union all the data into a single Fact table, and then code measures to behave as if they were from separate fact tables.

This initially sounds like a great idea. M-Query makes creating a single UNION table from several independent tables very simple. Having one Fact table gives PowerBI the best chance of optimizing data storage compression to create a model that is as small and fast as possible. Once all the data is in a single table, it is not difficult to write measures that to remove filters from the columns that are not appropriate for that measure. While this sounds straightforward, there is a gotcha and the obvious approach does not work. Fortunately, there is a work-round.

To UNION all the fact table, open the M-Query Editor, and select ‘Apend Queries -> Append Query as New’

Then select all the required fact tables – there can be more than two.

Unlike DAX or SQL, the tables do not need to be UNION-compatible. Since M-Query matches columns on column name, not position, it is important that the column names are correct but it doesn’t matter if they are not in the correct order or if the tables do not have the same number of columns. After completing the action to create the new appended table, clear the ‘Enable Load’ flag from the original fact tables. This will prevent the tables from being loaded twice and so save time and space. It will also give a more understandable model.

The data in the UNION (‘consolidated) Fact table in our example is:

Let’s investigate how this behaves by creating the following, incorrect, measure

When these fields are added to slicers, we get this:

The blank entries for the Outlet and ReturnReason slicers can be filtered out, and the result is:

The DAX code in the Measure [Returns Wrong] explicitly removes the filters from ‘FactUnion’[Outlet], but when the Outlet slicer is set, the result is:

The return value for iPhone is correct, but the other two rows have disappeared. This clearly is not the result we need and this approach isn’t working. Why it isn’t working isn’t that obvious – an explanation can be found here https://www.sqlbi.com/articles/understanding-dax-auto-exist/

The solution is to set up dimension tables for all the non-common dimension fields. This follows Kimball, and is a situation where adopting a purist approach is important.

The measures should be written to remove the filters from the dimension tables that are not appropriate for that measure. The foreign key fields in the fact table should not be used, and can be hidden to avoid confusion.

This gives the required result. Changing the values in the ‘Outlet’ slicer has no effect on the ‘Returns’ measure. Whether the relationship between the dimension table and fact table has a unidirectional or bidirectional filter will determine if the field will cascade other slicers or not.

Recommendation

In this blog five approaches to combining fact tables have been presented. Each as advantages and disadvantages, but despite what you might find on the internet, there is no clear ‘best practice’.

 Supports cascading filtersSpace efficientMemory efficientDAX required
Classic KimballNoGoodGoodNo
Junk and DegenerateYes for common fieldsVery BadGoodNo
Many-to-many Junk and DegenerateYes for common fieldsBadGoodNo
Virtual relationshipsYes for fields in main fact tableGoodBadYes
Consolidated Fact tableYes for all fields GoodGoodYes

Discover more from MaxPowerBI.PRO

Subscribe to get the latest posts sent to your email.