Possible SWITCH Optimization in DAX – #powerpivot #dax #tabular

In one of the Advanced DAX Workshop I taught this year, I had an interesting discussion about how to optimize a SWITCH statement (which could be frequently used checking a slicer, like in the Parameter Table pattern).

Let’s start with the problem. What happen when you have such a statement?

Sales :=
    SWITCH (
        VALUES ( Period[Period] ),
        “Current”, [Internet Total Sales],
        “MTD”, [MTD Sales],
        “QTD”, [QTD Sales],
        “YTD”, [YTD Sales],
         BLANK ()
    )

The SWITCH statement is in reality just syntax sugar for a nested IF statement. When you place such a measure in a pivot table, for every cell of the pivot table the IF options are evaluated. In order to optimize performance, the DAX engine usually does not compute cell-by-cell, but tries to compute the values in bulk-mode. However, if a measure contains an IF statement, every cell might have a different execution path, so the current implementation might evaluate all the possible IF branches in bulk-mode, so that for every cell the result from one of the branches will be already available in a pre-calculated dataset.

The price for that could be high. If you consider the previous Sales measure, the YTD Sales measure could be evaluated for all the cells where it’s not required, and also when YTD is not selected at all in a Pivot Table. The actual optimization made by the DAX engine could be different in every build, and I expect newer builds of Tabular and Power Pivot to be better than older ones. However, we still don’t live in an ideal world, so it could be better trying to help the engine finding a better execution plan.

One student (Niek de Wit) proposed this approach:

Selection :=
IF (
    HASONEVALUE ( Period[Period] ),
    VALUES ( Period[Period] )
)

Sales :=
CALCULATE (
    [Internet Total Sales],
    FILTER (
        VALUES ( ‘Internet Sales'[Order Quantity] ),
        ‘Internet Sales'[Order Quantity]
            = IF (
                [Selection] = “Current”,
                ‘Internet Sales'[Order Quantity],
                -1
            )
    )
)
    + CALCULATE (
        [MTD Sales],
        FILTER (
            VALUES ( ‘Internet Sales'[Order Quantity] ),
            ‘Internet Sales'[Order Quantity]
                = IF (
                    [Selection] = “MTD”,
                    ‘Internet Sales'[Order Quantity],
                    -1
                )
        )
    )
    + CALCULATE (
        [QTD Sales],
        FILTER (
            VALUES ( ‘Internet Sales'[Order Quantity] ),
            ‘Internet Sales'[Order Quantity]
                = IF (
                    [Selection] = “QTD”,
                    ‘Internet Sales'[Order Quantity],
                    -1
                )
        )
    )
    + CALCULATE (
        [YTD Sales],
        FILTER (
            VALUES ( ‘Internet Sales'[Order Quantity] ),
            ‘Internet Sales'[Order Quantity]
                = IF (
                    [Selection] = “YTD”,
                    ‘Internet Sales'[Order Quantity],
                    -1
                )
        )
    )

At first sight, you might think it’s impossible that this approach could be faster. However, if you examine with the profiler what happens, there is a different story. Every original IF’s execution branch is now a separate CALCULATE statement, which applies a filter that does not execute the required measure calculation if the result of the FILTER is empty. I used the ‘Internet Sales’[Order Quantity] column in this example just because in Adventure Works it has only one value (every row has 1): in the real world, you should use a column that has a very low number of distinct values, or use a column that has always the same value for every row (so it will be compressed very well!). Because the value –1 is never used in this column, the IF comparison in the filter discharge all the values iterated in the filter if the selection does not match with the desired value.

I hope to have time in the future to write a longer article about this optimization technique, but in the meantime I’ve seen this optimization has been useful in many other implementations. Please write your feedback if you find scenarios (in both Power Pivot and Tabular) where you obtain performance improvements using this technique!