PowerPivot FILTER condition optimizations

In the comments of a recent post from Alberto Ferrari there was an interesting note about different performance related to the order of conditions in a FILTER call. I investigated about that and Jeffrey Wang has been so nice to give me some info about actual implementation that I can share on a blog post.

First of all, an important disclaimer: PowerPivot is intended to make life easier, not requiring the user to think how to write the order of elements in a formula just to get better performance. To achieve this goal, following releases of PowerPivot could have a smarter engine that will not require the author to worry too much about how he writes a DAX expression!

Suppose you have a table Sales with three columns: Product, Price and Quantity.

These two FILTER conditions should be equivalent:

FILTER( Product, Product[Price] > 10 && Product[Quantity] > 5 )

FILTER( Product, Product[Quantity] > 5 && Product[Price] > 10 )

In fact, from the point of view of the final result, they are equivalent. However, in the current version of PowerPivot there is a short-circuit evaluation (in reality something like that), which might favor performance if the most selective condition is put upfront in case of an AND condition.

The reason for that relies in the columnar behavior of the Vertipaq engine. This condition:

FILTER( T, p1 && p2 )

is internally rewritten (if possible) as

FILTER( FILTER( T, p1 ), p2 )

You might wonder whether the difference can be measured, because Vertipaq is so fast. Well, as I said, only certain conditions can be optimized in this way and are those that can be solved by the Vertipaq engine without evaluating a more complex DAX expression. Moreover, a sort of “bulk evaluation mode” is used, without going to evaluate the predicate conditions row by row.

Now, consider a more complex statement:

COUNTROWS( FILTER( Products, Products[Name] = “Name”
                             && Products[Stock] > CALCULATE( AVERAGEX( Fact, Fact[Quantity] / 5 ) ) ) )

In this case, reverting the statement to:

COUNTROWS( FILTER( Products, Products[Stock] > CALCULATE( AVERAGEX( Fact, Fact[Quantity] / 5 ) ) 
                             && Products[Name] = “Name” ) )

produces a slower query, which requires much more memory to be computed and also more CPU time.

The final consideration is that in the current version of PowerPivot it is better to put the most selective condition upfront in a filter predicate. Future versions of PowerPivot might be able to automatically reorder these conditions.