From SQL to DAX: Filtering Data

The WHERE condition of a SQL statement has two alternatives in DAX: FILTER and CALCULATETABLE. In this article we explore the differences between them, providing a few best practices in their use.

Consider the following SQL syntax:

SELECT * 
FROM DimProduct
WHERE Color = 'Red'

It corresponds to this DAX query using FILTER:

EVALUATE
FILTER ( Product, Product[Color] = "Red" )

You can also use a DAX query using CALCULATETABLE:

EVALUATE
CALCULATETABLE ( Product, Product[Color] = "Red" )

In case of a simple SQL query like the initial one, there are no semantic differences between the two corresponding DAX solutions. However, there is a different behavior when there are other expressions or a more complex filter condition.

Consider a double filter like this one:

SELECT * 
FROM DimProduct
WHERE Color = 'Red' AND ListPrice > 1000

You have two options using FILTER. The simplest one is applying the same logical expression to a single FILTER:

EVALUATE
FILTER (
    Product,
    AND ( Product[Color] = "Red", Product[ListPrice] > 1000 )
)

With this approach, the logical conditions (color red and list price greater than 1000) are applied to every row in Product, even if some optimizations might happen internally to reduce the number of comparisons made.

This is the corresponding solution using CALCULATETABLE:

EVALUATE
CALCULATETABLE ( 
    Product,
    Product[Color] = "Red",
    Product[List Price] > 1000
)

The filter arguments in CALCULATETABLE are always put in a logical AND condition. Even if the results produced by FILTER and CALCULATETABLE are identical, the performance might be different. The CALCULATETABLE could be faster if the number of values returned by each logical condition on single columns is relatively small (more details in Understanding DAX Query Plans white paper). In fact, every filter argument in CALCULATETABLE corresponds to an internal FILTER on a single column. The previous CALCULATETABLE syntax is internally rewritten as:

EVALUATE
CALCULATETABLE (
    Product,
    FILTER ( 
        ALL ( Product[Color] ), 
        Product[Color] = "Red" 
    ),
    FILTER (
        ALL ( Product[List Price] ),
        Product[List Price] > 1000
    )
)

Do not assume that CALCULATETABLE is good and FILTER is bad. Every CALCULATETABLE has internally some FILTER, and the performance are generally determined by the granularity of each FILTER (even if the DAX engine might do other optimizations).

If you have an OR condition, you have to write the entire logical condition as a single one even in CALCULATETABLE, including a FILTER condition inside. Consider the following SQL statement:

SELECT * 
FROM DimProduct
WHERE Color = 'Red' OR Weight > 100

The corresponding DAX syntax using FILTER is:

EVALUATE
FILTER (
    Product,
    OR ( Product[Color] = "Red", Product[Weight] > 1000 )
)

The corresponding CALCULATETABLE version requires an explicit FILTER argument in order to specify a logical condition that includes two columns of the same table (in this case, no automatic FILTER rewriting is possible).

EVALUATE
CALCULATETABLE (
    Product,
    FILTER (
        Product,
        OR ( Product[Color] = "Red", Product[Weight] > 1000 )
    )
)

At this point, you might think that FILTER is simpler to use. However, you should always consider the CALCULATETABLE option, because of the different behavior when you have nested calculations.

For example, consider the following SQL query that returns red products having in calendar year 2006 a total sales amount greater than 100,000 and an average sales amount greater than 3,000.

SELECT *
FROM DimProduct p
WHERE
    Color = 'Red'
    AND ( SELECT SUM([SalesAmount])
          FROM [FactInternetSales] s
          INNER JOIN DimDate d
            ON s.OrderDateKey = d.DateKey
          WHERE s.ProductKey = p.ProductKey
            AND d.CalendarYear = 2006
        ) > 100000
    AND ( SELECT AVG([SalesAmount])
          FROM [FactInternetSales] s
          INNER JOIN DimDate d
            ON s.OrderDateKey = d.DateKey
          WHERE s.ProductKey = p.ProductKey
            AND d.CalendarYear = 2006
        ) > 3000

The equivalent DAX expression using only FILTER is shorter than SQL. However, you still specify the Calendar Year filter in two calculations (sum and average):

EVALUATE
FILTER (
    FILTER ( Product, Product[Color] = "Red" ),
    AND (
        CALCULATE (
            SUM ( 'Internet Sales'[Sales Amount] ),
            'Date'[Calendar Year] = 2006
        ) > 100000,
        CALCULATE (
            AVERAGE ( 'Internet Sales'[Sales Amount] ),
            'Date'[Calendar Year] = 2006
        ) > 3000
    )
)

Using CALCULATETABLE, the filter arguments (color and calendar year) are applied to the entire expression specified in the first argument. For this reason, the two CALCULATE expressions in the FILTER of the following DAX query do not have to include the filter on calendar year, because it is “inherited” from the outer CALCULATETABLE filters.

EVALUATE
CALCULATETABLE (
    FILTER (
        Product,
        AND ( 
            CALCULATE ( SUM ( 'Internet Sales'[Sales Amount] ) ) > 100000,
            CALCULATE ( AVERAGE ( 'Internet Sales'[Sales Amount] ) ) > 3000
        )
    ),
    Product[Color] = "Red",
    'Date'[Calendar Year] = 2006
)

Using CALCULATETABLE you propagate a filter to all the expressions embedded in the first argument. This results in a shorter syntax that simplify the query, even if it requires a particular attention because outer CALCULATETABLE and CALCULATE statements manipulate the filter context and affects any inner expression.