Converting MDX to DAX – First Steps

I will use the Adventure Works DW Tabular sample for my example. The following MDX query will return the Internet Total Sales measure from the BISM Tabular model divided by Product Category.

SELECT [Measures].[Internet Total Sales] ON COLUMNS,
       NON EMPTY [Product Category].[Product Category Name].[Product Category Name] ON ROWS
FROM [Internet Sales]

The result of such a query is the following one:

	Internet Total Sales
Accessories	$700,759.96
Bikes	$28,318,144.65
Clothing	$339,772.61

Because of the conceptual similarities between DAX and SQL, let’s start considering the correspondent SQL query:

SELECT
    pc.EnglishProductCategoryName,
    SUM(SalesAmount) AS [Internet Total Sales]
FROM
    dbo.FactInternetSales s
LEFT JOIN dbo.DimProduct p
    ON s.ProductKey = p.ProductKey
LEFT JOIN DimProductSubcategory ps
    ON p.ProductSubcategoryKey = ps.ProductSubcategoryKey
LEFT JOIN dbo.DimProductCategory pc
    ON ps.ProductCategoryKey = pc.ProductCategoryKey
GROUP BY
    pc.EnglishProductCategoryName
ORDER BY EnglishProductCategoryName ASC

The presence of a GROUP BY operation might suggest you to use the SUMMARIZE statement in DAX. And it seems to work! In fact, the following DAX query returns the same result if our initial MDX query.

EVALUATE
SUMMARIZE( 'Internet Sales',
           'Product Category'[Product Category Name],
           "Internet Total Sales", 'Internet Sales'[Internet Total Sales]
           )
ORDER BY 'Product Category'[Product Category Name]

There are a few differences in syntax based on the way tables (between single quotes) and columns (between square brackets) are identified. However, the semantic of the DAX query is very similar to the SQL one: the rows in the Internet Sales table are grouped by Product Category Name by following the relationships between these tables, which traverse the Product and Product Subcategory tables.

The main difference between SQL and DAX is that in DAX you don’t have to explicitly mention the relationships to use like you do in SQL. This information is inferred in DAX from the underlying BISM Tabular model. And this is not so much different than MDX. However, even if we obtained the same result, what we did in DAX is semantically different than MDX. In fact, if we try to remove the NON EMPTY in MDX:

SELECT
    [Measures].[Internet Total Sales] ON COLUMNS,
    [Product Category].[Product Category Name].[Product Category Name] ON ROWS
FROM [Internet Sales]

we obtain a different result, including a category (Components) that doesn’t have corresponding sales:

	Internet Total Sales
Accessories	$700,759.96
Bikes	$28,318,144.65
Clothing	$339,772.61
Components	(null)
	(null)

Moreover, there is an empty row at the end, which identifies the “empty” category used in case there were sales without a valid product.
Trying to obtain this result in DAX requires some change. One first attempt is to run the SUMMARIZE on Product Category instead that on Internet Sales:

EVALUATE
SUMMARIZE( 'Product Category',
           'Product Category'[Product Category Name],
           "Internet Total Sales", 'Internet Sales'[Internet Total Sales]
           )
ORDER BY 'Product Category'[Product Category Name]

However, the result is only partially similar to what we looked for. In fact, there is a missing row (the one with an empty name) compared to the MDX result after we removed the NON EMPTY clause:

	Internet Total Sales
Accessories	$700,759.96
Bikes	$28,318,144.65
Clothing	$339,772.61
Components

As I mentioned, the MDX query is displaying an empty row for “empty” category. Because there are products without related categories, we can try to summarize by Product instead of Product Category:

EVALUATE
SUMMARIZE( 'Product',
           'Product Category'[Product Category Name],
           "Internet Total Sales", 'Internet Sales'[Internet Total Sales]
           )
ORDER BY 'Product Category'[Product Category Name]

The result is similar to the one we’ve obtained in DAX. There is only a difference in the order because the empty row is at the beginning instead of at the end.

	Internet Total Sales

Accessories	$700,759.96
Bikes	$28,318,144.65
Clothing	$339,772.61
Components

Thus, the SUMMARIZE statement can return different results by changing the table that is used as first parameter, which represents the main table that is iterated in order to group data, considering the relationships starting from that table, either in one-to-many and many-to-one directions.
Now, consider the MDX query that filter data by year 2002 and define a calculated member that returns sales in year 2003.

WITH
    MEMBER Measures.[Sales 2003]
        AS ([Measures].[Internet Total Sales], [Date].[Calendar Year].&[2003] )
SELECT
    {Measures.[Internet Total Sales], Measures.[Sales 2003]} ON COLUMNS,
    NON EMPTY [Product Category].[Product Category Name].[Product Category Name] ON ROWS
FROM [Internet Sales]
WHERE ([Date].[Calendar Year].&[2002])

The result contains categories that have sales in at least one year, regardless it is 2002 or 2003.

	Internet Total Sales	Sales 2003
Accessories	(null)	$293,709.71
Bikes	$6,530,343.53	$9,359,102.62
Clothing	(null)	$138,247.97

We try to define a similar statement in DAX: the WHERE condition in MDX is translated into a filter in a CALCULATETABLE, whereas the MEMBER definition in MDX becomes a MEASURE definition in DAX, containing a CALCULATE statement that resembles the calculation made by the tuple in MDX.

DEFINE
      MEASURE 'Internet Sales'[Sales 2003] = CALCULATE(
             SUM( 'Internet Sales'[Sales Amount] ),
             'Date'[Calendar Year] = 2003 )
EVALUATE
SUMMARIZE( CALCULATETABLE( 'Internet Sales',
                           'Date'[Calendar Year] = 2002 ),
           'Product Category'[Product Category Name],
           "Internet Total Sales", 'Internet Sales'[Internet Total Sales],
           "Sales 2003", 'Internet Sales'[Sales 2003]
           )
ORDER BY 'Product Category'[Product Category Name]

However, the result of this query seems wrong:

	Internet Total Sales	Sales 2003
Bikes	$6,530,343.53

There are no categories that don’t have sales in 2002, and the value for Sales 2003 is blank. Why this happens?

The reason is that the effect of the CALCULATETABLE is to filter the table that will be considered in the following SUMMARIZE statement. Once this filter has been applied, the CALCULATE in the Sales 2003 measure tries to replace the filter over the Calendar Year column, but the current filter context now contains a sort of table filter on the Internet Sales table that cannot be replaced in this way. At this point we might try to rebuild the desired filter context in the Sales 2003 measure definition, but in order to do that we will probably do too many assumptions. And we cannot simply use the Product table instead of the Internet Sales table in the CALCULATETABLE statement, because the filter on Calendar Year would be ineffective (there is no relationship between Product and Date tables).

We reach a conclusion at this point: the SUMMARIZE statement is not the right way to transform a simple MDX SELECT statement into a DAX query. Yes, we can obtain similar results, but in order to mimic the semantic of an MDX SELECT statement we have to change our approach.

DEFINE
    MEASURE 'Internet Sales'[Sales 2003] = CALCULATE(
              SUM( 'Internet Sales'[Sales Amount] ),
              'Date'[Calendar Year] = 2003 )
EVALUATE
CALCULATETABLE(
      ADDCOLUMNS(
            ALL( 'Product Category'[Product Category Name] ),
            "Internet Total Sales", 'Internet Sales'[Internet Total Sales],
            "Sales 2003", 'Internet Sales'[Sales 2003] ),
      'Date'[Calendar Year] = 2002
)
ORDER BY 'Product Category'[Product Category Name]

The ADDCOLUMNS function iterates over all the Product Category Name values and evaluates the following DAX expressions, one for each column added, in the filter context defined by the surrounding CALCULATETABLE, which applies the filter over the Calendar Year column. In this case, because the Sales 2003 measure defines another filter on the same Calendar Year column, the innermost filter replaces the outer one and the Sales 2003 measure calculation is calculated in the right way. The only difference with the MDX statement is that the result includes also category with no sales in both 2002 and 2003.

	Internet Total Sales	Sales 2003

Accessories		$293,709.71
Bikes	$6,530,343.53	$9,359,102.62
Clothing		$138,247.97
Components

The ADDCOLUMNS is the right thing to do whenever you want to mimic the behavior of the MDX SELECT statement. In order to obtain the same result of the NON EMPTY clause, we just have to surround the result with a FILTER condition that removes the rows that have both measures equal to 0. This is the final DAX statement:

DEFINE
    MEASURE 'Internet Sales'[Sales 2003] = CALCULATE(
        SUM( 'Internet Sales'[Sales Amount] ),
        'Date'[Calendar Year] = 2003 )
EVALUATE
FILTER(
    CALCULATETABLE(
        ADDCOLUMNS(
            ALL( 'Product Category'[Product Category Name] ),
            "Internet Total Sales", 'Internet Sales'[Internet Total Sales],
            "Sales 2003", 'Internet Sales'[Sales 2003] ),
        'Date'[Calendar Year] = 2002
    ),
    [Internet Total Sales] <> 0 || [Sales 2003] <> 0
)
ORDER BY 'Product Category'[Product Category Name]

Finally, whenever you have a CROSSJOIN in MDX:

WITH
    MEMBER Measures.[Sales 2003]
        AS ([Measures].[Internet Total Sales], [Date].[Calendar Year].&[2003] )
SELECT
    {Measures.[Internet Total Sales], Measures.[Sales 2003]} ON COLUMNS,
    NON EMPTY [Product Category].[Product Category Name].[Product Category Name]
      * [Product Sub-Category].[Product Subcategory Name].[Product Subcategory Name] ON ROWS
FROM [Internet Sales]
WHERE ([Date].[Calendar Year].&[2002])

You can use the CROSSJOIN function in DAX in order to obtain the same result:

DEFINE
    MEASURE 'Internet Sales'[Sales 2003] = CALCULATE(
        SUM( 'Internet Sales'[Sales Amount] ),
        'Date'[Calendar Year] = 2003 )
EVALUATE
FILTER(
    CALCULATETABLE(
        ADDCOLUMNS(
                  CROSSJOIN( ALL( 'Product Category'[Product Category Name] ),
                             ALL( 'Product Sub-Category'[Product Subcategory Name] ) ),
            "Internet Total Sales", 'Internet Sales'[Internet Total Sales],
            "Sales 2003", 'Internet Sales'[Sales 2003] ),
        'Date'[Calendar Year] = 2002
    ),
    [Internet Total Sales] <> 0 || [Sales 2003] <> 0
)
ORDER BY 'Product Category'[Product Category Name],
         'Product Sub-Category'[Product Subcategory Name]

However, in this case you can still use SUMMARIZE to obtain the same result: because you want the list of existing categories and subcategories of products, what you really want is the list of subcategories with the related categories. If you thinks in terms of MDX, this is somewhat similar to what you obtain with the autoexists behavior in MDX.

DEFINE
    MEASURE 'Internet Sales'[Sales 2003] = CALCULATE(
        SUM( 'Internet Sales'[Sales Amount] ),
        'Date'[Calendar Year] = 2003 )
EVALUATE
FILTER(
    CALCULATETABLE(
        ADDCOLUMNS(
                  SUMMARIZE( ALL( 'Product Sub-Category' ),
                             'Product Category'[Product Category Name],
                             'Product Sub-Category'[Product Subcategory Name] ),
            "Internet Total Sales", 'Internet Sales'[Internet Total Sales],
            "Sales 2003", 'Internet Sales'[Sales 2003] ),
        'Date'[Calendar Year] = 2002
    ),
    [Internet Total Sales] <> 0 || [Sales 2003] <> 0
)
ORDER BY 'Product Category'[Product Category Name],
         'Product Sub-Category'[Product Subcategory Name]

We have seen that in DAX there is an important difference between ADDCOLUMNS and SUMMARIZE. With ADDCOLUMNS, you start from an “empty” filter context made by the initial table you define. When you want to get only some columns from an existing table, you can use the CROSSJOIN function in order to get all the combinations between them. In this way, you don’t consider any other column and/or relationship existing between these values. With the approach we have seen, the filter context defined by a surrounding CALCULATETABLE affects only the measures in the ADDCOLUMNS function, because the column passed to generate the initial table (and the cardinality of the result) are isolated by such a filter context by calling the ALL function.

The SUMMARIZE function starts from a table and consider an existing filter context applied to the table before grouping data and calculating the values for the columns. Sometime this is exactly what you want to do, and coming from a SQL background this could be the more intuitive way to query a tabular model. However, if you have an MDX background and/or want to make calculation in a more flexible way, applying different filter context to different columns of the result, consider the ADDCOLUMN syntax as a more “intuitive” approach in order to mimic the behavior or an MDX query.