From SQL to DAX: Joining Tables

In SQL there are different types of JOIN, available for different goals. This article shows what are the equivalent syntaxes supported in DAX.

The SQL language has the following types of JOIN:

  • INNER JOIN
  • OUTER JOIN
  • CROSS JOIN

The result of a join does not depends on the presence of a relationship in the data model. You can use any column of a table in a join condition.
In DAX, you can obtain a JOIN behavior in two different ways. First, you can leverage existing relationships in the data model in order to query data included in different tables, just as you wrote the corresponding JOIN conditions in the DAX query. Second, you can write DAX expressions producing a result equivalent to certain types of JOIN. In any case, not all of the JOIN operations available in SQL are support in DAX.

You can test the examples shown in this article by downloading the sample files (see buttons at the end of the article) and using DAX Studio to run the DAX queries.

Using Relationships in Data Model

The common way to obtain a JOIN behavior in DAX is implicitly using the existing relationships. For example, consider a simple model with the tables Sales, Product, and Date. There is a relationship between Sales and the each one of the other three tables. If you want to see the Quantity of sales divided by Year and Product Color, you can write:

EVALUATE
ADDCOLUMNS (
    SUMMARIZE (
        Sales,
        'Date'[Year],
        Product[Color]
    ),
    "Total Quantity", CALCULATE ( SUM ( Sales[Quantity] ) )
)

The three tables are automatically joined together using a LEFT JOIN between the Sales table (used in the expression for the Total Quantity column) and the other two tables, Date and Product.

SELECT
    d.Year, p.Color, SUM ( s.Quantity ) AS [Total Quantity]
FROM
    Sales s
    LEFT JOIN Date d ON d.DateKey = s.DateKey
    LEFT JOIN Product p ON p.ProductKey = s.ProductKey
GROUP BY
    d.Year, p.Color

Please, note that the direction of the left join is between Sales and Date, so all the rows included in the Sales table that do not have a corresponding row in Date or in Product are grouped in a BLANK value (which corresponds to the concept of NULL in SQL).

If you do not want to aggregate rows, you can simply use RELATED in order to access the columns on lookup tables (which are the ones on the “one side” of the relationship). For example, consider the following syntax in SQL:

SELECT
    s.*, d.Year, p.Color
FROM
    Sales s
    LEFT JOIN Date d ON d.DateKey = s.DateKey
    LEFT JOIN Product p ON p.ProductKey = s.ProductKey

You obtain the same behavior by using the following DAX query:

EVALUATE
ADDCOLUMNS (
    Sales,
    "Year", RELATED ( 'Date'[Year] ),
    "Color", RELATED ( Product[Color] )
)

You might obtain a behavior similar to an INNER JOIN by applying a filter to the result of the ADDCOLUMNS you have seen so far, removing the rows that have a blank value in the lookup table, assuming that the blank is not a value you might have in the data of that column.

You cannot obtain a CROSS JOIN behavior in DAX by just leveraging on relationships in the data model.

Joining tables without relationships in DAX

The current version of DAX (Power Pivot for Excel 2010/2013 and Analysis Services Tabular 2012/2014) only has a specific function for the CROSS JOIN. Consider this the following syntax in SQL:

SELECT *
FROM a
CROSS JOIN b

You can write an equivalent syntax in DAX by using the CROSSJOIN function:

EVALUATE
CROSSJOIN ( a, b )

You can obtain the equivalent of an INNER by embedding the CROSSJOIN expression in a filter, even if this is not suggested in case you have to aggregate the result (as will we see later). Consider the following INNER JOIN in SQL:

SELECT *
FROM a
INNER JOIN b ON a.key = b.key

You write an equivalent syntax in DAX using the following expression:

EVALUATE
FILTER (
    CROSSJOIN ( a, b ),
    a[key] = b[key]
)

You do not have a simple way to obtain a syntax in DAX (in versions available at the end of 2014) corresponding to a LEFT JOIN in SQL. Nevertheless, you have an alternative if you can assume that you have a many-to-one relationship between the table on the left side and the one on the right side. This was the case of LEFT JOIN using relationships in DAX, and you have seen the solution in DAX by using RELATED. If the relationship does not exists, you can use the LOOKUPVALUE function instead.

For example, consider the same SQL query we have seen previously.

SELECT
    s.*, d.Year, p.Color
FROM
    Sales s
    LEFT JOIN Date d ON d.DateKey = s.DateKey
    LEFT JOIN Product p ON p.ProductKey = s.ProductKey

You can write it in DAX in this way:

EVALUATE
ADDCOLUMNS (
    Sales,
    "Year", LOOKUPVALUE (
        'Date'[Year],
        'Date'[DateKey], Sales[DateKey]
    ),
    "Color", LOOKUPVALUE (
        Product[Color],
        Product[ProductKey], Sales[ProductKey]
    )
)

The version using RELATED is more efficient, but this could be a good alternative if the relationship does not exists.
Finally, consider the query that aggregates the result of a LEFT JOIN in SQL, like the one we have seen previously (we added just the ORDER BY clause):

SELECT
    d.Year, p.Color, SUM ( s.Quantity ) AS [Total Quantity]
FROM
    Sales s
    LEFT JOIN Date d ON d.DateKey = s.DateKey
    LEFT JOIN Product p ON p.ProductKey = s.ProductKey
GROUP BY
    d.Year, p.Color
ORDER BY
    d.Year, p.Color

You can use two approaches here. The first is to leverage the LOOKUPVALUE syntax, aggregating the result like in the following DAX syntax:

EVALUATE
SUMMARIZE (
    ADDCOLUMNS (
        Sales,
        "Sales[Year]", LOOKUPVALUE (
            'Date'[Year],
            'Date'[DateKey], Sales[DateKey]
        ),
        "Sales[Color]", LOOKUPVALUE (
            Product[Color],
            Product[ProductKey], Sales[ProductKey]
        )
    ),
    Sales[Year],
    Sales[Color],
    "Total Quantity", CALCULATE ( SUM ( Sales[Quantity] ) )
)
ORDER BY Sales[Year], Sales[Color]

However, if the number of combinations of the aggregated columns is small and the number of rows in the aggregated table is large, then you might consider this approach (verbose, but faster in certain conditions):

DEFINE
    MEASURE Sales[Total Quantity] =
        CALCULATE (
            SUM ( Sales[Quantity] ),
            FILTER (
                ALL ( Sales[ProductKey] ),
                CONTAINS (
                    VALUES ( Product[ProductKey] ),
                    Product[ProductKey], Sales[ProductKey]
                )
            ),
            FILTER (
                ALL ( Sales[DateKey] ),
                CONTAINS (
                    VALUES ( 'Date'[DateKey] ),
                    'Date'[DateKey], Sales[DateKey]
                )
            )
        )
EVALUATE
FILTER (
    ADDCOLUMNS (
        CROSSJOIN ( ALL ( 'Date'[Year] ), ALL ( Product[Color] ) ),
        "Total Quantity", [Total Quantity]
    ),
    NOT ISBLANK ( [Total Quantity] )
)
ORDER BY 'Date'[Year], Product[Color]

Future Enhancements

DAX will introduce new features in future versions of Excel, Power BI and Analysis Services. We will include updates and/or links when these features will be available.

Download

Download Demo (ZIP)