How to relate tables in DAX without using relationships

UPDATE 2017-07-12: please note this article was written in 2010, there are now better ways to obtain the same result. Please read these articles: Physical and Virtual Relationships in DAX and Propagate filters using TREATAS in DAX

PowerPivot supports only one type of relationship between two tables, which is the one-to-many relationship. You can define that a column in a table (the “many” side) corresponds to a lookup table through a column which is an identity column there (the “one” side). DAX provides support to follow this relationship through functions such as RELATED and RELATEDTABLE.

As I already wrote in this blog, many-to-many relationships are not directly supported by DAX and we can work-around that by writing more or less complex DAX expressions. However, in that case the many-to-many relationship was still partially supported by two existing one-to-many relationships. Sometimes there cases where the relationship would require the creation of new tables that we do not have and it might be useful to implement the same concept of relationship by using only DAX and nothing else. The scenario that I will show will be the cost simulation by using several couriers. But before that, let’s learn how to use DAX to get the same RELATED function behavior without having an underlying relationship in the model.

IMPORTANT NOTE: in a following post we will see how we can optimize performance of the shipment cost simulation sample with a model based on relationships between tables. However, the sample in this post can be easier to implement for an end user. With medium data sets (below million of rows) performance are still good.

Simulate a table relationship in DAX

We have an Orders table and for each Order there is a Country where the order will be shipped. The Weight column is the parameter necessary to calculate the shipment cost (it could be the weight in kg, for example).

 

OrderNumber

Country

Weight

SO43697

Canada

35.50

SO43701

Australia

52.00

 

Then we have a PriceList table with shipment cost for each Country.

 

Country

Cost

Australia

3.50

Canada

0.50

France

2.50

Germany

2.60

Italy

2.70

USA

0.40

 

If we had the relationship between these two tables, the calculated column for each order would be:

‘Orders'[ShipmentCost] = ‘Orders'[Weight] * RELATED(‘PriceList'[Cost] )

If instead than a calculated column we would like defining a calculated measure, it will be:

‘Orders'[ShipmentCost] = SUMX( ‘Orders’, ‘Orders’ [Weight] * RELATED( ‘PriceList'[Cost] ) )

But what if we don’t have a relationship? Imagine we don’t have a RELATED function. Well, the idea is that we have to filter the row containing the same destination country of the order we are calculating. So, we’d like to replace the expression

RELATED( ‘PriceList'[Cost] )

with something like

CALCULATE(
        VALUES( ‘PriceList'[Cost] ),
        ‘PriceList'[Country] = ‘Orders'[Country] )

We wrote VALUES( ‘PriceList'[Cost] ) because we expect a single value will be returned and VALUES avoid the syntax error that we would have got writing only ‘PriceList'[Cost] in the first parameter of CALCULATE. However, the whole expression is not going to work. This is the error:

The expression contains multiple columns, but only a single column can be used in a Boolean expression that is used as a table filter expression.

I have to say that this error is not very clear. The problem is that ‘PriceList'[Country] = ‘Orders'[Country] does not give an indication of what is the table that we want to filter (remember that a filter expression in CALCULATE always has its effects on a table, see this previous post). What we want to do is to filter rows of the PriceList table, so that our VALUES function will return only the row of the country we have in our Order. We can use a FILTER function instead, returning only the rows from PriceList table which satisfy the condition:

CALCULATE(
        VALUES( ‘PriceList'[Cost] ),
        FILTER( ‘PriceList’, 
                ‘PriceList'[Country] = ‘Orders'[Country] 
        )
)

Replacing the original RELATED call with the CALCULATE expression above, we can define the calculated measure ShipmentCost in this way (in a packed form):

‘Orders'[ShipmentCost] = SUMX( ‘Orders’, ‘Orders’ [Weight] * 
        CALCULATE( VALUES( ‘PriceList'[Cost] ),
                   FILTER( ‘PriceList’, 
                           ‘PriceList'[Country] = ‘Orders'[Country] ) ) )

The resulting PivotTable will show this calculation.

 

Row Labels

Sum of Weight

ShipmentCost

SO43697

35.5

17.75

Canada

35.5

17.75

SO43701

52

182

Australia

52

182

Grand Total

87.5

199.75

 

Table relationship through multiple columns in DAX

Now that we know how to simulate the relationship in DAX, we can also use multiple columns for defining a relationship. For example, imagine we have Country and Zone columns in the Orders table.

 

OrderNumber

Country

Zone

Weight

SO43697

Canada

1

35.50

SO43701

Australia

2

52.00

 

Our PriceList now contains different prices for each Country and Zone.

 

Country

Zone

Cost

Australia

1

2.90

Australia

2

3.20

Australia

3

3.50

Canada

1

0.50

Canada

2

0.55

Canada

3

0.60

 

We can add another condition to the FILTER on PriceList table, using two boolean expression in an AND condition: we want to have the same Country and the same Zone. The following formula makes the thing working:

‘Orders'[ShipmentCost] = SUMX( ‘Orders’, ‘Orders’ [Weight] *
        CALCULATE( VALUES( ‘PriceList'[Cost] ),
                   FILTER( ‘PriceList’, 
                           ‘PriceList'[Country] = ‘Orders'[Country]
                           &&
                           ‘PriceList'[Zone] = ‘Orders'[Zone] ) ) )

The resulting PivotTable will show the calculation of ShipmentCost for each order considering the price based on Country and Zone.

 

Row Labels

Sum of Weight

ShipmentCost

SO43697

35.5

17.75

Canada

35.5

17.75

1

35.5

17.75

SO43701

52

166.4

Australia

52

166.4

2

52

166.4

Grand Total

87.5

184.15

 

Shipment cost simulation for different Couriers

At this point we want to simulate the cost using different Couriers. Using the same Orders table we have seen before, we add a Courier column to the PriceList table.

 

Courier

Country

Zone

Cost

Blu Express

Australia

1

2.90

Blu Express

Australia

2

3.20

Blu Express

Australia

3

3.50

Blu Express

Canada

1

0.50

Blu Express

Canada

2

0.55

Blu Express

Canada

3

0.60

SpeedyMail

Australia

1

3.00

SpeedyMail

Australia

2

3.15

SpeedyMail

Australia

3

3.45

SpeedyMail

Canada

1

0.45

SpeedyMail

Canada

2

0.50

SpeedyMail

Canada

3

0.55

 

Now have to use the Courier attribute in the resulting PivotTable, otherwise there will be an error in the ShipmentCost calculation because there are more prices for each Country/Zone combination.

 

Row Labels

Sum of Weight

ShipmentCost

Blu Express

87.5

184.15

SO43697

35.5

17.75

SO43701

52

166.4

SpeedyMail

87.5

179.775

SO43697

35.5

15.975

SO43701

52

163.8

Grand Total

87.5

#VALUE!

 

Having an error for the Grand Total row is probably right. We want to compare different Couriers, simulating a complete calculation for each order using different price lists. Looking at aggregated data, without the order granularity, we will find the most convenient Courier. Data can also diced by Country if we can make local agreements. However, we might want to have a reference value for the Grand Total row and, in general, a valid value for ShipmentCost even when a Courier is not selected. In these cases, we can replace the VALUES( ‘PriceList'[Cost] ) expression in the CALCULATE statement with AVERAGE( ‘PriceList'[Cost] ), which will use the average price of the selected Couriers (we can always use slicers) in the final total row.

 

Row Labels

Sum of Weight

ShipmentCost

Blu Express

87.5

184.15

SO43697

35.5

17.75

SO43701

52

166.4

SpeedyMail

87.5

179.775

SO43697

35.5

15.975

SO43701

52

163.8

Grand Total

87.5

181.9625

 

Final considerations

The scenarios we have described are useful only when the use of relationships is not possible. This can be the case of a particular complex condition to relate two tables, or simply it could be hard or time wasting to generate the necessary intermediate tables to define the relationships between tables containing real data.

From a performance point of view, it is always better to define a model that solve the problem leveraging relationships, using DAX just to make calculation but not to filter data. Using relationships, filtering is implicit in DAX formulas and this helps the engine to produce results faster. However, having a complete toolset to solve real business problems helps you to choose the fastest path to create a working PowerPivot model.

UPDATE: this post seems to qualify for T-SQL Tuesday post!