Using VALUES in iterators

In a previous article, Choosing between DISTINCT and VALUES in DAX, we explained how to choose the proper function to iterate the unique values visible in a column in the current filter context. We suggest reading that article before starting this one, because it describes the same underlying problem. Here, we discuss whether to use VALUES in an iterator. This choice depends on the answer to this question: Do you want to include or exclude the blank row generated by an invalid relationship when iterating over the rows of a table reference?

The mentioned article analyzes DISTINCT and VALUES using a column reference as an argument. However, DISTINCT and VALUES can have a table as an argument. Nevertheless, there is a significant difference that is reflected in their semantics:

DISTINCT accepts any table expression and removes duplicate rows.
VALUES only accepts a table reference and not any table expression.

The only purpose of VALUES with a table reference as an argument is to return an additional blank row if that blank row exists because of an invalid relationship. Moreover, VALUES does not remove duplicate rows if you provide a reference to a table that does not have unique column constraints. Therefore, we can ignore DISTINCT for the blank row problem. When we use a table reference, we have two cases:

The table reference does not include the blank for an invalid relationship.
VALUES with a table reference includes the blank row for an invalid relationship.

In the case of a valid relationship, or when the table does not have unique column constraints – like the one-side of a one-to-many relationship – VALUES ( table ) and table return the same result. Therefore, we can assume that the same issue may exist when iterating over a table reference that could be affected by the blank row.

For example, consider the following measure that adjusts the revenues by 1% for countries in Europe:

Measure in Sales table

Net Cashback (incorrect) = 
SUMX (
    Customer,
    [Sales Amount] * (1 - Customer[% Cashback])
)

What we see is the same issue we described in a previous article about DISTINCT and VALUES, where we iterated with SUMX the result of DISTINCT with a column reference in the Sales Adjusted (incorrect) measure. Here we have the same problem, because iterating a table reference corresponds to ignoring the additional blank row in case of an invalid relationship.

In this case too, the solution is to use VALUES; but this time, we use it around the table reference instead of as a DISTINCT replacement, because the original (incorrect) formula did not have any DISTINCT, just the Customer table reference:

Measure in Sales table

Net Cashback = 
SUMX (
    VALUES ( Customer ),
    [Sales Amount] * (1 - Customer[% Cashback])
)

The case where you must use a table reference and not use VALUES is similar to the one we described for DISTINCT, using iterators such as AVERAGEX, MINX, and MAXX. For example, a measure returning the maximum amount of sales per customer should not use VALUES, as we do in the “incorrect” version:

Measure in Sales table

Max Customer = 
MAXX (
    Customer,
    [Sales Amount]
)

Measure in Sales table

Max Customer (incorrect) = 
MAXX (
    VALUES ( Customer ),
    [Sales Amount]
)

Indeed, the version with VALUES produces an incorrect result, where the total and the amount of an empty country correspond to the sum of all the “unknown” customers. When you do not have any customer attributes in the visual, the situation is worse, as shown in the matrix by brand. The values are inflated but not immediately recognizable as inaccurate: it can happen that a brand (like Northwind Traders) shows the same value in both measures, but most of the time the incorrect measure shows a larger amount than the correct measure does.

Thus, you should not blindly use VALUES in every iterator; the choice between a table reference and VALUES with the same table reference depends on the calculation requirements. However, while VALUES is advisable as a default for the reasons described when iterating the unique values of a column, we must recognize that most existing code does not use VALUES even when it should, according to best practices. Yet, there are no negative consequences for that, because the result is the same when at least one of these conditions is true:

The expression only retrieves values using column references, and it has no measure references. Examples:
- No differences: SUMX ( table, table[column1] * table[column2] )
- Different results: SUMX ( table, table[column1] * [measure3] )
The expression does not include functions performing context transition, like CALCULATE and RELATEDTABLE. Examples:
- No differences: SUMX ( table, IF ( LEFT ( table[column1], 1 ) = “A”, table[column2], table[column3] )
- Different results: SUMX ( table1, table1[column1] * CALCULATE ( SUM ( table2[column2] ) ) )
The iterated table does not have any column on the one-side of a regular relationship. Examples:
- No differences: Sales does not have any column on the one-side of a many-to-one relationship.
- Different results: Customer has CustomerKey, which is on the one-side of the Customer–Sales one-to-many relationship.

It is not necessary to refactor existing code if you spot the absence of VALUES in a SUMX iterator, as we often have one of the cases that make VALUES unnecessary. That said, there are no additional costs associated with using VALUES when it is not needed, as long as it still provides the correct semantics for the calculation.