Articles   \  

From SQL to DAX: String Comparison



In DAX string comparison requires you more attention than in SQL, for several reasons: DAX doesn’t offer the same set of features you have in SQL, a few text comparison functions in DAX are only case-sensitive and others only case-insensitive, and performance might have a relevant impact in your query according to the comparison technique you use.

If you compare two strings by using common operators (=, <>, <, >, <=, >=) the string comparison can be case-insensitive, according to the collation setting of your Analysis Services instance (for PowerPivot it depends on the workbook). However, functions like FIND and SUBSTITUTE are always case-sensitive, whereas SEARCH is always case-insensitive. FIND can be 10% faster than SEARCH just because it is case-sensitive.

One issue you might have if you come from SQL is the different operator you have to perform somewhat similar to a regular expression. In fact, the LIKE operators in SQL has a similar correspondent function in DAX: the SEARCH function has a slightly different semantic because it returns the position found instead of a True/False value. Moreover, it has different wildcard characters, as you will see later in this article.

For example, consider the following syntax in SQL:

Name LIKE '%SQLBI%'

In Tabular and PowerPivot v2 (for SQL Server 2012) you can write the following equivalent syntax:

SEARCH( "SQLBI", Table[Name], 1, 0 ) > 0

The condition above returns 0 if SQLBI is not found in the Name column of Table. The search starts at first character (the third parameter) and in case of no match you get 0 as a result (defined by the fourth parameter). If you are using PowerPivot for SQL Server 2008 R2 (or PowerPivot v1), you have to wrap SEARCH into an IFERROR because the fourth parameter have been introduced only in SQL Server 2012 release and an error is thrown in PowerPivot version 1.

IFERROR( SEARCH( "*SQLBI*", Table[Name], 1 ), 0 ) > 0

Unfortunately, the presence of IFERROR has a big impact on performance and you can observe a 4x slower execution times with this syntax. Thus, avoid IFERROR if possible.

In case you need to define a more complex filter, you have to adapt to the different wildcard characters. Instead of using % and _ you would use in the SQL LIKE operator, you have to use * and ? in the DAX SEARCH function.

For example, consider the following condition in SQL:

Name LIKE '%SQLBI%Methodology%at%work%'

The correspondent syntax in DAX is:

SEARCH( "SQLBI*Methodology*at*work", Table[Name], 1, 0 ) > 0

However, if you change the SQL condition to:

Name LIKE '%SQLBI%Methodology%at%work'

you do not have an equivalent syntax in DAX, because you cannot check that the string ends with “work” based only on the return value of the SEARCH call. Moreover, if you want to improve performance, you should avoid SEARCH whenever possible. For example, using LEFT instead of SEARCH in order to check whether a string begins with a particular text might improve performance of a 5x-10x factor. Here are a few hints in order to translate LIKE in the best pattern.


SQL DAX
Name LIKE 'SQLBI'
Table[Name] = "SQLBI"
Name LIKE 'SQLBI%'
LEFT( Table[Name], 5 ) = "SQLBI"
Name LIKE '%SQLBI'
RIGHT( Table[Name], 5 ) = "SQLBI"
Name LIKE '%SQLBI%'
SEARCH( "SQLBI", Table[Name], 1, 0 ) > 0

for PowerPivot v1:

IFERROR( SEARCH( "SQLBI", Table[Name], 1 ), 0 ) > 0
Name LIKE 'SQLBI%Methodology'
LEFT( Table[Name], 5 ) = "SQLBI"
&& RIGHT( Table[Name], 11 ) = "Methodology"

Name LIKE 'SQLBI%Methodology%'
LEFT( Table[Name], 5 ) = "SQLBI"
&& SEARCH( "Methodology", Table[Name], 1, 0 ) > 0

Name LIKE '%SQLBI%Methodology%'
SEARCH( "SQLBI*Methodology", Table[Name], 1, 0 ) > 0

In conclusion, in DAX it is better to avoid the use of SEARCH unless you really need to search a pattern using wildcards, and it is highly suggested to avoid using IFERROR for performance reasons (you pay the penalty for every error raised – using IFERROR when the errors are really rare is fine).







 
Want to read more?