String Comparison in #DAX

In DAX you don’t have the LIKE operator and you have to use SEARCH instead. However, performance are not very good and it is better to use LEFT and RIGHT if you just need to compare the initial (or ending) match of a string.

I just wrote an article about string comparison in DAX. During my exploration of LIKE replacement functions in DAX, I have found that documentation of SEARCH for PowerPivot v1 is wrong – in DAX the SEARCH function is always case-insensitive, whereas FIND is always case-sensitive. Moreover, in PowerPivot v1 you had to use IFERROR to catch the string not found condition. This was a big issue in performance and in PowerPivot v2 (and in BISM Tabular) the new SEARCH and FIND functions have a fourth parameter that specify the value that should be returned when a match is not found. Using FIND seems to be 10% faster than SEARCH, just because it is case-insensitive.

I’d like to thank Marius Dumitru for the feedback he provided me in this analysis. Here are a few best practices he also suggest:

  • Always avoid IFERROR and ISERROR (everywhere, not just for string functions).
  • Use LEFT instead of FIND/SEARCH for expressing “starts with” conditional expressions on strings
  • Use FIND instead of SEARCH if your comparison is (or can be) case-sensitive.
  • When FIND or SEARCH are needed, use their new flavors taking a 4th parameter (added in SQL11)

If you are curious to see a few performance numbers, at the end of the post I includes the timing for a few query over a Tabular model based on an IISLog table. I included a best-practice table conversion in my article. If you think that a regular expression function would be useful in DAX, you can vote this suggestion on Connect.

— Each DAX query is preceded by a comment with total execution time

— The original condition over IISLog[cIp] column is this one in SQL:

   WHERE cIp NOT LIKE ‘217.57.131.11%’

     AND cIp NOT LIKE ‘11.22.33.3%’

     AND cIp NOT LIKE ‘127.0.0.1’

     AND cIp NOT LIKE ‘192.168.%’

— Other conditions tested in a few queries are:

     AND csUriStem LIKE ‘%SQLBI%Methodology%at%work%pdf’

     AND csUserAgent NOT LIKE ‘%bot%’

 

— 28 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[csUserAgent]),

            SEARCH( “bot”, IISLog[csUserAgent], 1, 0 ) = 0

    ),

    FILTER( ALL(IISLog[csUriStem]),

            SEARCH( “SQLBI*Methodology*at*work*pdf”, IISLog[csUriStem], 1, 0 ) <> 0

    ),

    FILTER( ALL(IISLog[cIp]),

      [cIp] <> “127.0.0.1”

      && LEFT( [cIp], 12 ) <> “11.22.33.3”

      && LEFT( [cIp], 8 ) <> “192.168.”

      && LEFT( [cIp], 13 ) <> “217.57.131.11”

    )

)

 

 

— 15 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[csUserAgent]),

            SEARCH( “bot”, IISLog[csUserAgent], 1, 0 ) = 0

    )

)

 

— 11 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[csUriStem]),

            SEARCH( “SQLBI*Methodology*at*work*pdf”, IISLog[csUriStem], 1, 0 ) <> 0

    )

)

 

 

— 9 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[csUriStem]),

            SEARCH( “SQLBI”, IISLog[csUriStem], 1, 0 ) <> 0

    )

)

 

 

— 54 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

            IFERROR( SEARCH( “217.57.131.11”, IISLog[cIp] ), -1 ) < 0

            && IFERROR( SEARCH( “11.22.33.3”, IISLog[cIp] ), -1 ) < 0

            && IFERROR( SEARCH( “127.0.0.1”, IISLog[cIp] ), -1 ) < 0

            && IFERROR( SEARCH( “192.168.”, IISLog[cIp] ), -1 ) < 0

    )

)

 

— 16 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

            SEARCH( “217.57.131.11”, IISLog[cIp], 1, 0 ) = 0

            && SEARCH( “11.22.33.3”, IISLog[cIp], 1, 0 ) = 0

            && SEARCH( “127.0.0.1”, IISLog[cIp], 1, 0 ) = 0

            && SEARCH( “192.168.”, IISLog[cIp], 1, 0 ) = 0

    )

)

 

 

— 14 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

            FIND( “217.57.131.11”, IISLog[cIp], 1, 0 ) = 0

            && FIND( “11.22.33.3”, IISLog[cIp], 1, 0 ) = 0

            && FIND( “127.0.0.1”, IISLog[cIp], 1, 0 ) = 0

            && FIND( “192.168.”, IISLog[cIp], 1, 0 ) = 0

    )

)

 

 

— 12 seconds

EVALUATE ROW( “result”,

  COUNTROWS(

    FILTER( DISTINCT(IISLog[cIp]),

            [cIp] <> “127.0.0.1”

      && SEARCH( “11.22.33.3”, [cIp], 1, 0 ) <> 1

      && SEARCH( “192.168.”, [cIp], 1, 0 ) <> 1

      && SEARCH( “217.57.131.11”, [cIp], 1, 0 ) <> 1

    )

  )

)

 

 

— 12 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

            [cIp] <> “127.0.0.1”

      && SEARCH( “11.22.33.3”, [cIp], 1, 0 ) <> 1

      && SEARCH( “192.168.”, [cIp], 1, 0 ) <> 1

      && SEARCH( “217.57.131.11”, [cIp], 1, 0 ) <> 1

    )

)

 

— 2 seconds

EVALUATE

CALCULATETABLE(

    ROW( “Rows”, DISTINCTCOUNT( IISLog[cIp] ) ),

    FILTER( ALL(IISLog[cIp]),

      [cIp] <> “127.0.0.1”

      && LEFT( [cIp], 12 ) <> “11.22.33.3”

      && LEFT( [cIp], 8 ) <> “192.168.”

      && LEFT( [cIp], 13 ) <> “217.57.131.11”

    )

)