Using and optimizing DirectQuery in #powerbi and #ssas #tabular

More than one year ago, I and Alberto Ferrari started to work on DirectQuery, exploring the new implementation appeared in Analysis Services 2016 for the Tabular models, which uses the same engine of Power BI. The previous version of DirectQuery appeared in SSAS Tabular 2012 was very slow and suffered of performance issues also using small models. We have never seen it adopted in a production environment, it was very hard to optimize, and there were too many restrictions to the data model (for example, no MDX queries = no PivotTable in Excel as a client, and no time intelligence = complex DAX code to write even for trivial calculations).

For these reasons, when we worked on Analysis Services 2016 and Power BI, we wanted to push the new version of DirectQuery, which solved many of the limitations of the first implementation, discovering its limits and where it could be used in a real production environment. The results of our efforts is now available in a white paper published by Microsoft: DirectQuery in Analysis Services 2016. The document contains a detailed step-by-step guide to implement DirectQuery in your model, and then investigate on how DirectQuery transforms a DAX or MDX query into one or more query in SQL. If you want a quick answer to the question “should we use DirectQuery?”, then the answer is…. it depends!

You have to set your expectations to a correct level. You can use DirectQuery successfully in certain conditions (database size, frequency of refresh, capacity and performance of your RDBMS server). You certainly have to test the particular workload you want to apply to DirectQuery. The best suggestion is: try and measure it. The whitepaper will provide you many hints about what you should try and what you should expect. Specific measures we made could be different today, because there are often new release of this technology, and we can expect more improvements going forward. But you can certainly start to spend time testing it, and if you understand how it work, you can figure out what are the scenarios where you might want to adopt it.

The white paper can be used also for Power BI: the engine is the same, we will probably see many improvements very soon, and maybe that the version of the engine you are using is already much better than the ones we used writing the whitepaper. However, the basic concepts are the same, and you can see and measure the improvements by repeating the same tests we described in the document.

Out of the records, and in a very informal way, I would like to highlight that you should try DirectQuery with a correct attitude. Don’t expect the magic box, and don’t assume it will be bad, it could surprise you in both cases! Most important, think about why you might want to use this technology.

The right reasons for using DirectQuery are:

  • You need real-time queries. Processing window creates latency that you cannot accept. Good reason. Just ask to yourself if your customers are ready for real-time queries in a dashboard. You might discover they are not.
  • You have a small database to query, that changes often.
  • You have a large amount of data, too big to fit in memory, and you *don’t* need queries returning a result in less than 2 seconds.

There are also two very wrong reasons for choosing DirectQuery:

  • Saving money. I think that the in-memory Tabular model has a lower cost considering hardware+software+maintenance in order to provide the same level of performance.
  • Improving query performance. On the same hardware, it’s very hard. On a different hardware, maybe… but see the previous point about the cost. And also consider network latency (you run the RDBMS on another server, right?).

I don’t want to scare you. I just want to set the right level of expectations. With that in mind, you can implement successful projects using DirectQuery today.