Generative AI guidelines at SQLBI

These are pivotal times with the large adoption of new technologies like generative AI systems that will certainly impact how we work, along with a lot of other things. If you look at how Alberto and I answer questions related to adopting AI algorithms to Power BI, we have always been skeptical about using AI to generate DAX code. Many might confuse this position with a negative approach to AI in general. This is not the case – quite the contrary, as a matter of fact.

I want to clarify two simple concepts, which are our guidelines for adopting AI at SQLBI:

We look forward to using AI to improve productivity: our productivity and the productivity of our readers.
Whenever we publish content generated by AI engines, we will always make that clear to our readers.

If you don’t have time to read further, that is most of what I had to say today. You know that if we say that something doesn’t work (such as generating DAX code from AI), we mean that it doesn’t work the way it is implemented, it doesn’t work with the current generation of the technology, it doesn’t work for specific reasons that we can explain. However, sometimes we don’t have the space nor time to explain all of that. It does not mean that we do not believe it is possible (in a different way), and it does not mean that we are opposed to AI.

If you would like to read more, I can share a few more considerations about our guidelines.

I want to start with our current adoption in production. We produce video courses about DAX and data modeling for the international market. While English is the language we use to write and speak, English is not our mother tongue. Some students may need help understanding our pronunciation or are not fully comfortable with the English language. With the current technology, hand-crafted subtitles in English enable AI-powered services to produce translated subtitles, which brings us to a total of 14 languages including English. This approach is not ideal because an automatic translation is imperfect. Still, the alternative would be to reduce the number of available languages for a difference in quality that does not make a massive difference in overall consumption. We started using this approach a few years ago, we recently improved our ability to fix errors in original and translated subtitles with minimal overhead, and we will continue to look for incremental improvements in this area.

We do not use AI in DAX analysis and DAX coding. The unsolved problem here is that the DAX language is almost meaningless without a data model. The data model is often incomplete in terms of semantics because many pieces of information cannot be inferred only by table and column names. We understand the excitement about large language models (LLM) such as the ones running ChatGPT services, but the lack of context is an unresolved problem. I recorded an unplugged video, Writing DAX with ChatGPT-4, where I made this limitation pretty clear. As I said in the video, what this technology can do is impressive, but it does not work well in our context. With other languages, you do not have a strong dependency on an external context (like shared data structures for a programming language), which explains why ChatGPT is more successful in generating code for Python and JavaScript – just to mention two popular programming languages.

We are evaluating the adoption of AI in other stages of development of semantic models. Generating unit tests, monitoring data quality, analyzing processing logs, and evaluating the complexity of a measure or a model: these are just a few of our many ideas. While this is not a primary activity in our schedule, looking for AI services and algorithms that could help is on our radar. I expect to see a quicker return on investment (read: actual adoption) in these areas rather than in direct DAX code generation. Using AI to interact with the DAX Template engine could be a much more productive investment. Still, nobody seems interested in taking the longer but more effective route to get something that works reliably. For now.

We will use AI to improve content consumption on our websites. We are already at the proof-of-concept stage for a few features that will make it simpler to navigate the content available. Better search, table of content, and summarization of the existing content are a few of the ideas we are working on. The guiding principle is to “improve productivity”. With the same amount of resources (mainly time), we (and you) can do more: this is what we mean by “improving productivity”.

We will always be transparent when using AI-generated content. We recently published an article to celebrate April Fools’ Day: Navigating the Data Ecosystem: A Revolutionary Analytics Architecture. As described the day after in Behind the scenes of an April Fools’ Day, we published an article entirely generated by ChatGPT-4. We wanted to make fun of certain casual uses of the word “modern”: doing it using the latest cutting-edge technology seemed like an irresistible opportunity. The quality of the English is so good that we had to ensure there were enough clues that it was a joke – including the signature ChatGPT-4! While the signature was also part of the joke in this case, we are serious about always making the reader aware of the type of content displayed. We will always disclose which parts of the content are generated by AI systems. If not specified otherwise, readers can assume that humans created the content. We do not want to enter an ideological debate here. It is just what we think is the right thing to do to be respectful of our readers and customers.

I hope it was not too boring. I may have written this blog post just to have a link to share when I receive another question about ChatGPT and DAX. In that case, I apologize – but I tried to warn you in the third paragraph! If you have other questions about how SQLBI uses (and will use) AI technologies, please post in the comments.

Thanks for reading!