Why data modeling is important in #powerbi

I have a long experience in working on reporting systems based on semantic models. Now we call them “self-service BI”, but the principles are the same we had in DSS (decision support system), OLAP (on-line analytical processing), and many other names that we’ve seen over the years.

The basic idea is always the same: enable the user to navigate data without rewriting a new query every time. In order to do that, a description of the data model with metadata that enrich the simple collection of table and relationships allows a generic client tool to offer a customized experience, as it was designed for a specific business need.
Creating a good data model specific for the business analysis is important because the goal is different compared to a database that runs a transactional application. If you want to record sales, you need a database that is optimized to collect data in a safe and accurate way. If you want to analyze sales, you need a data model that is optimized to query and aggregate data in volume. These two requirements are different in a way that is much larger than a simple performance issue.

A normalized model for a transactional application might show challenges for a query. How to interpret a missing data or a null value? A data model optimized for the analysis is simpler in its nature, because the data have been transformed so that they are unambiguous.

Power BI shares the same data modeling experience of Power Pivot and Analysis Services Tabular. The easiest way to create a data model is to get the data “as is” from a data source, define relationships, and then start designing reports. However, this approach raises several issues when you have too many tables, or different granularities in tables representing the same entity from different data sources. Power Query and the M language are here to solve these problems. If you are so lucky to read data from a well-designed data mart that follows the star schema principles, you don’t need any transformation. But in all the other cases, you probably should massage your tables before creating your reports.

The question is: what is the right data model for the business analysis?
The first simple answer is: the model should be designed to answer business questions, and not to simply represent the data as they come from the data source.
Yes, this answer is not a real answer, because it does not provide any practical guidance.
A better hint is: one entity, one table. This is a “user-friendly” way to describe a star schema, where each set of attributes describing a business entity is collected in a table called “dimension”, and each set of numeric metrics describing events and sharing the same granularity is saved in a table called “fact table”. But “one entity, one table” is simpler to understand.

A product is a table. Category, color, price, manufacturer: these are attributes. There is no added value in creating many relationships to just describe attributes of the same entity. Just collapse all these attributes in the same Product table.
A customer is a table. Country, city, address, number of employees are just attributes of the customer. It is useless to create a table with the countries.
Unless the country is a business entity in your analysis. If you are doing demographical researches, chances are that the country is a business entity, and not just an attribute of a customer.
An invoice is a table. Date of the invoice, shipping cost, products sold, quantities… all these attributes should be in a table with a conformed granularity. And when this is not possible, but only then, you start considering several tables at different granularities.

I worked with DAX since its very first version in Power Pivot. I’ve seen the growth in the adoption of this language. And I know that many issues looking for a solution in DAX would find a better, simpler, and faster solution with a proper data model. Data modeling is important, indeed.

You can start your journey in data modeling by reading the book Analyzing Data with Power BI and Power Pivot for Excel . It is a book about data modeling. The reason why these two words are not in the book title is because most of the users needing data modeling skills are not aware of that. Call it marketing, we call it “help users”.

You can also spend one day on October 30, 2017 in Seattle in the Data Modeling with Power BI preconference day. Yes, the title reflects the content, there is no need to “simplify” the message. If you read until this paragraph, you know what you are looking for!