Is a BI methodology truly independent from the underlying technology?

This post is part of a Methodology discussion – other posts will follows. I will be happy to get your feedback!

Building a BI solution (like any kind of software project), it is normal to look for existing methodologies and best practices, just to avoid pitfalls and get better results. Usually, a methodology is not related to a particular technology and/or software product. However, considering some details and/or features that will be available in your solution might affect important decisions that will be made. If we apply these concepts to a BI solution, we might discover that features of a particular product might have a wide impact in the overall architecture.

In this post, we want to consider the impact that the adoption of the SQL Server BI stack (and in particular of Analysis Services) can have on a methodology.

The first part of a modern BI solution is typically a data warehouse. Even if you adopt Analysis Services, usually you will have a relational star schema as a data source. This star schema will be part of a Data Mart extracted from the Data Warehouse, or sometime will be part of the Data Warehouse itself. If we talk about relational star schemas, and more in general if we talk about Data Warehouse modeling, there are well-known best practices to follow. Nevertheless, these papers and books are generally not tied to a particular platform; the only requirement is the use of a relational database. The model you will create will be not affected by the product of a particular vendor you will use as a DBMS. If the user is going to write its own queries against the Data Warehouse, everything is ok.

Now, when Analysis Services comes into the game, our previous choices might affect our final result. Analysis Services has several modeling capabilities that might change the relational Data Mart design we would have been thought otherwise. Just to name a few of these features:

  • Many-to-Many Dimension Relationships
  • Reference Dimensions
  • Parent-Child Dimensions
  • Join to dimension at different level of granularity
  • Data Source View (which allows the definition of views outside from the database)
  • MDX Scripts

At this point, several issues arise. Are the reference dimension a signal that snowflake schema might be better than star schema? Are the parent-child dimensions a way to avoid the construction of a fixed number of hierarchy levels corresponding to the maximum deep of the existing hierarchies? Should we prefer named queries in Data Source View instead of creating regular (and centralized) views into the relational database? Can the power of MDX Scripts substitute some part of the ETL processing?

Answering yes or not to the each of the previous questions might affect our architecture, and in particular the relational design (but also ETL). We didn’t mention the many-to-many relationships. When used in a simple way, they reflect the existing many-to-many dimension relationships existing in a regular Kimball design. However, as shown in the “The Many-to-Many Revolution” paper, we might create very atypical relational schema, just to satisfy our multidimensional modeling needs.

Thus, we need to consider how these changes affect our methodology of choice. If we want to leverage on features of a particular product, we will need to make several exceptions to a standard methodology. Not having a guide for these exceptions often brings to inconsistent results, where several people of the team (or, sometime, the same people over time) use different techniques to implement something that is not well described in the original methodology.

Moreover, if you extend these considerations to the client used to navigate OLAP cubes, there are more substantial differences. These differences impact both the user experience in terms of speed and ease of use and the format of the query sent to the server, forcing you to adapt the cube structure to the specific client.

For these reasons, in the last years we defined a set of rules, patterns and best practices that forms a specific methodology to implement a BI solution with the Microsoft SQL Server BI stack of services. We named it “SQLBI Methodology” and we will publish within September 2008 on the SQLBI web site.