Recently I discovered that a very large dimension could be loaded more effectively by SSAS2005 if it is designed as a snowflake schema instead than as a singular table (star schema). I have to say that I’m a strong supporter of star schema, but these are the facts.

For a dimension, SSAS2005 sends a SELECT DISTINCT query to the relational data source for each dimension attribute. If you have a product dimension with 2 million rows and a lot of attributes (may be 30), it requires time and consumes SQL Server resources (CPU and RAM). But when many of these attributes are defined at the category level (imagine to have a category-product natural hierarchy), then in a snowflake design many of SELECT DISTINCT queries are sent to the ProductCategories table only, without join with the much more populated Products table.

When you consider performance in a cube full process operation, it may be not so significant, after all. But what if you have an incremental cube update and want to incrementally update the Product dimension? Many times each day? Yeah, in this case you could consider this condition in a very different way!

I’d like to share experiences with someone who had done similar test and considerations: comments are welcome!

DISTINCT

Returns a one column table that contains the distinct (unique) values in a column, for a column argument. Or multiple columns with distinct (unique) combination of values, for a table expression argument.

DISTINCT ( <ColumnNameOrTableExpr> )