Optimize Table and Column Size in #PowerPivot and #BISM #Tabular #SSAS #VertiPaq

When you create a PowerPivot workbook or a BISM Tabular model (new in Analysis Services 2012), you store data by using the VertiPaq engine, which is an in-memory columnar database. Instead of considering the row of a table as the main unit of storage, it considers every column as a separate entity and stores data for every column in a separate way. This makes it very fast to query data for a single column, but requires a higher computational effort in order to retrieve data for several columns of a single row.

VertiPaq is very fast and stores data in a highly compressed way. This is possible also thanks to its particular storage architecture. Every table you process is divided in columns, and every column has its own storage. For every column, a dictionary of all the distinct values is created, so that the real storage for that column will be a bitmap index that references the dictionary. Both dictionary and bitmap index are then highly compressed and are stored in a compressed way on both RAM and disk. The disk is used only as a sort of backup of data and all the queries are made by loading all of the column data in memory.

Exactly two years ago I wrote a blog post about how to optimize memory usage in PowerPivot and those suggestions where still valid today in any VertiPaq implementation:

Reduce the number of columns
Avoid high-cardinality columns
Reduce precision of some numbers (the goal is reducing the number of distinct values)

Considering the possible ways to optimize a table in VertiPaq, the best approach is locating the largest column and trying to optimize it if you cannot remove it. It’s interesting to discover that possible optimizations require a completely counterintuitive approach, that would increase table size in a SQL Server table but can reduce space up to 90% in VertiPaq. The basic idea is that you can save this space by splitting the column in two or more columns with a smaller number of distinct values, and I described that in the new article Optimizing High Cardinality Columns in VertiPaq. I’d like to get feedback using this technique, especially on possible unexpected side effects!

In the meantime, I’d like to highlight the next upcoming events in which I’ll talk about PowerPivot and BISM Tabular (hint: even if you don’t understand Italian language, take a look at the conference web site and let me know if you don’t like it!):