Cost of Process Defrag in Analysis Services Tabular #ssas #tabular

I recently received a question about the memory required to run a Process Defrag on a Tabular model in Analysis Services. The Process Defrag is useful when you run incremental processing of a table frequently, or when some of the values in the dictionary are no longer used in the table, for example if you process the same partition multiple times in a table and/or remove partitions from a table. Cathy Dumas wrote an interesting blog post about the savings you can obtain by running process defrag.

I made some investigation and I’ve been assured that data is not completely uncompressed in this process, even if some coding/encoding happens. In a rough estimate, you need a free space equivalent to the size of the table you are going to defrag (which is already in memory) plus buffer for transient data structure, but data are managed in a compressed form, without requiring larger memory buffers for uncompressed raw data. So, if all partitions of your table and the column dictionaries require 100MB in RAM, then you need another 100MB of free RAM in order to execute a Process Defrag.

Here is a more detailed description I received from Akshai Mirchandani:

In addition to the master copies, it requires enough memory for a new dictionary, and the final compressed data + some small temporary buffer space for the transient data structures (no big buffers like normal processing).

It is essentially going to read each column value, insert it into a new dictionary, get a new DataID back from the dictionary insert, and append that DataID to the current segment. It doesn’t need to do VertiPaq (compression) again, and it doesn’t keep the uncompressed data in buffers like the data processing algorithm does.

The important point here is that the analysis of the segment data to come up with the best compression strategy no longer needs to be performed – and that’s typically the most expensive step of the compression (the VERTIPAQ_STATE in DISCOVER_STORAGE_TABLE_COLUMN_SEGMENTS shows whether this was done).

This is a good news if you are concerned with memory required to perform this operation.