We recently discussed an issue related to the Dictionary size reported by Bravo for Power BI that, in reality, references the behavior of VertiPaq Analyzer, which relies on data management views (DMV) provided by the underlying Analysis Services engine. Because the numbers reported are correct, even if they seem wrong, I wanted to write a short blog post describing why this happens and why it is not an error.
The issue reported is the following: the dictionary size for a calculated column is always at least 1MB, also for columns that have only one or two values in the dictionary.
While it is true that the dictionary does not need 1MB of storage, 1MB of RAM has indeed been allocated to store the dictionary. Let’s go into more detail:
- When you refresh a model, the engine allocates 1MB of RAM for each column to create the dictionary.
- Before reading data, the engine does not know how big the dictionary will be, so it allocates a larger amount of RAM to reduce the number of following allocations.
- Allocating memory is computationally expensive and reducing the allocations improves the speed of the refresh operation.
- When the refresh completes, not all of the allocated memory has been consumed for the dictionary.
- However, the memory allocated in excess is not released to the operating system.
- The DMV reports the allocated memory, which corresponds to the memory currently consumed by the dictionary.
- Note: Just after a refresh, the dictionary size reported by DMV/VertiPaq Analyzer/Bravo corresponds to the memory allocated, not to the actual size of the dictionary.
- When Analysis Services restore a database from a backup, the engine only allocates the minimum amount of memory required for the dictionary.
- Indeed, the dictionary is read-only at that point and will not be modified until the next refresh operation.
- In this case, the memory allocated corresponds to the exact size of the dictionary.
- Whenever you save/close/open a Power BI file, you are actually doing a backup/detach/restore operation of a database in Analysis Services.
- Note: The memory reported for a column dictionary immediately after opening a PBIX file is usually “correct”, meaning that the reported memory corresponds to the size of the dictionary – unless something refreshes a column…
- When a calculated column depends on something that could rely on the execution environment, the engine must recalculate the calculated column as soon as it restores the database.
- For example, if a PBIX file contains a calculated column that depends on the Date table, and the Date table is a calculated table that depends on the TODAY(), the calculated table and the calculated column(s) are inevitably recomputed as soon as you open the PBIX file.
- When the calculated column is recalculated, its reported dictionary size memory is at least 1MB.
- Note: The memory reported for a calculated column dictionary might always be greater than 1MB even immediately after opening a PBIX file.
Conclusion: the dictionary size reported could be bigger than the actual size of the dictionary. However, that number reflects the actual memory allocated for the dictionary.
Returns the current date in datetime format.
TODAY ( )