Articles   \  

Incremental Processing in Tabular Using Process Add



In Analysis Services 2012 you can process a table in a Tabular model by several ways: you can process the whole table, you can split the table in several partitions and process a single partition, you can merge partitions and you can incrementally process a single partition by using ProcessAdd, which is the topic of this article.

The ProcessAdd command allows you to add only a few rows to an existing partition and to do that you have to specify the query that reads data from the data source, applying the necessary WHERE condition in a SQL query or by using any other SQL statement to this purpose. The ProcessAdd command available in Process Partition(s) dialog box in SQL Server Management Studio (SSMS) does not allow you to specify a custom query for the process operation, in order to filter only new rows that have to be added to the partition. However, this is not a big issue: in fact, if you need to use ProcessAdd, probably you need to automate that command in a batch process. Thus, a programmatic approach is required. A XMLA script command is required and you will see how to programmatically obtain it by using AMO and PowerShell.

It is out of scope of this article describing how you should define the SQL command that only returns the new rows to be added to the table in a Tabular model. Remember that it is your responsibility avoiding duplicate rows in the destination table. There is no automatic detection of duplicates and if the table does not have unique columns you would obtain row duplicates in your table as a result, otherwise the process operation will stop with an error if a unique condition for a column is violated by loading new data.

Technically, when a ProcessAdd runs internally Analysis Services creates a new partition, process the whole partition and then merge it to the target partition (the one on which ProcessAdd command has been executed). You can use this same approach by using separate operations, but ProcessAdd can be more optimized for this specific activity.

In the following sections you will see how to execute and automate ProcessAdd by using different tools.

ProcessAdd with XMLA Script

In order to process a table partition in Tabular, you have to issue a process command to a partition of a measure group in the corresponding Multidimensional model that Analysis Services publish in order to make it queryable by any existing OLAP client tool. The Batch element contains a Process command, which specify the target partition / measure group / cube / database, followed by a Bindings element that replaces the existing query binding on the partition with a different query, which will be used just for this process command.

<Batch xmlns="http://schemas.microsoft.com/analysisservices/2003/engine">
    <Process>
        <Object>
            <DatabaseID>AdventureWorks Tabular Model SQL 2012</DatabaseID>
            <CubeID>Model</CubeID>
            <MeasureGroupID>Internet Sales_78de3956-70d9-429f-9857-c407f7902f1e</MeasureGroupID>
            <PartitionID>Internet Sales_797b5664-d3d8-441f-ab29-b3cc76cdc1ff</PartitionID>
        </Object>
        <Type>ProcessAdd</Type>
        <WriteBackTableCreation>UseExisting</WriteBackTableCreation>
    </Process>
    <Bindings>
        <Binding>
            <DatabaseID>AdventureWorks Tabular Model SQL 2012</DatabaseID>
            <CubeID>Model</CubeID>
            <MeasureGroupID>Internet Sales_78de3956-70d9-429f-9857-c407f7902f1e</MeasureGroupID>
            <PartitionID>Internet Sales_797b5664-d3d8-441f-ab29-b3cc76cdc1ff</PartitionID>
            <Source xsi:type="QueryBinding" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >
                <DataSourceID>d58c09be-3cc9-4688-8d08-026c407f5ad7</DataSourceID>
                <QueryDefinition>
                    SELECT * FROM FactInternetSales WHERE OrderDateKey &gt;= 20120215
                </QueryDefinition>
            </Source>
        </Binding>
    </Bindings>
</Batch>

The names that you can see in MeasureGroupID, PartitionID and DataSourceID elements contain a GUID in the name that has been automatically generated by the editor when you created the data model. In order to find the right values you have to investigate the object properties in SQL Server Management Studio, otherwise you have to use the AMO or PowerShell approach that are described later in this article, which automatically find the right name starting from the display name of the object.

The QueryDefinition element contains the query that will be used by the ProcessAdd operation. You can use any valid SQL statement here that will produce the same columns required by the partition that you are going to incrementally update. In the example a SELECT statement has been used, filtering all the orders having an order date greater than or equal to February 15th 2012. It is up to you to define a safe filter condition and to avoid loading the same rows multiple times (for example, with this filter condition if an order has a date of February 16th and the day after you filter orders greater than or equal to February 16th, that order row will be loaded twice).

ProcessAdd with Integration Services

The XMLA script described in the previous section can be generated by using the Analysis Services Processing Task component in Integration Services. Cathy Dumas described how to use this component in a step-by-step article in her blog. This is probably the simplest user interface that makes you able to obtain the corresponding XMLA Script for a Process Add. After you correctly defined the task in Integration Services, you can capture the XMLA Script command by running the task when a SQL Profiler trace is active on the Analysis Services server. The Command Begin event class contains in TextData property the XMLA batch command that you need. Getting XMLA in this way might be useful if you just need to replace a parameter in the SQL query that is specified in the QueryDefinition element of the XMLA script.

ProcessAdd with AMO

Using Analysis Management Objects (AMO) you can generate the same Process Add command you have seen in XMLA Script without worrying about internal GUIDs. You can read a description of the usage of AMO commands in a blog post written by Cathy Dumas. The XMLA script you have seen before in this article can be obtained and executed by using the following C# code (you can capture the XMLA Script by using the CaptureLog class and CaptureXML attribute).

namespace AmoAutomation {
    class Program {
        static void Main(string[] args) {
            Server server = new Server();
            server.Connect(@"localhostTABULAR");
            Database db = server.Databases["AdventureWorks Tabular Model SQL 2012"];
            DataSourceView dsv = db.DataSourceViews.GetByName("Sandbox");
            Cube cube = db.Cubes.GetByName("Model");
            MeasureGroup measureGroup = cube.MeasureGroups.GetByName("Internet Sales");
            Partition partition = measureGroup.Partitions.GetByName("Internet Sales");
            partition.Process(
                ProcessType.ProcessAdd,
                new QueryBinding(
                        dsv.DataSourceID,
                        "SELECT * FROM FactInternetSales WHERE OrderDateKey >= 20120215"));
            server.Disconnect();
        }
    }
}

ProcessAdd with PowerShell

Once you know how to create the desired process command with AMO, you can easily translate that code in a PowerShell script. For example, the AMO code you have seen in the previous section can be translated into the following PowerShell script:

[Reflection.Assembly]::LoadWithPartialName("Microsoft.AnalysisServices")
$server = New-Object Microsoft.AnalysisServices.Server
$server.connect("localhostK12")
$db = $server.Databases.Item("AdventureWorks Tabular Model SQL 2012")
$dsv = $db.DataSourceViews.GetByName("Sandbox")
$cube = $db.Cubes.GetByName("Model")
$measureGroup = $cube.MeasureGroups.GetByName("Internet Sales")
$partition = $measureGroup.Partitions.GetByName("Internet Sales")
$queryBinding = New-Object Microsoft.AnalysisServices.QueryBinding( $dsv.DataSourceID, "SELECT * FROM FactInternetSales WHERE OrderDateKey >= 20120215" )
$partition.Process( "ProcessAdd", $queryBinding )
$server.Disconnect()

Conclusion

Incremental processing of tables in a Tabular model is possible but it requires using commands on Multidimensional entities. This is because by now an API over the real Tabular model is not available. This article showed several ways to execute an incremental ProcessAdd command on a table in a Tabular model, providing a temporary query binding that identifies only the rows to load in the ProcessAdd batch, without changing underlying Tabular structure or views in SQL Server. You can use the same pattern, choosing the technique that better adapts to your periodic process batch operation.











 
Want to read more?