Monday, June 27, 2011

Under the Covers: MDX IF Statement

It is a well-known best practice in the MDX community to avoid run-time checks by choosing SCOPE over IIF function, and for the same reason, the IF statement. But what is the actual performance impact when you have no choice but to use IIF function or IF statement? I have described in details the block mode algorithm for IIF function in one of my previous blog posts. The other day, Teo Lachev asked whether he needed to worry about performance if he used IF statement in his cube script. In particular, Teo wanted to know if there was any performance impact on other calculations which kick in when the condition of IF statement is false. In my post about IIF function, I mentioned that IF statement is internally rewritten as IIF function calls. Today I am going to add a bit more details on how the rewriting is done and what limitations can prevent the rewriting from happening. What is described here applies equally to the CONDITION clause in a CREATE CELL CALCULATION statement since there is no internal difference between the two MDX features.  

Rewrite to IIF function calls
Let’s consider a simple case where two calculations apply to the same subspace S0, as shown in Figure 1. The calculation, Calc1, wrapped in IF statement has higher priority over the other calculation, Calc2. For example, Calc1 may be at a higher calculation pass than Calc2. When comparing two scope specifications, MDX calculation engine does not take into account conditions of IF statements although they appear to be part of the scope definitions. Instead, conditions in IF statements are evaluated at run-time.

In case you wonder why Calc1 and Calc2 have their own subspaces S1 and S2 even though the evaluation node already has a subspace S0, that’s because the scope of a calculation can be at lower granularity than the subspace of an evaluation node. For example, a query may ask for results at the year level, but there is a calculation at month level. In this case, the calculation is needed to answer the query but the subspace for the evaluation node, which is the same as the subspace of the query, and the subspace of the calculation will be at different granularities.

Figure 2 shows how the IF statement is translated to IIF function calls which are evaluated at run-time. DisjointTest is an internal function that takes as input a given cell and returns true if the cell is not covered by one of the higher priority calculations. In our example, when S1 and S2 are at the same granularity, DisjointTest degenerates into NOT Condition, which returns true when the condition of IF statement returns false. When S2 has lower granularity than S1, DisjointTest first finds a cell in S1 that covers the given cell in S2 and then evaluates NOT Condition in the context of the covering cell.

So unlike IIF(Condition, Calc1, Calc2), where the Condition is evaluated in one subspace, the condition of IF statement is replicated and evaluated in all subspaces of lower priority calculations which apply to the given evaluation node. Consequently, the condition of IF statement will be evaluated in many more cells than the original subspace, especially when the subspaces of lower priority calculations are at lower granularities.

The internal rewriting to IIF does not happen in S2 when Calc2 is a semi-additive measure, a unary operator, or a storage engine query. In those cases, Calc2 is evaluated in a larger subspace not constrained by the opposite condition of the IF statement. This is typically not a problem when Calc2 is to simply fetch data from the storage engine except for really large cubes when fetching the extra data requires a lot of disk IOs. In the other two cases, S2 becomes inexact since now Calc2 is evaluated in more cells than it should. An inexact subspace increases the chance of the MDX calculation engine choosing cell-by-cell mode when it builds the calculation subtree starting from S2.