Introduction to some FE concepts
Before I delve into IIF function, I first need to introduce some basic concepts used by MDX FE. I admit that these concepts can be very complex and are worth several dedicated blog posts. The cursory description here can only give you some general ideas to help you understand execution plans for IIF. You can skip this section directly to the discussion about IIF plans and come back here for references when you need to.
- Rectangular subspace vs. arbitrary-shaped subspace
- Cell-by-cell mode vs. block mode
- Logical plan
- Varying attributes
- Sparse expression vs. dense expression
- Default value
Execution plans for IIF function
Now we are ready to explore the execution plans of IIF. Implementation of IIF underwent a major overhaul in SQL Server 2008. In SQL Server 2005, there is limited support of block mode for most MDX functions, therefore IIF branches are more often evaluated in cell-by-cell mode. When SQL Server 2005 is able to evaluate IIF in block mode, it evaluates both branches in the original subspace with some special flag to indicate that the subspace may be too large, see eager mode below.
In this post I focus on the algorithm adopted by SQL Server 2008 and above. As a first step, FE transforms a single IIF function call into nested IIF function calls by eliminating AND operators and OR operators in the predicate expression. Notice that one branch expression is duplicated as a result of this rewrite.
IIF(a AND b, x, y) => IIF(a, IIF(b, x, y), y)
IIF(a OR b, x, y) => IIF(a, x, IIF(b, x, y))
Next, FE builds a logical plan for the simplified predicate expression. Note that FE may execute some subtrees while building a logical plan tree, this is a big topic which I have to discuss in a seperate blog post. After FE has built a logical plan tree for the predicate expression, FE performs constant folding by checking whether the simplified predicate is constant in the current subspace, and if so, reducing IIF function to one of its branch expressions.
The next step is to build logical plans for the then branch expression and the else branch expression. FE can evaluate each branch expression in one of two modes: eager mode and strict mode. As mentioned before, the IIF expression is evaluated in a subspace, let's denote it as S0. FE can either evaluate the then branch expression in S0 or FE can build a new subspace, S1, by restricting S0 to those cells that the predicate is true. Similarly, FE can either evaluate the else branch expression in S0 or FE can build another new subspace, S2, by restricting S0 to those cells that the predicate is false. If FE evaluates a branch expression in the original subspace S0, we call it in eager mode. If FE evaluates a branch expression in the restricted subspace S1 or S2, we call it in strict mode.
Strict mode has the obvious benefit of reducing the number of cells an expression is evaluated, so why doesn't FE choose it all the time? As it turns out, constructing a new subspace by evaluating the predicate will produce an arbitrary-shaped subspace if the varying attributes of the predicate expression contain more than one attribute. In general, arbitrary-shaped subspace can lead to bad performance due to various reasons such as cache misses. Moreover, if a branch expression is simple enough, like when it is a constant value, it doesn't make sense to build a new subspace at all.
MDX gives user direct control over whether a branch is executed in eager mode or strict mode through branch hints. For example, IIF(p, x HINT EAGER, y HINT STRICT) tells FE to build logical plan in eager mode for the then branch and in strict mode for the else branch.
When there is no direct hint from user, FE takes several steps to come up with a final logical plan for each branch expression. It first tries to build a logical plan for each branch in eager mode if it is cheap. It is always cheap to build a plan for a constant value expression. When branch expressions are more complex, FE looks at the varying attributes of the predicate logical plan. If the varying attributes contains more than one attribute, hence will produce an arbitrary-shaped subspace which makes an eager plan more appealing, FE first tries to build an eager logical plan tree for a branch expression if it can be cheaply done, in other words, if it can build a plan tree without executing any subtree. In the cheap mode, if at any point in the process of building a logical plan tree for a branch expression, FE finds out that it needs to execute a subtree before it can proceed, FE will abandon the process and roll back all the way to the root of the current tree. I have seen bad performance caused by this trial-and-error process when a deeply nested IIF tree recursively tried to build a cheap plan, aborted the effort, built a real plan, and then repeated this process at the next nesting level. FE may abandon a cheaply built plan when the predicate expression is sparse and a branch with a cheaply built eager plan is on the minority side. For example, if the predicate is mostly true, the else branch is on the minority side. If a branch is on the minority side and if it is dense or if its default value doesn't match the default value of the IIF expression, its cheaply built eager plan will be abandoned.
At this stage, if FE still doesn't have a logical plan for a branch expression, FE will build a logical plan for that branch in strict mode. It first produces a new subspace for the branch by actually executing the predicate expression, producing a materialized list of tuples of the varying attributes of the predicate that satisfy the condition for that branch, and then applying the list of tuples to the original subspace as a new constraint. FE then generates a logical plan of the branch expression in the new subspace.
As you can see, MDX FE employes a rule-based approach when building execution plans for IIF function. FE makes decisions based on sparsity and complexity of sub-expressons. Since sparisity of MDX expression is based on heuristics rather than statistics, FE may make the wrong decision which is why we need user hints.
IIF functions are often deeply nested, although this may not be obvious at first glance. As mentioned earlier in the post, AND and OR operators in IIF predicate are rewritten into nested IIF calls. Searched CASE statement is internally rewritten as nested IIF calls as well. Moreover, MDX script IF statement is also internally mapped to an IIF function call. All these internal tree rewrites make deeply nested IIF trees a common phenomenon in many cube.
Although MDX FE employes a complex algorithm to build a final plan for the IIF function, we can still learn some basic lessons that may help us understand performance issues associated with the function.
- Strict mode helps when a branch expression is very expensive. It can be a lot of extra data has to be read from disk for those extra cells in eager mode. It can also be a very expensive MDX scoped calculation has to be included for one unwanted cell in eager mode. Try plan HINT STRICT if you think AS fetches too much data from disk as a result of evaluating a branch in eager mode.
- All the trial-and-errors when building the eager plan can be a source of bad performance when the same IIF function is evaluated many times, typically since the parent function chooses cell-by-cell mode, or when deeply nested IIF calls recursively try eager mode and then fall back to strict mode. Adding plan hints in this case will force FE to make up its mind right away.
- Order of predicates in nested IIF calls can be important. A wrong condition at the top level may produce an expensive arbitrary-shaped subspace which makes it expensive to evaluate a child subtree. Or a right condition at the top may save a lot of time by eliminating the need to evaluate a more expensive condition at lower level. Keep in mind that AND and OR operators are rewritten as nested IIF so the order in AND and OR operators is important for the same reason.
- Sometimes cell-by-cell mode can be fast especially when a top level function is in cell-by-cell mode. I have seen cases where MDX FE spends a lot of time building plans for a branch expression but never execute the plans as the IIF condition was never met. When the IIF function itself is called many times since its parent is in cell-by-cell mode, all the time FE spent on building unused plans is wasted. Here you may try an undocumented HINT LAZY which applies to all MDX expressions but only takes effect when a server config flag LazyEnabled is set to 1. Using this hint will force FE to delay evaluating the MDX expression until execution time. No logical plan is built by traversing the expression tree. WARNING: You will not get support from Microsoft if you use this hint on your own. You must get approval from Microsoft Customer Service and Support if you want to use this hint in your production system.
There are two types of CASE statements: simple case statement and search case statement.
Search case statement is internally rewritten as nested IIF function calls in the following way:
WHEN p1 THEN e1
WHEN p2 THEN e2
WHEN pN THEN eN
IIF (p1, e1, IIF(p2, e2, ... IIF(pN, eN, eL)...))
After the rewrite, FE executes all IIF functions according to the algorithm outlined in the previous section.
SQL Server 2008 SP2 CTP1 introduced a hint for search case statement. The syntax of hint for CASE is:
CASE HINT <EAGER|STRICT>
WHEN ... THEN ...
Internally, the provided hint is propagated to all branches of all IIF functions generated during the rewrite. Afterwards, FE will use the hint in the same way as hints provided directly to an explicit IIF function.
The simple case statement has an execution plan similar to that of IIF function. The main difference is that IIF function only has two branches while simple case statement has N branches. As of this writing, users cannot specify eager or strict hints to simple case statements.
In summary, since MDX IIF function and CASE statement are frequently a source of performance problems, using plan hints may help you solve your problem. Note that plan hints for IIF function are only available in SQL Server 2008 and later and plan hints for search case statement are introduced after SQL Server 2008 SP2 CTP1. I have to admit trying out plan hints is like shooting in the dark due to the lack of end user tools to display MDX execution plan trees.