Tuesday, May 31, 2011

Performance Considerations for Recursive Calculations in MDX (Part 2)

Last time I discussed MDX engine limitations in dealing with recursive calculations. Today I want to describe two more situations where users may run into bad performance when writing recursive calculations in MDX.

Pseudo Infinite Recursion

Last time I recommended users to avoid subspace overlap over the changing attribute of the recursion, in most cases, an attribute in the Date/Time dimension. That’s because MDX calculation engine may choose a cell-by-cell execution plan when it detects sideways recursion. But performance can get a lot worse than simply executing in cell-by-cell mode if MDX calculation engine falsely reports an infinite recursion even though it is actually a sideways recursion. When a calculation applies to two subspaces on the callstack, MDX calculation engine checks for potential infinite recursion. When the two subspaces have overlapping regions, MDX calculation engine checks to see if any cell in one subspace is mapped to itself in the other subspace. This requires MDX calculation engine to keep track of how each MDX expression on the callstack transforms one subspace into another. While there are extensive logic inside MDX formula engine to analyze most common MDX expressions, there are still some MDX expressions which are treated as a black box by the formula engine. Moreover, if one of the subspaces on the callstack is a single-cell subspace, MDX calculation engine switches to lazy execution mode for that expression, and treats the calculation in that subspace as opaque as well. During infinite recursion detection, an opaque calculation on the callstack would force MDX calculation engine to assume the worst and to raise an infinite recursion error even though it is not true.



Error handling is one of the most expensive operations in MDX calculation unless the error simply aborts the entire query. This is especially true in block mode. MDX calculation engine is not sure whether the error happens in one of the cells in the subspace or the error applies to the entire subspace. As a result, parent evaluation node which intercepts the error will abandon all intermediate results and start all over by falling back to cell-by-cell mode and recalculating cell values one by one. For the same reason, if an infinite recursion error is raised due to overlapping region and opaque MDX expression anywhere on the callstack, error handling logic at upper level would force a recomputation of cell values one at a time. This is helpful since if the infinite recursion is not real, it won’t happen again as soon as the changing attribute is reduced to a single member. However, the unwinding process happens one level at a time. If the recursion loop is long, recomputations of cell values at all intermediate levels are wasted effort, see Figure 2.



The false infinite recursion error is repeatedly raised until the beginning of the recursion loop is reached. Only when MDX calculation engine splits the subspace at the beginning of the recursion into individual cells will the false infinite recursion error stops being raised, as by now the beginning subspace would have a single member on the changing attribute and the ending subspace would have a different member on that attribute. To make matters worse, MDX calculation engine has a preference for block mode. Even though a parent level is forced into cell-by-cell mode to avoid false infinite recursion, a child level may still make the same mistake if somehow the subspace is enlarged by some calculation to include more members on the changing attribute. If you see a steady increase of performance counter MDX\Total Recomputes along with the creation of large numbers of evaluation nodes of various kinds, your query may have entered such a vicious cycle.



Impact on Calculation Caches

Aggressively caching intermediate results is one of the key reasons for MDX execution engine to deliver great query performance. MDX execution engine maintains a complex system of various types of caches for different purposes. When a MDX query requires heavy calculation, caches for evaluation nodes tend to play a decisive role in good query performance. An evaluation node is a subspace along with query plans built for all applicable calculations plus, optionally, data caches holding the calculation results. Unlike cached results of storage engine queries which correspond to leaf nodes in an MDX evaluation tree, formula engine evaluation nodes can be at much higher level or even be the root node of an evaluation tree. Hitting or missing an evaluation node at high level in the cache will make a dramatic difference in performance for calculation intensive queries.

MDX calculation engine keeps separate caches for cell-by-cell evaluation nodes and for bulk-mode evaluation nodes. In this section I am going to discuss the impact of recursive calculation on bulk-mode evaluation node caches.

The evaluation node cache mentioned above builds hash tables and indexes to facilitate insertions and lookups of cache entries. The hash function used by the hash table calculates hash values based on the group-by attributes of a subspace, plus some other information which is unimportant for the sake of this discussion. The hash table uses linked lists to implement separate chaining for collision resolution. Since every evaluation node ever created is inserted into the cache, the cache becomes crowded very fast. When a linked list becomes too long, evaluation nodes are evicted from the cache, as reflected by the performance counter MDX\Number of evictions of evaluation nodes.

Recursion can easily generate a large number of evaluation nodes. Starting from the original query subspace, the evaluation tree can grow quite big as recursion grows deeper and deeper. This in turn puts heavy pressure on evaluation node caches. To exacerbate the situation, the hash function mentioned previously skips group by attributes in parent child dimensions, as a result, subspaces with a lot of parent child dimensions have a much higher chance of hash collision. On the other hand, recursive calculations tend to show up in financial cubes which usually contain a lot of parent child dimensions. So recursion in a cube with a lot of parent child dimensions has a high chance of eviction of cached evaluation nodes and a low chance of hitting a previously built evaluation node.



You can increase the maximum length of all linked lists by increasing the values of private server configuration properties CalculationLRUMinSize and CalculationLRUMaxSize as the eviction threshold is dynamically calculated but always falls in the interval [CalculationLRUMinSize, CalculationLRUMaxSize]. Increasing the size of the linked lists only allows more evaluation nodes to stay in the cache. You also have to increase the value of private server configuration property CalculationCacheRegistryMaxIterations so that those entries are actually examined during cache lookup.

Summary

In this blog post, we explored two more scenarios in which the presence of recursive MDX calculations can negatively impact query performance. MDX calculation engine may raise false infinite recursion errors when the changing attribute has more than one member in the subspace and when there are a mix of block mode evaluation nodes and cell-by-cell mode evaluation nodes along the recursion path. Although not a definitive diagnosis, you can watch the performance counter Total recomputes to get an indication that unwarranted errors are causing the query slowdown. As recommended in my previous post, keeping a single member on the changing attribute will prevent this issue.

We often find recursive calculations in financial cubes, which also tend to have many parent child dimensions. The combination of the two is cache unfriendly for bulk mode evaluation nodes. Here is one way to help you identify this problem. Assume there is a recursive calculation based on the [Month] attribute. First issue a query to calculate the value in January. After the query finishes, issue a query to calculate the value in February. After that, issue a query to calculate the value in March, so on so forth. If each query comes back fast in this fashion of successive calculations but a query to calculate the value in December is slow when starting in cold cache mode, this is an indication that your recursive calculation can benefit from hitting previously cached results but the large number of evaluation nodes generated by a deep recursion is evicting good cached results. Sometimes increasing the maximum size of hash collision chain of the evaluation node cache may help.

Most of these problems arise because block mode evaluation can be improved in terms of infinite recursion detection and sideways recursion handling. While that may happen in a future release of Analysis Services, a potential workaround is to force a pure cell-by-cell mode for all calculations. SQL Server 2000 performed calculations in cell-by-cell mode only. Many cubes designed back in SQL Server 2000 days worked well enough in that mode with acceptable and predictable query performance. If none of my other recommendations work for you and you suspect that pure cell-by-cell mode may be what you need, you can contact Microsoft Customer Support and Services to explore such a possibility.