Wednesday, April 27, 2011

Performance Considerations for Recursive Calculations in MDX

The other day, while investigating a customer performance problem, Chris Webb came up with a sequence of very simple and targeted MDX queries against SQL Server 2008 Adventure Works database that clearly illustrate some of the pitfalls people may encounter when they write recursive MDX formulas. Today we explore how MDX formula engine generates execution plans for those queries and discuss some engine limitations in this area that may have big impact on query performance.

Chris Webb’s Queries

Chris defined a test measure, [rtest], that recursively counts the number of days from the current date back to 07/01/2001. His first query calculates [rtest] against all customers with a slice on date 07/31/2001. The query finished in about one second to return 118,484 cells.

Query #1

WITH MEMBER MEASURES.rtest AS
  IIF (
    [Date].[Date].CURRENTMEMBER IS [Date].[Date].&[20010701]
    , 1
    , 1 + (MEASURES.rtest, [Date].[Date].CURRENTMEMBER.PREVMEMBER)
  )
SELECT
  MEASURES.rtest ON 0,
  [Customer].[Customer].[Customer].MEMBERS ON 1
FROM [Adventure Works]
WHERE ([Date].[Calendar].[Date].&[20010731])

Next Chris trimmed down the number of customers to be slightly less than 50% of all customers and ran the query again. The query finished a little bit faster than the first one as expected.

Query #2

WITH MEMBER MEASURES.rtest AS
  IIF (
    [Date].[Date].CURRENTMEMBER IS [Date].[Date].&[20010701]
    , 1
    , 1 + (MEASURES.rtest, [Date].[Date].CURRENTMEMBER.PREVMEMBER)
  )
SELECT
  MEASURES.rtest ON 0,
  HEAD ([Customer].[Customer].[Customer].MEMBERS, 9241) ON 1
FROM [Adventure Works]
WHERE ([Date].[Calendar].[Date].&[20010731])

But adding one more customer to the second query suddenly increased query execution time to 20 seconds. The obvious question is what is so significant about querying 50% or more of total customers.

Query #3

WITH MEMBER MEASURES.rtest AS
  IIF (
    [Date].[Date].CURRENTMEMBER IS [Date].[Date].&[20010701]
    , 1
    , 1 + (MEASURES.rtest, [Date].[Date].CURRENTMEMBER.PREVMEMBER)
  )
SELECT
  MEASURES.rtest ON 0,
  HEAD ([Customer].[Customer].[Customer].MEMBERS, 9242) ON 1
FROM [Adventure Works]
WHERE ([Date].[Calendar].[Date].&[20010731])

Chris then tried to remedy the situation by cleverly adding a helper measure, [rtest2], that pre-calculates [rtest] in the range of desired dates in the hope that calculating values of later dates can hit caches of values of earlier dates. I omitted several intermediate queries that led Chris to this idea. When he tried out his idea against the problem query, execution time was cut down to about 12 seconds, an improvement from 20 seconds.

Query #4

WITH MEMBER MEASURES.rtest AS
  IIF (
    [Date].[Date].CURRENTMEMBER IS [Date].[Date].&[20010701]
    , 1
    , 1 + (MEASURES.rtest, [Date].[Date].CURRENTMEMBER.PREVMEMBER)
  )
MEMBER MEASURES.rtest2 AS
  IIF (
    ISEMPTY(
      SUM(
        [Date].[Date].&[20010701] : [Date].[Date].CURRENTMEMBER,
        MEASURES.rtest
      )
    )
    , null
    , MEASURES.rtest
  )
SELECT
  MEASURES.rtest2 ON 0,
  HEAD ([Customer].[Customer].[Customer].MEMBERS, 9242) ON 1
FROM [Adventure Works]
WHERE ([Date].[Calendar].[Date].&[20010731])

But the same trick didn’t help the original query, which now takes about 38 seconds to finish.

Query #5
WITH MEMBER MEASURES.rtest AS
  IIF (
    [Date].[Date].CURRENTMEMBER IS [Date].[Date].&[20010701]
    , 1
    , 1 + (MEASURES.rtest, [Date].[Date].CURRENTMEMBER.PREVMEMBER)
  )
MEMBER MEASURES.rtest2 AS
  IIF (
    ISEMPTY(
      SUM(
        [Date].[Date].&[20010701] : [Date].[Date].CURRENTMEMBER,
        MEASURES.rtest
      )
    )
    , null
    , MEASURES.rtest
  )
SELECT
  MEASURES.rtest2 ON 0,
  [Customer].[Customer].[Customer].MEMBERS ON 1
FROM [Adventure Works]
WHERE ([Date].[Calendar].[Date].&[20010731])

Block Mode vs. Cell-by-Cell Mode

As it turned out, queries #1 and #2 run fast because they execute in block mode. On the other hand, queries #3, #4, and #5 all execute in cell-by-cell mode. But how do I know that? Well, I admit that Analysis Services product team owes the MDX community good diagnostic features, such as MDX query plans, to easily identify such problems. Before such features become available, we have to make do with what we have today, namely PerfMon counters and SQL Profiler trace events. In this case, PerfMon counters can be very revealing. If you start PerfMon and add the counter MSAS 2008:MDX/Total flat cache inserts, rerun queries #2 and #3 with cache cleared, you will see that the counter stays at 0 during query #2 but jumps to 286,502 at the end of query #3. Flat cache is one of the many data caches maintained by MDX formula engine and is used to store single-cell calculation results. The large number of inserts into this cache during query #3 indicates that the query runs in cell-by-cell mode. I'd like to add that cell-by-cell mode can be reflected in other PerfMon counters under a different circumstance. Just because Total flat cache inserts is zero, does not necessarily mean query is in block mode.

Sideways Recursion and Inexact Subspace

Now we know adding one more customer to query #2 tips MDX formula engine over the edge into cell-by-cell mode, but what is so special about the threshold of 50% of members in an attribute set? When MDX formula engine constructs a query subspace, it does not always construct the exact subspace as specified by the MDXMDX formula engine has an easy way to indicate that all customers are included in the subspace. Later operations like determining overlapping regions between two subspaces or detecting whether a cell belongs to a subspace become fast and efficient. But the real benefit of this expansion is to fetch more fact data into detail data cache, so that following queries have a much higher chance of hitting a cache entry. An obvious downside of this expansion is the possibility of fetching too much data. If there are four years of data in the cube and a query selects two years, AS engine ends up retrieving all four years of data. A much more serious problem with this expansion is when there are calculations applicable to the subspace. A single extra member may introduce an unwanted calculation that kills performance. The 50% threshold is arbitrary and is controlled by private server configuration flag QueryOptimizerRatio. Now change the ratio from -1 to 0.7 and rerun query #3, the query finishes instantly. You should know that Microsoft does not support customers who temper with private server configuration settings without authorization from Microsoft Customer Service and Support.

So MDX formula engine expanded the subspace for query #3 to include all customers, the expanded subspace ended up being the same as the one in query #1, but why was query #3 so much slower than query #1? In addition to expanding the subspace in query #3, MDX formula engine also marks the query subspace as inexact for the obvious reason that the subspace contains more customers than requested. Inexact subspaces have several repercussions on building query plans. Loosely speaking, whenever MDX formula engine runs into a potentially expensive operation and when one of the parent subspaces is inexact, it is likely to fall back to cell-by-cell plan as the expensive operation may be unnecessarily introduced by the extra members added.

The next concept to explain is sideways recursion. A recursion happens when the same calculation shows up twice on the stack of current calculations. When the two subspaces to which the calculation is applied have the same granularity and there is no slice change from a regular member to a calculated member or vice visa, the recursion is called sideways recursion. So if one subspace is at All level on an attribute, but the other subspace is at leaf level, the recursion is not a sideways recursion.

When MDX formula engine detects a sideways recursion in the context of an inexact subspace, it chooses a cell-by-cell execution plan. There are a couple of reasons for making such a choice.  In addition to query execution, MDX formula engine is also responsible for detecting infinite recursions. Detecting infinite recursion in block mode is more complicated than in cell-by-cell mode. Even when a calculation shows up twice on the stack of current calculations and the two subspaces to which the calculation is applied have overlapping regions, it does not necessarily entail an infinite recursion. There have been cases where it takes a long time to detect an infinite recursion in block mode and it turns out that time is wasted as the subspace is enlarged through expansion. So currently MDX formula engine resorts to cell-by-cell mode when it encounters sideways recursion and when one of the parent subspaces is inexact. If you want to keep all subspaces exact throughout the execution process to overcome this constraint, you can change another private server configuration property, SpaceDecomposition, from default value of 8 to 9. You can try the new value and rerun query #3 again to see instant result. Why isn’t this default setting? Wouldn’t precise subspaces be a good thing all the time? The Analysis Services product team tried to make the switch during SQL 2008 development but found that precise subspaces hurt performance much more often than helping. Precise subspaces tend to carry a lot of large slices and often arbitrary-shaped slices. Arbitrary-shaped slices are a well-known reason of MDX query performance degradation. More importantly, precise subspaces make it hard to hit caches. Extensive caching of intermediate results is one of the secret recipes of good MDX query and calculation performance.

Recursion and Subspaces Overlapping on Shifted Attributes

Although sound promising in concept, queries #4 and #5 do not perform as well as queries #1 and #2 since they also run in cell-by-cell mode. But the subspace for query #5 is precise, why does it still enter cell-by-cell mode? The execution plan for query #5 enters cell-by-cell mode for a different reason.

Chris introduced the following helper expression

  SUM(
    [Date].[Date].&[20010701] : [Date].[Date].CURRENTMEMBER
    ,MEASURES.rtest
  )

in the hope that MDX calculation engine will calculate [rtest] on 07/01/2001 first, and then each following day will hit the cached result of the previous day. But MDX formula engine always tries to execute in block mode first. So when constructing the subspace for the evaluation of the scalar argument of SUM, the engine adds all dates from the set argument to the subspace. The diagram below shows the important subspaces constructed when executing query #5.


  
The steps below illustrate the import stages during query #5 evaluation.

Construct subspace 1 from MDX query.
Build query plan for calculation rtest2 in subspace 1.
        Build query plan for sub-expression Sum(07/01/2001 : 07/31/2001, rtest)
        Construct subspace 2 by adding MDX set 07/01/2001 : 07/31/2001 to subspace 1.
        Build query plan for calculation rtest in subspace 2.
                Build query plan for sub-expression (rtest, Date.PrevMember).
                Construct subspace 3 by shifting all dates in subspace 2 to the previous date.
                Build query plan for calculation rtest in subspace 3.
                Detect recursion and overlapping dates. Choose cell-by-cell plan.

When MDX formula engine detects recursion and the two subspaces have overlapping regions, it is potentially an infinite recursion. But the overlapping dates have been shifted, so whether or not there is infinite recursion depends on the MDX operation that performed the shifting. It is obvious in this case there is no infinite recursion. But MDX formula engine has to handle the general case when there are a series of shifting operations between the two subspaces. To play it safe, the engine chooses cell-by-cell execution plan again.

Why is choosing cell-by-cell mode bad for performance in this case? Doesn’t the recursive formula force the engine to evaluate [rtest] one day at a time anyway? That’s true. But there are also 118,484 customers in the subspace. Right now cell-by-cell is a global decision. While going cell-by-cell over days is not a bad thing in this case, going over customers one by one is. Ideally, we want the formula engine to go block on [Customer] attribute, but cell-by-cell on [Date] attribute.

Another downside of choosing cell-by-cell mode in subspace 3 is that the decision will propagate back to all parent calculations as well. MDX formula engine cannot take advantage of sparsity of underlying data when calculation is in cell-by-cell mode. Since building execution plan for [rtest] in subspace 3 is a part of building the overall execution plan for [rtest2] in subspace 1, choosing cell-by-cell plan in subspace 3 ends up forcing a bad plan for [rtest2] in subspace 1 as well.

We don’t have this problem in queries #1 and #2 since there is only a single date in the subspace. When constructing a new subspace, [Date].PrevMember shifts the current date to the previous date. The newly constructed subspace has a different date from all previously constructed subspaces, therefore, there is no risk of infinite recursion. As a result, the execution plan stays in block mode.

Deep Recursion and CLR Assembly

The next topic is unrelated to Chris Webb’s queries but still important for recursions in MDX. Each additional recursive step consumes more space on the stack. A very deep recursion will eventually use up all space on the stack. When that happens, MDX formula engine creates a fiber and switches execution on to the new fiber. But it is not always safe to call managed code from within a fiber since you may get stack overrun problem. In SQL Server 2008 R2, MDX formula engine raises an error when this happens. But the error does not abort the current query to return to user immediately. Most errors cause MDX formula engine to switch to cell-by-cell mode in case some cells contain errors but others don’t. So instead of seeing a query fail quickly, you may end up waiting a long time for it to fail eventually. Note that VBA library is registered as a CLR assembly hosted by Analysis Services, so calling VBA functions in MDX recursion may trigger the fiber exception and cause performance problems. Also note that some VBA functions have been implemented directly in Analysis Services, therefore are safe to use inside recursion. Setting private server configuration flag AllowCLRStoredProcedureCallsInFiberMode to 1 would prevent the exception from being raised.

Conclusions

So far, we have discussed three important performance limitations imposed by MDX formula engine when it encounters recursions. We have learned that to achieve better performance when writing recursive MDX formulas, you should:

  1. Don’t write recursive formulas unless really needed. Non-recursive MDX formulas are not subject to the limitations discussed in this post.
  2. Keep the subspaces precise. Try to stay with queries that produce nice, rectangular-shaped query spaces which don’t trigger the 50% rule. Playing with SpaceDecomposition is an option but you need approval from Microsoft Customer Service and Support to use it in production environment. Note that always enforcing precise subspaces is likely to have an overall negative impact on query performance.
  3. Avoid subspace overlap over shifted attributes. Most recursions happen along the time line. Keeping a single date in the subspace is one way to guarantee that subspaces don’t overlap.
  4. Avoid calling into CLR assembly when recursion is very deep.
As always, performance advices apply to particular versions of a product. The recommendations given in this post apply to all currently supported versions of Analysis Services with the latest one being 2008 R2. The Analysis Services product team will continue to evolve the product and some of the limitations are likely to be lifted in future releases. When you test your recursive calculation, you should start at the beginning and gradually increase recursion depth. Observe whether query time increases at worst linearly as recursion depth increases. Make sure you test the case when the recursion is the deepest as needed by the user.