To illustrate the problem, I am going to run a series of MDX
queries against the AdventureWorks sample database
for Microsoft SQL Server 2008R2. In between queries, I send ClearCache commands
to eliminate inter-query interference. For each MDX query, we are going to
observe the sequence of SE queries through the Query Subcube Verbose profiler trace events. Instead of showing the
complete output of Query Subcube Verbose
events, I am going to extract only those attributes in the [Date] dimension
which are relevant to our discussion. As well known in the MDX community, Analysis
Services engine internally uses subcubes to represent queries, therefore I will
use the terms query and subcube interchangeably throughout the post.
Now, let’s run our first MDX query to retrieve the values of
measure [Internet Sales Amount] for all days in January 2007.
Query 1:
// January
2007
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&[2007],
[Date].[Calendar].[Date]), 31) on 1
from [Adventure Works]
Since measure [Internet Sales Amount] is defined as a
measure expression, there are three Query
Subcube Verbose events, one for the [Exchange Rates] measure group, two for
the [Internet Sales] measure group, although only one of them hits the
partitions, the other is just an artifact of measure expression evaluation.
Here are the attributes corresponding to levels in the
[Calendar] hierarchy extracted from the trace events:
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0
0 20 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1
CY 2007]
[Calendar Semester]:[H1
CY 2007]
[Month Name]:[January
2007]
[Calendar Year]:[CY
2007]
The SE query asks for all days in January 2007, which match
exactly the days listed on Axis 1 of the MDX query, with the added benefit of
not listing each day individually. There is no prefetch query in this case
which makes sense as the original query is asking for all days in January 2007
that matches exactly the SE query.
Now let’s add one more day to the query.
Query 2:
// January
2007 + February 1 2007
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&[2007],
[Date].[Calendar].[Date]), 32) on 1
from [Adventure Works]
From the
captured profiler trace below
You can
extract the following information form the three queries sent to the [Internet
Sales] measure group:
Non-cache data
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0
0 20 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1
CY 2007]
[Calendar Semester]:[H1
CY 2007]
[Month Name]:[January
2007]
[Calendar Year]:[CY
2007]
Non-cache data
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0
0 21 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1
CY 2007]
[Calendar Semester]:[H1
CY 2007]
[Month Name]:[February
2007]
[Calendar Year]:[CY
2007]
Cache data
Dimension 7 [Date] (0 582 8 0 6 0 0 0 0
0 0 21 4 0 0 0 0 0 0 0 0)
[Date]:[February
1, 2007]
[Calendar Quarter]:[Q1
CY 2007]
[Calendar Semester]:[H1
CY 2007]
[Month Name]:[February
2007]
[Calendar Year]:[CY
2007]
Again there is no prefetch query for January 2007 but FE
does prefetch all days in February 2007 first before getting the value of
February 1, 2007 from cached data.
Now let’s add one more day again to the query.
Query 3:
// January
2007 + February 1 2007 + February 2 2007
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&[2007],
[Date].[Calendar].[Date]), 33) on 1
from [Adventure Works]
Below is the profiler trace events and relevant information
extracted from them:
Non-cache data:
Dimension 7 [Date] (0 * * 0 * 0 0 0 0 0
0 * * 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:*
[Calendar Semester]:*
[Month Name]:*
[Calendar Year]:*
Cache data:
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0
0 20 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1
CY 2007]
[Calendar Semester]:[H1
CY 2007]
[Month Name]:[January
2007]
[Calendar Year]:[CY
2007]
Cache data:
Dimension 7 [Date] (0 + * 0 * 0 0 0 0 0
0 * * 0 0 0 0 0 0 0 0)
[Date]:+
[Calendar Quarter]:*
[Calendar Semester]:*
[Month Name]:*
[Calendar Year]:*
One EventSubclass = Non-Cache data query prefetches fact data into SE cache which is
used to answer the subsequent two SE queries shown as EventSubclass =
Cache data, one for all days in January 2007, the other for the first two
days of February 2007. But why did the prefetch query ask for data for all
dates? Why did we lose all slices in the [Date] dimension? This is really bad
as it would have caused all partitions to be scanned. The answer lies in the
imprecise nature of the algorithm used to construct the prefetch subcube.
In order to
build the prefetch SE subcube, FE keeps track of all relevant MDX sets in a stack-like
data structure called Sonar registry. Later on each MDX set in the Sonar
registry is used to build a collection of Sonar subcubes. Perfmon
counter MDX\Total Sonar subcubes
displays the total number of Sonar subcubes generated inside FE.
By default,
the conversion from an MDX set to its collection of Sonar subcubes is not a
precise process. The resulting Sonar subcubes will cover the original MDX set
but may include additional members on some attributes. The Cache Ratio connection string property controls how a Sonar subcube
attribute is expanded to include more members than what the original set
contains. The actual algorithm is very complex. I am only going to show here a
greatly simplified version that illustrates the main idea of the algorithm.
Algorithm 1: Build Sonar subcubes from an
MDX set
Input: an
MDX set S, for simplicity, assume S is single-grained
For each hierarchy H in S
Since S is single-grained, let Ls be the
level of H where members of S reside
For each level L in H starting from the
topmost level down to Ls
Identify all members on L with good coverage of S
A member M has good coverage of S
if the ratio of the number of its descendants in S to the number of all its descendants on level Ls is greater
than or equal to the Cache Ratio
Add M to the list of slicers on level L
For each level with a nonempty list of
slicers
Create a Sonar subcube with granularities
on attributes corresponding to the levels of H from the topmost level down to Ls
Apply the list of slicers to the
subcube
Output: the collection of generated Sonar subcubes
First, apply
Algorithm 1 to the [Calendar] set in
Query 1
head(descendants([Date].[Calendar].[Calendar
Year].&[2007], [Date].[Calendar].[Date]), 31)
you get the following sequence of slices encoded in internal
data IDs and eventually FE finds a single good cover slice on the month level.
Sonar [Calendar] [1 * * * * *] ratio:
31/1188
Sonar [Calendar] [1 4 * * * *] ratio: 31/365
Sonar [Calendar] [1 4 6 * * *] ratio: 31/181
Sonar [Calendar] [1 4 6 8 * *] ratio: 31/90
Sonar [Calendar] [1 4 6 8 20 *] ratio: 31/31 ß good cover
The
resulting Sonar subcube is
Sonar
Subcube 1
Calendar
Year
|
Calendar
Semester
|
Calendar
Quarter
|
Month Name
|
Date
|
2007
|
H1 CY 2007
|
Q1 CY 2007
|
January 2007
|
*
|
Here and in
the following I only show the important attributes in the Sonar subcube which
correspond to levels of the [Calendar] hierarchy. Next, apply Algorithm 1 to the [Calendar] set in Query 2
head(descendants([Date].[Calendar].[Calendar
Year].&[2007], [Date].[Calendar].[Date]), 32)
FE finds two
good cover slices, one on the month level and the other on the date level.
Sonar [Calendar] [1 * * * * *] ratio:
32/1188
Sonar [Calendar] [1 4 * * * *] ratio: 32/365
Sonar [Calendar] [1 4 6 * * *] ratio: 32/181
Sonar [Calendar] [1 4 6 8 * *] ratio: 32/90
Sonar [Calendar] [1 4 6 8 20 *] ratio: 31/31 ß good cover
Sonar [Calendar] [1 4 6 8 21 *] ratio: 1/28
Sonar [Calendar] [1 4 6 8 21 582] ratio: 1/1 ß good cover
The two
Sonar subcubes created are:
Sonar
Subcube 2.1
Calendar
Year
|
Calendar
Semester
|
Calendar
Quarter
|
Month Name
|
Date
|
2007
|
H1 CY 2007
|
Q1 CY 2007
|
January 2007
|
*
|
Sonar
Subcube 2.2
Calendar
Year
|
Calendar
Semester
|
Calendar
Quarter
|
Month Name
|
Date
|
2007
|
H1 CY 2007
|
Q1 CY 2007
|
February
2007
|
February 1,
2007
|
Finally,
apply Algorithm 1 to the [Calendar]
set in Query 3
head(descendants([Date].[Calendar].[Calendar
Year].&[2007], [Date].[Calendar].[Date]), 33)
FE finds
three good cover slices, one on the month level and two on the date level.
Sonar [Calendar] [1 * * * * *] ratio:
33/1188
Sonar [Calendar] [1 4 * * * *] ratio: 33/365
Sonar [Calendar] [1 4 6 * * *] ratio: 33/181
Sonar [Calendar] [1 4 6 8 * *] ratio: 33/90
Sonar [Calendar] [1 4 6 8 20 *] ratio: 31/31 ß good cover
Sonar [Calendar] [1 4 6 8 21 *] ratio: 2/28
Sonar [Calendar] [1 4 6 8 21 582] ratio: 1/1 ß good cover
Sonar [Calendar] [1 4 6 8 21 583] ratio: 1/1 ß good cover
Again, FE
created two Sonar subcubes.
Sonar
Subcube 3.1
Calendar
Year
|
Calendar
Semester
|
Calendar
Quarter
|
Month Name
|
Date
|
2007
|
H1 CY 2007
|
Q1 CY 2007
|
January 2007
|
*
|
Sonar
Subcube 3.2
Calendar
Year
|
Calendar Semester
|
Calendar
Quarter
|
Month Name
|
Date
|
*
|
*
|
*
|
*
|
+
|
Note that Sonar Subcube 2.2 retains slices on all
levels of the [Calendar] hierarchy while Sonar
Subcube 3.2 only has a slice on the lowest level of the [Calendar]
hierarchy. This is an artifact of the current implementation. When multiple
covering members of a hierarchy are added to a Sonar subcube, only a single
slice is created at the lowest level of the hierarchy.
FE uses
Sonar subcubes in several different places, including constructing a larger
evaluation node when FE is executing in cell-by-cell mode. In the simple case
where an MDX query can be answered directly from SE, Sonar subcubes are used to
construct SE queries.
Algorithm 2: Evaluate cell values
Input: a
cell iterator and the current Sonar registry
For each
cell C
Find the Sonar subcube S that contains C
Build an evaluation node E for S
Execute E. In the simplest case, create SE subcube SSE from S and
send SSE to SE.
Lookup value of C from cached result in E
Output: cell
values
Another
usage of Sonar subcubes is to build a prefetch subcube from a given SE query
subcube. The full algorithm is again very complex with exceptions and special
handling of arbitrary shapes or unary operators, etc. Here I include a
simplified version to illustrate the essence of the algorithm:
Algorithm 3: Build a prefetch subcube for a
query subcube
Input: a SE query
subcube Squery and the current
Sonar registry
Initialize
the prefetch subcube Sprefetch as
a clone of Squery,
For each Sonar subcube Ssonar in the
Sonar registry
For each attribute Aprefetch
in Sprefetch
Find the corresponding attribute Asonar in Ssonar
Skip Asonar if it is
not a granularity attribute
If Aprefetch is a
granularity attribute
If the slice of Asonar is a
superset of the slice of Aprefetch
Copy the slice of Asonar to Aprefetch
Else
If the slice of Asonar includes all members
Make Aprefetch a granularity attribute
Output: Sprefetch if it is different from Squery
In Query 1, Sonar Subcube 1 is equivalent to the SE query subcube therefore no
prefetch subcube is needed. In Query 2,
the first SE query subcube matches Sonar
Subcube 2.1 so again no prefetch subcube is needed. But the second SE query
would copy the * from [Date] in Sonar
Subcube 2.1 therefore produce a prefetch subcube that fetches all days in
February. The * in Sonar Subcube 2.1 really
means all days in January 2007 but Algorithm
3 treats each attribute independently and is oblivious to the fact that
[Date]:* is cross-filtered by slices on other attributes. But the end result of
prefetching all days in February 2007 isn’t too bad.
Prefetch
Subcube 1
Calendar
Year
|
Calendar
Semester
|
Calendar
Quarter
|
Month Name
|
Date
|
2007
|
H1 CY 2007
|
Q1 CY 2007
|
February
2007
|
*
|
But when we
come to Query 3, there is a * on
each attribute from either Sonar Subcube
3.1 or Sonar Subcube 3.2, we end
up with a prefetch subcube that has * on all its attributes.
Prefetch
Subcube 2
Calendar
Year
|
Calendar
Semester
|
Calendar
Quarter
|
Month Name
|
Date
|
*
|
*
|
*
|
*
|
*
|
Nowadays we
are seeing more and more implementation of systems with multiple terabytes of
data in Analysis Services. The above example of overly aggressive prefetching
of fact data is unacceptable in such environments. You can disable prefetching
altogether by setting connection string property Disable Prefetch Facts=true or you can play with the Cache Ratio connection string property to
influence the Sonar subcubes generated by FE. But either approach is likely to
have negative performance impact that would require a lot of testing to confirm
their overall benefit against a certain query set. If you can control the types
of queries in your deployment, it is better to replace queries that fall into
this pitfall with ones that produce good Sonar subcubes in the default settings.
Using our example, since we know that there are 90 days in the first quarter
and the default Cache Ratio is 0.5, we
can simply ask for 45 days instead of 33 days to produce a single good cover at
quarter level.
Query 4:
// January
2007 + February 1-14 2007
select [Internet Sales Amount] on 0,
head(descendants([Date].[Calendar].[Calendar Year].&[2007],
[Date].[Calendar].[Date]), 45) on 1
from [Adventure Works]
The captured
profiler trace confirmed our conjecture as it shows a single SE query to the
[Internet Sales] measure group to retrieve all days in the first quarter of
2007.
Dimension 7 [Date] (0 * 8 0 6 0 0 0 0 0
0 * 4 0 0 0 0 0 0 0 0)
[Date]:*
[Calendar Quarter]:[Q1
CY 2007]
[Calendar Semester]:[H1
CY 2007]
[Month Name]:*
[Calendar Year]:[CY
2007]