Sunday, March 4, 2012

DAX Query Plan, Part 3, Vertipaq Operators

The Vertipaq operators are an important subset of leaf-level operators which are responsible for preparing and sending queries to the Vertipaq Engine for execution and receiving query results. Table 1 is a list of all Vertipaq operators. You can see them in DAX Query Plan trace events by running the sample queries in column 3 against the tabular AdventureWorks database.

Table 1. Vertipaq Operators

Logical Operators
This RelLogOp is the foundation of all other Vertipaq logical operators. It represents a basic Vertipaq query that joins a root table with related tables following many-to-one relationships, determines which rows are retrieved by testing against Vertiscan predicates, and groups the resultset by output columns.
evaluate ‘Product’
This RelLogOp renames columns and adds rollup columns to a Vertipaq query.
evaluate summarize(‘Product’, rollup(‘Product Category’[Product Category Name], ‘Product’[Product Name]))
This RelLogOp adds a Verticalc predicate to a Vertipaq query.
evaluate filter('Product', right([Product Name], 4) = "Tire")
This ScaLogOp adds a SUM aggregation to a Vertipaq query.
evaluate row("x", sum('Internet Sales'[Sales Amount]))
This ScaLogOp adds a MIN aggregation to a Vertipaq query.
evaluate row("x", min('Internet Sales'[Sales Amount]))
This ScaLogOp adds a MAX aggregation to a Vertipaq query.
evaluate row("x", max('Internet Sales'[Sales Amount]))
This ScaLogOp adds a COUNT aggregation to a Vertipaq query.
evaluate row("x", countrows('Internet Sales'))
This ScaLogOp adds a DISTINCTCOUNT aggregation to a Vertipaq query.
evaluate row("x", distinctcount('Internet Sales'[Due Date]))
This ScaLogOp adds an AVERAGE aggregation to a Vertipaq query.
evaluate row("x", average('Internet Sales'[Sales Amount]))
This ScaLogOp adds a STDEV.S aggregation to a Vertipaq query.
evaluate row("x", stdev.s('Internet Sales'[Sales Amount]))
This ScaLogOp adds a STDEV.P aggregation to a Vertipaq query.
evaluate row("x", stdev.p('Internet Sales'[Sales Amount]))
This ScaLogOp adds a VAR.S aggregation to a Vertipaq query.
evaluate row("x", var.s('Internet Sales'[Sales Amount]))
This ScaLogOp adds a VAR.P aggregation to a Vertipaq query.
evaluate row("x", var.p('Internet Sales'[Sales Amount]))
Physical Operators

This IterPhyOp iterates over the resultset returned by a Vertipaq query.
You can find this operator in the physical plan tree after running any of the above queries.

The Vertipaq Engine runs the fastest when it executes a query in pure Vertiscan mode, which is when the query only contains simple aggregations and simple predicates. A simple aggregation aggregates a single column or counts rows of the table. A simple predicate, called Vertiscan predicate in Table 1, typically looks like [Column] = 5 or [Column] IN { 5, 6, 7, … } or ([Column1], [Column2]) IN { (5, 6), (7, 8), … } where the numbers are data IDs. Note that all column values are encoded as integer data IDs inside the Vertipaq Engine. The Vertipaq Engine extends its basic Vertiscan capabilities through its own calculation engine, called Verticalc. When aggregation expressions or predicate expressions become more complex, the Vertipaq Engine builds calculation trees to evaluate those expressions and can even call back to the DAX Formula Engine for help on those operators it doesn’t support natively. For example, when the aggregation expression is maxx('Internet Sales', year([Due Date])), the Vertipaq Engine calls back to the DAX Formula Engine to evaluate the Year function. Similarly, complex predicate expressions are also evaluated using the Verticalc technology therefore are called Verticalc predicates in Table 1. The Vertipaq Engine typically runs slower when it has to invoke Verticalc evaluations during query execution therefore the DAX Formula Engine converts filter predicates into Vertiscan slices whenever possible.

Ever since the MDX days, users have been complaining that the Formula Engine is single-threaded when it executes a single query. The problem is partially solved when the DAX Formula Engine pushes calculations down to the Vertipaq Engine. Not only is the Vertipaq Engine multi-threaded, it is also able to greatly reduce the number of calculations by taking advantage of large blocks of continuous rows with identical values for those columns referenced by the calculation. Therefore, the DAX Formula Engine tries very hard to push operations down to the Vertipaq Engine whenever possible and it is highly desirable to see mostly Vertipaq operators in DAX Query Plan events.

Caveats of Understanding Vertipaq Operators in DAX Query Plan Trees

As a brand new feature in Denali, the DAX Query Plan has plenty of room for future improvements. Here I am going to highlight three aspects of the current implementation which might confuse users who are trying to read DAX Query Plans for the first time.

1.      The display of Vertipaq operators is too closely tied to the underlying implementation, as a result, users might see multiple nested Vertipaq operators in a plan tree but only a single Vertipaq query is issued. For example, if you run Query 1, you will see four Vertipaq operators, Sum_Vertipaq, Filter_Vertipaq, Filter_Vertipaq, Scan_Vertipaq, in the DAX Query Plan/DAX Vertipaq Logical Plan event, as shown in Figure 1, but only a single Vertipaq SE Query Begin/Vertipaq Scan event. As you can see in Figure 1, a Scan_Vertipaq operator serves as the foundation for all other Vertipaq operators, each aggregation or Verticalc filter has its own operator, and the cascading operators are displayed in a chain of parent-child relationships even though aggregations and Verticalc filters are eventually folded into a single Vertipaq query.

// Query 1
evaluate row("x",
                                    'Internet Sales',
                                    Related(Product[List Price]) < 10
                                    && Related(Customer[Yearly Income]) < 50000
                        [Sales Amount]

2.      As mentioned in the second installment of the DAX Query Plan blog series, spools are not first-class citizens in Denali DAX Query Plans, but users can still detect the presence of spools in Denali DAX Query Plan trees indirectly through the presence of physical operators which sit directly on top of spools such as the Spool LookupPhyOp or the Spool_IterOnly, Spool_LookupOnly, Spool_SliceIndex IterPhyOps. Let’s call them spool operators in this post to make it easier to refer to them.  If you run Query 2 and examine the physical plan tree shown in Figure 2, you can see a Spool_IterOnly operator with a child VertipaqResult operator. Since the real child operator of Spool_IterOnly is a spool, what’s with the VertipaqResult IterPhyOp? As it turns out, some iterators supply the rows needed to fill a spool when it is materialized and DAX Query Plan shows iterator subtrees which are used to populate the spool as child operators of the spool operator. In Denali, a spool is always constructed to receive the resultset of a Vertipaq query, hence VertipaqResult operator is always a child of a spool operator. An important property of a spool operator is #Records, highlighted in Figure 2, which tells you how many rows of data are in the underlying spool and is currently the most important property to help identify performance problems of a query. When VertipaqResult is the sourcing iterator, this property tells you how many records are returned by the Vertipaq Engine.

// Query 2
evaluate values(Product[List Price])

3.      As stated earlier, Verticalc evaluations may call back to the DAX Formula Engine for unsupported logical operators which in turn would construct the corresponding physical operator trees just as it would when the calculation happens entirely in the Formula Engine. Sometimes the callback functions can be very expensive themselves but those physical operators are not shown in the plan tree. If you run Query 3 you will see in the Vertipaq SE Query Begin event, shown in Figure 3, that the Vertipaq Engine calls back to the DAX Formula Engine for its help with the Year function, but the corresponding physical operator does not show up in the DAX Query Plan/DAX Vertipaq Physical Plan event, shown in Figure 4.

// Query 3
evaluate row("x", maxx('Internet Sales', year([Due Date])))

Special Properties of Scan_Vertipaq RelLogOp

As you have seen in the previous examples, the Scan_Vertipaq operator is at the core of every Vertipaq query and has several special properties worth mentioning here. To show you all the properties of Scan_Vertipaq, run Query 4 and then look at its logical plan.

// Query 4
define measure 'Internet Sales'[Total Sales Amount] = Sum([Sales Amount])
                                    crossjoin(values('Date'[Month]), distinct('Product Category'[Product Category Name])),
                                    calculate([Total Sales Amount],
                        'Date'[Calendar Year] = 2003

Below is the longest line I extracted from the logical plan with operator specific properties in bold face.

Scan_Vertipaq: RelLogOp DependOnCols(1, 2)('Date'[Month], 'Product Category'[Product Category Name]) 4-141 RequiredCols(1, 2, 133)('Date'[Month], 'Product Category'[Product Category Name], 'Internet Sales'[Sales Amount]) Table='Internet Sales_78de3956-70d9-429f-9857-c407f7902f1e' -BlankRow JoinCols(2)('Product Category'[Product Category Name]) SemijoinCols(3)('Date'[Month])

As you can see, in addition to the common properties DependOnCols, range of column numbers, and RequiredCols, Scan_Vertipaq also has a couple of extra properties:

·        Table

Displays the internal ID of the root table in Denali. I think this should change to user friendly table name in the future.

·        Whether the blank row is requested

DAX introduced an optional blank row to an IMBI table in order to deal with any incoming referential integrity violations. When users want to include this blank row in the resultset of a Vertipaq query, query plan shows +BlankRow; when users don’t want to include the blank row in the resultset, query plan displays –BlankRow. In Query 4 I deliberately used both the Values function and the Distinct function inside the CrossJoin function to demonstrate the difference, see Figure 5 which was an excerpt from the logical plan.

·        JoinCols

The columns from this table which are needed for natural join with other tables. JoinCols are a subset of DependOnCols, the latter is actually a union of the former and the DependOnCols of all semi-join filter operators.

·        SemijoinCols

The columns from this table which are needed for natural semi-join with filter tables. The DAX Formula Engine simplifies the logical operator tree by converting most explicit semi-join filters to Vertiscan slices or Verticalc slices so that the filtering operations happen inside the Vertipaq Engine, and since DAX Query Plan/DAX Vertipaq Logical Plan events are fired after the simplification stage, users typically don’t see the SemijoinCols property. But when SemijoinCols do show up in the plan, the DAX Formula Engine may have to fetch more columns back, join with the filter tables, and then remove unwanted SemijoinCols by grouping on the desired output columns. When this happens, the Vertipaq operator can be quite expensive as a lot of post-processing happens in the Formula Engine before the final resultset can be returned.

Understand Performance Differences between Equivalent Queries

To conclude today’s post, let’s run two queries which are written in different ways but return the same results and use DAX Query Plans to figure out why they perform differently. Both Query 5 and Query 6 use the AddColumns function to simulate adding a calculated column to the ‘Date’ table. The calculated column calculates the sum of [Sales Amount] after filters the ‘Internet Sales’ table based on a predicate that depends on a column value from the current row. While it is natural to write the calculation as sum of filter as done in Query 5, to get much better performance, you should split the sum and the filter into separate functions and then calculate the sum by adding the filter to the filter context, see Query 6. Compare their logical plans, shown in Figure 6 and Figure 7 respectively, you can see that Query 5 executes both the SumX function and the Filter function in the Formula Engine but Query 6 only executes the Filter function in the Formula Engine but pushes the SumX down to the Vertipaq Engine through the Sum_Vertipaq operator. Since today we have learned that it is always good if calculations can be pushed down to the Vertipaq Engine, Query 6 runs a lot faster than Query 5. So write your DAX expressions to take advantage of Vertipaq operators.

// Query 5
                                    filter('Internet Sales', [Order Date] <= [Date]),
                                    [Sales Amount]

// Query 6
                                    sum('Internet Sales'[Sales Amount]),
                                    'Internet Sales'[Order Date] <= earlier([Date]),


  1. Jeffrey, in the last example using SUMX( 'Internet Sales', 'Internet Sales'[Quantity] * 'Internet Sales'[Unit Price] ) instead of SUM( 'Internet Sales'[Sales Amount] ) you obtain the same query plan - this can be deduced by looking at the query plan. I think that highlighting this makes it clear that is the different usage of SUMX (the second is inside a CALCULATE) that determines its performance and it is not a simple consequence of using SUM instead of SUMX.

  2. Thanks for pointing this out. I have been doing this for so long that I automatically thought of Sum as just a syntax sugar for SumX, but you are right some users may think otherwise.

  3. Hi Jeffrey,
    with the introduction of DAXMD it also seems that some new Operators have been added to the engine like MDXDimensionQuery and MDXMeasure[MyMeasure]
    are you also going to blog about the internals of those new operators in an upcoming post?
    would be great!


  4. Lets say I create a pivot table with a simple equals page filter and another column on rows, would that result in one Scan_Vertipaq operation+a sum_vertipaq after another column is added to the values area?

  5. During the process of adding fields to a pivot table, several MDX queries will be sent from Excel to Analysis Services engine, each MDX query will result in a couple of vertipaq queries, some are basic table scans, others include aggregations like sum. The best way to is watch those events in SQL Server Profiler to see exactly which MDX queries generate which vertipaq queries.

  6. Very nice post here thanks for it .I always like and such a super contents of these post.Excellent and very cool
    idea and great content of different kinds of the valuable information's.
    seo company in chennai

  7. This comment has been removed by the author.

  8. I feel satisfied to read your blog, you have been delivering a useful & unique information to our vision even you have explained the concept as deep clean without having any uncertainty, keep blogging. Power BI Online

  9. myTectra Placement Portal is a Web based portal brings Potentials Employers and myTectra Candidates on a common platform for placement assistance

  10. Nice post and thanks for sharing with us...keep blogging......
    click here

  11. it’s really nice and meanful. it’s really cool blog. Linking is very useful have really helped lots of people who visit blog and provide them usefull information.
    MS Power BI Online Training

  12. Thanks for such a great article here. I was searching for something like this for quite a long time and at last I’ve found it on your blog. It was definitely interesting for me to read about their market situation nowadays. Well written article of mdx Thank You for Sharing with Us pmp training fee | pmp certification course in chennai | best pmp training institute in chennai| | pmp training class in chennai \ pmp training fee

  13. Thanks for sharing your information. Great efforts put it to find it which is really amazing. It is very useful to know, Definitely will share the same to other forums.
    openstack training in chennai omr | openstack training in chennai velachery | openstack certification training in Chennai | openstack training in chennai

  14. Thanks for such a great article here. I was searching for something like this for quite a long time and at last, I’ve found it on your blog. It was definitely interesting for me to read about their market situation nowadays.angularjs best training center in chennai | angularjs training in velachery | angularjs training in chennai | best angularjs training institute in chennai

  15. I like your post very much. It is very much useful for my research. I hope you to share more info about this. Keep posting!!Best Devops Training Institute

  16. Thanks For Sharing The Information The Information Shared Is Very Valuable Please Keep Updating Us Time Just Went On Reading The article Python Online Course Hadoop Online Course Aws Online Course Data Science Online Course

  17. Thanks For Sharing information
    Yaaron Studios is one of the rapidly growing editing studios in Hyderabad. We are the best Video Editing services in Hyderabad. We provides best graphic works like logo reveals, corporate presentation Etc. And also we gives the best Outdoor/Indoor shoots and Ad Making services.
    Best video editing services in Hyderabad,ameerpet
    Best Graphic Designing services in Hyderabad,ameerpet­
    Best Ad Making services in Hyderabad,ameerpet­

  18. I Got Job in my dream company with decent 12 Lacks Per Annum salary, I have learned this world most demanding course out there in the current IT Market from the data science training institute in btm experts who helped me a lot to achieve my dreams comes true. Really worth trying

  19. This comment has been removed by the author.

  20. Thank you for sharing such valuable information.Good job.keep it up.Keep writing.
    machine learning institute in btm layout

  21. Vihaan Kumar’s character helps his elder brother Bala ( Ayushmann Khurrana) for applying all his unique treatment ideas like Eggs etc. Vihaan Kumar is continuously commanded by his mother to help his elder brother ( Ayushmann Khurrana) with unique, and ludicrous ‘hair-growing ‘ methods. And because he is the younger sibling, he never gets to say no – until finally, one nuskha (involving cow dung and bull semen) thrusts him to reject and fume.

  22. As Kumar Vihaan, there is nothing to be concerned about scaling because Ethereum Scale itself to be able to handle the increased transaction and reduce the load on the main chain by moving the bulk of transactions to a second layer.