Azure SQL Data Warehouse: Explaining the Architecture Through System Views

The architecture of Azure SQL Data Warehouse isn't easy to explain briefly, but if you have some useful queries that access the management and catalog views, and diagrams that show how they relate together, you can very quickly get a feel for what is going on under the hood. By using and extending these queries that use these views, you can check on a variety waits, blocking, status, table distribution and data movement in ASDW.

Contents

Introduction

The aim of this article is to describe the most important catalog views and dynamic management views that come with Azure SQL Data Warehouse (ADSW) in order to explain and illustrate its architecture. These views are designed to help understand, manage, monitor and correct the ASDW system’s behavior.

The diagrams and SQL code examples in this article are intended to give you the means of exploring and understanding what is going on in ASDW. I hope that these examples are easily adapted, and that they lead you to compose your own system queries to understand any specific issues that you might face while you are migrating, testing or working with the system in production.

This is not an exhaustive list of all the views available to the system, although they are sufficient to get you started. They are the ones I’ve found the most useful, most interesting, or that are not identical to their SQL Server equivalent.

You will notice that in the name of most of them is prefixed by the term ‘pdw’: This refers to Parallel Data Warehouse, the name of the Massively Parallel Processing SQL Server version that is inside the on premise Analytics Platform System and also ASDW in the cloud.

For a general introduction to the ASDW service, please refer to Robert Sheldon’s ASDW articles.

The information on the system views and how they relate to each other is accurate at the date that this article was published. Things are always moving in the cloud, if you run into any errors or unexpected results, be sure to check the online documentation for ASDW/PDW related views on this link.

We will start by showing you the views and queries that allow you to explore how data is distributed, what is currently happening in the system, the Data Movement Services and any operations moving data around.

How is a table being distributed?

ASDW is a Massively Parallel Processing (MPP) system, and as such, it takes advantage of multiple storage buckets called distributions and multiple machines called nodes. When a table is created, you have to specify how it will be distributed among 60 different buckets called distributions. Currently, ASDW supports distributing a table either based on a column as a HASH key or through a simple Round-Robin distribution of all the rows.

The system’s approach is to ‘divide-and-conquer’ by splitting very large tables into distributions and attaching those distributions to multiple nodes. Many nodes can then work in parallel to resolve queries over large datasets. If different tables are distributed with the same hash key then the distributions for a particular value will end up on the same node and so data movement is avoided if that key is used in a join. Proper distribution is a critical piece to getting the best performance of ASDW.

Here are the views that can help us to understand how a given table has been distributed around the system.

Notice that sys.schemas, sys.tables, sys.indexes and sys.columns are the same as SQL Server and I’ve left them in the diagram since they are important into tying the other ASDW specific views together and writing your own queries. As far as the ASDW specific views we have:

  • sys.pdw_table_distribution_properties: this view will tell you how the table was distributed. In the case of ASDW, as of the date of publication the options will be HASH or ROUND_ROBIN.
  • sys.pdw_table_mappings: because tables are split into 60 distributions, ASDW actually creates 60 different internal tables. This view will let you map the ‘logical’ table record from sys.tables with the actual internal table names used by ASDW.
  • sys.pdw_nodes_tables: once we have those internal table names, we can use this view to see which compute node and distribution corresponds to each internal table. The view also exposes table level properties similar to regular SQL Server tables.
  • sys.pdw_distributions: this is a simple view that ties the nodes and the distributions that are attached to them together.
  • sys.pdw_nodes_partitions: this view has the partition information matching those internal tables that we get from sys.pdw_nodes_tables. Remember that by default, all tables have 1 partition.
  • sys.dm_pdw_nodes_db_partition_stats: this view can tie back to those internal table names, the node and the distribution to give you crucial information like the amount of rows, the space used and the space reserved for a particular table, index or partition.
  • sys.pdw_column_distribution_properties: this view can tie to the sys.tables and sys.columns view to let us know the specific column that was used to distribute a table if HASH distribution was selected. If the column distribution_ordinal is equal to 1 then this is the hashing key.

In the following examples of how these views are used, we will try to understand the distribution of an example table named ‘FactProductInventory’.

This set of views enable us to answer questions and queries such as:

What type of distribution is a table using?

If my table is hash distributed, which column is the hash key?

See the amount of rows and space taken by the distributions of a table ranked by the space taken

If you’re also using partitioning, then you’ll have to include sys.pdw_nodes_partitions into the queries to get to that level of granularity.

For example, this query shows the same information as the one above but breaks it down even further by space per partition per distribution:

From the diagram and the examples you can adjust and customize these queries to provide other information related to table distribution.

Let’s move on to understanding what’s running in the system at a given point in time.

What is currently happening in the system?

Since ASDW is a close relative to SQL Server, the views in this section will be familiar to most administrators. ASDW follows the same concepts of sessions, connections and requests as SQL Server. However, the MPP nature of ASDW means that a given request can be queued as well as split into multiple operations. But before we start looking into the views we need to cover a feature called Query Labels.

Query Labels for tracking individual queries in the requests system view

Query Labels is a feature native to ASDW that is unfamiliar to SQL Server administrators, though its usefulness will make you want to have it on the other SQL versions! The concept is very simple: for a given query, specify a friendly text label as a hint, in order to use it to quickly identify the query in the requests system view. Here’s an example:

By using the ‘AverageBalance’ label, we can now easily track this query on the requests system view:

Simple, right?

As we’ll see later on, the requests system view is core to understanding how the system handles any given query or data load and using a label is the easiest way to tie together a query with its assigned request identifier.

System Status Views

Here’s the diagram for system status information:

The connections, sessions and requests views all work in a similar way as their SQL Server counterparts. The sys.dm_pdw_exec_requests one in particular is crucial to get the status of any given query as it exposes the ‘label’ column and as we’ll see later on, it allows to get a lot of detail on how the request is executed in the system.

From this set of views, the only ones exclusive to ASDW are these two:

  • sys.dm_pdw_errors: this view allows you tie together errors with either the session or the request that raised it. It’s very handy and surprising that there’s no equivalent on SQL Server.
  • sys.dm_pdw_sys_info: this view gives an ‘at-a-glance’ snapshot of system operation. It returns the amount of requests or loads either running or queued and idle or active sessions in the system. Another handy one that I wish was included in SQL Server!

Using the System Status Views

With this set of views we can answer questions and queries such as:

How many queries are currently running and how many queued?

Which application has executed the most queries?

Is there a host generating more errors than all others in the last hour?

Are there clients connected with a non-encrypted connection and how are they authenticating?

Using the query example we had with a label of ‘AverageBalance’, we can also answer questions and queries such as:

What’s the status, command and resource class of this query?

Were there any errors when the request ran and which user ran it?

What’s the runtime of the last 10 executions of this query?

Unlike the sys.dm_exec_requests view in SQL Server, the sys.dm_pdw_exec_requests view actually keeps up to 10000 records with the information of a request even after it has executed. This capability is very useful as you can track specific query executions as long as their records are still among the 10000 kept by the view. As time passes the oldest records are phased out in favor of more recent ones.

Let’s move on now to exploring and understanding the Data Movement Services and data movement operations that happen inside ASDW.

Tracking Data Movement Services

Because of the distributed nature of ASDW, the system has a component called the Data Movement Services (DMS) that takes care of moving data around the different compute nodes as part of a query’s distributed execution plan.

The DMS also has worker threads that take care of reading data from Blob Storage and distributing it to the 60 distributions of ASDW. The number of readers depends on the amount of Data Warehousing Units provisioned to the service and the amount of writers is always 60. This parallel reading from a Blob Storage file to an ASDW table is accomplished through the loading method known as PolyBase.

Data Movement Services Views

From the diagram, we can see the familiar requests view as well as three ASDW exclusive views:

  • sys.dm_pdw_dms_cores: this view shows the status of the DMS service on each compute node in the system. Since ASDW is a managed service on Azure, this should always display as ‘Ready’.
  • sys.dm_pdw_dms_external_work: this view shows each step that a DMS worked has taken to read a file from an external location like Blob Storage. The steps record not only the file name but also timestamps and the amount of data read.
  • sys.dm_pdw_dms_workers: this view shows each DMS worker thread and information on the work they are doing or have done. This includes the node and distribution that the worker is working on, CPU, time and other measurements.

Both the external work and the DMS worker views also accumulate history at a maximum of 10000 records the same as the requests view.

Using the Data Movement Services views

With this set of views we can answer questions and queries such as:

Which files are currently being loaded into the system and by which user?

How much data has been written by the DMS workers for each file that was loaded in the last hour?

If we perform a data load with a label of ‘Load FactServiceCalls Table’, we can also answer questions and queries such as:

How many rows have been distributed by the DMS workers for this data load?

What is the average Megabytes-per-sec read rate for this load?

As you can see, there’s enough instrumentation on these views to track load progress, monitor the status of a specific load and detect where bottlenecks might be popping up.

Drilling Down on Request Details

Once a query has been submitted, tracking it through the request identifier from sys.dm_exec_requests allows joining with other views that provide different perspectives on what the query is doing under the hood.

We already saw one of those views when we explored the Data Movement Services. Here are the rest of them:

Request Details Views

We can see here three new views:

  • sys.dm_pdw_hadoop_operations: this view will populate with records that correspond to steps in map-reduce jobs when running a query to an external Hadoop. This makes perfect sense for the SQL Server 2016 and APS implementation of PolyBase that can push down to Hadoop some predicate computations so the amount of data sent from the Hadoop cluster to the SQL engine is minimized. However, at the time of this article’s publication, ASDW doesn’t have the capability of doing external Hadoop tables yet. The fact that the view is present and accessible seems to point to this feature coming to ASDW down the line.
  • sys.dm_pdw_request_steps: this view allows us to see the breakdown of a request into the different steps that lead to its resolution. Because ASDW is a distributed system, any given request can require different operations such as moving data around the nodes, creating intermediate datasets on temporary tables and running different SQL on any of these structures. Using the step_index column to sort, we can see the exact order in which these steps ran. Other columns such as operation_type, distribution_type and location_type offer more details on the execution internals of ASDW.
  • sys.dm_pdw_sql_requests: this view ties closely to the request_steps view. It ties back to specific steps in a request and expands this information to give us stats for a specific step inside one or more distributions. This allows much more granular instrumentation since it goes all the way down to the individual distribution level.

Using Request Details Views

For example, we could ask:

Which requests have steps that required work from the Data Movement Service?

If you see many operations that use the DMS and are taking the majority of your execution time then this is likely a tuning opportunity in the system. For example, tables that are frequently joined by a specific column can avoid data movement if they both use a hash key distribution on that column.

Now let’s assume we have this query:

Using the query label, with these views we can answer questions such as:

What were the different steps executed by the system to resolve this query and how much time did each step take?

Once we can split a specific request into its individual steps, we can quickly isolate what is causing the bulk of our execution time and focus our tuning efforts on that step.

For each step of the query, what was the maximum, minimum, average and standard deviation of the execution time per distribution?

In this last query, I’m using a LEFT JOIN to join the request_steps view to the sql_requests view because some query steps will not break down further into operations at the distribution scope. Using a query like this you can detect if you have data distribution issues and one distribution is doing much more work than others.

In this type of skewed distribution, you have to analyze what type of distribution each table involved is using and whether there is a better way of distributing your data. The key concept to keep in mind is that in general, good performance in ASDW comes from using all the system resources in parallel: distributions and compute nodes. If the data is evenly balanced on the different distributions then more work can be done in parallel.

Checking on the ASDW nodes

Now that we’ve taken a look at the views and information we get at the detailed request level, let’s move up and check the views that operate at a node level. These views can be used as a complement on the request level analysis or individually to trend, monitor or troubleshoot issues at the node level.

Node-related Views

We already covered sys.dm_pdw_errors, and now we have three new views:

  • sys.dm_pdw_nodes: this view will return a row for every node in your ASDW. As you increase and decrease the number of Data Warehousing Units in your system, the number of compute nodes will change accordingly. Regardless of the amount of DWUs, there is always just one Control node.
  • sys.dm_pdw_os_threads: this view returns a row for every thread running on each node of the system.
  • sys.pdw_nodes_pdw_physical_databases: this view gives us a list of the different internal databases that compose the entire Data Warehouse and which node it is attached to. You will notice that each distribution will be a separate internal database and each node has its own set of system databases.
  • sys.dm_pdw_nodes_database_encryption_keys: this view lets you keep track of encryption operations on a node when you turn ON or OFF the Transparent Database Encryption (TDE) feature on ASDW. If an operation is in progress you can track it with the percent_complete column.

Using the Node-Related Views

With this set of views we can answer questions such as:

How many of each type of node do I have on my system?

The results will always be one Control node plus a number of Compute Nodes. The exact amount of Compute Nodes will depend on the amount of Data Warehousing Units that you have provisioned and will change as you scale those DWUs up and down.

How many threads are running on each node?

Since ASDW is a fully managed service by Microsoft, there’s not a lot of trending or diagnostics that you can distill from the OS threads info. That said, I’m sure there are people that are just as curious as I am and enjoy looking into these internal details of the system.

What’s the status of TDE on each internal database on the system?

On a large warehouse, the encryption and decryption of the data can take a long time and this query will allow you to monitor the progress of the operation.

At this point we’ve looked at both the detailed request level and high level node information. Now, let’s look at the wait related information and views in the system.

Queuing, Blocking and other performance issues.

Similar to SQL Server, ASDW also keeps track of the different wait statistics that pop up during the operation of the system. Some of them are unavoidable and harmless, others can be signs of real bottlenecks on your Data Warehouse.

Keeping track of the top waits can help tune specific queries, monitor the system and spot trends in performance. ASDW offers wait information at the node or the request level.

Performance-Related Views

We can see four wait-related views:

  • sys.dm_pdw_lock_waits: this view shows the current locks granted and being waited on for the active requests in the system.
  • sys.dm_pdw_resource_waits: this view shows waits that are related to the concurrency management limits established by ASDW. As you scale your Data Warehousing Units up you can service a larger concurrent workload.
  • sys.dm_pdw_waits: this view is a union of the previous two views in case you are interested in both the lock and the concurrency waits.
  • sys.dm_pdw_wait_stats: this view aggregates the amount of waits and waiting time for each wait type on each node of the system.

Answering performance questions with the Performance-Related Views

The way to handle and analyze blocking and wait bottlenecks is the same as SQL Server, the only difference is in where to find the information. Also keep in mind ASDW is a distributed system and thus the wait stats are provided at the individual node level. To provide a view of the entire system, we aggregate the values over all the different nodes by grouping by wait type and ignoring internal system waits and wait types that have not occurred at all.

With this set of views we can answer questions and queries such as:

Which queries are being queued by the system due to concurrency?

In the result set for this query we include the user Resource Class. A full exploration of Resource Classes is out of the scope of this article but we will say that they are part of ASDW’s workload management system and govern how much resources a query can use. Higher resource class queries get more resources but the trade-off is that they decrease the amount of concurrent queries that will be allowed to run in the system.

Which queries are currently blocked and which process is the blocker?

Unlike the sys.dm_exec_requests view on the on-premises SQL Server, the sys.dm_pdw_exec_requests view does not have the blocking_session_id column. This means that monitoring and analyzing blocking requires working with the relationships and information on these views as we can see in the query above.

ASDW does offer a more direct equivalent of sys.dm_exec_requests called sys.pdw_nodes_exec_requests and this one does include the ‘blocking_session_id’ column. However, this view is not that useful as it is usually used in conjunction with functions such as sys.dm_exec_sql_text to see the command that is running, however this function is not available in ASDW.

For this reason, I left the sys.pdw_nodes_exec_requests view out of the diagram intentionally. New users of ASDW should familiarize themselves and work with sys.dm_pdw_exec_requests instead.

To analyze system wide waits, we can use the views for questions like:

What are the aggregated values for the different performance impacting waits in the system?

To create the list of wait types to ignore, I based it on the wait stats diagnostic query from Glen Berry found here and added some other wait types that I’ve seen on ASDW that don’t seem to impact performance and can be safely ignored.

Wrapping Up

Through the diagrams and example queries we have covered many different facets of ASDW as exposed by the system views. Take these examples as the launch point for your own analysis, monitoring and learning how the system is architected. Once you’re comfortable, roll up your sleeves and customize these or write your own from scratch as you dig deeper into the power of Azure SQL Data Warehouse.

  • 12046 views

  • Rate
    [Total: 5    Average: 5/5]
  • Luis Simões

    Impressive Article. Very well written and that makes it very easy to understand. Congrats Warner Chaves for the nice work.

    • Warner Chaves

      Thanks Luis!