Embarcadero DB Optimizer
Datadog DBM Oracle OEM
Amazon Performance Insights
Average Active Sessions, or AAS, is the core metric for monitoring and analyzing database performance.
At its core, AAS represents the average number of concurrent SQL queries running on the database. Each 'session' refers to a user process executing a specific task or SQL query. The 'average' part of AAS signifies that it's not just a point-in-time measurement, but an average calculated over a specified collection interval. For example, if each bar on the AAS graph represents a 5-second interval, the value at that point is the average number of concurrently running SQL queries during those 5 seconds.
By tracking these active sessions, AAS provides invaluable insights into what the database is doing, where resources are being allocated, and where potential bottlenecks may reside. AAS, a part of the Active Session History (ASH), serves as a time-series metric capable of capturing granular information about database performance.
The uniqueness of AAS lies in its ability to offer a comprehensive view of database load, spanning across its components, from CPU usage, to Locks, to buffers, to latches to IO waits. Every single place that a SQL query spends time is covered by AAS. We see everything in one metric. This makes it an invaluable tool in any database administrator's toolkit.
AAS is defined as the average number of concurrent active sessions in an database. To picture this, imagine each 'session' as an individual user process performing a specific task or executing a SQL query. The word 'active' is key here as it specifically refers to sessions that are doing something, not just connected to the database. When we say AAS, we are referring to the average number of these active sessions in a given time window.
Now, why is this important? The beauty of AAS lies in its ability to capture the ongoing activity in the database. It gives a view into what the database is doing, which tasks are keeping it busy, where resources are being spent, and where potential bottlenecks might be. It's almost like having a live CCTV footage of your database. I've gone to some of the worlds largest banking institutions who had the live graph of AAS on wall size monitor so everyone could immediately see visually any issues on the database.
AAS not only measures database activity, but can also indicate whether the database is idle or not. An AAS near 0 suggests that the database is nearly idle — not because no queries are running, but likely because the number and speed of the queries are so minimal that they barely register on the AAS metric.
Crucially, AAS provides an easy-to-understand, real-time picture of database health. If the AAS value is less than 1, it implies that all sessions are advancing, albeit at different rates, and there are no blocked sessions. When the AAS value is less than the number of virtual CPUs (vCPUs), it indicates that the system has available capacity and operations should be running smoothly, although a session could be blocked even in this case.
However, when AAS exceeds the number of vCPUs, it is a signal that the system is experiencing some level of saturation, with some sessions likely waiting. The further AAS exceeds the vCPU count, the more pronounced the bottleneck. In extensive analysis of a six-figure number of database instances at Amazon, we found that when AAS exceeded five times the vCPU count, the system was almost always considered to be experiencing a significant bottleneck. Thus, we set an AAS value of 5x vCPU as our default threshold for alerting customers to potential performance issues. This metric provides a much more accurate analysis of a system's performance than any other, as it doesn't require knowledge about the application or expected query latency, which can vary widely between systems.
The simplicity of AAS may lead some to underestimate its value, especially when compared to other metrics that require deep knowledge about the application, like transaction workload or expected query latency. However, these metrics are often variable and application-specific. In contrast, AAS provides a universal, easy-to-understand measure of database performance that works across all systems and applications.
Explanation of the AAS graph and its Components
An AAS graph is a visual representation of this 'CCTV footage.' It depicts the variation of active sessions over time. Each bar on the graph can be seen as a snapshot of the database's activity during a specific interval. The height of the bar represents the AAS value, that is, the average number of concurrent active sessions in that time interval. For example, if each bar represents a 5-second interval, the height of the bar tells us the average number of SQL queries that were running concurrently in the database during those 5 seconds.
The graph also color-codes these sessions based on their 'wait class' or 'state.' This coding provides additional insights into what these sessions are doing - are they executing CPU operations? Or are they waiting for an I/O operation? Or are they waiting for some other session to complete? The color-coding allows us to quickly identify patterns and potential issues, like if too many sessions are spending time in a particular wait state.
Overall, an AAS graph provides a powerful visualization tool to analyze the database's performance and identify issues for further investigation.
The Role of AAS in Database Performance Monitoring
Comparison of AAS with other traditional metrics (CPU Utilization, IOPS, etc.).
Explanation of how AAS provides a more holistic view of system performance.
It's common in the realm of database management to rely heavily on traditional metrics such as CPU Utilization and Input/Output Operations Per Second (IOPS). While these metrics offer valuable insights, they can sometimes provide a rather narrow view of the system's overall performance. This is where AAS stands apart.
Comparison with CPU Utilization
CPU Utilization, expressed as a percentage, indicates the amount of processor capacity currently in use. However, it has limitations. When CPU utilization reaches 100%, it simply signifies that all available CPU resources are in use; it doesn't convey how much additional CPU resource demand exists. Moreover, it can't pinpoint which user or SQL query is responsible for the high CPU usage.
AAS, on the other hand, provides a more comprehensive picture. Consider two scenarios: In the first, there are 16 runnable queries on a 16 vCPU system, resulting in 100% CPU utilization. In the second scenario, there are 32 runnable queries on the same 16 vCPU system - again, the CPU utilization is 100%. However, in the second case, the demand for CPU is twice as high as the available capacity, and at any point in time, half of the runnable queries are idle, waiting for CPU time. Traditional CPU Utilization metrics don't capture this detail. AAS does. It reflects the demand, revealing that we have an AAS of 32 in the second scenario — double the first scenario.
Comparison with IOPS
Like CPU Utilization, IOPS can also be misleading. While it shows how much IO is being performed, it doesn't illustrate who is performing it. For instance, if we hit the maximum IOPS, we don't know which users or SQL queries are causing the high I/O waits.
AAS comes to the rescue here as well. With AAS, we can group the data by the top users or top SQL queries to see which ones are responsible for the most I/O waits. AAS provides a unique level of granularity that allows us to drill down into the root cause of performance issues. It is more than just a capacity metric; it's a diagnostic tool for assessing and enhancing database performance.
In sum, while traditional metrics like CPU Utilization and IOPS offer useful but partial insights, AAS provides a comprehensive, detailed, and nuanced picture of your database performance. It's the missing puzzle piece that connects other metrics and delivers a holistic view of system performance.
Decomposing Performance Data with AAS
Description of how AAS derives from sampled data.
Explanation of the 'slice and dice' capability of AAS.
Detailed scenarios demonstrating how AAS can help identify issues which CPU utilization and IOPS can't.
One of the most notable strengths of AAS is that it's derived from sampled data. This data includes who was running, what query they were running, what wait state they were in, the application server they were coming from, and more. This richness of data allows AAS to offer unique "slice and dice" capability, enabling us to analyze performance problems in a more granular and detailed manner. Let's explore this further.
Derivation from Sampled Data
AAS data comes from a sampling of the database's active sessions. Each sample reflects the state of the database at a specific point in time, capturing details such as the number of active sessions, the queries being executed, the users connected, and more. This high-resolution view allows us to construct an accurate picture of the system's performance.
“Slice and Dice" Capability of AAS
Thanks to its origins in sampled data, AAS is highly flexible and can be dissected in multiple ways to analyze performance. You can group the AAS data by different dimensions — top users, SQL queries, wait states, and more. Each of these groupings can offer new insights. This capability to segment and investigate different aspects of database activity makes AAS a uniquely powerful tool for identifying and diagnosing performance issues.
Identifying Issues through Detailed Scenarios
Let's consider a couple of scenarios to illustrate this:
Scenario 1 - High CPU Utilization: Your system shows a CPU utilization at 100%. Traditional metrics would simply indicate that you're out of CPU resources. But who is consuming those resources? AAS can help answer this. By grouping the AAS data by the top users or SQL queries, you can identify which users and which SQL queries are causing the high CPU usage.
Scenario 2 - Hitting Maximum IOPS: Similarly, if your system is hitting its maximum IOPS, traditional metrics won't tell you who is responsible. But with AAS, you can group the data by the top users and/or top SQL queries to see who's causing the most I/O waits.
In both these scenarios, AAS goes beyond indicating that a problem exists. It helps identify the nature and source of the problem, enabling a more effective response.
In sum, the ability to decompose the time series AAS data into these components is what sets AAS apart from other performance metrics. It's not just about measuring capacity or activity; it's about understanding the why and the who behind the performance data.
AAS and Performance Bottleneck Identification
Explanation of the Max CPU line in the AAS graph.
Discussion of how AAS values exceeding the Max CPU line signify potential bottlenecks.
Illustration of how AAS aids in faster diagnosis and resolution of performance issues.
Understanding how to read and interpret the AAS graph is key to identifying potential performance bottlenecks. One critical line to look out for on this graph is the Max CPU line. This line represents the maximum CPU capacity available to your system. Let's delve into what it means when AAS values exceed this line and how AAS can guide you towards faster issue diagnosis and resolution.
The Max CPU Line in the AAS Graph
In an AAS graph, the Max CPU line is a horizontal line indicating the maximum amount of CPU capacity available to your system. This is the theoretical limit of your system's processing power and is directly tied to the number of vCPUs (virtual CPUs) in your system. For example, if your system has 16 vCPUs, the Max CPU line will sit at the value of 16 on your AAS graph.
Significance of AAS Values Exceeding the Max CPU Line
When your AAS values for CPU component exceed the Max CPU line, it signifies that the demand for processing power exceeds what the system can provide — a potential bottleneck. While your CPU utilization metric would only show you've hit 100%, the AAS graph tells a deeper story.
Consider the scenario where your AAS value is at 32 while the Max CPU line is at 16. This means that, on average, you have twice as many active sessions (or SQL queries) demanding CPU resources as your system can handle. Half of the sessions are idle, waiting for a chance to get on a CPU. Simply looking at the CPU utilization metric wouldn't have revealed this; you'd see 100% utilization in both cases where demand matches supply and where it far exceeds it.
Faster Diagnosis and Resolution with AAS
Understanding the relationship between AAS and your Max CPU line not only identifies potential bottlenecks but also accelerates their diagnosis and resolution. By knowing that your performance issue is related to CPU demand exceeding capacity, you can focus your efforts on specific mitigation strategies. These might involve optimizing your SQL queries, balancing your workload, or considering scaling your CPU resources.
AAS, therefore, doesn't just indicate potential problems; it provides the context needed to understand them. It informs your response, helping you address performance issues more efficiently and effectively.
AAS as a Time-Based Metric
Addressing misconceptions about AAS's connection to time-based performance analysis.
Explaining how AAS is intrinsically a measure of SQL query elapsed time per second.
Demonstrating the calculation of average elapsed time of SQL queries using AAS.
AAS, in essence, is a time-based metric. It represents the average amount of time that the database spends processing SQL queries every second. This characterization counters misconceptions that AAS is unrelated to time-based performance analysis. In fact, the relationship between AAS and time is at the heart of its utility as a performance metric. Let's clarify these concepts:
Misconceptions About AAS and Time-Based Performance Analysis
There's a prevailing misconception that AAS is not a time-based performance metric. This misunderstanding may stem from the fact that AAS is not expressed directly in units of time. However, it's crucial to understand that AAS is fundamentally a measure of SQL query elapsed time per second, which is a measure of time. Dismissing the connection between AAS and time-based performance analysis overlooks the core function of the metric.
AAS as a Measure of SQL Query Elapsed Time Per Second
AAS represents the average number of SQL queries concurrently running on the database, measured over the width of each bar in an AAS graph. If each bar represents 5 seconds, the AAS value would show the average number of active sessions over that 5-second interval. This means AAS essentially measures the amount of SQL query elapsed time per second.
Calculating Average Elapsed Time of SQL Queries Using AAS
AAS can be used to calculate the average elapsed time of SQL queries. If we know the AAS and the average execution rate of queries (the number of queries executed per second), we can easily find the average elapsed time per query. This is simply the AAS divided by the average execution rate.
For example, suppose our AAS is 2, and we're executing 2 queries per second. Dividing the AAS by the execution rate (2 / 2), we find that the average elapsed time per query is 1 second. This demonstrates how AAS, beyond being a metric of concurrent load, can provide insight into the duration of SQL queries.
In conclusion, understanding AAS as a time-based metric illuminates its value in performance analysis. It offers more nuanced insight into database activity and performance than many traditional metrics can provide.
Case Studies
Detailed case studies showing the application of AAS in real-world scenarios.
How AAS has helped in addressing performance issues.
Case studies provide compelling evidence of the power and practicality of AAS in real-world scenarios. They illustrate how AAS facilitates better understanding of performance issues and aids in identifying and addressing bottlenecks more effectively. Let's dive into a couple of examples:
Case Study 1: Identifying and Resolving High CPU Usage
Consider an organization that consistently experienced peak CPU utilization during specific hours, causing noticeable slowdowns in their application performance. Traditional metrics showed that CPU utilization was consistently at 100% during these periods, but provided no additional insights about what was causing the high usage.
Using AAS, they observed that during periods of high CPU usage, the AAS values were also significantly high, often exceeding the line representing maximum CPU capacity. Further, they sliced the AAS data by top SQL and found a couple of complex SQL queries that were active during these peak periods.
Through this analysis, the organization pinpointed the specific SQL queries that were causing high CPU usage. They were able to rewrite these queries for better optimization, leading to a significant drop in AAS values and, consequently, CPU usage during peak periods.
Case Study 2: Diagnosing I/O Wait Issues
In another situation, a company was struggling with performance degradation due to high I/O waits. However, they were unable to identify the specific users or SQL statements causing the high I/O waits from their traditional metrics.
By analyzing their AAS data and grouping by I/O waits, they were able to discern that the high I/O waits were predominantly caused by a small group of users running specific SQL queries that involved large data transfers. The identification of these queries and users allowed the company to implement specific optimizations, such as re-indexing certain tables and modifying the SQL queries, thereby significantly reducing the I/O waits and improving overall performance.
These case studies underscore the versatility and granularity of the AAS metric. By providing a more detailed and nuanced view of database activity, AAS can expose performance issues that other metrics may miss, enabling quicker diagnosis and more targeted optimizations.
Common Misconceptions about AAS
Addressing common misconceptions about AAS (primarily focusing on those circulating on platforms like Twitter).
Providing evidence-backed refutations of these misconceptions.
The dynamic world of database performance metrics often leads to misconceptions, misunderstandings, and spirited debates among professionals. In recent times, some misconceptions about AAS have gained traction, particularly on social media platforms like Twitter. This section aims to clear the air by addressing and debunking these misconceptions.
Misconception 1: AAS is merely a capacity metric
One of the primary misconceptions floating around is that AAS is simply a capacity metric, providing information similar to CPU utilization or I/O operations per second (IOPS). However, as explained previously in this article, AAS is fundamentally a performance metric. It not only provides data about resource utilization, but also offers detailed insights into the concurrency and contention in the database. By leveraging the fine-grained data AAS provides, organizations can uncover the root causes of performance issues that generic capacity metrics might miss.
Misconception 2: AAS is not related to time-based performance analysis
Another mistaken belief is that AAS doesn't contribute to time-based performance analysis. This couldn't be further from the truth. As we've explained earlier, AAS is intrinsically tied to time; it represents SQL query elapsed time per second. By calculating AAS and understanding its implications, database administrators can derive the average elapsed time of SQL queries, a vital aspect of performance analysis.
Misconception 3: High AAS is always bad
It is important to understand that a high AAS is not always indicative of a performance problem. AAS measures active sessions, and these sessions could be running efficiently, doing valuable work. If your system is properly tuned and capable, a high AAS could simply mean that your database is busy and efficiently utilizing resources.
Each of these misconceptions stems from a partial understanding of what AAS is and does. By comprehending AAS in its entirety, as a multifaceted, granular, time-based performance metric, we can utilize it to its fullest potential and derive accurate, actionable insights to guide our performance optimization strategies.
The Future of AAS in Database Performance Management
Discussion on the ongoing evolution and potential future developments of AAS.
Importance of AAS in the context of emerging technologies and trends in database management.
As we move forward into an era of increasingly complex database systems, the Average Active Sessions (AAS) metric continues to prove its relevance and importance. Its granular detail and ability to quickly isolate issues make it a critical tool for managing database performance.
Here are some areas where we can expect AAS to play a significant role:
1. Managing Cloud-based Databases
With the rise of cloud-based databases and services, understanding the performance of your system is critical. As businesses continue to migrate databases to the cloud, the AAS metric will play a crucial role in identifying potential bottlenecks and optimizing performance in these new environments.
2. Integrating with AI and Machine Learning
Artificial Intelligence (AI) and Machine Learning (ML) are making significant strides in database management, particularly in performance tuning and predictive analysis. AAS, with its rich and detailed data, can provide invaluable input to these AI/ML models. The ability of AAS to accurately represent concurrent activity and demand on the system can help AI/ML algorithms predict and manage potential performance issues proactively.
3. Incorporation in Real-time Analytics
Real-time analytics require rapid identification of performance issues to maintain optimal service levels. AAS, with its ability to provide an immediate snapshot of concurrent activity, is ideally suited for integration into real-time monitoring and analytics tools.
4. Role in Autonomous Databases
As we move towards more autonomous databases, AAS could play a crucial role in managing performance. These self-managing systems rely on sophisticated metrics to operate effectively, and AAS is well positioned to be an essential part of that toolkit.
In summary, while the AAS metric is already an indispensable tool in database performance management, its future looks even brighter. As databases continue to evolve, embracing new technologies and architectures, AAS will continue to play a pivotal role in helping us understand and optimize database performance. Its granularity, detail, and versatility make it a vital part of any database performance analyst's toolkit. As such, it's crucial that we continue to study, understand, and effectively utilize this powerful metric.
Conclusion
Recap of key points.
Closing remarks on the importance of AAS in effective database performance management.
Over the course of this comprehensive guide to the Average Active Sessions (AAS) metric, we've explored its various facets, from its definition and interpretation to its crucial role in performance management and problem resolution within databases. It's abundantly clear that AAS is not just a capacity metric, but rather a powerful performance tool that delivers detailed insights which other traditional metrics cannot.
Through a number of case studies and scenarios, we've seen how AAS, with its unique ability to monitor concurrent SQL queries and average them over specified intervals, can dissect performance data to identify potential bottlenecks. Whether it's a CPU running at full capacity or an I/O operation causing unnecessary waits, AAS offers the granularity needed to diagnose and resolve these issues.
However, the impetus for this exploration was a controversial tweet which served as a stark reminder of the misconceptions and misunderstandings that persist around AAS. In an age where information can be disseminated rapidly, it's distressing when inaccuracies are spread, especially when they concern such vital tools for database performance management.
Reflecting on this, I realize the role I played in this knowledge gap. For the past 20 years since I originally designed AAS with John Bersniewicz, my focus has been on teaching, designing, and coding, but not enough on writing about AAS. I've put together countless presentations and delivered lectures to thousands of DBAs around the world. I've developed live demonstrations to illustrate database bottlenecks and the steps to resolve them.
Additionally, I've been deeply involved in designing AAS monitoring interfaces for tech giants like Oracle, Amazon, Datadog, and Embarcadero. On the coding side, I've dedicated hours to create a graphical version of the AAS tool in TCL/TK, and even written C code to directly access the data from the database's shared memory. I also developed a continuous active connection sampling program, a project later extended by Marcin Przepiorowski with the "S-ASH" or Simulated Active Session History tool.
But, despite these efforts, the lack of written material on AAS has undoubtedly contributed to the misunderstandings we see today. As the recent Twitter controversy underscores, there's a pressing need for clear, concise, and accurate written resources about AAS.
Looking ahead, AAS is set to remain an integral part of database performance management. With the continuous evolution of database technologies and management strategies, the demand for insightful metrics like AAS will continue to rise.
In closing, I urge all database professionals to strive for a comprehensive understanding of the AAS metric. It's not just a tool, but a necessity for those seeking to excel in database performance management. With renewed vigor, I now take on the task of filling the knowledge gap about AAS, and I hope this article serves as a stepping stone for you on your journey towards mastering AAS.
Discussion
Jeremy Schneider @jer_s samples of current state of active sess/conn.
active on lock reflects workload more than capacity, and
samples showing CPU/IO/network state don't reveal much about actual capacity either. the word "perf" may be vague/ill-defined but does the word capacity really apply better?
John Beresniewicz @JBeresniewicz AAS is truly a performance metric, but it is not a measure of performance like avg latency or transaction throughput. Because it measures the “velocity” of DB Time, which IS the key performance metric. We often refer to it as database “load”, but that can be misleading. /1
If you think that “performance” is about service time of requests as well as thruoughput of the system then AAS is absolutely a performance metric , since it is the product of these 2 core performance measures:
AAS = txn/sec x secs/txn
aka
AAS = avg latency x throughput
If txns is controversial, then equivalently:
AAS = user calls/sec x secs/call
AAS increases when either load increases (calls/sec) OR latency increases (secs/call) So it is the product of two inverse perf measures, and thus unitless… …that’s why we had to make up a name
The colors in it gives an idea about how it can scale. can scale with more compute / disks are locks and latches and can't scale
Comments