top of page
Writer's picturekyle Hailey

Most important performance metric on a database

Updated: Jun 29, 2023


This applies to Operating Systems and Application Servers, it is just that no one has synthesized the data as simple as I have on databases.


Why is this metric, AAS - average active sessions a.k.a. # of concurrently running SQL queries so important? see Setting the Record Straight: A Comprehensive Guide to Understanding the AAS Metric in Databases


The most important database performance metic is Average Active Sessions aka AAS aka database load a.k.a # of concurrenlty running SQL queries. . Really AAS is just the number of concurrent queries (CCQ) running on database. The more queries running , the bigger the load. Like if you have a pickup truck, then the more bricks you put in the cargo bed the bigger the load you pickup is moving.


AAS could be the core performance metric any system process widgets where widgets are just units of work done, like in Goldratt's The Goal.


One easy way to see AAS/CCQ as performance metric is when talking about CPU. If your host has 16 vCPUs, then you can only have 16 queries running on CPU. All other queries are waiting. Thus if you have 32 queries running concurrently then you know that half of them are waiting for something. The might be waiting for CPU or the could be waiting for a lock or an I/O or some other resource, but you know that 16 of the queries are waiting.


Now I use Datadog at work. I love Datadog monitoring and customizable dashboards.

And though Datadog has the best most flexible and easy to use customizations, it does have some sharp edges and some of the sharp edges go unnotices because for the most part the graphs look reasonable. I plan to do some blog posts on those sharp edges, but my first post on customizations is the following example of the database load :




The colors are the wait types and the height is the number of concurrent queries.

So the example of CPU saturation is easy to see here. The black dotted line is the # of vCPUs on the host. The green bars are the queries runnable on CPU. As we can see the green bars are about half the height of the vCPU line, meaning we have plenty of CPU resources.

On the other hand we have about twice as many concurrent queries running as the green bar, i.e all those queries not in green state, not runnable on CPU, are waiting for something else.

What are they waiting on? Well that depends on the colors.


I color coded the main waits I see on my system:

Locks (transactionid, relation, tuple) are red because IMO they are worst general wait

All commit stuff is yellow/orang

ClientRead is gray, its a wait that is outside the database. It how long the client takes to consume the fetched data from the database

All non commit I/O and I/O resource activity (like allocating a free buffer) are in blue





Everything else is a shade of purple

You have to create two template variables or change the variables to hard coded values:


$host_prod - name of the database you want to monitor 
$wait_event - you can take this out, but I put it in sometimes to narrow down the output to a single wait_event



Here is the JSON code which you can drop into you custom graph to the above graph.







{
    "viz": "timeseries",
    "requests": [
        {
            "style": {
                "palette": "dog_classic",
                "type": "solid",
                "width": "normal"
            },
            "type": "bars",
            "formulas": [
                {
                    "style": {
                        "palette": "cool",
                        "palette_index": 0
                    },
                    "alias": "1 ExecGather PQO",
                    "formula": "query2 / query15"
                },
                {
                    "style": {
                        "palette": "warm",
                        "palette_index": 0
                    },
                    "alias": "1 WalWrite",
                    "formula": "query3 / query15"
                },
                {
                    "style": {
                        "palette": "warm",
                        "palette_index": 1
                    },
                    "alias": "1 WalInsert",
                    "formula": "query4 / query15"
                },
                {
                    "style": {
                        "palette": "warm",
                        "palette_index": 2
                    },
                    "alias": "1 WalSync",
                    "formula": "query6 / query15"
                },
                {
                    "style": {
                        "palette": "gray",
                        "palette_index": 0
                    },
                    "alias": "1 ClientRead",
                    "formula": "query7 / query15"
                },
                {
                    "style": {
                        "palette": "cool",
                        "palette_index": 2
                    },
                    "alias": "1 BufferFileRead",
                    "formula": "query9 / query15"
                },
                {
                    "style": {
                        "palette": "cool",
                        "palette_index": 3
                    },
                    "alias": "1 BufferIO",
                    "formula": "query10 / query15"
                },
                {
                    "style": {
                        "palette": "cool",
                        "palette_index": 4
                    },
                    "alias": "1 BufferFileWrite",
                    "formula": "query11 / query15"
                },
                {
                    "style": {
                        "palette": "cool",
                        "palette_index": 5
                    },
                    "alias": "1 DataFileRead",
                    "formula": "query12 / query15"
                },
                {
                    "style": {
                        "palette": "purple"
                    },
                    "formula": "query1 / query15"
                },
                {
                    "style": {
                        "palette": "warm",
                        "palette_index": 5
                    },
                    "alias": "1 transactionid",
                    "formula": "query13 / query15"
                },
                {
                    "alias": "1 relation",
                    "style": {
                        "palette": "warm",
                        "palette_index": 6
                    },
                    "formula": "query16 / query15"
                },
                {
                    "style": {
                        "palette": "green",
                        "palette_index": 5
                    },
                    "alias": "1 CPU",
                    "formula": "query14 / query15"
                }
            ],
            "queries": [
                {
                    "name": "query2",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:ExecuteGather,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query15",
                    "data_source": "metrics",
                    "query": "sum:postgresql.activity.waits{wait_event:CPU,$host_prod}.fill(zero).rollup(count)"
                },
                {
                    "name": "query3",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{$wait_event AND wait_event:WALWrite OR wait_event:walwrite* AND $host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query4",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:WALInsert*,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query6",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:WALSync,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query7",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:ClientRead,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query9",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:BufFileRead,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query10",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:BufferIO,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query11",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:BufFileWrite,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query12",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:DataFileRead,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query1",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{!wait_event:CPU,!wait_event:WALInsert,!wait_event:WALWrite,!wait_event:DataFileRead,!wait_event:ClientRead,!wait_event:WALSync,!wait_event:transactionid,!wait_event:BufferIO,!wait_event:BufFileRead,!wait_event:ExecuteGather,!wait_event:BufFileWrite,!wait_event:relation,$wait_event,$host_prod} by {wait_event}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query13",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:transactionid,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query16",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:relation,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                },
                {
                    "name": "query14",
                    "data_source": "metrics",
                    "query": "avg:postgresql.activity.waits{wait_event:CPU,$wait_event,$host_prod}.rollup(sum).fill(zero)"
                }
            ],
            "response_format": "timeseries"
        },
        {
            "style": {
                "palette": "dog_classic",
                "type": "solid",
                "width": "normal"
            },
            "type": "line",
            "formulas": [
                {
                    "formula": "0 * query0 + 96 * 1"
                }
            ],
            "queries": [
                {
                    "data_source": "metrics",
                    "name": "query0",
                    "query": "avg:postgresql.activity.waits{wait_event:CPU,$host}"
                }
            ],
            "response_format": "timeseries"
        },
        {
            "style": {
                "palette": "dog_classic",
                "type": "solid",
                "width": "thick"
            },
            "type": "line",
            "formulas": [
                {
                    "style": {
                        "palette": "warm",
                        "palette_index": 5
                    },
                    "formula": "3 * query0 / query1"
                }
            ],
            "queries": [
                {
                    "data_source": "metrics",
                    "name": "query0",
                    "query": "avg:postgresql.activity.waits{wait_event:LockManager,$wait_event,$host_prod,$host}.rollup(sum).fill(zero)"
                },
                {
                    "data_source": "metrics",
                    "name": "query1",
                    "query": "sum:postgresql.activity.waits{wait_event:CPU,$host_prod}.fill(zero).rollup(count)"
                }
            ],
            "response_format": "timeseries"
        }
    ],
    "yaxis": {
        "include_zero": false
    }
}

863 views0 comments

Recent Posts

See All

Comments


bottom of page