Greetings from Woodler,
Today, we have prepared another release of SMT 1.11.0 for you. It is rather significant, and it should be since we haven’t had a big one since last summer. So take your time, sit down, and let’s go through the news in SMT.
Significant changes and features:
There are several important changes which hopefully you will enjoy. We tried to focus on specific aspects and go deep as much as possible [which is by the way an SMT signature approach] to provide detailed and valuable data.
Session Thread Wait Statistics has always been my wish to have in SMT, and finally, we have. And to be honest, in the most advanced way I have ever seen in monitoring tools. I am not talking only about reporting, but also collection. From what I saw in other tools, there could be several fundamental flaws in collecting such a volatile statistic. So, we took more time to do it right. No cumulative statistics from the session start were collected for the currently running query, and no collection for only current wait statistics of the running session. We correctly identify one of the six possible situations where the query could be and, based on the situation, perform the correct collection operation.
As a result, you will see more detailed wait statistics for sessions than ever before, and I am looking forward to your feedback!
Advanced collection has rules, and you can influence them with new check parameters for the Session Waits check in SMT. Go ahead and learn how to use them.
For reporting, you have a completely new report in the Waits group called The Session Waits. It allows you to see data in table fashion with cumulative values and an aggregated timeline column graph (as you know from Waits Summary, for example).
This report supports aggregation on:
- Wait Type
- Session
- Database
- Login
- Resource Governor Workload Group
- Query Hash
- Plan Hash
NOTE: All those attributes allow you to filter, so now you could answer a very specific questions like how much waits over time was generated by this database on my server. Or, which query from this workload group generated most CXpackets. The options are wide.
Waits in Current Activity are another place where we see added value. If you need to watch the current server activity for specific sessions, now you will have on top of standard session performance metrics also session wait statistics in over time and cumulative manner. And that’s not all, we have developed several workarounds for sys.dm_exec_plan_attributes bugs and this allows us to make the “current index processing” table more reliable in terms of estimated statistics. Plus, we make available the current estimated plan of running session for download, so you may quickly check the plan in Plan Explorer if needed.
To reflect all those changes, we renamed the Index Processing tab, which is now the Index Processing & Thread Waits Information tab.
Performance monitor counters have been for a long time a little pain to manage. As we added more and more counters, especially with collection per every database, the perfmon table started to grow more than we wanted. So, in this release, we have come up with some fine-tuning:
- The majority of counters are now collected at 60-sec intervals instead of 5-sec intervals.
- We rewrote the DWH process to be faster and lighter.
- Collection to DWH will happen more often, so we have data available in respective levels quicker.
- The previous step allows us to reduce retention in the Perfmon table (which collects in 5-sec samples) from 48hous to 4 hours as we will already have data in DWH in 15 15-minute samples. So, if you want to see higher granularity, use “NOW” and “4 Hours” intervals in the time chooser.
- To compensate for less detailed data at the lowest granularity, we have added around 60 new counters into SMT. We will introduce them in the reports in upcoming releases. Some have already started to show in this release (check the new features section)
To allow this tuning to happen on your servers we dropped (if applicable) your custom settings for perfmon collection and retentions and aligned them back to the BASE config. You may still change them if you wish to have more granularity of data at the expense of storage requirements.
Another major update has been made in the Queries > Execution Plan report. Here, we expanded the information available in several tables. There are so many of them that it will be better for you to look at this report. Table and graph descriptions have also been updated to provide you with more insight.
But in general:
- The Plan Performance Attributes table now shows the compilation and memory statistics of the plan.
- The ANSI Standards present how standards were configured during plan creation
- The Used Statistics tab will show you which column statistics were used during compilation and how old or inefficient it was at the time of plan compilation.
- The Missing Indexes tab will give you a quick peek into the plan and its recommendations
- The Plan Operators tab will show you all operators together with not only estimates, but also with actual executions and rows (requires SQL Server 2022 and LAST_PLAN enabled on database).But that’s not all, we also report Plan Operator runtime statistics which are useful for diagnosing various issues like uneven load distribution during parallel operations. I am very proud of this report, and I believe we are the first to have it in the monitoring tool.
- Another tab, the Plan Waits, will provide you with Query wait statistics over time (Requires SQL Server 2017 or above and Query Store enabled).
To give you even more control over observer overheat of SMT, we have come with new and redesigned reports to provide insight into which checks are most costly and how SMT collection is performing.
SMT Settings > Collector Status have redesigned tabs and added filter on check name for the Last 100 checks log table.
SMT Settings > Runtime Stats now provides very detailed performance and error metrics in last day or last month fashion. On top of that, we have also added a second tab to show check performance over time for the last 12 hours. Here, you may find how often and for how long our checks are running on the SQL Servers, together with check errors over the last 12 hours.
We believe in transparency. It would be a shame not to provide the best performance metric for complicated monitoring tools such as SMT, so we also keep our standards here.
Other New Features
Smaller features that could be useful for you:
The Queries > Execution Plan > Plan Statistics tab got a Stale Lag [min] column to quickly identify obsolete statistics during plan compilation.
The Workload KPI tab has been extended and moved as a stand-alone report into Queries > Workload KPI. We believe it will help with its accessibility since it could provide an interesting baseline as to how your complete workload is doing over time and show you any anomalies which could arise.
Queries> Query Summary tab received a new column called Significance in the table. The purpose of it is to give you one complete metric on which you may evaluate the queries across more than one chosen attribute (e.g., CPU, Log. Reads, Phy. Reads, etc.…). It uses an algorithm to weight between various metrics and produces a single value, usable in comparison with other queries. This algorithm currently uses the highest emphasis on the CPU, followed by physical reads, logical reads, and writes. By scaling down read and write metrics, the score provides a balanced view of resource utilization, enabling precise identification of queries that significantly impact server performance. This algorithm can be changed by editing a check parameter called Q_SIGNIFICANCE.
Added logging for SMT configuration level events (Checks, Parameters), so now you may visit the Administration > Logs > Events report to see who changed which SMT configuration setting.
To reflect recent changes from SMT 1.10.1 in query collection, we added an Object Name column to the Queries > Hash List report, so you may differentiate from which object your query has been called. Also, we have replaced batch text with Object Name in Queries > Query Detail. In case your query is not called from the module, the reporting remains the same as before, and you will see a complete batch.
Added limitation for check retention to 48 hours as the minimum allowed due to DWH requirements.
We updated the CPU & Tasks > Overview report to show process ID instead of #number at the end of the process name. This will help you to keep track of specific processes in case of more instances of the same process running on the OS. Also, we added a Kernel Time Graph and a little bit of sorted Average CPU Process Consumption graph to show the SQL Server process as a first (green) chart. This chart is now a stacked chart for better visibility of total CPU consumption. SQL Server Process showing variation in CPU usage by SQL Server received its own graph, instead of the previously shared one with Average CPU Process Consumption.
The Tempdb > Version Store report now provides a graph with average and maximum Tempdb Version Store Usage by Database. This will give you a clean view which database is participating on version store size mostly.
Queries > A/B Testing received a new front end message when cmd_shell is turned off, so some data will be missing. We have removed the Server Statistics tab and replaced it with Thread Waits and Server Throughput tabs. Those contains graphs previously located in removed tab.
Memory > Internal Memory has new graphs with important metrics like Memory Grants, Checkpoints, and Free list metrics. Some are interesting for memory troubleshooting. For example, the usage of Indirect Checkpoints can be identified in those new graphs. Special attention has been given to Free List Statistics, which will now show you by color how hot your free list is, what type of workload is running on the server from the memory perspective, and how many requests are blocked from execution due to unavailability of free memory pages. Very interesting stuff, in my opinion.
Other Fixes
Simplification of the Query Summary reporting procedure allows for better performance and a dynamic significance column algorithm.
Performance tuning for the Memory > Internal Memory reporting (10x improvement in graph load).
Queries > Execution Plan > Plan Detail Waits has Error handling in case you don’t have Query Store enabled.
Fix for Indexes > High Fragmentations not showing data in the timeline due to filter input issue.
Fix for Queries > A/B Testing not showing any values in the Database filter. Reorganized Server Statistics into the Thread Waits tab and Server Throughput. There are minor visual fixes there.
Updated Current Activity > Ex. Plan Processing to show all plan operators. Also fixed another bug when the current operator was not showing. Added Message into output table instead of reporting error if the session is not running. Also, fixed current index processing is not showing time estimates properly.
Fixed bug Current Activity > All Thread Waits For The Session waits showing data in ms. instead of sec.
Small fix for SMT settings > Collector Status, which has the wrong ordering of STAMP data.
Fixed Deadlocks/Blocks sparklines on the Dashboard not showing values.
Performance tuning for Always On > Current Status.
Waits > Waiting Tasks fix for duplicate Queries and Plans.
If you are reading these lines I want to especially thank you for your time reading those release notes. It is important to understand why and how things in SMT have evolved. We, at Woodler, hope this release will give you as much enjoyment as it has for us and in case of any issues feel free to use the “Organize meeting with Woodler” button in the top right corner of SMT. Until next time, happy tuning!
Regards,
Michal