Collation conflict
by Michal Tinthofer on 07/03/2012I think there is more simple and harmless way. First we need to understand the process of tempdb collation setting.
Read moreThe Ghost in the TempDB: How One App Held an Availability Group Hostage
In the world of SQL Server, TempDB is often called the "public dumping ground." We expect it to be busy, but we also expect it to clean up after itself. Recently, we encountered a critical incident where a 16GB database caused a TempDB explosion that threatened an entire High Availability cluster.
The culprit? Not a massive JOIN, but a combination of Always On Availability Groups (AG) and a "zombie" application.
The Crisis: Disk Space Exhaustion
It started with a standard alert:
DB error: OLE DB or ODBC error: Could not allocate a new page for database 'TempDB' because of insufficient disk space in filegroup 'DEFAULT'.
Initial checks showed that the space wasn't being eaten by user-created temporary tables or massive internal worktables (sorts/hashes). Instead, it was the Version Store that was growing uncontrollably.
The Architecture: Why Always On AG Changes the Game
To understand this, we have to look at how secondary replicas work. When you use Readable Secondaries, SQL Server automatically maps all queries to Snapshot Isolation.
This is done to ensure that:
To achieve this, SQL Server stores old versions of rows in the Version Store (within TempDB). This works perfectly... until a transaction stays open too long.
The Investigation: Identifying the "Zombie"
By querying sys.dm_tran_active_snapshot_database_transactions, we found the root cause. We identified sessions from the ANeT TimeComp application (specifically via login Anet_xxx from host) that were holding transactions open for an incredible amount of time.
Key Evidence:
The "Garbage Collection" Trap
The Version Store in TempDB operates on a simple rule: You cannot delete a version of a row if there is a transaction older than that version still active.
In our case, because the ANeT TimeComp application failed to commit or roll back its transactions, the SQL Server Garbage Collector was paralyzed. It couldn't clean up any newer row versions because that 11-hour-old transaction was still "active" in the eyes of the engine. Consequently, TempDB just kept growing until the physical disk was exhausted.
The Tactical Fix: Kill Jobs and Async Stats
While waiting for a vendor fix, we implemented a two-pronged defense:
ALTER DATABASE [YOUR_DB] SET AUTO_UPDATE_STATISTICS_ASYNC ON;
This allowed the engine to update stats in the background without blocking or being blocked by the application's synchronization issues.
Final Resolution and Lessons Learned
The issue was finally classified as an application-level bug where network instability caused the application to lose track of its state without closing its SQL connections.
The vendor later provided a hotfix, which was implemented as part of the plan to migrate this application to a different environment.
Master Takeaways for DBAs:
Log vs. Data Ratio: If your Log is 4x larger than your Data, you don't have a "big database" you have a "big problem" with transaction management.
Michal is a technically proficient SQL Server Specialist with a proven track record in resolving incidents and implementing changes within large-scale database infrastructures, ensuring maximum availability of services. Concurrently, as a Digital Content and Marketing Specialist, his priority is building strong online brand identities through strategic communication and creative storytelling. He consistently seeks new ways to enhance digital interaction, believing quality digital communication is key to success in today's connected world.
I think there is more simple and harmless way. First we need to understand the process of tempdb collation setting.
Read moreThere are usually multiple ways how to provide a view of a set of data as a report, involving decisions like; keep it easy to read, interpret the relationships between multiple figures in the correct way and to show user as much information as possible at
Read moreWe would like to inform you in this form of an important event (about which you may already know yourself). It also applies to the security of your SQL Servers. We think we should have this initiative from our side.
Read more