Tool to measure Index Selectivity
by Michal Tinthofer on 05/06/2012Sometimes when you plan to change your index design is good to know the columns of your tables. But not just a data type and max size.
Read moreImagine opening SQL Server Management Studio (SSMS) on a Monday morning and finding… nothing. Your databases are gone from the Object Explorer. You try a different machine, you try a different version of SSMS, but the result is the same. Empty.

This was the reality for one of our customers. The databases had been "invisible" for more than 24 hours. While the instance was running, the management layer was paralyzed.
Phase 1: The Ghost in the Machine
The first instinct was to look for locks. The customer checked running processes, but found no obvious blockers. Initially, we suspected a lock timeout issue, but the Activity Monitor showed no long-term schema locks.
Then we spotted it: A Backup process that was essentially a zombie. It had been running for 38 hours with 0 CPU activity, 0 reads, and 0 writes. It was stuck in a perpetual wait state: HADR_SYNC_COMMIT.
Phase 2: The Master Database Misdirection
The mystery deepened when we realized that this backup process was holding locks in the master database, specifically blocking access to sys.databases. This is why SSMS couldn't list anything.
However, a HADR_SYNC_COMMIT wait technically shouldn't happen on the master database. This was a crucial realization: the locked tables in master were just a symptom, not the cause. We were looking at a ripple effect from a problem located elsewhere.
Phase 3: 600,000 Events and the "Smoking Gun"
To find the root cause, we deployed Extended Events (XEvents). Within seconds, we captured over 600,000 events. After heavy filtering and analysis, we narrowed it down to 6 important records.

The events described a specific failure:
“Occurs when a committed LSN in primary has not been notified for the hardening from all synchronous-commit secondary replicas for more than 2 seconds.”
The Diagnosis: During a minor network glitch, a specific packet from the backup operation was lost. The primary node was waiting for the secondary to confirm that the data block had been written to disk (Harden). Because that confirmation never arrived, the session entered an infinite loop, retrying every 2 seconds and blocking system resources in the process.
Phase 4: The Non-Disruptive Cure
We needed to break the loop without crashing the production environment. We executed a surgical four-step plan:
The Lesson: When the system looks broken at the surface (SSMS/master), the real culprit often lies in the "plumbing" of the high-availability layer. A true SQL Master doesn't just fix the symptom; they trace the wait state to its origin.
Michal is a technically proficient SQL Server Specialist with a proven track record in resolving incidents and implementing changes within large-scale database infrastructures, ensuring maximum availability of services. Concurrently, as a Digital Content and Marketing Specialist, his priority is building strong online brand identities through strategic communication and creative storytelling. He consistently seeks new ways to enhance digital interaction, believing quality digital communication is key to success in today's connected world.
Sometimes when you plan to change your index design is good to know the columns of your tables. But not just a data type and max size.
Read moreRecently I had some issues with SQL server 2008 R2 service pack 1 installation. From first perspective it looks like an issue with corrupted installation package, but true issue was somewhere else. It was a windows installer.
Read moreAnother changes to SMT are done and ready for the release
Read more