Thursday, October 12, 2017

The Curious Case of Compression - An Availability Group Story

After we started using SQL 2014 availability groups in the corporate environment we ran into a few isolated situations where the business asked us to turn off synchronous replication between the HA nodes because the performance was unacceptable. The DBA team began to come up with some tests to compare workloads in both asynchronous and synchronous mode so we could try and measure the difference between the two and what penalties are incurred. Some workloads performed within an acceptable margin of difference but with some types of workloads we noticed a penalty of over 40%. When you use synchronous replication to ensure high availability you understand there is a performance hit to ensure maximum up-time but almost doubling run times is unacceptable.
When SQL 2016 was released the marketing tag line that we heard was: "It just runs faster". Microsoft turbo charged availability groups with an improved log transport process and some compression enhancements. Their goal was to have 95% transaction log throughput when comparing a server with one synchronous secondary to a stand alone workload. They had very good results as can be seen in the below chart. The blue line is a standalone system, yellow is SQL 2014 and orange is SQL 2016.

Based on this, we installed SQL 2016 onto our test servers and reran the tests we had designed for SQL 2014 and were surprised that performance didn't improve, in fact in our tests, the performance seemed to degrade in 2016. Lets look at the test setup and then what changed between SQL Server 2104 and SQL Server 2016

The Test:

We wanted to design a test that:
  1. Would have reproducible results
  2. Had multiple workload types so we could see if there was a change in asynchronous vs synchronous performance based on workload.
Based on these criteria we decided not to use a 3rd party tool like Hammer DB (the tool MS used for the above graph) but instead design our own scripts. While this method usually leads to very contrived workloads that aren't representative of a production load, we didn't see that as an issue. We were just trying to test performance and not benchmark anything. After some initial testing where we noticed that performance changed based on the number of inserts and rows inserted (1 insert of 10000 rows versus 10000 inserts of 1 row) we decided on a testing methodology using only inserts.  I setup a server with two databases and two availability groups, one asynchronous and one synchronous. I then created three tables in each database: a source table with over one million rows, an empty destination table and a tracer table to record the results. Then I created a job to insert a specific number of inserts a specific number of times.  See the chart below for how the workload changes in each run.
Num of InsertsRows Per Insert
I ran the test through 100 loops for each number-of-inserts/rows-per-insert combination. For each loop I would record the start time and end time and log it to a table. Then once I had finished all the tests, I could pull data out to compare the results.

The Results:

Results in 2014:

InsertsRows Per InsertAsync msSync msPenalty msPen ms/insPenalty %Async Ins/sSyn ins/sPenalty ins/s
You can see that you end up with a larger penalty per insert if you do a high number of singleton inserts as opposed to doing one insert of a high number of rows.
We ran the same tests using the same setup in SQL 2016.
Results in 2016:
InsertsRows Per InsertAsync msSync msPenalty msPen ms/insPenalty %Async Ins/sSyn ins/sPenalty ins/s
Notice the Sync MS column in the second chart. The Synchronous times were much higher than in 2014 and in some cases the run times more than doubled when switching between the asynchronous and synchronous availability groups. I was very surprised that the new version of SQL that "just runs faster" was seemingly running slower under synchronous commit. I decided to open a case with Microsoft and see if they could explain why there was apparent performance degradation in SQL2016.

The Answer

After working with one Microsoft support tech for many months and not getting any answers, I was escalated to another engineer who was very helpful and gave me the answer on our first call. The difference comes down to what was changed between SQL Server 2014 and SQL server 2016. One of the things that was changed was how log stream compression was handled by default for synchronous AG setups. In SQL 2014, log stream compression was turned on by default for both asynchronous and synchronous setups. In SQL 2016, log stream compression was left on for asychronous availability groups but turned off for synchronous ones. You can enable log stream compression for synchronous availability groups using trace flag 9592. See these two links for information from microsoft:
Once I enabled the trace flag in my 2016 instance and reran my tests the results in 2016 began to look like I had originally expected them to

Results in 2016 with log stream compression:
Rows Per Insert
Async ms
Sync ms
Penalty ms
Pen ms/ins
Penalty %
Async Ins/s
Syn ins/s
Penalty ins/s
As you can see the synchronous vs asynchronus times have less of a difference and the penalties per insert per second are better now overall than they were in SQL 2014.

The Conclusion

Know your workload and know what changed. Do I recommend turning on trace flag 9592 on every server? Of course not. 
Almost every time you ask a performance question to a SQL server professional you will get the same answer: "It depends". I may seem like a cop-out but it is very true. There are a lot of factors that go into performance and one of the key ones is workload. Is your workload true OLTP with a mix of reads and inserts/updates/deletes or is it a pure read workload like a AG read only replica or is it a datawarehouse where you get heavy inserts during the night and mostly reads during business hours? Knowing what type of workload you have helps you answer the question of should you use a trace flag or not. In my test case, I was doing pure inserts of char data which benefitted greatly from having compression turned on. Your workload may be such that you see a performance degradation by using the trace flag.
When you upgrade or migrate a database from one version to the next, you need to understand what changes were made under the hood that could impact you. From a production DBA side, you should understand what changes are being made to the features that you use in your environment (like availability groups). If you are on the application DBA side of things you should understand what changes are being made to the query engine. You should know about the changes that have been made to the query optimizer in the last few releases, how those changes can affect your code and how to turn them off or on. In SQL 2016, Microsoft took some features that were accessible through trace flags (like -T1117 and -T1118) and made them default. By understanding what the changes are and how they will affect you, you will be better equipped to resolve an issue when it pops up and hopefully you won't spend months looking at something that could have been fixed with a simple start up trace flag.

Friday, July 28, 2017

Dynamic Quorum and Dynamic Witness in Windows 2012 R2

The Setup

The DBAs were having a discussion this week about how we should setup Quorum on clusters that have different sets of nodes. We looked at our current setups and noticed that there are some inconsistencies. Some have a File Share Witness and some do not, some times the nodes in the BDC (DR) have votes and some times they do not. We would like to have a standard configuration for all clusters. It makes it easier to setup and easier to validate if something is not correct.

In our environment we have three basic types of SQL clusters.
  • Two node cluster - One node in the PDC and one node in the BDC
  • Three node cluster - Two nodes in the PDC and one node in the BDC
  • Four node cluster - Two nodes in the PDC and two nodes in the BDC
We had some concerns over what happens when you use a file share witness in a two node cluster configuration so I created a cluster in our lab and tested out some scenarios. In Windows Server 2012 R2, Microsoft introduced the concepts of Dynamic Quroum and Dynamic Witness. I am not going to rehash what can be found using google but here are a couple of articles I used when running this test:



The Testing

Two Node Cluster with File Share Witness

Initial Setup

I setup a cluster named ASTCL-DUO with two members, Batman and Robin. I added a File Share Witness and gave both nodes a vote

Then I checked the cluster configuration using PowerShell

You can see the quorum dynamically configured the witness to have 1 vote (WitnessDynamicWeight) so that we have an odd number of votes.

Remove One Node

Next I removed one of the nodes from the cluster to see what would happen to the assigned votes and the Witness. I did this by shutting down the cluster Service on Robin.

I was very surprised what I saw when I checked the cluster quorum settings

Not only did the File Share Witness not get removed but the Node Weight for the cluster node that was shut down stayed at 1. It still has a vote assigned. Dynamic Quorum did nothing. I thought about this and it makes sense. You had three votes, you only need to have two for a quorum so there is no need to change anything. My only concern with this is what will happen if the File Share Witness goes down. I will test that later and add the results below

Remove One Node and File Share Witness

Now we are going to leave the Robin node down and shut down the File Share Witness as well. Since I am using my DC as the file share witness I am just going to turn off sharing for the folder that is being used as the file share witness

Our node status doesn't change but now we can see that the File Share Witness has failed

Using our PowerShell Script we can see that nothing has changed with the Quorum

The kicker is that since there are three voters in the quorum but we only have one available the cluster shuts down

I would have thought that dynamic quorum would have removed the votes from secondary node and the witness when the secondary node went down to prevent this from happening.

Two Node Cluster with File Share Witness and One Voting Node

Initial Setup

For this series of tests I started the cluster service back up on Robin and removed it's vote to see what would happen to the cluster when we started removing things

I ran the cluster quorum configuration script

Since we started out with an even number of votes (1), the cluster decided it did not need the File Share Witness so it set the WitnessDynamicWeight to 1

Remove One Node

I shut down the cluster service on Robin again.

The Quorum configuration will be the same since the node didn't have a vote

Remove One Node and File Share Witness

Leaving the Robin node down, I next turned off sharing for the File Share Witness folder.

The Cluster Quorum configuration will be the same

The interesting thing here is that the cluster does not go down like it did in the first setup. I let it sit in this state for around 5 minutes to be sure.


If you read through any cluster documentation found on the web regarding using a File Share Witness (see my two links above), it seems to be best practice to use a witness even if you have an even number of nodes. If you try to configure your cluster Quorum without one you even get an error message about it during setup.

Since we currently are not running an active/active setup and DR failover is a manual event, we also do not need to have votes on the nodes in the BDC.

Going forward (and possibly backward, we might need to fix some current clusters). It seems like the best setup is to

1. Always have a File Share Witness

2. Always remove the votes from DR (and AG read servers)

This will prevent the cluster from coming down as long as the Primary node is available, it can survive the loss of both the secondary node and the File Share Witness.
This might change if we move to an Active/Active setup or the FSW goes to a third party site but I think we should build it into our current architecture.

Code used in this Blog

So you don't have to retype from my images

# Get node weight

get-cluster |ft name,dynamicquorum, witnessdynamicweight -Autosize

get-clusternode -Name * |format-table NodeName, DynamicWeight, NodeWeight -Autosize