Thursday, July 16, 2015

Replication - To init or not to init that is the question


We use replication quite heavily at work. We are just getting into availability groups now so previously replication was the only way to get readable data onto a secondary server. Last week I was primary on-call and there were two changes to be implemented on Saturday dealing with replication. One change was adding a table to replication and the other was altering an existing table and an existing view to add a column to both. I was curious if any or all of the changes would require a re-initialization or at the very least that I would have to run the snapshot agent. Like anything I do, I wanted to be sure before I had to do the actual work so I did some testing. I have two test servers with the AdventureWorks database setup to be replicated between them.

First Test: Adding an article to a publication

  1. I created a new table in the publisher database 
  2. I added the table as an article in the publication 
  3. I checked replication status on the Subscriber and it says "The initial snapshot for article 'JohnReplTest' is not yet available." 
  4. Ran the Replication Snapshot Agent for the publication on the publisher 
  5. Check the subscriber and the table was there and synced. 

Second Test: Adding a view to an existing publication

  1. Created a view in the published database 
  2. Added the view as an article in the publication 
  3. Checked the subscriber but the view was not there and the status said: The initial snapshot for article 'vw_JohnsReplView' is not yet available. (I did not realize you needed a snapshot for a view) 
  4. Ran the Replication snapshot agent job for the publication 
  5. Checked the Subscriber and the view is now there. 

Third Test: Adding a Column to a Table and to a View

  1. Altered the table to add a column 
  2. Altered the view to add the same column 
  3. Checked the subscriber and the information was there. 
I did not need to do anything additional to get the data across.

Fourth Test: Adding a column with a default value

  1. Altered the table to add a column with a default value 
  2. Altered the view to add the same column 
  3. Checked the subscriber and the column was there with the data. 
I didn't need to do anything additional to get the data across.

Sum Up

I find Replication a difficult technology to work with, there are so many facets that change based on what you are doing. It's easy to go the route of a full re-initialization when something goes wrong but if you have an oversees subscriber you have to consider the time it will take to push all the data out.
Out of my tests I was most surprised that adding a view required running the snapshot agent. I did not think that creating an object that had no data would require a snapshot. The saving grace is the fact that if you have not re-initialized any of the subscribers, the snapshot agent will only publish the new articles not the whole publication. Try these tests for yourself. The best way to learn about something is not just to read but to actually experiment.

Hasta Luego,

Wednesday, July 1, 2015

TempDB - Round Robin and Proportional Fill

Proportional Fill

My post from last week sparked quite a bit of discussion at work about how the data files for TempDB fill up when they are not of equal size. Pretty much all of the documentation and reccomendations out there say to keep the data files the same size so that the round-robin data flow works properly. What we were having trouble deciding is what happens when they are not the same size, for example if you have one file that is 5GB and three files that were 500 MB.

The discussions ended up in two different camps.

  1. TempDB would allocate pages in the first larger file until the free space was equal to the three smaller files then the round robin would continue.
  2. The allocations would still be handled in a round robin fashion with the larger file just getting more so that they would fill up evenly (from a percentage standpoint). 
Like any good discussion we decided to take this to our test boxes and see if we could figure it out The answer I found comes down to a little guy named proportional fill.

The Test

I took my test server and created four TempDB data files. Look at the image below to see how they looked when I started out. Notice they all have a very similar free space percentage
(The script I used to view this info came from Dave Turpin)

Then I wrote a simple script that would create a temp table and add data to it
DECLARE @x int
SET @x = 1

(  id BIGINT, SomeValue char(5000))

WHILE @x < 362846
INSERT INTO #MyTempTable ( id, SomeValue )
VALUES  ( 99999999999, 'ddddddddddddddddd')
SET @x = @x +1

I picked the number 362846 because I knew that would get the files near 50% full and I wanted to add something to the test half way through. After this ran I checked TempDB again using my script and got the below results

As you can see it looks like the second camp was correct. Proportional fill was used to allocate pages to each data file according to its free space percentage and not it's size. The result is that there were a lot more allocations to the larger file but at the end they all ended up around the same free space percentage

Add to the Test

Now I wanted to see what would happen if you added a new TempDB data file, so I added another 500 MB file to TempDB. My thought was that if I now had four files at 55% free and one file at almost 100% free would it fill up the new file until it was at 55% or continue to allocate pages in all the files
After the file was added I ran my test again and checked the file usage twice
Once about halfway

And when the script finished

As you can see, the allocations were made to all of the files, not just the new one. It is just that more allocations were made to the new 500 MB file than to the old 500MB files during this second test (361MB as opposed to about 192 MB). Proportional Fill strikes again! If you look at how the numbers are trending you can see that if I hadn't set a limit on my while loop all of the files would have been almost equally full when the first one completely filled up.

Sum Up

This was a neat little experiment for me. I enjoy being able to test things out rather than just taking as fact what I read some where. In fact, quite a bit of the documentation I found was misleading on this subject. If you would like a good (and more thorough) look at our friend Fill, check out this post by Jonathan Kehayias:

He uses extended events to show this so you can actually see the number of file writes per data file. His test is not specifically on TempDB but the allocation should work the same way.
Also check out this article by Paul Randall for more info about proportional fill:

I'll be back in two weeks. Hopefully I can squeeze one more blog post out of all my TempDB testing.

Hasta Luego,