Rutherford's SQL Blog: 2011

Monday, October 3, 2011

Back To Basics - OVER and PARTITION BY

I attended SQL Saturday #97 here in Austin this weekend. I had a blast and came out of it with a bunch of new knowledge. The last session that I attended was hosted by Jennifer McCown (blog|twitter) and named "T-SQL Brush-up: The Best Things You Forgot You Knew". It was very lively and informative and introduced me to some functions I knew about but hadn't really seen the potential of before: Partition and Over. Below are a couple of articles I found about the functions. Enjoy!

SQL Awesomesauce: Tip OVER and PARTITION BY
and
Eliminating Duplicate Rows using THE PARTITION BY Clause

Now, here is my usual rant about keeping to some sort of schedule, but you have probably figured out that it's not going to happen right now. The company I work for recently got acquired by a bigger company (which I am assuming the way it normally occurs) and we are beginning the integration process. The upside is I still have a job, the downside is that I am going to be very busy for the next three months (Well, its not that big of a downside, I would rather be busy than bored any day). I will try to do a few Back-To-Basic Mondays this Month and hope to get out at least one regular Blog post but we will see. Sorry.

RutherfordTX

Wednesday, September 7, 2011

Beginning SQL #4 - Data Recovery

Last time I said I was going to talk about Creating and Modifying databases. Well, I lied. I re-read the chapter of my book and just wasn't feeling it. Instead I decided to skip ahead to data recovery. Today we are going to talk about the recovery model in SQL Server and your options for backing up your database.

"What I'm trying to say is when you buy a box marked guranteed, all your getting is a guranteed piece of shit. Hey if you want me to take a dump in a box and mark it guaranteed, I will, I have spare time. But for your daughter's sake, for your company's sake, you might wanna buy a quality product from me." - Tommy Boy

The reason warranties were invented is because its a fact of life that stuff breaks. The important thing for you, as a DBA, is to have a plan to make sure that when something breaks, like you lose a RAID array, you can make your company's data accessible as quickly as possible while returning it to a state as consistent as possible to just prior to the failure. You can achieve this by having your database set to the appropriate recovery model and making sure, not only, that you are preforming routine backups but that those backups are good backups.

Recovery Model
There are three available recovery models in SQL Server: Full, Simple, and Bulk-Logged. The Model you use will determine how the transaction log is used and what options you have for backup and restore.

Full Recovery Model - Using the full recovery model, SQL server logs all database transactions to the transaction log. This is the model you want to choose if your data recovery is a high priority for you because it offers you the most options for backup and restore. Since every transaction is logged in the transaction log, not only can you restore to a full backup but you can restore to any point in time since a full backup.

Simple Recovery Model - Like it's name this is the simplest option. It only gives you a few backup options and the transaction log is truncated whenever a checkpoint is issued. This model is useful if you have a read-only database or the database is a backup or dev database.

Bulk-Logged Recovery Model - This model is very similar to the full recovery model except that bulk operations aren't logged to the transaction log and you can't do a point-in-time restore.

Backing up Your Database
Now that you have your recovery model set you need to know about the different backup operations that you can perform with SQL Server. If you want to backup a database you need to be a member of one of three fixed roles: sysadmin, db_owner, or db_backupoperator. Remember, it is always best practice to give each user the minimum amount of permissions to do their job. If you have someone whose job it is to only backup your database you don't want to give them owner or sysadmin rights. There are three different backup methods that we are going to talk about: Full, Differential and Transaction Log.

Full Backup - A full backup does just what you think it would, it backs up everything you would need to make a full restore of the backup to the point in time that you took the backup. This includes: the database files, objects, data, and a portion of the transaction log. The backup causes a database checkpoint to occur so that the committed transactions in the transaction log get written to the database file. It also makes a note of the current log sequence number(LSN) so that it can grab any activity that was committed after the backup started.

Differential Backup - If you have ever done file system backups you are familiar with the term differential. it means all the changes since the last full backup. Differential backups only backup the changes that have been made since the last full backup. Since there is less data to backup the process runs much faster. This would be useful if you wanted to make a backup in the middle of the work day with minimal impact on the system. You must have the prior full backup to make use of the differential backup.

Transaction Log Backup - The transaction log backup will backup up the transaction log activity that has occurred since the last backup of any of the three kinds (Full, Differential or Transaction). When it backs up the transaction log it truncates the inactive portion of the log reclaiming the disk space that was taken up. If you are familiar with file system backups these are like incremental backups in that they are useless unless you have the last full backup AND all the transaction log backups since the last full backup.

You should become familiar with all three backup methods so that you can figure out the best way to provide your environment with maximum data reliability while keeping performance as high as possible and minimizing your restore time. A good backup plan could involve all three methods being used together. For example, you could do a nightly full backup and then do transaction log backups every 15 minutes and differential backups every four hours. Then if you had a failure at 3:45 you would only have to restore a maximum of nine backup files (Nightly full, 12pm incremental and the 7 transaction log backups since noon) and a tail log backup to get the server back up and running.

I final thought, I once overheard someone say "It's not a good backup until you have restored it". So not only should you be backing up your databases you should be testing those backups on a regular basis to make sure that you can recover if you have a failure. Backups are just one half of a good disaster recovery plan.

Hopefully I have given you a good overview of SQL Server backup. Next time I am going to write a post about server and database security, it's a subject that I am a bit week in and I think doing the research for the article will really help me. Once again I am going to make the claim that I will stick to a two week schedule but we all know how that is going so far.

Hasta Luego,
RutherfordTX

Monday, August 15, 2011

Back to Basics - ACID Properties

I was reading the Editorial from my DatabaseWeekly.com newsletter by Steve Jones (Blog|Twitter). The gist of the article was how knowledge is use-it-or-lose-it, the longer you go without using a skill the less of it you remember. One of his links was to an excellent article by Michael Swart (Blog|Twitter) about the Acid properties of a database. I am not going to try to rewrite his article, instead I'll give you a link.

ACID Properties by Example

I hope that every(other) Monday I can find an article somewhere on the web that goes into one of the database fundamentals that either you haven't taken the time to learn or maybe just need a quick refresher on. These posts will be in addition to and not instead of my "regular" blog posts.

Hasta Luego,
RutherfordTX

Friday, August 5, 2011

Beginning SQL #3 - Files and Filegroups

Ok, so I lied last time about having a fortnightly schedule. But this time I mean it, I will post every other week. This week I am going to talk about Database Files and Filegroups. On your server your data is stored in tables and those tables are stored in a database, but where does the actual database live? The answer is that there are physical files on your server where the databases are stored. There are at least two files for each database, the primary data file and the transaction log, but you can split your database up into many files for various reasons.

The Data Files

Primary Data File - This is the default file that is created when you create your database. This is where all your database objects will reside unless you add additional data files. The primary data file usually has the file extension .mdf (SQL server doesn't force you to use the extension but its standard practice). There is one primary data file for every database you create.

Transaction log - The transaction log does just what it says, it logs transactions. All data modifications are written to the transaction log before being written to the primary (or secondary) data file. This is what is commonly referred to as "write-ahead caching". I'll go more into how data is written to disk a little further down. Remember, you must have at least one Transaction Log file for each database. The transaction log file usually has the file extension .ldf

Secondary Data Files - Secondary data files can be created to supplement the data storage space of the primary data file. These data files are optional and usually have the file extension .ndf. You can use secondary data files for various reasons. For example, you could use them to stripe you data across multiple disks to speed up data access or to physically separate different database schemas into separate files.

Writing Data to Disk

We talked about the transaction log and how it uses write-ahead caching, well here is the process for writing data to disk. First the data pages are loaded into memory (unless they are already there from some previous activity). Then a record is created in the log cache. Once the transaction has been committed it is written to the transaction log from the log cache. Transactions are then written to primary or secondary data file when a checkpoint occurs. The Checkpoint process occurs for a variety of reasons.

If you issue the CHECKPOINT statement
If you stop the SQL Server Process
If you backup the Database or do a Transaction Log backup
If you have the database in log-truncate mode and the log becomes 70% full
If you perform a minimally logged operation
Based on the "recovery interval" server configuration option.

Filegroups

You can group your database files together into filegroups to aid with administration or allocation. There are two types of file groups: Primary and User-defined. The Primary file group contains the primary data file (.mdf) and any other files not assigned to a user-defined file group. It also contains the system database and tables. User-defined filegroups are just what they sound like, a filegroup created by the user. Log files are not associated with a filegroup. You have to add files to a filegroup when they are created. You can't move files around after they have been added to the database. Filegroups are handy because they can be backed up or restored separately and even be taken offline and brought back online separately. An example of when you could use them is if you had some tables that were read from more often (like archive data) you could put those tables into a secondary data file that you put into a user-defined filegroup and then move that filegroup to a separate disk to increase performance.

Hopefully I have given you a good overview of Database Files and filegroups. There is plenty more to learn so don't be afraid to use google to dig a bit deeper into the subject. Next time I will be discussing Creating and Modifying Databases.

Hasta Luego,

RutherfordTX

Thursday, July 14, 2011

Beginning SQL #2 - The System Databases

There are five main system databases that are created automatically when you install SQL server. They are master, model, tempdb, msdb, and resource. There is a sixth system database, distribution, that is only created when you configure replication on the server.

master - The master database contains all the information the system needs. I contains things login information, linked servers, and system settings. It also contains information on all the other databases on your server and the location of those databases. If the master database is deleted SQL Server cannot start.

model - The model database is a template for the user databases that you create. If you modify the model db it won't affect any existing databases but it will affect any newly created databases. For example, if you have a stored procedure that you like to have in all databases, you can create it in the model database and it will automatically be in any database that you create

tempdb - As I heard SQLSoldier say in a recent webinar. "tempdb is the junk drawer of SQL Server". Tempdb is a shared database used to store temporary tables. Everytime you restart SQL server tempdb is recreated. The proper or improper use of this database can have a huge impact on your overall server performance. Here is a link to the Slide deck and scripts SQLSoldier used in his presentation

msdb - This is the database that is used by SQL Server Agent to jobs and alerts. If the SQL Server Agent service is not running none of the jobs stored in the db will run. The msdb database also stores the history information for the jobs.

resource - This Database is where all the system objects physically live. They show up logically in every database but actually live here. These are all your system tables and system Catalog Views.

That's a quick rundown on the five main system databases. Tune in next time for Databases and Filegroups.
I hope to get a more regular schedule from here on out. I plan on posting fortnightly on Thursdays. Let's hope I can stick to that schedule.
Hasta luego,
RutherfordTX

Wednesday, June 15, 2011

Beginning SQL #1 - SQL Server

I took a SQL class last month to prepare for exam 70-432, which I hope to take later in the summer. One of the best ways for me to remember what I have learned is to write it down so I have decided to use my blog as the medium for that for the next few months. There were six classes that covered nine chapters and I should be able to get a couple of posts from each chapter.

SQL 2008 Server Services
SQL Server 2008 is made of seven main services. Not all of these services are required and some aren't even installed unless you do a full install. Below is a list of these services and a brief description of each one.

SQL Server Database Engine Service - This is the primary service for SQL Server and it is responsible for hosting the databases, managing data storage and access and generating indexes. The default instance of SQL server is always named MSSQLSERVER. Named Instances are called by what name you chose when you installed them.

SQL Server Agent Service - The Agent Service is responsible for job handling. If it is not running your scheduled jobs will not run. If you nightly maintenance plans and backup jobs are not running it is a good idea to make sure this service is running as by default it is disabled. The Agent Service is also responsible for managing replication, log shipping and database mirroring.

SQL Server Browser - The SQL Server Browser is the service that allows you to have more than one instance of SQL installed on a machine. The Browser service listens for incoming requests for SQL Server and provides information on the instances of SQL Server installed on the server. The Browser service is what enables you to install different versions of SQL server on the same machine. The Browser Service usually starts automatically. It won't start automatically if you only install the default instance during a new install.

SQL Server Analysis Service - The Analysis Service (SSAS) adds data mining and OLAP functionality to SQL Server. It allows you to use MDX (Mutlidimensional Expression) queries to analyze cube data and DMX (Data Mining expression) queries for data mining.

SQL Server Integration Service - The Integration Service (SSIS) is used to combine data from disparate sources. You can use Business Intelligence Design Studio (BIDS) to build ETL (Extract, Transform and Load) packages that can access a variety of formats such as: Excel, XML, Oracle, OLE-DB data sources and of course SQL Server.

SQL Server Reporting Service - The Reporting Service (SSRS) enables reporting functionality with SQL Server. You can create reports using Visual Studio, BIDS or the Report Builder that is included with the installation. The reports are administered via a web interface but can accessed using the Microsoft Office Suite or Sharepoint. You can configure SSRS 2008 R2 to work with Sharepoint 2010 during install by configuring it for Sharepoint integrated mode.

Full Text Search Service - This is a indexing service for searching for text in certain SQL columns. It is similar to using the like operator in a query but has performance advanatages. The service works with columms of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary or varbinary(max). This service is not enabled by default and for it to work a full text index must be created on each table that you want to use it with.

That's it for this week. Next week I will take a look at the six System Databases in SQL Server.
Hasta Luego,
RutherfordTX

Thursday, April 28, 2011

Finding The Top One Value Per Group

I ran into an issue the other day where I had a table like this:

ID	name	location	comment	comment_time
1	John	home	hello	1/1/2011 01:16:23
2	John	work	comment2	1/12/2011 13:23:10
3	John	home	blah	1/13/2011 02:02:02
4	Bill	work	smurf	1/15/2011 23:59:00
5	Bill	home	jello	1/17/2011 13:30:00
6	Bill	home	snafu	1/18/2011 12:00:19
7	Frank	work	batman	1/19/2011 14:15:16

Somebody came up to me and said "Hey John, I would like to see the last comment made by each person in the database and where they were when they made the comment". The logical answer is to use a Group By query. The problem is that you can't include a non-aggregate field in a Group By query which means I can't get at both the location and the comment in single query. I need to get a set of data that I can query against to get the result set i need. So each row in the return set has to be unique. I would love to use this subquery:

SELECT TOP(1) id, name

FROM table1

GROUP BY name

ORDER BY comment_time desc

But you get the following error:

Msg 8120, Level 16, State 1, Line 63

Column 'dbo.table1.id' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

So I have to use an aggregate function (SUM, AVG, MIN, MAX or COUNT). At first I thought about using the MAX function on the ID to give me the highest ID but that assumes that the comment_time field is generated on INSERT and isn't updateable (for example, it wouldn't work if the comment_times for ids 2 and 3 were switched). Since I don't know and can't control that I decided to use MAX on the comment_time field, like this:

SELECT MAX(comment_time) as comment_time, name

FROM table1

GROUP BY name

Which gives me:

comment_time	name
1/13/2011 02:02:02	John
1/19/2011 14:15:16	Frank
1/18/2011 12:00:19	Bill

You have to alias the MAX(comment_time) or else it will return the column with the header "(No Column Name)" which wold make it difficult to use it in the next step.

Now I can use that set to get the result set i want like this:

SELECT name, location, comment, date

FROM table1

JOIN

(SELECT MAX(comment_time) as comment_time, name

FROM table1

GROUP BY name) as sub

ON table1.name = sub.name

AND table1.comment_time = sub.comment_time

ORDER BY

table1.comment_time

Which gives us:

name	location	comment	comment_time
John	Home	blah	1/13/2011 02:02:02
Bill	home	snafu	1/18/2011 12:00:19
Frank	work	batman	1/19/2011 14:15:16

The only problem might be if you have two comments by the same person in the same day at exactly the same time. If the completion time is generated by the server on insert, you shouldn't have this problem but if it is generated by the application you might (especially if the field is editable). My answer is that if two comments were made at the same time, they are both the last comment and should both be returned so my solution works fine (for my needs). The other problem is that my solution doesn't work if you need to return more than one row per group.

Let me know in the comments if you have a different solution.

Hasta Luego,

RutherfordTX

Wednesday, April 20, 2011

Professional Development Planning

I went to a Cactuss meeting last night (the local Austin SQL users group) and heard a talk by A.J Mendo (twitter|website) about Professional Development planning and it put a name to what I am trying to do in my career right now so I thought I would share what I took away from it.

1. Own Your Career

Nobody is going to walk up to you and give you advancement opportunities (unless your last name is the same as the CEO's). You need to make yourself valuable and decide exactly what you want your career to look like. Do you want to be a DBA? A Manager? A VP? Think about what you want to gain out of your career and work towards it. That brings us to the next point

2. Set Goals

Take the time to sit down and map out some goals for yourself. Think about what you can do today. Think about what you can do in five years. Think about what you can do in 10 years. Goals can be as simple as joining Linkedin or Twitter or more complex like getting certified. Don't beat yourself up if you don't meet your goals. If you wanted to get certified by April but weren't ready, Don't Sweat It, reevaluate your goals and give yourself a new deadline. The important part is to keep evaluating where you are and where you want to be.

WRITE IT ALL DOWN. Get a journal and make a plan in it. Then go back and re-read it and revise it as needed.

3. Join the community

The best way to learn is to find people who do what you do only better and talk to them. There is a User Group in almost every major city for almost anything. You have your SUGs and your LUGs and your MUGs and your Pugs. Join one, go to the meetings and get involved. Raise your hand and ask questions. Answer questions. Introduce yourself. Stay late and meet people. Exchange Business Cards. Network. You never know what can come out of a chance encounter. Just by going to the CACTUSS meeting last night I came away with two job opportunities for a friend of mine who is currently unemployed.

Join online groups. There are tons of forums on the internet. Find one and join it and spend some time there. Many of us in the IT industry, especially those of us at small to mid-size businesses, don't have people to talk to about what we do and what we are doing wrong and how we can do it better. Forums are a great place to find that needed camaraderie.

4. Learn and Make Time to Learn

This one should be a no-brainer for anyone in the tech industry. We need to learn constantly just to keep up with our jobs. I buy books and subscribe to magazines and read blogs and try to listen when I can. It goes beyond just learning how to do what you do. Learn about different or better ways to do it. Learn about things you didn't even know you could do. I wish I had researched SSRS a year ago, I have been doing things with third party apps like Cognos and Crystal I could have been doing straight from SQL.

Make time to learn. Find a time of day you can stop working and read something. I use the netvibes website to group rss feeds of the blogs I read into one page so when i get to work in the morning I can get all the new posts from one place. Here is the public page to my SQL tab in netvibes: http://www.netvibes.com/rutherford#SQL .

Don't be afraid to learn at work. I have never met a boss who would get angry at an employee trying to do his job better (as long as productivity doesn't suffer).

5. Build a brand.

I read an excellent article by Andy Warren (website|twitter) which can be found here about building your brand. That article actually inspired me to join twitter, start this blog and join my local user group. I won't try to repeat what he says, I'll just ask you to read it and hopefully it will change your career life like it did mine.

If you are a clock watcher who thinks of his/her job as a way to pay the bills, this post probably isn't for you. But if you love your career life (not necessarily your job) as much as you love your home life, I hope you give this some thought.

Hasta Luego,

RutherfordTX

Friday, April 8, 2011

Updating a Table Using a Select

I recently had a situation at work where someone had exported data from our SQL Database into excel so they could update a column. Then they asked me to figure out how to update the data "my way", as they called it, so they didn't have to manually update all the changes. Coming from more of a programming background my first thought is to do things procedurally. I have done quite a bit of Perl and PHP coding so I would usually reach for a nice For Each loop and do it that way. Since T-SQL is set based I am having to retrain my brain to think in sets rather than procedures.

So I had a table like this:

pk_num	new_val
1	NULL
2	NULL
3	NULL

And I was given back a table like this:

pk_num	new_val
1	Frank
2	Joe
3	Sam

After scratching my head and looking through my big stack of books I decided the best way to do this would be a subquery. Lets call the original table table1, I imported the new table into SQL and called it table2. Here is the code I ended up using:

UPDATE table1

SET

new_val = table2.new_val

FROM

(SELECT pk_num, new_val

FROM table2) AS table2

WHERE

table1.pk_num = table2.pk_num

---------
EDIT: 8/16/2011
Since I am not using a where clause in the subquery, it is not actually needed. The query could be rewritten like it is below. I'm leaving it as an example and because it is the basis for the next part
UPDATE table1

SET

new_val = table2.new_val

FROM table2

WHERE

table1.pk_num = table2.pk_num

---------

It was a bit more complicated than that in that I didn't want to overwrite existing values in the new_val field in the original table if there was no value in the field in the update table so the end result actually looked like this:

UPDATE table1

SET

new_val = table2.new_val

FROM

(SELECT pk_num, new_val

FROM table2

WHERE new_val IS NOT null) AS table2

WHERE

table1.pk_num = table2.pk_num

Hope this helps someone. I know this is pretty basic stuff, but when you are just starting with T-SQL or, like me, are use to working with procedural or object oriented programming languages its's hard to train your brain to think in concepts like subqueries.

Hasta Luego,

RutherfordTX

Wednesday, April 6, 2011

First Post!

Howdy. I have been in IT for over 15 years but have just recently developed an interest with MS SQL server. I am in the process of studying to get certified and have been reading SQL books like crazy to help me learn. I want to use this blog to help me in my learning process. I remember from school that I always learned a subject better if I had to write a paper at the end of it. I also will post some of the real world problems I encounter as I try to manage, write code and create reports for the two production SQL Databases that we run at my office.

I thought for this first post (besides introducing myself) I would list some of the resources that I have used in the last year as I have begun to familiarize myself with MS SQL server and T-SQL.

Books:

Websites:

Twitter - I joined twitter and started to follow some SQL pros that I had heard of:

Training:

This list is tiny compared to the resources available out there for the budding SQL star. Hit Google and see what you can find. Let me know of you favorite sites.

Hasta Luego,

RutherfordTX