Feb 25

Extracting Procedure Name from Cached Plans

The other day, a friend of mine shot me an email asking how he could extract the procedure name from cached plans. He was using the query I referenced from Jonathan Kehayias to get subtree cost, but he really wanted to see which procedures aligned with which cost. I assume this was to look at the ones with the highest cost to optimize them. I told him I didn’t have anything, but I liked the idea so I went to work.

Why reinvent the wheel? I started with a query from Jason Strate:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

;WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT cp.usecounts
,cp.size_in_bytes
,cp.cacheobjtype
,cp.objtype
,pa.attribute
,pa.value
,pa.is_cache_key
,qp.dbid
,qp.objectid
,qp.number
,cp.plan_handle
,qp.query_plan
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) pa
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
WHERE attribute = 'dbid'
AND value = DB_ID()
ORDER BY cp.usecounts DESC;

I use this query a lot to find cached plans for a specific database. In fact, that’s pretty much the title of his post… I needed to take this and modify it such that I could get the object name of any procedure. So, step 1 was to make sure that I was only looking at stored procedures. To do that, I had to filter the results on cp.objtype and limit it to ‘Proc’. The next step was to filter pa.attribute to ‘objectid’. This is because the objectid attribute of a ‘Proc’ objtype is the id of the procedure itself. Since this query is already bringing back the id of the database for each object, I could use the OBJECT_NAME() function to retrieve the name of the procedure. Finally, I removed system objects by making sure the objectid was greater than 0.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

;WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT cp.usecounts
,cp.size_in_bytes
,cp.cacheobjtype
,cp.objtype
,pa.attribute
,pa.value
,OBJECT_NAME(CAST(pa.value AS INT),qp.dbid) AS proc_name
,pa.is_cache_key
,qp.dbid
,qp.objectid
,qp.number
,cp.plan_handle
,qp.query_plan
FROM sys.dm_exec_cached_plans cp
CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) pa
CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp
WHERE attribute = 'objectid'
AND cp.objtype = 'Proc'
AND pa.value  > 0;

Voila! Now can can get the names of all of your stored procedures that are currently in the plan cache!

Jan 07

So you want to learn SQL Server? – Part 2

In Part 1 of this series, we learned how to do a very basic install of the free version of SQL Server (Express Ed.) in order to start learning the basics. In Part 2, we will be downloading a sample database called AdventureWorks that we can attach to our SQL Server instance and begin using right away.

The first thing you need to do is download the sample database here. If that link doesn’t start your download for you, go to http://msftdbprodsamples.codeplex.com/releases/view/125550. You will want the .zip file found in the “Recommended Download” section. The file is named  “Adventure Works 2014 Full Database Backup.zip”. Go ahead and save that file to a known location like downloads so that you can reference it when it is complete.

Once you have that file, navigate to where the file is and right-click on the file to being up the options and select “Extract All…”. This will bring up another window that will allow you to extract or unzip the file:

 

 

ExtractAll1

Either choose a new path or just go ahead and click on Extract. It will open up the folder with the extracted file for you. It will look like this:

ExtractAll2

In SQL Server, we give backup files the extension .bak. This is a backup file that can be restored using SQL Server Management Studio (SSMS) that we installed earlier or by some other options like scripting. Note the file path and name (or just keep it open to copy and paste).

Now we can go and open SSMS and connect to our local instance of SQL Server. If you are using Windows 8.1 or later, use the shortcut Windows+s and search for “SQL Server Management Studio”. If you are on an older version or prefer to navigate, just go to “All Programs” or “All Apps” find the folder for “Microsoft SQL Server 2014” and expand it. Then go to “SQL Server Management Studio 2014”. Open that up and a window will pop up that looks like below, but you will have to type in “localhost” for the server name.TestInstall1

Now, you can click “Connect” giving you the screen below:

TestInstall2

If you click on the + to the left of the Databases folder to expand it out, you will only see another folder for “System Databases” that we will ignore for now. What you are really seeing is that you have no “User” databases to query. To restore our AdventureWorks2014 database, you can right-click on the Databases folder and select the “Restore Database” option will will open up a window:

RestoreDB1

To restore from a file, we’ll change the Source to “Device” and either use the path of the .bak file that we extracted earlier or just use the ellipsis to navigate to the path and click on the file name:

RestoreDB2

Once you click OK, it will take a couple seconds to a minute to restore, depending on your CPU and disk speed. Now you will see the AdventureWorks2014 database in the window on the left.

RestoreDB3

Now, you can expand that out and look at some of the objects contained within a database. They are displayed in folders just like a file structure when you look at them using SQL Server Management Studio. In Part 3, we will write our first SQL query against this database. However, if you can’t wait for me to write Part 3, I suggest checking out SQLServerCentral.com’s Stairway to T-SQL DML. I HIGHLY recommend SQLServerCentral.com for their stairways, forums, and articles. If you don’t already have an account there, go sign up now! If you have any questions or this doesn’t work. Email me at jkarney@sqlknowitall.com or send a tweet to #SQLHelp

 

 

 

 

Nov 12

So you want to learn SQL Server? – Part 1

After attending the PASS Summit last week, I really got pumped to start blogging and speaking. I wanted to talk about spinlocks and latches, parsing xml, and PowerShell. However, I realized that there are so many people out there that want to get started in SQL, but don’t know where to start. Inspired by my wife and by best bud, Brett, I decided I should start a series on getting started with SQL Server.

In this series, I will go through the real basics. We’ll start with installing a local instance of SQL Server and the AdventureWorks database. Then we’ll see how this evolves. My goal is to work with the people around me who know nothing about SQL to make this the best tutorial that exists for learning SQL.

Step 1 – Installing SQL Server for Learning

There are several editions of SQL Server. For now, I want you to stick with the free version, Express. You can download the latest version here. If you don’t have a Microsoft account already, you will need one.

Fill in the required information and then download the “SQL Server 2014 Express with Advanced Services 64 bit”. (32 bit if you have a 32 bit machine, but these are rare in today’s world…) Make sure you save it to a known location, like “Downloads” or “Documents”. Now, navigate to that location and double-click on the file. You may (depending on Windows version and settings) get a prompt that asks if you want to “Do you want to allow this app to make changes to your PC?”. Just click yes which will give you one more prompt asking where you would like to extract the files. You can just click “Ok” and use the default.

After the self-extracting installer is complete you will get a window that looks like this:
SQLInstall1
If it doesn’t show these options, make sure that “Installation” is selected on the left (see how it is BOLD in the image above?). Then go ahead and click on “New SQL Server stand-alone installation or add features to an existing installation.” This will cause another window to open up which will be your main window to complete the installation. Now, I’m not going to go into best practices for installing SQL Server here. This series is strictly for learning the scripting language and how relational databases work. Therefore, we will be accepting most of the defaults.

The next screen will look like this:

SQLInstall2

You can just go ahead and click “Next” to take you to the next screen. This is the screen where we will select what to install. The main things that you will need to learn to script T-SQL are Database Engine Services and Management Tools. I also recommend installing Documentation Components to make it easier to navigate “Help” and Books Online (BOL). However, it won’t hurt you to install other components. Especially if you plan on exploring more aspects of SQL in the future. My configuration for this install is in the image below.

SQLInstall3

Click next to move on to the “Instance Configuration” section. On this screen, I change the radio button from “Named Instance” to “Default Instance”. This should change the instance ID to MSSQLSERVER. This is important because anything other than the default has to be referenced by name. This allows several “instances” of SQL to be installed on the same machine. Make sure your screen looks like mine below and click “Next”.

SQLInstall4

The next screen is where you can set specific service accounts for the services to run under. We’ll leave this as default for this install.

SQLInstall5

Click “Next” again to take you to the “Database Engine Configuration” section. Select the “Mixed Mode” option and enter a password you will remember. This becomes the admin (sa) password for your SQL Server instance. Also, click on the “Add Current User” button to add yourself as an administrator.

SQLInstall6

Again, click “Next” to move on to the “Reporting Services Configuration” screen. If you didn’t select the option to install Reporting Services, you can skip this. Else, you can select the “Install Only” option.

SQLInstall7

Click “Next” to begin the install. When it completes, you will get a final conformation screen.

SQLInstall8

That’s it! You’ve completed your install of SQL Server Express! Let’s check to make sure you can connect. Hopefully you are all on Windows 8.1 or later. If you are, use the shortcut Windows+s and search for “SQL Server Management Studio”. If you are on an older version or prefer to navigate, just go to “All Programs” or “All Apps” find the folder for “Microsoft SQL Server 2014” and expand it. Then go to “SQL Server Management Studio 2014”. Open that up and a window will pop up that looks like below, but you will have to type in “localhost” for the server name.

TestInstall1

If you did everything as I said above, you can click “Connect” and everything should work fine, giving you the screen below:

TestInstall2

WOOHOO!!! Now have the ability to create, restore, and attach databases. In the next post, I will describe how to download the AdventureWorks database and get it onto your system. If you have any questions or this doesn’t work. Email me at jkarney@sqlknowitall.com or send a tweet to #SQLHelp

 

Jared

Mar 26

How to move data or a table from one data file to another in the same filegroup! – Part 2 of 2

This post is a continuation of the previous post, How to move data or a table from one data file to another in the same filegroup! – Part 1 of 2, and will show you how SQL handles data when it has more than 1 file. I will show you what happens when someone adds a second data file because they ran out of space on the first. I have seen this done when a disk is out of space, so someone adds a second file to the file group on a different disk. In this demo, we will use the same disk, but limit the size and growth of the first data file.

To start, here is a section from BOL on Database Files and Filegroups:

For example, three files, Data1.ndf, Data2.ndf, and Data3.ndf, can be created on three disk drives, respectively, and assigned to the filegroup fgroup1. A table can then be created specifically on the filegroup fgroup1. Queries for data from the table will be spread across the three disks; this will improve performance. The same performance improvement can be accomplished by using a single file created on a RAID (redundant array of independent disks) stripe set. However, files and filegroups let you easily add new files to new disks.

Now I will show you how you do not get to decide which file within the group the data will go to.

Demo 1 – Adding a file to a full database in order to give some free space

First we will create a database with one data file in the PRIMARY filegroup and check the space of the table:

USE [master];
GO

--Check for existence of database and drop if exists
IF EXISTS (
 SELECT 1
 FROM sys.databases
 WHERE NAME = 'TestDataFileAllocation'
 )
 DROP DATABASE TestDataFileAllocation;
GO

--Create a database with 1 data file in the PRIMARY filegroup
CREATE DATABASE TestDataFileAllocation ON PRIMARY (
 NAME = N'TestDataFileAllocation'
 ,FILENAME = N'F:\DATA\TestDataFileAllocation.mdf'
 ,SIZE = 10 MB
 ,MAXSIZE = 100 MB
 ,FILEGROWTH = 0
 ) LOG ON (
 NAME = N'TestDataFileAllocation_log'
 ,FILENAME = N'F:\Log\TestDataFileAllocation_log.ldf'
 ,SIZE = 1024 MB
 ,MAXSIZE = UNLIMITED
 ,FILEGROWTH = 1
 );
GO

--Switch database context to new database
USE TestDataFileAllocation;
GO

--Create a table for some data
CREATE TABLE SomeData (
 idCol INT IDENTITY(1, 1)
 ,NAME VARCHAR(MAX)
 ,number INT
 );
GO

--Add a clustered primary key to our new table
ALTER TABLE SomeData ADD CONSTRAINT PK_idCol PRIMARY KEY CLUSTERED (idCol);
GO

--Look at the file allocation using system tables and DMVs
SELECT OBJECT_NAME(p.object_id) AS tableName
 ,df.file_id
 ,df.name AS file_logical_name
 ,df.physical_name AS file_physical_name
 ,CAST(df.size AS BIGINT)*8/1024 AS file_size_MB
 ,CAST(df.max_size AS BIGINT)*8/1024 AS file_max_size_MB
 ,df.growth AS file_autogrow_ind
 ,fsu.total_page_count AS file_total_page_count
 ,fsu.allocated_extent_page_count AS file_allocated_extent_page_count
 ,fsu.unallocated_extent_page_count AS file_unallocated_extent_page_count
 ,au.total_pages AS table_allocated_pages
 ,au.used_pages AS table_used_pages
 ,au.data_pages AS table_data_pages
 ,p.rows AS table_rows
FROM sys.allocation_units au
INNER JOIN sys.database_files df ON au.data_space_id = df.data_space_id
INNER JOIN sys.partitions p ON au.container_id = p.hobt_id
INNER JOIN sys.dm_db_file_space_usage fsu ON df.file_id = fsu.file_id
WHERE au.type IN (
 1
 ,3
 )
 AND OBJECT_NAME(p.object_id) = 'SomeData';
GO

Notice the results of the allocation on the file:

NoData

So far, 328 pages are allocated while 952 are unallocated for the file itself. This particular table is currently empty and has 0 pages. Now lets add dome data to the table and see what happens.

--Insert data into the SomeData table
INSERT INTO SomeData (
NAME
,number
)
VALUES (
REPLICATE('sqlknowitall', 1000)
,10000
);
GO

And using the same query to look at the data as we used earlier:

1Row

Now we have 1 row in the file and we can see that nothing has changed on the file itself, but that the table has added 1 data page. So, not much has changed… However, what happens if we add so much data that the file runs out of space?

--Insert data into the SomeData table and max out the file
INSERT INTO SomeData (
 NAME
 ,number
 )
VALUES (
 REPLICATE('sqlknowitall', 1000)
 ,10000
 );
GO 10000

Run the query to see what happened:

MaxData

We now have the file full. The file has no more unallocated pages and the table itself has 953 allocated and used pages. To allow more data into this table, we need to either grow the file or add a new one. For the sake of this article and the point I am trying to make, we will add a second file to the file group.

 --Add a second file to the filegroup
ALTER DATABASE TestDataFileAllocation ADD FILE (
NAME = TestDataFileAllocation_2
,FILENAME = N'F:\DATA\TestDataFileAllocation_2.ndf'
,SIZE = 1024 MB
,MAXSIZE = UNLIMITED
,FILEGROWTH = 512 MB
) TO FILEGROUP [PRIMARY];
GO

Look at the allocation:

SecondFile

We now see the new file with 8 pages. The only reason we are still showing table data with this new file is because the DMVs don’t tell us which file the table resides on, only which FILEGROUP. However, since we only have 1 table with no changes, we know that it still all resides on the first file.

The suggestion was to drop and recreate the clustered index to move the table. Well, let’s see what that does:

--Drop the clustered index
ALTER TABLE SomeData
DROP CONSTRAINT PK_idCol;
GO
--Recreate the same clustered index
ALTER TABLE SomeData ADD CONSTRAINT PK_idCol PRIMARY KEY CLUSTERED (idCol);
GO

Results:
SecondFileRebuildCI

Hmm… Well, some of the data moved to the new file. But how much? Well… There are several more technical ways to come to this, but to keep it simple I will just look at the starting. Before we had any data in the table, the first file showed 328 allocated pages. Before we had any data in the second file, it had 8 allocated pages. After the drop and recreate on the clustered index, the first file now has 336 pages allocated while the second has 968. We can see that not all of the data has moved off of the first file, but a majority did. According to Paul Randall:

SQL Server’s allocation system uses a round-robin mechanism where allocations are spread over all the data files in a filegroup – an allocation from file 1, then an allocation from file 2, and so on, then back to file 1 again. Combined with that, it also uses an algorithm called proportional fill that aims to allocate data from the files according to how much free space they have relative to the other files in the filegroup.
The basic premise of proportional fill is that each file has a weighting, where files with less free space will have a higher weighting, and files with lots of free space will have a low weighting. The files with the lower weightings will be allocated from more frequently – i.e. those files with more free space will be allocated from the most.
This means that if you add a new data file to a filegroup that has mostly full data files, the proportional fill weightings will be such that the new file will be the one where allocations come from until it fills up to the same level as the older files. The new file in essence becomes an allocation hot spot.

In closing, you cannot decide how the data moves between files within the same filegroup. SQL Server uses its own algorithm for doing this and it is out of our control. However, what if you decided that having 2 files was hurting the performance or was too hard to manage? Can you somehow get rid of 1 of them? Yes, you can… but ONLY if the file you are trying to remove is not the first file created when the database was created.

--Empty file 1 and remove it
CHECKPOINT
GO

DBCC SHRINKFILE (TestDataFileAllocation, EMPTYFILE);
GO

ALTER DATABASE TestDataFileAllocation REMOVE FILE TestDataFileAllocation;
GO 

This will result in an error:

EmptyFileError

The only thing we can do in this case is move our first file to a larger drive (if that was the issue to begin with, who knows) and move everyting from file 2 to the original after growing the file to accomodate the data.

--Grow our first file and move everything there to get rid of second file
USE master;
GO
ALTER DATABASE TestDataFileAllocation
MODIFY FILE
(NAME = TestDataFileAllocation,
SIZE = 1024MB);
GO

USE TestDataFileAllocation;
GO

CHECKPOINT;
GO

DBCC SHRINKFILE (TestDataFileAllocation_2, EMPTYFILE);
GO

ALTER DATABASE TestDataFileAllocation REMOVE FILE TestDataFileAllocation_2;
GO

Just double-checking with our allocation query:

FinalFile

Tada!

 

 

 

Mar 20

How to move data or a table from one data file to another in the same filegroup! – Part 1 of 2

Sorry! This isn’t something you can do. I wanted to title this post in a way that would draw people to it for my rant. I am ranting because I had a friend who went to an interview where he was asked how you move data from one file to another. He didn’t know, so he asked me…

Friend: Do you know how to move data from one file to another?

Me: You mean between filegroups. Yes.

Friend: No, between files in the same filegroup.

Me: You don’t. Why?

Friend: That’s what I thought too, but I learned at an interview today.

Me (laughing): Really? Can you share this knowledge with the rest of the SQL Server community? Well… start with me.

Friend: You rebuild the clustered index on the other file or you limit the growth of the file you want to move off of and then rebuild the clustered index so that it moves to the other file.

Me: They told you this? Have you ever tried to create an object on a file?

Friend: No.

Me: Did you test it out? Because, it can’t be done…

Friend: No, but…

Let me tell you friends… that as far as I know, this cannot be done. Whoever asked this question doesn’t understand the file structure in SQL Server or how data gets allocated. This conversation made me SOOOO mad because it means that there are people out there asking bad interview questions or they clearly explained the question incorrectly. If you are interviewing someone, PLEASE make sure you know the correct answer to your own questions and make sure you are right. In the next post, I will post the DDL and DML to show this.

Mar 02

My Experience on SQL Cruise Caribbean 2015

Other SQL Cruise Links:

 

This year I had the amazing opportunity to attend SQL Cruise Caribbean 2015. I found out about it from a blog post by David Klee (@kleegeek) (who posted his recap of the cruise here) and immediately talked to my boss about it. I was lucky because he had seen references to this in the past and had asked his boss, at the time, for permission to attend. However, he was shot down on the spot and laughed at. I imagine it went something like this…

DBA Manager: Good morning Director. I came across this amazing opportunity to learn SQL and network with some of the greatest minds in the field. The presenters are SQL Server MVPs, founders of PASS, and real contributors to the community!

Director: What’s PASS? Sounds good, how much does it cost and where is it?

DBA Manager: Well, it is really inexpensive and takes place on a cruise sh–

Director (cutting off DBA Manager, laughing): ARE YOU JOKING? I’m not paying for you to go on a cruise!

I’ll admit, when I saw the words “SQL Cruise Caribbean” I was thinking about sipping on a mojito while enjoying the sun on the pool deck. I was not thinking about all of the things I could learn and the people I could learn from.

I did a quick write up to my boss that focused on the cost and the value of this experience. I told him about all of the knowledge I could put to use when I returned from Jes Borland‘s sessions on restore methods and automation. The skills I could use to improve our infrastrucure and virtualization configuration that I would learn from David Klee. The questions I would ask Grant Fritchey and Kevin Kline about execution plans and performance tuning. I really sold this thing, but I really didn’t understand the true value until I arrived at the Bon Voyage event in Ft. Lauderdale the night before embarkation.

I was a little shy at first because I really didn’t know anybody (except David Klee) and was even a bit nervous about meeting some of the greats. I mean, I read these peoples’ books and blog posts on a daily basis. They are like celebrities to a SQL geek like myself! Tim and Amy Ford introduced themselves right away and made my wife and I feel very comfortable. We sat down and started talking to people right away. I could see that some of the people had met before this and had been on the previous cruises. They seemed like a family; catching up on the events of the year, talking about future endeavors, and (of course) plenty of SQL talk.

Training started immediately aboard the ship on Saturday, even before we left the port. Jeff Lehmann from Slalom Consulting kicked off the trip by teaching us about the offerings of the Amazon Cloud and how it could benefit SQL Server implementations. Almost immediately, I realized that this training was going to be different from any other. The room was small and people were asking lots of questions. Nobody, even the “experts”, held back. This was a group that would share knowledge, test theories, question standards and best practices, and do so all with the selfless intent of making the SQL community stronger. Not only in their knowledge of technology, but also in terms of professional advancement.

I spent 40 hours of scheduled time with this community throughout the week. I spent a total of over 50 hours combined with this new family of mine. I gained months of knowledge in a week. I gained years of networking within a week. I made friends. My wife made friends. I was inspired: to write, to present, to volunteer, to teach, and to inspire others!

During the trip, I made my deposit with the cruise line for the future. I returned from the cruise on February 14th and signed up for SQL Cruise Caribbean 2016 on February 28th because I know that this yearly training/networking is not only beneficial to my company, but also for my personal career.

So, what will your company get out of this? Why should they pay for it? The first is that the training you will receive is hands-on and by far the best training I have ever received. It is small group which allows for very pointed and specific questions to be asked. Even more specific questions can be asked during office hours and normal conversation outside of the formal sessions. The second is the cost of this training. It is significantly less than any training I have seen. Especially when you look at the quality of the training and the ability to tailor a lot of it to your own specific needs. Lastly, the networking is invaluable to you and your company. When you bring in a consultant, you typically have to get an MSA signed, SOW put together, and a minimum amount of hours contracted for. However, after this training you will have a family that you can pose issues and questions to. These people LOVE a puzzle and also love to help you, a friend, solve and learn from them. Ultimately, this does not only help you, but also your company.

In closing, this was the most valuable experience for my career and I intend to attend every year regardless of my company paying for it. Thanks to all those who made it a memorable experience. Also, thanks to all of the sponsors who make this a great value and give us the coveted swag: Amazon Web Servicesredgate, SQLSentry, and SQLAgentman.

 

 

Apr 26

Determining a setting for “Cost Threshold for Parallelism”

I was asked to share this after a discussion at SQLSaturday Chicago by a presenter, David Klee. So, thank you David for getting me to finally post something!

A while ago, I was tasked with researching Max Degree of Parallelism and Cost Threshold for Parallelism settings for my company’s SQL Servers. While trying to find recommendations, I found that the Cost Threshold for Parallelism was mostly done by picking a number and testing. I could not find any way of even finding a starting number in some sort of objective way. I couldn’t just settle for “start at 25 or 30 and play with it to see how your server responds.” It was just too random for me, and I knew there had to be a better way.

My boss had given me a script from Jonathan Kehayias, some time before this, that returned a list of parallel query plans and their subtree costs that I thought could be used to gather some data for me.

;WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT
	query_plan AS CompleteQueryPlan
	, n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText
	, n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel
	, n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost
	, n.query('.') AS ParallelSubTreeXML
	, cp.usecounts
	, cp.size_in_bytes
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1;

I then took this script and modified it to simplify it for what I needed. All I really cared about were the costs. So I removed all other columns. I also thought it would be much easier to work with these numbers in a temp table, so I did that. The script below is now what I use to help me decide what to set my Cost Threshold for Parallelism. It gathers all of the costs, inserts them into a temp table, retrieves the data (not always needed and sometime too large to do anything with, so comment out if not needed), and calculates 3 important statistics; Mean, Median, and Mode.

CREATE TABLE #SubtreeCost(StatementSubtreeCost DECIMAL(18,2));

;WITH XMLNAMESPACES
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
INSERT INTO #SubtreeCost
SELECT
	CAST(n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS DECIMAL(18,2))
FROM sys.dm_exec_cached_plans AS cp
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS qp
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n)
WHERE n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1;

SELECT StatementSubtreeCost
FROM #SubtreeCost
ORDER BY 1;

SELECT AVG(StatementSubtreeCost) AS AverageSubtreeCost
FROM #SubtreeCost;

SELECT
	((SELECT TOP 1 StatementSubtreeCost
	FROM
		(
		SELECT TOP 50 PERCENT StatementSubtreeCost
		FROM #SubtreeCost
		ORDER BY StatementSubtreeCost ASC
		) AS A
	ORDER BY StatementSubtreeCost DESC
	)
	+
	(SELECT TOP 1 StatementSubtreeCost
	FROM
		(
		SELECT TOP 50 PERCENT StatementSubtreeCost
		FROM #SubtreeCost
		ORDER BY StatementSubtreeCost DESC
		) AS A
	ORDER BY StatementSubtreeCost ASC))
	/2 AS MEDIAN;

SELECT TOP 1 StatementSubtreeCost AS MODE
FROM   #SubtreeCost
GROUP  BY StatementSubtreeCost
ORDER  BY COUNT(1) DESC;

DROP TABLE #SubtreeCost;

So what do I do with these numbers? In my case, I am trying to get “about” 50% of the queries below the threshold and 50% above. This way, I split the amount of queries using parallelism. This is not going to guarantee me the best performance, but I figured it was the best objective way to come up with a good starting point.

If all of my statistics are very close, I just set the Cost Threshold for Parallelism equal to about what that number is. An average of the 3 and round will work. In many of my cases, this was between 25 and 30. If the numbers are different, i.e. a few very large costs skew the average up but the median and mode are close, then I will use something between the median and mode.

Although this method does not spit out 1 number that will magically solve all of your problems, nor is it completely objective… It is the most objective way I have found to come up with a starting point. If you have any thoughts on this or would like to share your own experiences, please feel free to send me an email and I will respond as soon as I can.

Mar 01

Existing Instance Not Available for Upgrade/Repair/Remove

Yesterday we ran into an issue when upgrading one of our SQL Server 2008 R2 instances to SQL Server 2012.  We ran the install as normal with the upgrade option.  When it came time to choose the instance and features to upgrade, the only options were the shared features.  What happened to our Database Engine instance?

Long story short… We found a registry key that contains the instance name was duplicated in 2 locations:

  • HKLM\Software\Microsoft\Microsoft SQL Server\Instance Names\SQL
  • HKLM\Software\Microsoft\Microsoft SQL Server\Instance Names\RS

The solution? Since we did not have Reporting services installed on this server anymore, I took a backup of the registry and then deleted the RS folder. Restarted the install and… VOILA!

Apr 04

Window Functions! RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE(n)

Microsoft introduced ranking functions (also known as window functions) in SQL Server 2005. What are window functions exactly? First, they are awesome. Second, they are functions that can be applied to a group of rows as specified by the window descriptor (I. Ben-Gan, 2011). Itzik Ben-Gan has a great series on how to use these and the the new functions in SQL Server 2012. He also wrote an entire book on these! That’s how valuable they are…

So, in layman’s terms… We can apply a function to a group of rows as if they were the only results returned. This is done using, as mentioned above, a window descriptor. A sample from MSDN using AdventureWorks2008R2 database is below:

USE AdventureWorks2008R2;
GO 
SELECT p.FirstName, p.LastName
  ,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS 'Row Number'
  ,RANK() OVER (ORDER BY a.PostalCode) AS 'Rank'
  ,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS 'Dense Rank'
  ,NTILE(4) OVER (ORDER BY a.PostalCode) AS 'Quartile'
  ,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s
INNER JOIN Person.Person p
  ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address a
  ON a.AddressID = p.BusinessEntityID
WHERE s.TerritoryID IS NOT NULL
  AND s.SalesYTD <> 0;

Ok, so what does this do? Let’s talk about the common clause of each function above: OVER(ORDER BY a.PostalCode)  This is the window descriptor, it is describing the set of rows to apply the function to and how to apply it. In this example it is applying the function to all rows, but in the order of the PostalCode column. In addition to the ORDER BY, a PARTITION BY can precede it to tell the function which group of rows to apply the function to. In the case of sales, you may want to use the functions to rank ALL of the sales reps within each territory.  To do this, the above code can be modified to include a PARTITION BY clause within each window descriptor:

USE AdventureWorks2008R2;
GO 
SELECT s.TerritoryID --added to show partition by TerritoryID
 , p.FirstName, p.LastName
 ,ROW_NUMBER() OVER (PARTITION BY s.TerritoryID ORDER BY a.PostalCode) AS 'Row Number'
 ,RANK() OVER (PARTITION BY s.TerritoryID ORDER BY a.PostalCode) AS 'Rank'
 ,DENSE_RANK() OVER (PARTITION BY s.TerritoryID ORDER BY a.PostalCode) AS 'Dense Rank'
 ,NTILE(4) OVER (PARTITION BY s.TerritoryID ORDER BY a.PostalCode) AS 'Quartile'
 ,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s
INNER JOIN Person.Person p
 ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address a
 ON a.AddressID = p.BusinessEntityID
WHERE s.TerritoryID IS NOT NULL
 AND s.SalesYTD <> 0;

So what is the difference between each of the ranking functions?

RANK()

According to BOL: RANK() “returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question.”  This means that rank numbers will be assigned to each row in the window descriptor based on the value of the ORDER BY value.  Equal values will receive the same rank number, but rank number will still increase for each row in the set.  This is why you will see only 1 and 6 in the first example.

DENSE_RANK()

According to BOL: DENSE_RANK() “returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question.” This means that rank numbers will be assigned to each row in the window descriptor based on the value of the ORDER BY value (just as in RANK).  Equal values will receive the same rank number (just as in RANK), but rank number will NOT increase for each row in the set.  This is why you will see only 1 and 2 in the first example.

ROW_NUMBER()

According to BOL: ROW_NUMBER() “returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.” So, this is simple numbering from 1 to n of the rows defined in the window descriptor. If there is a partition clause, it will start the ordering of the next partition at 1 and resume incrementing until the end of the partition in the order defined.

NTILE(n)

According to BOL:NTILE(n) “Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.” In the above example, we told the function to split the partition into 4 groups.  The function will then assign a number 1-4 to the members of the group based on the ordering in the window descriptor. If there are not enough groups to meet n, it will increase until there are no more rows leaving n-totalGroups not included in the values of the result set.

A Great Utility of ROW_NUMBER()

Deleting true duplicates is something we have all experienced at one point or another.  The problem is that we have no easy way to tell SQL Server which of 1 or more duplicate rows to delete. Typical solutions have involved a cursor or many nested cursors to achieve this. Personally, if I can find a way to avoid a cursor I will use it.  Below is the script that I use to identify and delete duplicate rows:

WITH RowNumbers AS
(
SELECT columnA, columnB, columnC, columnD
  ,ROW_NUMBER()
    OVER (PARTITION BY  columnA, columnB, columnC, columnD ORDER BY columnA, columnB, columnC, columnD) AS rn
FROM tableA
)
DELETE FROM RowNumbers WHERE rn > 1

This is setting up my window descriptor to partition the groups based on all columns in this table so that true duplicates are in their own grouping. It then uses the ROW_NUMBER() function to assign a row number to each row in the group. If there are duplicates, there will be rows with a row number greater than 1.  If there are no duplicates of a row, it will be assigned a row number of 1.  Finally, delete all rows with a row number greater than 1! This script also uses a CTE or Common Table Expression to produce the result set that we will use for the delete.  CTEs will be discussed in a future post.

Jan 18

Generating Consecutive Numbers Within a Given Range

A couple of months ago I was working on a project where we were revamping an existing system.  One piece of the system was a call to a stored procedure that generated a number in sequence and assigned it to a user that was accessing the application. It was very similar to taking a number and waiting in line. In that respect, it was also similar because we were required to have a range of numbers used for this.  In our case it was to start at 100000 and then rollover back to 100000 once we got to 899999.

When I began this project, an existing stored procedure was accomplishing this in a peculiar way.  We only noticed a problem because of newly required load testing. When pushed to a high amount of calls to this stored procedure, they started failing. We found it was because the existing script was using the SERIALIZABLE isolation level as it first grabbed the MAX value from a 1 row/1 column table, checked if it was less than 899999, if it was it added 1 to it, updated it the table with the new value, and then passed back the value to the application. If it was equal to 899999, it updated the value in the table to 100000.  This isolation level created deadlocks when the stored procedure was called many times in quick succession which caused it to fail.  Subsequently, the application then failed as well. (I will cover isolation levels with more detail in a future post.) This method also did not keep a history of the numbers and thereby made it very difficult to trace back to a particular user, vendor, or time.

The Solution!

I wanted a solution that would not cause deadlocks, but that would also guarantee a unique number each time the stored procedure was called. I also wanted to store a history of when these numbers were generated, yet somehow be able to identify numbers after rollovers in the range.  I knew that generating sequential numbers in a range this large would not rollover for many years, so as long as the numbers could be generated sequentially I would not have any problems rolling over.

We decided to use a log table with an identity column, and a date column to start.

CREATE TABLE numberLog(
gNumber int IDENTITY(1,1) NOT NULL,
gTime datetime NOT NULL,
CONSTRAINT [PK_numberLog] PRIMARY KEY CLUSTERED
(
gNumber ASC
) ON [PRIMARY]
) ON [PRIMARY]

This allowed us to allow SQL to generate the number sequentially without having any chance of repeating the same number. Skipping numbers was also not an issue. So how did we limit the range?

CREATE PROC generateNumber
AS
BEGIN
INSERT INTO numberLog(gTime)
SELECT GETDATE()
SELECT scope_identity() % 800000 + 100000
END

This stored procedure will maintain the identity in the table while using the newly created id to return the proper number to the application.  Since the application itself never needs to return to this table, it was the perfect solution!

How does it work?

The table is set to auto-increment the first column upon insertion.  The SQL Server Database Engine ensures that this value is unique and incremented sequentially upon insertion.  So, when the stored procedure is called to generate a number, it first inserts a row into our table. Upon insertion, the identity is generated and the current datetime is inserted with it by using the GETDATE() function. Finally, the identity value is selected by  using the SCOPE_IDENTITY() function which returns the generated identity only from the scope of this session/stored procedure.  That identity is then converted using the % and + operator to give us the number required by the application. The % operator gives us the integer remainder of the identity number divided by 800000. Below is a sample of what happens with different numbers using the % operator.

SELECT 0 % 800000 --0
SELECT 1 % 800000 --1
SELECT 799999 % 800000 --799999
SELECT 800000 % 800000 --0
SELECT 800001 % 800000 --1

The great thing about this method is that we now have a log of generated numbers that never rolls over and by using the generated number in conjunction with the formula we can solve for multiple identity numbers.  Knowing the approximate date that the number was generated will give us the specific identity.  Again, it was not a requirement to store the generated numbers or the dates, but I thought it was safer to do so and may make my life easier if they decide to do a lookup.

Older posts «