Mar 01

Existing Instance Not Available for Upgrade/Repair/Remove

Yesterday we ran into an issue when upgrading one of our SQL Server 2008 R2 instances to SQL Server 2012.  We ran the install as normal with the upgrade option.  When it came time to choose the instance and features to upgrade, the only options were the shared features.  What happened to our Database Engine instance?

Long story short… We found a registry key that contains the instance name was duplicated in 2 locations:

  • HKLM\Software\Microsoft\Microsoft SQL Server\Instance Names\SQL
  • HKLM\Software\Microsoft\Microsoft SQL Server\Instance Names\RS

The solution? Since we did not have Reporting services installed on this server anymore, I took a backup of the registry and then deleted the RS folder. Restarted the install and… VOILA!

Apr 04

Window Functions! RANK(), DENSE_RANK(), ROW_NUMBER(), NTILE(n)

Microsoft introduced ranking functions (also known as window functions) in SQL Server 2005. What are window functions exactly? First, they are awesome. Second, they are functions that can be applied to a group of rows as specified by the window descriptor (I. Ben-Gan, 2011). Itzik Ben-Gan has a great series on how to use these and the the new functions in SQL Server 2012. He also wrote an entire book on these! That’s how valuable they are…

So, in layman’s terms… We can apply a function to a group of rows as if they were the only results returned. This is done using, as mentioned above, a window descriptor. A sample from MSDN using AdventureWorks2008R2 database is below:

USE AdventureWorks2008R2;
GO SELECT p.FirstName, p.LastName
  ,ROW_NUMBER() OVER (ORDER BY a.PostalCode) AS 'Row Number'
  ,RANK() OVER (ORDER BY a.PostalCode) AS 'Rank'
  ,DENSE_RANK() OVER (ORDER BY a.PostalCode) AS 'Dense Rank'
  ,NTILE(4) OVER (ORDER BY a.PostalCode) AS 'Quartile'
  ,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s
INNER JOIN Person.Person p
  ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address a
  ON a.AddressID = p.BusinessEntityID
WHERE s.TerritoryID IS NOT NULL
  AND s.SalesYTD <> 0;

Ok, so what does this do? Let’s talk about the common clause of each function above: OVER(ORDER BY a.PostalCode)  This is the window descriptor, it is describing the set of rows to apply the function to and how to apply it. In this example it is applying the function to all rows, but in the order of the PostalCode column. In addition to the ORDER BY, a PARTITION BY can precede it to tell the function which group of rows to apply the function to. In the case of sales, you may want to use the functions to rank ALL of the sales reps within each territory.  To do this, the above code can be modified to include a PARTITION BY clause within each window descriptor:

USE AdventureWorks2008R2;
GO SELECT s.TerritoryID --added to show partition by TerritoryID
 , p.FirstName, p.LastName
 ,ROW_NUMBER() OVER (PARTITION BY s.TerritoryID ORDER BY a.PostalCode) AS 'Row Number'
 ,RANK() OVER (PARTITION BY s.TerritoryID ORDER BY a.PostalCode) AS 'Rank'
 ,DENSE_RANK() OVER (PARTITION BY s.TerritoryID ORDER BY a.PostalCode) AS 'Dense Rank'
 ,NTILE(4) OVER (PARTITION BY s.TerritoryID ORDER BY a.PostalCode) AS 'Quartile'
 ,s.SalesYTD, a.PostalCode
FROM Sales.SalesPerson s
INNER JOIN Person.Person p
 ON s.BusinessEntityID = p.BusinessEntityID
INNER JOIN Person.Address a
 ON a.AddressID = p.BusinessEntityID
WHERE s.TerritoryID IS NOT NULL
 AND s.SalesYTD <> 0;

So what is the difference between each of the ranking functions?

RANK()

According to BOL: RANK() “returns the rank of each row within the partition of a result set. The rank of a row is one plus the number of ranks that come before the row in question.”  This means that rank numbers will be assigned to each row in the window descriptor based on the value of the ORDER BY value.  Equal values will receive the same rank number, but rank number will still increase for each row in the set.  This is why you will see only 1 and 6 in the first example.

DENSE_RANK()

According to BOL: DENSE_RANK() “returns the rank of rows within the partition of a result set, without any gaps in the ranking. The rank of a row is one plus the number of distinct ranks that come before the row in question.” This means that rank numbers will be assigned to each row in the window descriptor based on the value of the ORDER BY value (just as in RANK).  Equal values will receive the same rank number (just as in RANK), but rank number will NOT increase for each row in the set.  This is why you will see only 1 and 2 in the first example.

ROW_NUMBER()

According to BOL: ROW_NUMBER() “returns the sequential number of a row within a partition of a result set, starting at 1 for the first row in each partition.” So, this is simple numbering from 1 to n of the rows defined in the window descriptor. If there is a partition clause, it will start the ordering of the next partition at 1 and resume incrementing until the end of the partition in the order defined.

NTILE(n)

According to BOL:NTILE(n) “Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.” In the above example, we told the function to split the partition into 4 groups.  The function will then assign a number 1-4 to the members of the group based on the ordering in the window descriptor. If there are not enough groups to meet n, it will increase until there are no more rows leaving n-totalGroups not included in the values of the result set.

A Great Utility of ROW_NUMBER()

Deleting true duplicates is something we have all experienced at one point or another.  The problem is that we have no easy way to tell SQL Server which of 1 or more duplicate rows to delete. Typical solutions have involved a cursor or many nested cursors to achieve this. Personally, if I can find a way to avoid a cursor I will use it.  Below is the script that I use to identify and delete duplicate rows:

WITH RowNumbers AS
(
SELECT columnA, columnB, columnC, columnD
  ,ROW_NUMBER()
    OVER (PARTITION BY  columnA, columnB, columnC, columnD ORDER BY columnA, columnB, columnC, columnD) AS rn
FROM tableA
)
DELETE FROM RowNumbers WHERE rn > 1

This is setting up my window descriptor to partition the groups based on all columns in this table so that true duplicates are in their own grouping. It then uses the ROW_NUMBER() function to assign a row number to each row in the group. If there are duplicates, there will be rows with a row number greater than 1.  If there are no duplicates of a row, it will be assigned a row number of 1.  Finally, delete all rows with a row number greater than 1! This script also uses a CTE or Common Table Expression to produce the result set that we will use for the delete.  CTEs will be discussed in a future post.

Jan 18

Generating Consecutive Numbers Within a Given Range

A couple of months ago I was working on a project where we were revamping an existing system.  One piece of the system was a call to a stored procedure that generated a number in sequence and assigned it to a user that was accessing the application. It was very similar to taking a number and waiting in line. In that respect, it was also similar because we were required to have a range of numbers used for this.  In our case it was to start at 100000 and then rollover back to 100000 once we got to 899999.

When I began this project, an existing stored procedure was accomplishing this in a peculiar way.  We only noticed a problem because of newly required load testing. When pushed to a high amount of calls to this stored procedure, they started failing. We found it was because the existing script was using the SERIALIZABLE isolation level as it first grabbed the MAX value from a 1 row/1 column table, checked if it was less than 899999, if it was it added 1 to it, updated it the table with the new value, and then passed back the value to the application. If it was equal to 899999, it updated the value in the table to 100000.  This isolation level created deadlocks when the stored procedure was called many times in quick succession which caused it to fail.  Subsequently, the application then failed as well. (I will cover isolation levels with more detail in a future post.) This method also did not keep a history of the numbers and thereby made it very difficult to trace back to a particular user, vendor, or time.

The Solution!

I wanted a solution that would not cause deadlocks, but that would also guarantee a unique number each time the stored procedure was called. I also wanted to store a history of when these numbers were generated, yet somehow be able to identify numbers after rollovers in the range.  I knew that generating sequential numbers in a range this large would not rollover for many years, so as long as the numbers could be generated sequentially I would not have any problems rolling over.

We decided to use a log table with an identity column, and a date column to start.

CREATE TABLE numberLog(
gNumber int IDENTITY(1,1) NOT NULL,
gTime datetime NOT NULL,
CONSTRAINT [PK_numberLog] PRIMARY KEY CLUSTERED
(
gNumber ASC
) ON [PRIMARY]
) ON [PRIMARY]

This allowed us to allow SQL to generate the number sequentially without having any chance of repeating the same number. Skipping numbers was also not an issue. So how did we limit the range?

CREATE PROC generateNumber
AS
BEGIN
INSERT INTO numberLog(gTime)
SELECT GETDATE()
SELECT scope_identity() % 800000 + 100000
END

This stored procedure will maintain the identity in the table while using the newly created id to return the proper number to the application.  Since the application itself never needs to return to this table, it was the perfect solution!

How does it work?

The table is set to auto-increment the first column upon insertion.  The SQL Server Database Engine ensures that this value is unique and incremented sequentially upon insertion.  So, when the stored procedure is called to generate a number, it first inserts a row into our table. Upon insertion, the identity is generated and the current datetime is inserted with it by using the GETDATE() function. Finally, the identity value is selected by  using the SCOPE_IDENTITY() function which returns the generated identity only from the scope of this session/stored procedure.  That identity is then converted using the % and + operator to give us the number required by the application. The % operator gives us the integer remainder of the identity number divided by 800000. Below is a sample of what happens with different numbers using the % operator.

SELECT 0 % 800000 --0
SELECT 1 % 800000 --1
SELECT 799999 % 800000 --799999
SELECT 800000 % 800000 --0
SELECT 800001 % 800000 --1

The great thing about this method is that we now have a log of generated numbers that never rolls over and by using the generated number in conjunction with the formula we can solve for multiple identity numbers.  Knowing the approximate date that the number was generated will give us the specific identity.  Again, it was not a requirement to store the generated numbers or the dates, but I thought it was safer to do so and may make my life easier if they decide to do a lookup.

Jan 10

Capturing Data Modifications – The OUTPUT Clause

There are several reasons you may want to capture data that has been inserted, updated, deleted, or merged (2008). The OUTPUT clause that SQL Server provides is extremely helpful for these situations. I have been tasked on several occasions to “merge” data from a staging table to a production table. Whether or not it is needed, I like to build a table that logs all of the changes made with this operation. This is especially helpful for debugging, but also allows you to run reports or rollback any changes to the data.  There are many different ways to capture data changes and based on your needs this may not be the proper choice (this does not capture all changes to data in the table as a trigger would). However, if you are just trying to capture changes done by a stored procedure or a specific block(s) of script this may be the best route.  I will not discuss the MERGE statement in detail here, but I will show you how to incorporate the OUTPUT clause into it.

Some Test Data

The following script will create 3 tables for demonstrating this. The first is a list of current employees and some of their information. The second is a staging table. The third table is for tracking the changes.

--
--
USE tempdb --safe place for testing, but you could always create a new database

--Create application table
SELECT 'AAA' AS empId, 'Albert' AS firstName, 'Anderson' AS lastName,
    '2008-12-01' AS hireDate
INTO currEmployees
UNION ALL
SELECT 'BBB', 'Beth', 'Beethoven', '2009-05-31'

--Create and populate staging table
SELECT 'AAA' AS empId, 'Albert' AS firstName, 'Andersen' AS lastName,
    '2008-12-01' AS hireDate
INTO empStaging
UNION ALL
SELECT 'CCC', 'Chet', 'Carlsberg', '2012-01-10'

--Create change log table
CREATE TABLE empChangeLog (id INT IDENTITY(1,1), changeDate DATETIME,
    changeType VARCHAR(10), empId CHAR(3), firstName VARCHAR(50),
    lastName VARCHAR(50), hireDate DATETIME, newEmpId CHAR(3),
    newFirstName VARCHAR(50), newLastName VARCHAR(50), newHireDate DATETIME)

--
--

Merging the Data

There are 2 ways to merge this data.  The most common way before SQL Server 2008 was to use 3 separate statements. This would be an insert, update, and delete statement:

--
--
DELETE currEmployees
FROM currEmployees e
LEFT JOIN empStaging s
 ON e.empId = s.empId
WHERE s.empId IS NULL

INSERT INTO currEmployees(empId, firstName, lastName, hireDate)
SELECT s.empId, s.firstName, s.lastName, s.hireDate
FROM empStaging s
LEFT JOIN currEmployees e
 ON s.empId = e.empId
WHERE e.empId IS NULL

UPDATE currEmployees
SET firstName = s.firstName, lastName = s.lastName, hireDate = s.hireDate
FROM empStaging s
INNER JOIN currEmployees e
 ON s.empId = e.empId
WHERE s.firstName <> e.firstName
 OR s.lastName <> e.lastName
 OR s.hireDate <> e.hireDate

--
--

You see, the problem here is that once you delete, insert, or update the rows you no longer have the old values.  All you have is the new values. So how do you retain the old and new values to insert into a change table? You could load them all into a temporary table before changing them so that you still have them, but that is a waste of resources. This is where you can add in the OUTPUT clause:

--
--
DECLARE @runtime DATETIME --declare a constant runtime for this whole block
SET @runtime = GETDATE() --initialize runtime of block

DELETE currEmployees
OUTPUT @runtime, 'DELETE', DELETED.empId, DELETED.firstName, DELETED.lastName,
 DELETED.hireDate, NULL, NULL, NULL, NULL
INTO empChangeLog
FROM currEmployees e
LEFT JOIN empStaging s
 ON e.empId = s.empId
WHERE s.empId IS NULL

INSERT INTO currEmployees(empId, firstName, lastName, hireDate)
OUTPUT @runtime, 'INSERT', NULL, NULL, NULL, NULL,
 INSERTED.empId, INSERTED.firstName, INSERTED.lastName, INSERTED.hireDate
INTO empChangeLog
SELECT s.empId, s.firstName, s.lastName, s.hireDate
FROM empStaging s
LEFT JOIN currEmployees e
 ON s.empId = e.empId
WHERE e.empId IS NULL

UPDATE currEmployees
SET firstName = s.firstName, lastName = s.lastName, hireDate = s.hireDate
OUTPUT @runtime, 'UPDATE', DELETED.empId, DELETED.firstName, DELETED.lastName, DELETED.hireDate,
 INSERTED.empId, INSERTED.firstName, INSERTED.lastName, INSERTED.hireDate
INTO empChangeLog
FROM empStaging s
INNER JOIN currEmployees e
 ON s.empId = e.empId
WHERE s.firstName <> e.firstName
 OR s.lastName <> e.lastName
 OR s.hireDate <> e.hireDate

--
--

The delete and insert are pretty straightforward, but what you will notice with the update is that it actually does a delete and an insert. Since this is how SQL Server actually does an update, it is also how the data is referenced in the OUTPUT clause. One final way to do this is with the MERGE statement introduced in 2008:

--
--
DECLARE @runtime DATETIME

SET @runtime = GETDATE()

MERGE currEmployees AS e --target
USING empStaging AS s --source
    ON e.empId = s.EmpId
WHEN NOT MATCHED BY TARGET
    THEN INSERT(empId, firstName, lastName, hireDate) VALUES(s.empId, s.firstName, s.lastName, s.hireDate)
WHEN MATCHED AND (e.firstName <> s.firstName OR e.lastName <> s.lastName OR e.hireDate <> s.hireDate)
    THEN UPDATE SET e.firstName = s.firstName, e.lastName = s.lastName, e.hireDate = s.hireDate
WHEN NOT MATCHED BY SOURCE
    THEN DELETE
OUTPUT @runtime, $action, INSERTED.empId, INSERTED.firstName, INSERTED.lastName, INSERTED.hireDate,
    DELETED.empId, DELETED.firstName, DELETED.lastName, DELETED.hireDate
INTO empChangeLog

--
--

And that is how you use the OUTPUT clause to capture data changes!

NOTE: Be sure to drop the 3 tables from tempdb that we created for testing:

USE tempdb

DROP TABLE currEmployees
DROP TABLE empStaging
DROP TABLE empChangeLog

--
--

Dec 21

Assuming order of data without ORDER BY

Why are my results out of order?

Many times I have seen posts in forums asking about why their data is not being returned in the order of the clustered index. Similar questions have been brought up questioning the order of data from a view when the view includes an ORDER BY clause (which it does not really allow except in a specific case). The fact is that you are NEVER… Let me repeat that… NEVER guaranteed the order of your result set without the ORDER BY clause in the main query. I am not going to get into the details of “why” in this post as it can be quite detailed and confusing when you get down to how rows of data are stored on an individual page and how SQL Server retrieves the data. What I WILL say is that when SQL Server returns results, it does so in the way that it determines most efficient at run time. This is many times in the order of a clustered index or the order of a view or subquery, but not always. When your applications or BI depend on the order of this data, you can not afford to rely on luck.

I have had people tell me that “This view has returned in the proper order every time since we scripted it.  Why is is all of the sudden returning a different order? You must have done something to the database or the view.” Let me ask you this, if I show you a quarter and display both the heads and tails side to you, then flip it 100 times and it comes up heads 100 out of 100 times, do you now assume the quarter will come up heads 100% of the time? I hope not…

Below are 3 situations that demonstrate the wrong way of assuming order of results. Now, these may return the results in the same order each time you test them. However, SQL Server may someday decide that it is more efficient to return those in another way and you will get a different order.

No ORDER BY in query

Many people assume that the result set will always be returned in the order of the clustered index when an ORDER BY is not present in the query. This is not true. You can try to prove me wrong by running a query 100 times and showing me the result set, but then I refer you to my quarter. In order to guarantee your results you will have to run the query below:

ORDER BY in subquery

This first one is a simple subquery using an ORDER BY, but not the main query. This will not guarantee the order of results will be in the same order as the subquery:

To fix this you would change the query to:

ORDER BY on a view

First of all, SQL Server will give you an error if you try to use an ORDER BY without a TOP clause:

Msg 1033, Level 15, State 1, Procedure viewName, Line 5
The ORDER BY clause is invalid in views, inline functions, derived tables, subqueries, and common table expressions, unless TOP or FOR XML is also specified.

I can only assume this is because SQL Server is trying to tell you that an ORDER BY in a view does not guarantee you order of results. I have seen developers ignore the purpose of this error and do the following:

This will compile and create the view, but when you SELECT from this view without an ORDER BY, you will not be guaranteed the order of results. Why does SQL Server let you compile the view in this case? Lets say you have 100 students and would like the view to only return the top 10 in regards to their grades. Using a TOP 10 with the ORDER BY will sort the results of the query, and return the top 10.  However, the top 10 will not be guaranteed to be in any particular order.

Summary

When you need to guarantee the order of your result set, always make sure that the ORDER BY is present and is part of the main query. Never assume the order of your result set without this.

Dec 19

Windows Requires Restart: Keeps failing on install of SQL Server 2008 even after restart

For my first content related blog post I wanted to start with something simple and something that is encountered upon installation of SQL Server. I encountered this problem a number of times on desktop machines where a number of peripherals may already be installed.

The issue is that when SQL Server is going through its prerequisite checks, it says that Windows requires a restart.  So logically, you restart Windows and begin the process again only to find that the checks require a restart of Windows again!  If you are like me, yous restart Windows 1 more time and experience the same result (I believe that it was Einstein who said that insanity is doing the same thing over and over again and expecting different results). So, how do we resolve this issue? It is simple really, but requires editing the registry using regedit. If you are new to editing registry entries, please backup the registry and make sure you know how to restore it before using the steps outlined below.

  1. Start regedit.exe by typing regedit into your Windows search, or navigate to c:\Windows and double-click on regedit.exe.
  2. Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager and locate the PendingFileRenameOperations entry.
  3. Right-click on the PendingFileRenameOperations entry and click on Modify.
  4. Delete the data in the value box and click ok.
  5. Close regedit and restart the machine.
  6. Start installation of SQL Server again.

This should resolve your issue, but I have heard of instances where the value appears again after restart (this has not happened to me).  In this case, you may try deleting the data in the registry key and then starting the install process without restarting the machine or looking at the same FileRenameOperations entry in ControlSet001 or ControlSet002 instead of CurrentControlSet. NOTE: These last 2 methods are untested by me and I have not encountered them.

You can also install SQL Server from the command prompt with parameters, one of which will skip the prerequisite check.  I don’t advise this as there may be other prerequisites that are vital to the proper installation.

-SQL Know-It-All

Dec 16

Welome to SQL Know-It-All!

Welcome to SQL Know-It-All! The goal of my blog is to provide information to both beginners and advanced users of SQL Server. Although it will mostly cover information regarding SQL Server and its applications (Integration Services, Reporting Services, Analysis Services, and most importantly Database Engine), it will also periodically contain information regarding other applications or platforms that you may need to have interact with SQL Server; i.e. PowerShell, .NET framework, and Visual Studio. I hope for this to be informative to all those who use SQL Server.

Enjoy!

Jared Karney – Enterprise DBA