«

»

Jan 10

Capturing Data Modifications – The OUTPUT Clause

There are several reasons you may want to capture data that has been inserted, updated, deleted, or merged (2008). The OUTPUT clause that SQL Server provides is extremely helpful for these situations. I have been tasked on several occasions to “merge” data from a staging table to a production table. Whether or not it is needed, I like to build a table that logs all of the changes made with this operation. This is especially helpful for debugging, but also allows you to run reports or rollback any changes to the data.  There are many different ways to capture data changes and based on your needs this may not be the proper choice (this does not capture all changes to data in the table as a trigger would). However, if you are just trying to capture changes done by a stored procedure or a specific block(s) of script this may be the best route.  I will not discuss the MERGE statement in detail here, but I will show you how to incorporate the OUTPUT clause into it.

Some Test Data

The following script will create 3 tables for demonstrating this. The first is a list of current employees and some of their information. The second is a staging table. The third table is for tracking the changes.

--
--
USE tempdb --safe place for testing, but you could always create a new database

--Create application table
SELECT 'AAA' AS empId, 'Albert' AS firstName, 'Anderson' AS lastName,
    '2008-12-01' AS hireDate
INTO currEmployees
UNION ALL
SELECT 'BBB', 'Beth', 'Beethoven', '2009-05-31'

--Create and populate staging table
SELECT 'AAA' AS empId, 'Albert' AS firstName, 'Andersen' AS lastName,
    '2008-12-01' AS hireDate
INTO empStaging
UNION ALL
SELECT 'CCC', 'Chet', 'Carlsberg', '2012-01-10'

--Create change log table
CREATE TABLE empChangeLog (id INT IDENTITY(1,1), changeDate DATETIME,
    changeType VARCHAR(10), empId CHAR(3), firstName VARCHAR(50),
    lastName VARCHAR(50), hireDate DATETIME, newEmpId CHAR(3),
    newFirstName VARCHAR(50), newLastName VARCHAR(50), newHireDate DATETIME)

--
--

Merging the Data

There are 2 ways to merge this data.  The most common way before SQL Server 2008 was to use 3 separate statements. This would be an insert, update, and delete statement:

--
--
DELETE currEmployees
FROM currEmployees e
LEFT JOIN empStaging s
 ON e.empId = s.empId
WHERE s.empId IS NULL

INSERT INTO currEmployees(empId, firstName, lastName, hireDate)
SELECT s.empId, s.firstName, s.lastName, s.hireDate
FROM empStaging s
LEFT JOIN currEmployees e
 ON s.empId = e.empId
WHERE e.empId IS NULL

UPDATE currEmployees
SET firstName = s.firstName, lastName = s.lastName, hireDate = s.hireDate
FROM empStaging s
INNER JOIN currEmployees e
 ON s.empId = e.empId
WHERE s.firstName <> e.firstName
 OR s.lastName <> e.lastName
 OR s.hireDate <> e.hireDate

--
--

You see, the problem here is that once you delete, insert, or update the rows you no longer have the old values.  All you have is the new values. So how do you retain the old and new values to insert into a change table? You could load them all into a temporary table before changing them so that you still have them, but that is a waste of resources. This is where you can add in the OUTPUT clause:

--
--
DECLARE @runtime DATETIME --declare a constant runtime for this whole block
SET @runtime = GETDATE() --initialize runtime of block

DELETE currEmployees
OUTPUT @runtime, 'DELETE', DELETED.empId, DELETED.firstName, DELETED.lastName,
 DELETED.hireDate, NULL, NULL, NULL, NULL
INTO empChangeLog
FROM currEmployees e
LEFT JOIN empStaging s
 ON e.empId = s.empId
WHERE s.empId IS NULL

INSERT INTO currEmployees(empId, firstName, lastName, hireDate)
OUTPUT @runtime, 'INSERT', NULL, NULL, NULL, NULL,
 INSERTED.empId, INSERTED.firstName, INSERTED.lastName, INSERTED.hireDate
INTO empChangeLog
SELECT s.empId, s.firstName, s.lastName, s.hireDate
FROM empStaging s
LEFT JOIN currEmployees e
 ON s.empId = e.empId
WHERE e.empId IS NULL

UPDATE currEmployees
SET firstName = s.firstName, lastName = s.lastName, hireDate = s.hireDate
OUTPUT @runtime, 'UPDATE', DELETED.empId, DELETED.firstName, DELETED.lastName, DELETED.hireDate,
 INSERTED.empId, INSERTED.firstName, INSERTED.lastName, INSERTED.hireDate
INTO empChangeLog
FROM empStaging s
INNER JOIN currEmployees e
 ON s.empId = e.empId
WHERE s.firstName <> e.firstName
 OR s.lastName <> e.lastName
 OR s.hireDate <> e.hireDate

--
--

The delete and insert are pretty straightforward, but what you will notice with the update is that it actually does a delete and an insert. Since this is how SQL Server actually does an update, it is also how the data is referenced in the OUTPUT clause. One final way to do this is with the MERGE statement introduced in 2008:

--
--
DECLARE @runtime DATETIME

SET @runtime = GETDATE()

MERGE currEmployees AS e --target
USING empStaging AS s --source
    ON e.empId = s.EmpId
WHEN NOT MATCHED BY TARGET
    THEN INSERT(empId, firstName, lastName, hireDate) VALUES(s.empId, s.firstName, s.lastName, s.hireDate)
WHEN MATCHED AND (e.firstName <> s.firstName OR e.lastName <> s.lastName OR e.hireDate <> s.hireDate)
    THEN UPDATE SET e.firstName = s.firstName, e.lastName = s.lastName, e.hireDate = s.hireDate
WHEN NOT MATCHED BY SOURCE
    THEN DELETE
OUTPUT @runtime, $action, INSERTED.empId, INSERTED.firstName, INSERTED.lastName, INSERTED.hireDate,
    DELETED.empId, DELETED.firstName, DELETED.lastName, DELETED.hireDate
INTO empChangeLog

--
--

And that is how you use the OUTPUT clause to capture data changes!

NOTE: Be sure to drop the 3 tables from tempdb that we created for testing:

USE tempdb

DROP TABLE currEmployees
DROP TABLE empStaging
DROP TABLE empChangeLog

--
--

Leave a Reply