I was just reading through (for the first time in history) a pretty good Microsoft training book (Querying Microsoft SQL Server 2012) and on page 43 I read something new to me. "Consider nonsequential keys, such as random ones generated with NEWID..."
Huh? Keys are always sequential, right? Shouldn't they be? We've all been taught that since day one.
But here's a benchmark:
Shocking -- I know!
Anyway, it has to do with the way SQL Server handles indexes (and of course, you'll almost always have an index on your primary key, which is almost always a surrogate key). If you're inserting new rows into a sequential index (which is most likely the case), you're going to add a new page presumably to the end of the index ... over ... and over ... and over ... and over. Why? Because the last page is going to fill up every few inserts or so. That's cool, except it's not because that page is always going to be latched ... so presumably there's no paralellism going on there? And if you're running an I/O subsystem with more than one disk (which most enterprise applications are in one form or another), then you're not going to benefit from those multiple disks.
On the other hand, if you're inserting rows with a random identifier as the key, the insertion point is also random and therefore 1) multiple processes/threads can write to (and read from) the table it at a time because there is no lock on the page they are trying to access 2) page locks/latches won't occur on the hot spot and interfere with each other and 3) and I/O subsystem is not constantly blocked waiting for the last page to be unlatched.
So I learned something today and it completely shocked my way of thinking about SQL Server. The question now becomes this: how do bulk inserts (say from SSIS) deal with this phenomenon? Will a bulk insert into a table with a random key perform better than one with a sequential key? Next time ...
As a professional, I am a data scientist with an interest in the languages, linguistics, and cultures of ancient civilizations. I primarily work with Microsoft SQL Server, Access, Excel, and PostgreSQL but I have 15 years of experience with databases. I am interested in data analysis, visualization, big data, and business intelligence. I have a master's degree in Classical Studies from UNM where I taught Latin for four years. I run an online dictionary called "Numen - The Latin Lexicon."
Friday, September 6, 2013
Random Surrogate Keys vs Sequential Surrogate Keys
Subscribe to: Post Comments (Atom)
This comment has been removed by the author.ReplyDelete
Thank you so much for writing up such an exclusive and detailed blog post focusing on Postgrsql.ReplyDelete
SSIS Postgresql Write