The ProblemSelectivity in a database is a truly important concept -- fundamental, to some degree. But it's also very abstract, weird, hard to understand, and hard to grasp. Of course, once you grasp it, you've got it forever. Database selectivity is like learning to ride a bicycle for the first time or learning to tie shoelaces or learning to walk even. So why is it that so many tutorials and blogs and websites completely skip over this or -- even worse -- give some arcane technically jargon-filled definition? This article is to help you understand database selectivity from many points of view and with many examples. This article is about one concept and one concept only.
|Look! It's a kitty cat! Meow!|
- The degree to which one value can be differentiated within a wider group of similar values.
This is fairly esoteric. I think this definition makes my head hurt a bit. First of all, what sort of degrees are we talking about? It's a polysemous word. Polysemous is just an adjective that means "a word with many meanings, some of which might be unrelated." Differentiated? Wider group? Without a frame of reference, this one is very confusing.
- In SQL, the term selectivity is used when discussing database indexes. The selectivity of a database index is a numeric value that is calculated using a specific formula. That formula actually uses the cardinality value to calculate the selectivity. This means that the selectivity is calculated using the cardinality – so the terms selectivity and cardinality are very much related to each other. Here is the formula used to calculated selectivity: Selectivity of index = cardinality/(number of rows) * 100%
I actually like this definition quite a bit because it's a mathematical formula. Unfortunately, it requires an understanding of cardinality prior to understanding selectivity. That makes it difficult if you don't have a grasp on cardinality, which is another fundamental concept in databases.
- Index selectivity is a number the determines the effectiveness of index. Best possible selectivity is when all table records have a different value for the columns in index (this is typical for primary keys and unique constraints).
This is another highly technical definition. You need to know all kinds of concepts before you can get anything out of this one. Not only that, but it's badly phrased and written in a non-standard dialect of English.
So here's my definition of database selectivity: In any given set of data, selectivity refers to the percentage of how many unique values exist compared to how many pieces of data exist.
Selectivity = number of unique (or distinct) values / total number of values
Attributes of Selectivity
- Is the child's name column highly or lowly selective and why?
- Is the pet's type column highly or lowly selective and why?
- Is the town's name column highly or lowly selective and why?
- Highly selective -- because the percentage is closer to 100% (percentage in these cases referring to the number of distinct values divided by the total numbers of values in a column).
- Somewhat selective -- because the percentage is closer to the middle or 50%.
- Lowly or not selective -- because the percentage is closer to 0%.
Example Using Circles
Example using a Database Query
SELECT COUNT( DISTINCT ColName ) AS DistinctValues, COUNT( ColName ) AS TotalNumberOfValues, COUNT( DISTINCT ColName ) / COUNT( ColName ) * 100.0 AS Selectivity FROM TableName;
Looking at the results, you can see that one has a very high selectivity (100%) and the other has very low selectivity (0%). Can you explain that? If you can, then you are probably getting the hang of selectivity.
Here's another question: which one of the results represents the primary key of the table and why?
Why Does it Matter?
The fact of the matter is that the higher the selectivity on an index (or in other words, the more unique values it has), the better that index will work for filtering data and joining tables. The reason that's true is the subject of another article altogether, so I will leave that for another day. In the meantime, try to understand selectivity and return to this article and the many more advanced articles on the interwebs to try to deeply ingrain this concept in your databasing soul!