Search Wiki:

How to Create and Use A Numbers Table


Jonathan Kehayias, April 15, 2008

A Numbers table is simply a table that holds a single integer column with each
positive integer number in the range from the number 1 to the maximum value
of an integer 2,147,483,647. (Note: When creating the table off the exact
code provided in this example below, your Numbers table will be
roughly 8GB in size. This is overkill in most cases and you should
size the number of rows being create according to your own needs.)
Creating a Numbers table is really simple to do and it
can be used to solve a broad number of problems.

Some Examples of how to use a Numbers Table are:
  1. Identify Missing Dates
  2. Get All Dates Between Two Dates
  3. Identify Missing Numeric Values
  4. Get All Numbers Between Two Values

These solutions apply to all versions of SQL Server.


Create Number Table
-- Suppress data loading messages
SET NOCOUNT ON
 
-- Create Sample Data using a Table Varable
SELECT TOP 2147483647 IDENTITY(INT,0,1) AS N
INTO Numbers
FROM sysobjects a, sysobjects b, sysobjects c, sysobjects d, sysobjects e
 
-- Create a Primary Key and Clustered Index to control data order
ALTER TABLE dbo.Numbers ADD CONSTRAINT
PK_Numbers PRIMARY KEY CLUSTERED (N) 
 
Return to Top


Identify Missing Dates In a Table
-- Suppress data loading messages
SET NOCOUNT ON
 
-- Create a table variable for the test data
DECLARE @TestTable TABLE
(RowID int identity primary key,
 DateField datetime)
 
-- Add some dates to the table
INSERT INTO @TestTable VALUES ('01/01/2007')
INSERT INTO @TestTable VALUES ('01/05/2007')
INSERT INTO @TestTable VALUES ('01/07/2007')
INSERT INTO @TestTable VALUES ('01/08/2007')
INSERT INTO @TestTable VALUES ('01/09/2007')
INSERT INTO @TestTable VALUES ('01/10/2007')
INSERT INTO @TestTable VALUES ('01/12/2007')
INSERT INTO @TestTable VALUES ('01/14/2007')
 
-- Find the dates not in the table an in the first 14 days of January
SELECT DATEADD(dd, N, '01/01/2007') as MissingDates
FROM Numbers
WHERE n < 14
  AND DATEADD(dd, N, '01/01/2007') NOT IN (SELECT DateField FROM @TestTable)
 
 
/* Results
MissingDates
-----------------------
2007-01-02 00:00:00.000
2007-01-03 00:00:00.000
2007-01-04 00:00:00.000
2007-01-06 00:00:00.000
2007-01-11 00:00:00.000
2007-01-13 00:00:00.000
*/
Return to Top



Get All Dates Between Two Dates
-- Suppress data loading messages
SET NOCOUNT ON
 
-- Set Date Range Values
DECLARE @StartDate datetime
DECLARE @EndDate datetime
 
SET @StartDate = '01/01/2007'
SET @EndDate = '01/07/2007'
 
-- Find the dates between these two dates
SELECT DATEADD(dd, N, @StartDate) as DatesBetween
FROM Numbers
WHERE n < (DATEDIFF(dd, @StartDate, @EndDate)+1)
 
 
/* Results
DatesBetween
-----------------------
2007-01-01 00:00:00.000
2007-01-02 00:00:00.000
2007-01-03 00:00:00.000
2007-01-04 00:00:00.000
2007-01-05 00:00:00.000
2007-01-06 00:00:00.000
2007-01-07 00:00:00.000
*/
Return to Top



Identify Missing Numeric Values
-- Suppress data loading messages
SET NOCOUNT ON
 
-- Create a table variable for the test data
DECLARE @TestTable TABLE
(RowID int identity primary key,
 DateField datetime)
 
-- Add some dates to the table
INSERT INTO @TestTable VALUES ('01/01/2007')
INSERT INTO @TestTable VALUES ('01/05/2007')
INSERT INTO @TestTable VALUES ('01/07/2007')
INSERT INTO @TestTable VALUES ('01/08/2007')
INSERT INTO @TestTable VALUES ('01/09/2007')
INSERT INTO @TestTable VALUES ('01/10/2007')
INSERT INTO @TestTable VALUES ('01/12/2007')
INSERT INTO @TestTable VALUES ('01/14/2007')
 
-- Delete Odd Numbered Rows From Table
DELETE @TestTable WHERE RowID IN (1,3,5,7)
 
-- Find the missing row Numbers
SELECT N as MissingRows
FROM Numbers
WHERE N > 0 -- RowID's are greater than 0
  AND N <= (SELECT MAX(RowID) FROM @TestTable) -- Constrain to valid RowID's
  AND N NOT IN (SELECT RowID FROM @TestTable) -- RowID not in the table
 
 
/* Results
MissingRows
-----------
1
3
5
7
*/
 
Return to Top



Get All Numbers Between Two Values
-- Suppress data loading messages
SET NOCOUNT ON
 
-- Set Number Range Values
DECLARE @Start int
DECLARE @End int
 
SET @Start = 14
SET @End = 22
 
-- Find the numbers between these two numbers
SELECT N as NumbersBetween
FROM Numbers
WHERE N BETWEEN @Start AND @End
 
 
/* Results
NumbersBetween
--------------
14
15
16
17
18
19
20
21
22
*/
Return to Top

___________________________________________________________________________________________________________________
Page Created By: Jonathan Kehayias, Apr 15, 2008
Last edited Jul 2 2008 at 9:18 PM  by JonathanKehayias, version 9
Comments
Erik wrote  Apr 15 2008 at 7:41 PM  
The following is convenient, and easy to understand, but is not best practice and can be very costly.

AND DATEADD(dd, N, '01/01/2007') NOT IN (SELECT DateField FROM @TestTable)

IN is logically expanded to an OR clause. The query engine is probably smart enough to convert this to a join. But the trick is that you can't rely on it to do that. The more complicated your query gets, the less likely the optimizer will make the expected conversion. If you have a table with MANY rows, and the optimizier leaves it in as an OR statement, you suddenly have Bad Performance. Don't use shortcuts based on what you *think* the optimizer will do.

Best practice is to use a join.

JonathanKehayias wrote  Apr 24 2008 at 10:28 PM  
Erik,

You are absolutely correct that a JOIN is a better option overall for this and I will be updating this sample to reflect this. The most optimal code for this is as follows:

SELECT DATEADD(dd, N, '01/01/2007') as MissingDates
FROM Numbers
JOIN @TestTable ON DATEADD(dd, N, '01/01/2007') = convert(datetime, convert(varchar(10), DateField, 101))
WHERE n < 14

In testing this against a table with just under 2.5 million rows, and I had no perceivable speed gains by changing this query out to using a join. However with Statisitics IO and Statistics Time turned on I did encounter about 1/2 of the logical reads, and about 1 second less cpu and execution time.

USING NOT IN:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Numbers'. Scan count 3, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Contacts'. Scan count 5, logical reads 86821, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 2329 ms, elapsed time = 1874 ms.


USING A JOIN:
SQL Server parse and compile time:
CPU time = 0 ms, elapsed time = 1 ms.
Table 'Numbers'. Scan count 3, logical reads 7, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Contacts'. Scan count 3, logical reads 40656, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 1749 ms, elapsed time = 1066 ms.


Thank you for pointing this out, though for simplicity of code sake, I would still probably use the NOT IN for this minute of a difference on just under 2.5 million rows of data.

pbnec wrote  Dec 20 2008 at 4:36 AM  
I think creating a numbers table is inefficient for finding dates... I would rather create a dates table for finding dates...
See also the article "Simplify SQL Server 2005 queries with a Dates table" in~
http://blogs.techrepublic.com.com/datacenter/?p=326

Jeff_Moden wrote  Dec 29 2008 at 3:19 AM  
I'm betting no one has actually used this to create a billion rows... if you did, you'd find a greatly engorged log file even if it's in the SIMPLE recovery mode. It's quite safe to use for up to 121 million rows if the cross join is on Master.dbo.SysColumns (which works in 2k and 2k5 equally as well) because you only need 1 self join, but above that, you really need to watch the log file of whatever database you're creating the table in.

There is a way to create larger files using Itzik's numbers generator code and it doesn't cause the same log file bloat.

unglesb wrote  Dec 26 2012 at 2:31 PM  
With the JOIN, you will get the records that do exist, not those that don't exist. If you really want to use a join, you could do a LEFT OUTER JOIN and select the records where the Dates table is null.
SELECT DATEADD(dd, N, @min_date) [missing_dates]
FROM NUMBERS_TABLE
LEFT OUTER JOIN #DATE_TABLE DT ON
DATEADD(dd, N, @min_date) = dates
WHERE N < @diff
AND dates IS NULL

Updating...
Page view tracker