SQL Server: Partition

Partitioning is the database process where very large tables are divided into multiple smaller parts. By splitting a large table into smaller, individual tables, queries that access only a fraction of the data can run faster because there is less data to scan.

Vertical Partitioning on SQL Server tables

Vertical table partitioning is mostly used to increase SQL Server performance especially in cases where a query retrieves all columns from a table that contains a number of very wide text or BLOB columns. In this case to reduce access times the BLOB columns can be split to its own table. Another example is to restrict access to sensitive data e.g. passwords, salary information etc. Vertical partitioning splits a table into two or more tables containing different columns:

Example:

As indicated, every page is read from the data cache, whether or not it was necessary to bring that page from disk into the cache for any given read. To reduce the cost of the query we will change the SQL Server database schema and split the Dim_Product table vertically.

Horizontal Partitioning on SQL Server tables

Horizontal partitioning divides a table into multiple tables that contain the same number of columns, but fewer rows. For example, if a table contains a large number of rows that represent monthly reports it could be partitioned horizontally into tables by years, with each table representing all monthly reports for a specific year.

Partitioning column is usually a datetime column but all data types that are valid for use as index columns can be used as a partitioning column, except a timestamp column. The ntext, text, image, xml, varchar(max), nvarchar(max), or varbinary(max), Microsoft .NET Framework common language runtime (CLR) user-defined type, and alias data type columns cannot be specified.

Filegroups: A filegroup is a logical storage unit. Every database has a primary filegroup that contains the primary data file (.mdf). An additional, user-defined, filegroups can be created to contain secondary files (.ndf). We will create 12 filegroups for every month:

Step 1

ALTER DATABASE youtube
ADD FILEGROUP January
GO
ALTER DATABASE youtube
ADD FILEGROUP February
GO
ALTER DATABASE youtube
ADD FILEGROUP March
GO
ALTER DATABASE youtube
ADD FILEGROUP April
GO
ALTER DATABASE youtube
ADD FILEGROUP May
GO
ALTER DATABASE youtube
ADD FILEGROUP June
GO
ALTER DATABASE youtube
ADD FILEGROUP July
GO
ALTER DATABASE youtube
ADD FILEGROUP Avgust
GO
ALTER DATABASE youtube
ADD FILEGROUP September
GO
ALTER DATABASE youtube
ADD FILEGROUP October
GO
ALTER DATABASE youtube
ADD FILEGROUP November
GO
ALTER DATABASE youtube
ADD FILEGROUP December

step 2

SELECT name AS AvailableFilegroups

FROM sys.filegroups

WHERE type = 'FG'

step 3

add .ndf file to every filegroup:
ALTER DATABASE [youtube]
ADD FILE
(
NAME = [PartJan],
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\DATA\youtube.ndf',
SIZE = 3072 KB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024 KB
) TO FILEGROUP [January]

ALTER DATABASE [youtube]
ADD FILE
(
NAME = [PartFeb],
FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL16.MSSQLSERVER\MSSQL\DATA\youtube1.ndf',
SIZE = 3072 KB,
MAXSIZE = UNLIMITED,
FILEGROWTH = 1024 KB
) TO FILEGROUP [February]

Repeat same for all group

SELECT
name as [FileName],
physical_name as [FilePath]
FROM sys.database_files
where type_desc = 'ROWS'

Partition function: A partition function is a function that maps the rows of a partitioned table into partitions based on the values of a partitioning column.

CREATE PARTITION FUNCTION [PartitioningByMonth] (datetime)
AS RANGE RIGHT FOR VALUES ('20140201', '20140301', '20140401',
'20140501', '20140601', '20140701', '20140801',

'20140901', '20141001', '20141101', '20141201');

Create Partition Schema:

CREATE PARTITION SCHEME PartitionBymonth
AS PARTITION PartitioningBymonth
TO (January, February, March,
April, May, June, July,
Avgust, September, October,

November, December);

Lets Create Table using PartitionbyMonth Schema:

CREATE TABLE Reports
(ReportDate datetime PRIMARY KEY,
MonthlyReport varchar(max))

ON PartitionBymonth (ReportDate);

INSERT INTO Reports (ReportDate,MonthlyReport)
SELECT '20140105', 'ReportJanuary' UNION ALL

SELECT '20140205', 'ReportFebryary'

Lets verify the row in different partition:

SELECT 

p.partition_number AS PartitionNumber,

f.name AS PartitionFilegroup, 

p.rows AS NumberOfRows 

FROM sys.partitions p

JOIN sys.destination_data_spaces dds ON p.partition_number = dds.destination_id

JOIN sys.filegroups f ON dds.data_space_id = f.data_space_id

WHERE OBJECT_NAME(OBJECT_ID) = 'Reports'