SQL Server stores its data in its files, there are two types of these files.
- Data File -- Where the actual data is stored. (.mdf, .ndf file extension)
- Log File -- Where the transaction log is maintained. (.ldf file extension)
SQL Server gives you the flexibility to configure the storage location of these files, which is pretty obvious and its behavior which is not very clear. To understand this first let me explain you what a File Group is
A file group is a collection of data files that serves as a logical storage unit for database, so when you define the database objects that are physically stored such as Tables & Indexes you specify the File group you are using for the object, There can be multiple files groups associated with a database instance and each file group can have multiple files in it. So the big question is what do we achieve by doing so?
In my previous posts I have already explained that the storage is finite, small files perform better than large files, distributed information is faster to access than information consolidated in one single file. so making all this presumptions basis of our argument its fair to say that if a database has multiple data files, located on different physical drives it will perform better and will have a better capacity planning than a single data file database.
One thing to note here is the Log file does not belong the a file group and normally there is one log file per database. Log file as the name implies, stores the log of all the changes (inserts, updates, deletes) you made in your database, its used by the DBAs to track the changes and rollback if required. Log files are also utilized in certain backups.
How File Groups & Files works
Now as we have understood what File groups and files are, let us focus on how they work. When you define a File group there are two important properties you configure.
- Read-Only -- This specifies the data in objects built on this file group cannot change. Normally this is set later on and not at the time of database creation. It is used in the cases of Lookup tables or historical data table where you want to ensure that there is no modifications for such tables.
- Default -- There can be only one default File group. At the time of object creation if you do not specify any file group the default one will be utilized. The default file group also contains some meta data about the database which the other file groups does not have.
In the example above I have created two File groups, PRIMARY & SECONDARY. Now once the File groups are defined we will now create files under it.
As you can see there are four files here, 3 of them are data file with file type "Rows Data", while one is a log file. I have associated two files to the PRIMARY file group while the other to the SECONDARY file group. The four important configurations for files are
- File Group -- Which file group this file belongs to
- Initial Size -- When the database is created what should be the file size by default.
- Auto Growth -- How the file grows when the data in the file reaches the file size.
- Path -- Where the file will be stored.
Now once this is defined we are ready to create our tables and other objects. As you can see below that when we are creating an object we can select which file group this object will be created on.
SQL Server handles how the data will be distributed across the files in a file groups and it follows an intelligent round robin method by taking into account the file size and auto growth parameters.
Now everything is good here, but what if one of your table has or it could have massive amount of data and this results in performance issues or storage capacity problems. What we have learned that the objects are created on file groups now how can we divide the storage of an object across multiple file groups to resolve the performance issues. This is where Partitions are used. Partition divides a Table across multiple file groups using a systematic scheme.
Partitioning is achieved by three steps.
- Partition Function -- This is a function that defines how we will distribute the data into partitions.
- Partition Scheme -- Here we tell which file group will be used for each partition created by the partition funciton
- Table Partition -- Once we define the above two we create the table on the partition scheme and the rest is taken care by the SQL Server itself.
First we will define our Partition function.
Now based on this function we will create the partition scheme
And than assign this partition scheme to our table.
and finally tell the SQL Server that use the ProductId column for partitioning through Partition Column List.