Databases simply cannot exists without a storage, in fact database is a layer between the end user(application) and the physically stored data. So how you setup your storage defines how good your database design is. When a database storage or as a matter of fact any storage is designed, following points must be considered
- Performance -- How fast is my storage
- Reliability -- Will I lose data if my storage fails
- Scalability -- What if I need more storage space
- Ease of Management -- Can I manage my storage
- And everyone's favorite "Cost" -- Will the cost of my storage fit my wallet.
Before going into the Enterprise storage solutions let us first look into the difference between a single storage device vs RAID.
Single Device vs RAID
What we use in our daily lives is a single device, such as a hard disk. It can be internal or external. but that's one single device, you can attach another hard disk in your computer but that will be another single device. Both of these hard disks will work independently to each other and for good reasons don't even know the other exists or not and in your operating system they will show up as separate devices. Obviously you can have partitions on it but that's a different story. So whats wrong with these single devices and why do we need anything else, the answer for this you have already read, if not go back to the beginning of my article where I talked about the different things you should be looking in your storage i.e. Performance, Reliability. Scalability, Ease of Management and Cost. We will rule out the ease of management and cost here as single devices offer the best in this, but lets focus on the other important factors.
Performance - Single devices have single controllers hence they can write only one bit of information at a time or in other words multiple write operations cannot run in parallel. and one write operation will start when the other finishes. This means the single devices are not high performing.
Reliability - Single devices have definite single point of failure, if this device crashes, burns or destroyed you lose all your data with it. which is quite frankly very scary.
Scalability - Single devices are not scalable and your data is ever growing, so if you need more storage space, you either replace it with a bigger disk and go through the pain of transferring all your data or attach another one but obviously that will exist as a separate device and you could only have a few devices connected to your machine. So sooner or later you might be replacing your disk.
so here comes RAID (Redundant Array of Independent Disks) to the rescue, RAID is collection of two or more disks that are connected to the machine as one logical disk. RAID has different schemes which offers performance, reliability, scalability or all of them. We will talk about the most frequent schemes here for the rest you can refer to our favorite site Wikipedia. All different RAID schemes offers scalability so if you need more space just add another disk in the RAID and Eureka, Infact many RAID devices has hot swappable feature which makes it possible to Add or remove disk without disconnecting or switching of the RAID.
RAID 0: RAID 0 provides data stripping, which means the data is divided on the disks in the RAID, this scheme provides performance as the read/write operations are performed simultaneously on all the disks, the drawback with this scheme is that it provides no redundancy and a single disk failure results in full RAID failure.
RAID 1: This schemes provides data mirroring, which means that the same set of information is written across all the disks hence providing data redundancy. This model does not offer any considerable performance gains, but it provides fault tolerance so in case of a disk failure the other disk will provide the same data and there is no data loss.
RAID 1+0 or RAID 1E: This schemes combines the advantages of both model by providing data stripping and mirroring. This scheme works with atleast 3 disks in RAID by distributing the data on multiple disks to gain performance but maintaining the copy of each block on a other disk to achieve redundancy, so in case of a disk failure the rest of the disks can still provide all the data.
RAID 5 : RAID 1E looks like a perfect solution but as you know there is always a room for improvement, The problem with RAID 1E or RAID 1 is that we are dedicating a large portion of the disks for a redundant data which we might never use, and disks are not that cheap. So if we can find a way to have fault tolerance but at the same time reduce the space the redundant blocks of data are taking than it would be much better. So this is where RAID 5 is the savior. RAID 5 provides performance by data stripping across multiple disk but instead of storing a redundant block it stores the Parity information. and in case of failure it reconstructs the information by combining the parity information with the data on the other disks. RAID 3 and 4 also works on this parity model but the location of this parity block is dedicated while in case of RAID 5 parity is also distributed.
Now as we have understood the difference between single storage devices and RAID, we will now look into how these storage devices are used in an enterprise.
DAS (Direct attached storage)
Direct Attached Storage is a storage device (Single Device) or a collection of storage devices (RAID) that is connected the the Server directly, It can be internal or external and are connected to the server/machine using interfaces such as SATA, SCSI, USB or Firewire. These storage devices does not work independently and is managed by the machine its connected to. These devices can be shared over the network, but the request passes through its connected server machine.
NAS (Network Attached Storage)
NAS is a file server, NAS is similar to DAS the difference is that instead of it being connected to a server machine, its directly connected to the network, it has its own IP address/Host name and the NAS Device has its own OS which only provides File System level operations. Like DAS, NAS can also be in Single device or RAID model. NAS is used for centralized storage model, like backups etc.
SAN (Storage Area Network)
While NAS works fine in a network it has its own shortcomings, first it uses Ethernet protocol for communication which does not provide very high data transfer speeds and this ethernet is in most cases not dedicated for storage therefore you have other network traffic on it which further reduces the performance, secondly NAS does not distribute its storage among servers, it is one device with lots of storage space but all the machines accessing it has access to the same storage, except for if you put some security permissions. Now here is the beauty of SAN, SAN is a network of storage devices, so all the storage devices/servers you add in the SAN adds up into the total storage space of SAN, SAN has its own network that works over FibreChannel instead of ethernet that offers much much more faster data transfer speeds and only the servers that are required this storage are connected to this network using special NIC called Host Bus Adapter (HBA). SAN Storage shows up as physical drives instead of network shares in the servers. But the most prominent feature of SAN is its storage allocation, let us take an example. Say you have three servers you have connected to the SAN, and the total storage capacity of your SAN is 5 Tera bytes. so if you want to give Server 1 & 2 around 1 Tera byte each and the other rest of the storage space you could configure this in SAN Manager and each server will have drives with the required storage and at any point of time this storage division can be changed without affecting the servers. This gives the companies much more flexibility in terms of how to use your storage intelligently. SAN is almost the perfect technology but because of its obvious benefits and advanced technology the cost of implementing and maintaining a SAN is much more higher than the other schemes.