So far, the concepts I have discussed deal with improving the reliability of the data. I have mentioned striping in the earlier sections but have not explained thoroughly what it is. Striping improves the performance of the array by distributing the data across all the drives. The main principle behind striping is parallelism. Imagine you have a large file on a single hard drive. If you want to read the file, you have to wait for the hard drive to read the file from beginning to end. Now, if you break the file up into multiple pieces and distribute it across multiple hard drives, you have all these drives reading a part of the file at the same time. You only have to wait as long as it takes to read each piece since the drives are working in parallel. The same is true if you were writing a large file to a disk. Transfer performance is greatly increased. The more hard drives you have, the greater the increase in performance. The number of drives is also the same as the stripe width, that is the number of simultaneous stripes that can be transferred simultaneously. How does this actually work though?
Every piece of data that comes into the RAID controller is divided into smaller pieces. There two levels of striping that use different techniques to divide the data, byte level and block level striping. Byte level striping involves breaking up the data into bytes and storing them sequentially across the hard drives. For example, if the data is broken into 16 bytes and there are 4 hard drives, the first byte is stored in the first hard drive, the second byte in the second drive, and so on. The fifth byte is stored in the first hard drive and the cycle continues. Sometimes byte level striping is done using 512 bytes at a time. Block level striping involves breaking up the data into a given size block. These blocks are then distributed the same way across the array as in byte level striping. The size of these blocks is called the stripe size. A variety of stripe sizes are usually available depending on the RAID implementation used.
The stripe size is a largely debated topic. There is no ideal stripe size but certain sizes work best with certain applications. The performance effects of increasing or decreasing stripe size are apparent. Using a small stripe size will enable files to be broken up more and distributed across the drives. The transfer performance will increase due to the increased parallelism. However, this also increases the randomness of the position of each piece of the file. As you probably guessed already, using a large stripe size will do the opposite of decreasing the size. The data will be less distributed and transfer performance is decreased. The randomness is decreased as well. The best way to find out the right stripe size for your particular application is to experiment. Start out with a medium stripe size and try decreasing or increasing the size and recording the difference in over-all performance.
The above diagram is another simplified model of how striping works. The data file that comes in is broken up into 6 blocks (A,B,C,D,E,F) and distributed across those two drives. If you had more hard drives, each block would have been distributed to those as well. Now if you want to move or transfer the file somewhere, the controller accesses both drives simultaneously, which is where the performance gain kicks in. It only takes half the time to transfer the file. If you increase the number of hard drives, the file will be transferred in 1/Nth the time it takes to transfer from 1 hard drive (where N is the number of drives). The next section will involve the various levels of RAID, so you will see how these concepts fit into all this.
>> Levels Of RAID