What Is A Hard Disk Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Today storage devices come in many different forms, types and sizes. However, for many decades, storage media has been dominated by magnetic technology. Even magnetic storage devices themselves have evolved over the years: from the first tapes to the 6Gbps Hard Drives of today.

What is a hard disk?

Basically a hard disk is a sealed container, housing a number of 2-sided disks made of aluminium called platters in a stack formation. Then on each platter an electromagnetic read/write head is positioned on the top and bottom surfaces. As all the platters spin, the drive heads move in and out to and from the centre thus being able to cover the entire surface of each platter.

Figure1: Parts of a Hard Disk

Source: http://www.ntfs.com/hard-disk-basics.htm

How is data stored on a hard disk?

In any magnetic storage device, data is stored by the presence or absence of magnetism on a magnetic surface, representing the binary values 1 and 0. In a Hard Disk a further level of organisation is needed, in order to allow for random access. In fact the hard drive is divided in tracks and sectors. Tracks are thin circular bands all along the platter like circular rings. Each track is composed of multiple sectors. "A sector is the smallest physical storage unit on a disk, and is almost always 512 bytes (0.5kB) in size."[1]

Figure2: Sectors and Clusters

Source: http://www.ntfs.com/hard-disk-basics.htm

Today the structures of modern hard disks use a "translocation factor to make their actual hardware layout appear continuous, as this is the way that operating systems from Windows 95 onward like to work."[1]

When a disk is formatted the tracks are numbered starting from 0 (usually the outside edge of the platter) and keep going close to the centre. Usually the highest numbered track is numbered 1023 thus comprising a total of 1024 cylinders. "To the operating system the tracks are logical rather than physical in structure, and are established when the disk is low-level formatted."[1]

Hard disk Layouts

Figure3: Disk Layout Methods

Source: Slides of Computer Organization and Architecture by William Stallings

When reading data, the platters always rotate at constant rpm. Today Hard disks reach speeds of up to 7,200 rpm and higher. The problem with this is that when the head is placed closer to the centre of the disk, the head will be reading/writing more slowly than when placed near the outside of the disk. The only way to solve this is using different data densities when writing on different tracks, so that the heads will always read at the same speed. The inner tracks will have more data density than the outer tracks as show in Figure 3(a). This however wastes storage resources, since the outer tracks will only store the same amount of data as the inner tracks, even though they have more area. To increase the density required, modern hard disks use a technique called multiple zone recording. This means that the surface is divided into multiple zones (usually 16) in which ones that are farther from the centre, contain more bits (sectors) than those closer than the centre but still for each zone, the number of bits per track is constant. At the expense of slightly more complex circuitry, a greater storage capacity is achieved, which all in all is what really matters today given the advancements of technology. In the case of figure 3(b) each zone is only a single track wide. [2]

It is obvious that to store more data, either you have to enlarge the hard disk or include more tracks and sectors (more data density). But for the latter to happen you must have heads that are able to read and write to these tracks. Thus the narrower the head the narrower the tracks can be. But unfortunately the narrower the heads the closer they must be to the surface which means a greater risk of impurities and imperfections. This is where the innovative Winchester disk comes in. Its head, which is just a piece of very thin foil, rests lightly on the surface of the platter in a sealed drive assembly when the disk is at rest. The spinning of the disk is enough for the foil to just lift above the surface while still being able to read/write.

There must be some space reserved for hardware track-positioning information which is not available to the operating system. Thus for example only 3 sides would be available for data in a 2 platter system. This track-positioning data is written during factory assembly. [1]

Moreover there must also be a way to locate each sector position within the tracks themselves. There must be a way of identifying each starting and ending point of each sector. This is usually handled by the control data recorded on the disk itself. Once again this data is not accessible by the user but this time these are created during formatting. This identification is written to the area immediately before the sector contents. [2], [1]

Figure4: Tracks and Cylinders

Source: Slides of Computer Organization and Architecture by William Stallings

When it comes to reading the data there are different ways to do this depending on the type of hard disk. There were cases (today are very rare) where the head was fixed. In this case there is one read-write head for each track. All heads are mounted on a rigid arm that covers all the tracks. Another approach uses movable-heads, one on each surface. Each head has its own retractable arm that moves in and out to cover all the tracks. [2] Most of today's Hard Disks feature a movable-head approach and have 2 or more double sided platters, each with its own read/write head. Each arm is mounted on a spindle and all the arms move at the same time. Thus at all times the heads are positioned at the tracks with same distance from the centre. This set of all track with same relative position are referred to as a Cylinder. [2]

When it comes to storing a file on a hard disk, the best way to do this is obviously to store it in a contiguous series (one part after the other in a single line). It is the Operating System's job to allocate enough sectors for a file to be stored. One or more consecutive sectors are known as Clusters. They are called clusters because only data is stored in them. This process protects the data from being overwritten. "A cluster could consist of 1 sector (2^0), or, more frequently, 8 sectors (2^3)". [1] The number of sectors must be a power of 2 - hence you cannot have 10 sectors but rather 8 or 16. The problem arises when contiguous clusters are not available, thus the remaining clusters have to be written on some other free part of the disk, preferably on the same cylinder or on a different cylinder. Thus a file stored like this is considered as fragmented. This can slow down system performance when retrieving a file since the head must travel to different addresses to retrieve it and causes a delay to retrieve it all. Using large cluster sizes will help solve this problem since a file might only need 1 cluster instead of 2 for example. The problem arises that if for example a file is only half a cluster, it is still going to occupy 1 whole cluster, thus you might end up with wasted space.

Logical Hard disks Organisations - Partitioning and RAID

Partitions and Master Boot Record

Partitioning a Hard Disk is when a Hard Disk Drive is divided into multiple logical storage units called partitions thus making one Hard Disk drive act as multiple ones. This can have multiple benefits. First of all it is very helpful to have a partition just for the Operating system so that one can keep backups and clones very easily. Moreover using multiple partitions can allow for multi-boot setups where one can keep more than one Operating system installed. Having large storage means that the Master File Table in NTFS systems has to be larger and thus might take longer to traverse, compared to having several smaller partitions each with its own (smaller) MFT. [3]

Probably the most important data structure on the disk is the Master Boot Record which is created with the first partition. The MBR is always the first sector on every disk and its location is always track (cylinder) 0, side (head) 0, and sector 1. "It contains the Partition Table for the disk and a small amount of executable code." [4] The code is used to identify the system partition and examines the Partition Table. The Master Boot Record locates the system's partition's starting location and loads a copy of its Partition Boot sector into memory before transferring execution to the executable code mentioned above. [4]

A problem with the MBR is that it loads before everything even the Operating System. Thus if it is affected by a virus, no Operating System can recover or detect from a corrupted MBR. Luckily there are some utilities that allow the user to save and restore the MBR if necessary. [5]

The Partition table mentioned above put simply is information about primary partitions and an extended partition. It is independent of the Operating System and always has the same layout. Each entry is 16bytes long and there are a total of 4 entries. [6]


RAID stands for Redundant Array of Independent Disks. It is the opposite of Partitioning since in RAID multiple Disks are considered as one and using different RAID techniques gives different results and design architectures. There are a total of 7 levels of RAID schemes, 0 to 6. These levels all share some common characteristics:

They are all viewed as a single logical drive by the operating system.

Data is distributed across all the physical drives

Redundant disk capacity is used to store parity information, which guarantees data recoverability in case of a disk failure. [2]

Although these are common for all RAID schemes, the second and third characteristics differ for the different levels of RAID. Unfortunately RAID 0 also, does not support the 3rd characteristic. RAID was proposed to show the need for redundancy. The advantage is that using multiple read/write heads at the same time means higher transfer rates to and from the disk but increases the chance of failure. To comeback this, RAID makes use of stored parity information that enables the recovery of data lost due to a disk failure. [2] By far the most popular RAID configurations are RAID 0 and RAID 1. [7] RAID 0 is more popular in gaming PCs where speed and performance are needed the most while RAID 1 is more used in server based computers where data redundancy is a must for storing and backups.


RAID 0 is usually not considered as part of the RAID family, since it does not provide redundancy of data for reliability in case of failure. RAID 0 is more optimised for speed since data is striped across all the disks. Thus data write and read speeds are increased since instead of having a file stored in one disk you have it stored in multiple disks which mean more read/write heads and more simultaneous accesses can be carried out.

Basically what the RAID card does is that it divides the disks into logical strips which could be physical blocks, sectors or other units and these strips are:

"mapped round robin to consecutive array members. A set of logically consecutive strips that maps exactly one strip to each array member is referred to as a Stripe" [2].

This statement can be shown in Figure5. The advantage as stated before is that if an I/O request has multiple logical strips, that request can be handled in parallel, thus reducing I/O transfer time. [2]

But to achieve high data transfer one still needs to fulfil 2 requirements. First of all one needs a high capacity path between host, memory and disk drives. Secondly the application that makes the request must also drive the disk array efficiently. [2]


RAID 1 is also different from the other remaining RAID configurations when it comes to redundancy. RAID 2 - 6 all use some sort of parity calculations. In the case of RAID 1, redundancy is achieved simply by duplicating all the data in the original disk to the other disks like a backup. Like RAID 0 it also makes use of data stripping but instead each strip is mapped to 2 separate disks. [2]

The advantages of using RAID 1 are simple:

When reading data, this can be done from any of the 2 drives, thus usually data is accessed from the disk that takes the lowest seek time and rotational latency.

When writing data to both disks it is still done parallel thus the longest time is the time it takes to write on the slowest disk. Moreover there is no update penalty unlike RAID 2 to 6, where the parity has to be updated as well.

It is simple to recover in case of failure, if one drive fails try accessing the data from the second one.

One major disadvantage however is that one requires double the disk space which usually means approximately double the cost. Thus RAID 1 setups are more popular in system software storing and backups for critical files. RAID 1 provides real-time backup of all data so that if a failure in one of the disks occurs, one can be sure that he can access the data immediately over the second disk. [2]

One other important feature is that in the case were read access times are important one can configure the setup to read the data from both disks thus acting like a RAID 0 configuration kind of way. [2]


RAID 2 also makes use of data striping but this is usually done at a byte level. The system then uses Hamming code for error checking. Each corresponding bit on each data disk is checked and corrected using the previously mentioned code. The advantage of the Hamming code is that it can detect double-bit errors and correct single-bit errors. [2]

The disadvantage is that although the setup requires fewer disks than a RAID 1 configuration, it still is rather costly. The number of redundant disks is always the log of the number of data disks. When reading, all the discs are accessed simultaneously to access the data required while also fetching the associated error-correcting code. If there is a single-bit error the system can recognize and fix the error at that same time so as not to slow down the read access. When writing all the data and parity disks have to be accessed. [2]

RAID 2 is very effective in cases where many errors occur since there is a lot of redundancy and error control. Since today Hard disks have become more reliable, one can safely say that RAID 2 is overkill and thus is rarely or even not used at all! [2]


This is very similar to the RAID 2 configuration. The difference and advantage over the RAID 2 setup is that no matter how large the disk array is, it only requires 1 single redundant disk. Once again data striping is used in small strips. Instead of an error correcting code it used a simple parity for every individual bit on all the data disks that are in the same position. [2]

In the case of 1 drive failure recovery is simple: the parity disk is accessed and the lost data is reconstructed from the remaining data disks. This procedure is used also for the remaining RAID configurations. [2]

Another advantage of using RAID 3 is that if just 1 drive fails, when reading data can be calculated on the fly. When it comes to writing data extra care is taken to conserve the parity drive to allow for later use. If one wants to return to full configuration, the failed drive must be replaced and the entire contents of the failed drive be regenerated once again on the new drive. [2]

RAID 3 allows for parallel transfer of data from the entire data disk. Unfortunately it only allows for a single I/O request at a single time. [2]


RAID schemes from 4 to 6 all make use of what is known as independent access technique. This allows for separate I/O requests to be done in parallel. Thus they are mostly used in setups where data transfer rates are important. Once again data striping is used and the strips are relatively large for the remaining RAID setups. [2]

RAID 4 uses a bit by bit parity calculation for each strip on the data disks and the parity is stored on the corresponding strip of the parity disk. Unfortunately it has one drawback when writing it requires 2 reads and 2 writes since it must read the old data and parity strips and then update them. [2]


RAID 5 works exactly like RAID 4 but instead of having a dedicated parity disk it uses a round robin method (most popular) to store parity data on the data disks themselves. This thus solves the I/O bottleneck of the RAID 4 configuration. [2]


RAID 6 makes use of 2 parity disks to calculate 2 different parity calculations. The advantage is that up to 2 disks can fail and the system continues working. Unfortunately it also has a write penalty since each write affects the two parity blocks. [2]

Error Control Techniques

When it comes to digital storage there is a considerable probability of errors being present. Thus many error checking and control techniques where developed to help solve this problem.

One of the first used and by far the simplest error detection schemes is the Parity Bit. A parity Bit is an extra bit added to a collection of bits to check that the numbers of bits (that are 1) are even or odd. Thus if a bit changes during storage or transfer the parity bit might point out that the group of bits is not correct. Unfortunately this is not flawless since if two bits are changed, the parity is still correct but the data itself is not. There are 2 types of parity Even and Odd. Even parity gives 1 when the number of bits (excluding the parity) is odd while in the Odd case the converse is true. [8]

Another error control technique that provides more reassurance about data integrity is the Cryptographic hash functions, assuming that all changes in the data where accidental. This function takes a block of data and returns a fixed size bit string, the actual hash value. Any change to the actual data will change the hash value. [9]

Error correcting codes are also used for error detection. Codes usually use Hamming functions. Hamming distance is used to detect errors depending on the distance. Using smaller distance codes for error detection is useful if one wants a limit on the minimum number of errors to be detected. [8]

CRC or Cyclic Redundancy Checks are non-secure hash functions that detect accidental changes to digital data mostly in computer networks but they are also used in hard disks since they are very efficiently implemented in hardware. They make use of a single-burst error-detecting cyclic code and non-secure hash function to detect these changes. [8]

Error correcting codes can be divided into 2 groups:

Convolutional codes: these are codes that are processed on a bit-by-bit basis. They are more suited for hardware use. An example is the Viterbi Decoder

Block Codes: these codes work on a block-by-block basis. Examples of these are Hamming codes, repetition codes and parity-check codes. [8]

As mentioned above RAID setups also have built in error detection techniques and thus can also be considered as ways of error control.

Section B - Optical Memory

Optical Memory was and still is one of the most popular means of storage especially when it comes to movies or audio. From the first CD to the latest in Blu-ray DL technology, this medium has undergone some major changes but the theory behind it remains approximately the same.


Figure5: CD Operation

Source: Slides of Computer Organization and Architecture by William Stallings

In the case of CD's, the disks are made from a resin, such as polycarbonate. Data in digital form is stored as a series of microscopic pits or holes, representing binary digits. In industry when many copies need to be made a master is made out using a fine laser. Then the copies are literally 'stamped out' of the master also made of polycarbonate. Then the pitted surface is covered with a reflector, a surface that will reflect the light. Aluminium or gold is usually used. To protect the shiny surface, the latter is then covered with a clear acrylic and then covered with a label. [2]

To retrieve the data a low powered laser is used. The laser is aimed at the clear polycarbonate while a motor spins the disk. If the light falls on a pit which has a rough surface, the light is scattered and a low intensity is picked up by the receiver. If the light falls on a land which is much more reflective, light is reflected back at a higher intensity. Depending on the received light, this is converted to digital form. A 0 is represented by no change in elevation while a 1 is represented with a beginning or end of a pit. [2]

Unlike magnetic disks, Optical Disks use a single spiral track starting at the centre and spiralling outwards. The segments are all the same and so is the data density. Thus instead of reading at CAV (Constant Angular Velocity), the pits are read at CLV (Constant Linear Velocity). Thus this implies that when reading at the outer edge the motor has to spin slower, which means an increase in rotational delay. Unfortunately using CLV means that Random access becomes more difficult. To locate a specific address, the head is set at a general area, correcting the speed and reading the address and then making minor adjustments depending on the results. [2]

Data on the disk is stored as a sequence of blocks. Each block has:

Sync: this shows the start of a block. Consists of a byte of all 0s, 10 bytes of all 1s and a byte of all 0s. [2]

Header: contains the block address and the mode byte. There are 3 modes, mode 0 shows an empty field, mode 1 shows a field with error-correcting code and 2048 bytes of data, while mode 2 shows a field with no error-correcting code but 2336 bytes of data. [2]

Data: the data itself. [2]

Auxiliary: additional data for mod 2 and a 288-byte error-correcting code for mode 1. [2]

The advantage of using Optical Disks is the ease of mass production: stamping out thousands of copies is fairly cheap when the master has been produced. Moreover the disk is removable, and can be replaced easily with another disk in the same drive, as we do today with different movies. One drive is bought and then different disks are inserted. The disadvantage is that it only allows for one or couple of writes before it becomes read-only. As stated before, optical media also suffer from a relatively large seek time when compared to Hard Disks. [2]

More types of Optical Disks

After the first CD-ROM's were created, they proved to be quite helpful. Unfortunately producing master copies is pretty expensive. Thus it was not feasible to use CD-ROM's when making only a small amount of copies. Thus this introduced the CD-R or CD recordable. This was a special type of CD that allowed one write to be performed using a modest intensity laser and reading as much as possible. The only minor problem was that a new drive was needed to write the data on the disk since a laser with a higher intensity from the original ones was needed. The difference from previous CD-ROM's is that it does not use pits and lands but rather a dye layer which is used to change the reflectivity instead of the intensity and is activated by the high-intensity laser. Note that it still can be read in a normal CD-ROM drive. [2]

Another breakthrough came with the introduction of CD-RW. This allowed for multiple writes to be done although the writes are not unlimited. The approach used is called phase change. What it does is that they use a material that depending on where the laser shines the light, it changes the atoms of the material from a phase to another: the amorphous state, which reflects light poorly and the crystalline state which is more reflective. Unfortunately as stated above after successive writes and erases, the material loses these properties. [2]

DVD's were very important mostly for the cinema industry. After a long time of using VHS video tapes, they had another means of medium that could not get tangled inside and was more resistant to tear or damage. Moreover using DVD's meant videos could now be stored in a digital random access form. Due to its increase in storage this meant that videos could enable higher quality. This meant a better picture and family time for the whole family. The designers of the Digital Video Disk achieved this increase in storage by increasing the data density on the whole disk, by using a laser with a higher frequency, thus being able to make shorter loop spacing and smaller distance between pits. Another important feature of the DVD's is that some of them allow Dual Layer writing which increases the total capacity up to 8.5GB. This is done by adding another semi-reflective layer to the previous layer. Then by adjusting the laser's focus, the layers can be read separately. There also exist double-sided DVDs where data can be written on both sides. [2]


As time passes many new forms of storage have been invented and others improved. Flash memory has been improved to a certain extent that now SSD's (Solid State Drives) are becoming as popular as magnetic Hard Disks even though the latter are cheaper. Also still pen drives which also are a form of flash memory have become cheaper and more popular but Optical Discs will still remain the media of choice when it comes to software production and movies because of their relative permanence with respect to other secondary storage devices.


[1] http://www.ntfs.com/hard-disk-basics.htm

[2] William Stalling - Computer Organization and Architecture 6th Edition

[3] http://en.wikipedia.org/wiki/Disk_partitioning

[4] http://www.ntfs.com/mbr.htm

[5] http://www.ntfs.com/mbr-virus.htm

[6] http://www.ntfs.com/partition-table.htm

[7] http://www.ghacks.net/2010/09/17/reclaime-free-raid-data-recovery/

[8] http://en.wikipedia.org/wiki/Error_detection_and_correction

[9] http://en.wikipedia.org/wiki/Cryptographic_hash_function