A Background To Server Mirroring Information Technology Essay
1.2 Server Mirroring Background.
Server mirroring is a utilized backup server that works to duplicate all the transactions and the processes of the source server to the target server. The main goal is to provide an exact duplicate of files or folder between the servers in real-time for purpose as high availability, disaster recovery, or load balancing . At any time, if the source server is down, the target server will act immediately to take its place without any down-time. 
Server mirroring involve two server which are attached to the same network with one of the server performs work while another server receives duplicate data from the server. In case if one of the server goes down, the other server is ready to take over the job immediately. The benefit of these is both of the servers can be placed in separate room or buildings for disaster protection. They can be located far apart if a powerful network connection is provided.
However, there is some drawback to server mirroring. The cost is high due to the need of second server to be kept in waiting mode. So the server maintenance and upgrade cost is high. Another drawback is performance degradation. Apart from handling all of the file transfer work for network users, the source server may have to process additional I/O as it passed information along the mirror server. This can cause heavy resource usage.
Server mirroring technology can be useful to solve many issues such as Automated Data Back-ups which works by checking and identify file changes independently and replicating these changes using Rsync, which helps to maintain a copy of real-time data rather than the need for copying the entire file that has changed. Server mirroring technology is also useful in maintaining the server so that no downtime will happen at any time with real-time disk-to-disk synchronization which is called as "Disaster- Recovery". In addition to that, server mirroring also helps to increase the overall redundancy of each server. Beside that it also enables efficient use of server resources.
Figure 1.1: Overview of server mirroring.
Real-time transmission of file changes is called as data replication. Replication process is able to identify any file changes independently from the file's related application and operates at the file system level. Replicating changes is a better method of maintaining a copy of data rather than have to copy the entire file that has changed due to network resourcing and time. The process can be completed in a short amount of time, and it does not use any server resources even if the files that being changed is very large in size. In order to keep the data up-to-date, data replication is used and it provide disaster recovery and high availability to the server.
Figure 1.2 : Data replication process.
Failover/Failure monitoring is a process which the target server takes the role of source server in case the failed source server. As a result, user and application result that make a request to failed source server is now redirected to the target server. Failover can be used without the need of data replication to ensure high availability which is only providing processing services.
Figure 1.3: Failover/Failure monitoring process.
Restoration is a process which used to copy replicated data from the target server to the source server. Selection of target, source, and the suitable replication set is required for the restoration process. When data is lost due to disk failure or when the most up to date data which are exists on target server due to the event of failover, restoration process will be used to solve the problems. At the time the source server fails, target server will have the same data as the source server. So, when the source server is back online, both servers will now contain different data. This is when restoration comes in by copying the incremental data from the target server to the source server.
Figure 1.4: Restoration process.
Database mirroring is using SQL server technology that transfers transaction log records directly from one server to another server. The client application can be coded to automatically redirect their connection information, and if failover happen, it can directly connect to the standby server and database. Database mirroring does not require proprietary hardware and is easy to manage and set up.
1.3 Problem Statement
The problem most organization face is to keep their server that they host for users to download for example software, updates and others from their server. For example, Fedora itself cannot cope with the high supports from the user which download from their server. Fedora itself is unable to provide a fast and reliable download for its users.
So as a solution, a mirroring server across the world is needed to fulfill the need of the Fedora user community. The mirror server is used so that users can download their files from the nearest available mirror server so that the download is far more reliable and stable.
One of the problem that need to overcome is when the main server which is the Fedora itself updates their files, the mirror server have to update their files according to the Fedora server. For example, when Fedora is releasing their latest version let say Fedora14, at the same time it release the new version, all the mirror server have to be alerted about this. Then the mirror server have to download the Fedora14 from the Fedora server and put it up for download for users around its region. In other words, the Fedora Server files have to be exactly the same on mirror server and any updates should be updated on the mirror server as well.
1.4 Project Objectives
After a brief introduction of the Server Mirroring and the usage, our objective in this project is as follow:
To setup a server with the latest Fedora, Open Source branch of Redhat Enterprise Server which is Fedora13.
To setup installation sources for Fedora on the server.
To setup update sources for Fedora on the server.
To secure Fedora server using the default security tools which is iptables and
To open up remote access to update or install Fedora as a mirror.
To setup server that function as an update server.
To setup private mirror to serve Fedora users in UTAR for a better, fast, and reliable download to get the latest Fedora updates and version.
To setup public mirror to serve the whole community of Fedora user in Malaysia for a better, fast, and reliable download to get the latest Fedora updates and version.
1.5 Project Requirement
The requirement for this project is using a hardware or PC with at least Pentium 3 processor. The harddisk space required is at least 300GB HDD . As for the random memory access (RAM) is required at least 2Gb of RAM.
As for the operating system, in this project will be using the latest release of Fedora distribution which at the time of writing (July 10, 2010) the latest version is 13.
The Fedora13 will be installed on the PC and the mirroring server will be set-up accordingly to the objective mentioned earlier which act as the mirror server with security protection over the server.
Chapter 2: Literature Review
In this chapter, a few server mirroring tools will be studied and analyzed for the best and appropriate tools for this project. There are few software applications for Unix system which synchronizes files and directories from one location to another. The software's that will be compared are Rsync, Unison and Zsync. All the software will be compared and analyzed to choose which server mirroring software will be used for the project development. In the end of this chapter, among the three compared tools, one of the tools will be chosen to be use in the project.
Rsync is a tool that used to copy files and directory between a local host and a remote host. It uses algorithm to quickly bring the remote and host files into sync or in other words, to be identical on both sites. Rsync is fast in process of mirroring from one location to another because it just send/receive the different in files over the network instead of sending the entire files which is tremendously waste of bandwidth. In other words, both the remote and local host will have the identical files. The advantages of using Rsync is that it include function such as SSH as a secure channel, receive only files that changed since the last replication, and removed files if those files were deleted from the source host to keep both host in sync.
For example, Fedora release a new version of Fedora 14 which already available on their official sites. The Rsync will work to download the version of Fedora 14 and not others files available on the sites assume that the files are already exist on the local system. In other words, any changes from the remote Rsync server will also be changed in the local system.
Since that Rsync will only sends the files that are different between the source and destination. This will helps to save bandwidths and also save times. Besides that, Rsync is widely used around the world and it is still expanding. The advantage of this is that it provides more supports from Rsync users to the community. To use Rsync, both remote system and local system are needed to be installed with Rsync for them to works.
One of the Rsync disadvantages is that Rsync doesn't copy from remote system to remote system . Rsync will only works to copy from remote system to local system. The only way to copy from a remote system to another remote system is to copy one of the remote systems to a local system and then only copy it from the local system to the remote system. This is the problem with using Rsync and one of the Rsync limitation and disadvantages. In this case, it causes slow updates due to the fact that the files have to be copied to another local system and then only back to another remote system. This can slow down the whole process as well as wasting bandwidths and times.
The advantages of using Rsync compared to others available mirroring tools are:
The source and destination partitions do not have to be identical in term of size.
It includes features such as it allows transparent remote shell which include ssh or rsh.
Rsync traffic which being transported using ssh has the advantages of encrypting the data, and also takes advantages of any trust relationship that have been established using the ssh client keys.
Supports for anonymous or authenticated Rsync servers which are ideal for mirroring.
It does not require root privileges.
Minimize latency costs using pipelining of file transfers.
Rsync is useful in numbers of way. Few examples are:
Useful for local file copying to remote machine using remote shell program such as ssh.
Useful for listing files on a remote machine.
Useful for copying files from remote server to local machine.
Useful to sync files between remote server and local machine.
Useful for low bandwidth local system to mirror with remote system.
Rsync generally is used to copy files from host A to host B with the condition that files that already exist in both hosts doesn't need to be copied over the network again. Assume that files in host A and host B are quite similar. To speed up the process of mirroring, advantages can be taken because of the similarity. The method is to send the differences between A and B down the link. One common problem of using Rsync is that creating a difference between two files relies on being able to read both files on the same machine. If not, these algorithms cannot be used since it cannot differentiate the differences in files.
The Rsync algorithm efficiently computes the source files that match the existing destination files. Source files that already exist in destination files need not to be send over the link. Only files that are not match between the source and destination will need to be send over to create identical files on both of the source and destination. The Rsync algorithm work fast and efficient to ensure that the files that need to be copied over the link and file that already exist and doesn't have to be transferred can be identified in a short time.
2.1.1 Rsync algorithm
The Rsync algorithm can be explained in such event that there are two computers named as A and B. Both computers have files named as "Files A" from computer A and "Files B" for computer B. When both of the File A and File B are similar, the rsync algorithm will do the following: 
The first step that the algorithm does is that Computer B split the files B into a series of fixed-sized blocks of size S byte which are non-overlapping.
Computer B will then calculate two checksums for each of the block which are the strong 128-bit MD4 checksum and the weak "rolling" 32-bit checksum.
Once the calculation of the checksum are done. Computer B will sends these computed checksums to computer A.
Computer A, using the both weak and strong checksum from computer B will now search its files to find all blocks of length S bytes that have the exact checksum as block of Files B.
Then, Computer A will sends computer B of its sequence of instructions that is used for constructing a copy of File A. It is either a reference to a block of Files B or literal data. Literal data will only be sent for those section in files A that did not match any of blocks of files B.
The objective of the algorithm after completion is computer B will then gets a copy of files in computer A which are not exist in its own files. When those data were send over the link, with the algorithm that only require one round trip can helps to minimize the problem of the link latency.
2.1.2 Rsync Rolling Checksum
Rolling checksum is used to computes every chunk size of S which also include overlapping chunks. For example, if the rolling checksum of bytes n through n + S - 1 is R, then the rolling checksum of bytes n + 1 through n + S can be computed from the R, byte n and the n + S. Part that did not match with the recipient block will be send from sender to receiver. This means that it creates identical file to the sender's copy. The utility only need to transfer little data if the recipient's and sender's file have many sections in common.
2.1.3 Checksum searching
Assume there is Computer A and Computer B with the Files A and Files B. When the list of checksums of the blocks of Files B is send over to Computer A. Computer A will then have to search for its blocks to find out if there is any offset that match the checksum of blocks of Files B. Then computations have to be done by computing the 32-bit rolling checksum for block of length S starting at each byte of A. In each checksum, it has to search the list to find out if any match between two. To do the searching, it involve 3 searching scheme.
First, the list of checksum from the blocks of Files B will be sorted according to 16-bit hash of the 32-bit rolling checksum. In the hash table, each entry will then points to the first element and it will contain null value if no element has the hash value. Then, 16-bit hash and 32-bit rolling checksum will be calculated. If at times the null value is not found in the hash value, then it will go to the next step. After the computation, it will then scan the sorted checksum list. The list will be scanned starting with the entry pointed to the hash entry. The objective of scanning is to find out entry with 32-bit rolling checksum that matches the current value. Once scanning is done, it will then go to the final step which involves calculating the strong checksum and then the checksum will be used to compare with the current list entry checksum. If both of the two checksums match, it can be said that the identical files from Files A and Files B had been found.
Unison is a program that works to do the file synchronization. It is an open source program which is free. The usage of unison is to synchronize files between two directories, such as between a computer to another computer or from a computer to another storage device such as removable disc. Unison can run on Unix-like operating system such as Linux and as well it can be used on Windows platform.
Besides it can run on Linux and Windows, Unison can also works across platforms which mean that it does allow synchronizing between two different platforms.  For example, it allows synchronizing a Windows PC with a Linux server. Why unison is different from other tools are that Unison can deal with updates to both replicas of a distributed directory structure. This means that it will detects conflicts where files were modified on both sources and then display these to the user.
Besides that, Unison, works unlike any other distributed file system because it is a user-level program. This means that Unison will work without the need to modify the current kernel. Furthermore, there is no need to have "super user" privileges on either host for Unison to works. Moreover, any pair of computers that are connected to the internet can use Unison as Unison can communicate over using the direct socket link. Besides that, it can also communicate using tunneling over an encrypted ssh connection.
Another advantage of Unison is that it works to save network bandwidths, which also mean that slow bandwidth connection is also possible to use Unison. Furthermore, to save the network bandwidth, Unison uses the same compression protocol that is similar to Rsync to do the transferring work.
Besides that, Unison has the advantages that it is resilient to failure. Unison work carefully to sense the states at all times in event when leaving the replica and its own private structures. This will also work in case of communication failures and abnormal termination.
Unison is unique in such a way that it allows user to have all the files with them all the times. For example, a user which in the office and want to work on a proposal with the office Linux machine, the user will just have to start working on the Linux machine. When the user decided to stop the work and continue later at home, what the user can do is simply synchronize the file that the user was working on to the Unison server.
When the user is at home, he/she just have to run the Unison to synchronize his own computer in his room. Once synchronizing is done, he/she will get the exact file from what he/she synchronize at the office. Thus, he/she can continue the proposal without having the hassle to transfer the files from office to his/her home using removable media. After the proposal is done, it will have both identical copy on both the office and home computer. In case of any failure to either one of the computer, the user will still have an identical copy.
The advantages of using Unison are not just bound with what is discussed above. In fact, Unison also give additional advantages such as Live backups via file replication, fast and non-traumatic recovery from hardware failures, and seamless control and verification of backups. As for Live backups via file replication, for example, the user personal data such as documents are very important. Normal users usually do a data backup only once in a while. Using Unison will helps to backup all the files and keep them alive, which mean the files are accessible at any time.
Unison also gives advantage of control and verification of backups. For example, synchronizing is done by the authorized person, so that the backup can be performed correctly. One problem that often happens when doing backup is the "Backup Trauma". It happens when users think that the files are being backups properly but in fact they are not. In an organization, employee might though that any files under their system may have been backups by the organization but in fact the backup are not working properly as expected. In the end, important files might lost and causes damage to the organization. To overcome this, Unison can help employees to backups their own data without have to fully rely on the organization.
Furthermore, Unison provides fast and non-traumatic recovery from hardware failures. In event of hard drive crash on a computer will eventually lost all their data. But with the help of Unison, the backup basically is done as often that the users use the computer. The worst cases will only causes data lose of the work that have been done few hours ago.
However, Unison is no longer under development. But, currently still there are developers that are still using Unison, the developers still release bug fixes, patches and small improvement for Unison. Support for Unison is now being provided by third party such as the operating systems developers.
Zsync is a file transfer program which allows downloading file from remote server. The file will be downloaded according to the older file that already exists in the computer. It is good such as that it will only download the new part or the update of a file and avoid any unnecessary download. Zsync and Rsync have some similarity such as that Zsync use the Rsync algorithm, but it is implemented on client side.
Zsync is designed for file distribution purpose, where for example, one file is distributed to thousands of downloader's. Zsync is also a free open source tool. It requires no special server software which means that it will just need a web server to host the files that give the advantage of no extra load on the server and is ideal for large scale distribution.
The advantages of using Zsync are that the technology is used for large scale file distribution. There are 3 aspects that explain why Zsync provide the new techniques for file distribution which are client-side Rsync, Rsync over HTTP and Compressed files handling. Besides that, Zsync is able to remember synchronization and track changes in every folder individually.
Zsync is used to overcome the problem that Rsync has that is that Rsync work hard on the server which is not good and causes server overload. In Rsync algorithm, it required one side to calculate the checksum and then the checksum is send to another end. The receivers of the checksum then need to do the rolling checksum to identify the blocks in common. With the identification of blocks in common, it know which blocks are not in common and have to be transmitted. So what does Zsync do is to minimize the load on the server.
When the server calculates checksum of block, and since it is not specified to any given client, thus the data can be cached. The cached data will then be saved into metafile. The Zsync client will then pull this data and then apply the rolling checksum of it owns and the comparing the downloaded checksum list. After comparing, it now knows the files that it already has. Next, the Zsync can then request the remaining data from the server using the simple HTTP Range request to download the data that it needs.
Apart from the checksums, the blocks size must also be determined so that the client can calculate the checksum on the block which are same in size. To know the total number of blocks, the file length must also be transmitted. File permission can also be included in the control file. 
The disadvantages of using Zsync is that although moving the work load from server to client can relieves the server, but the client now is on heavy load because of the need to compute the rolling checksum over the old file. Despite on this, when Zsync is working on large files, the calculation will take more times than Rsync.
2.4 Tools Comparison.
Comparison between Rsync and Zsync:
The obvious different between Rsync and Zsync are that Rsync require more works on the server, include file permission, tunneling using ssh, and others aspects. Zsync is different as it does not need to have an active server support. Zsync works with HTTP headers whereas in Rsync it wraps the data using its very own protocol. Besides that, Rsync has the overhead of protocol such as ssh and rsh that it uses to talk to. However, this overhead is rather small if compared to Zsync overhead for HTTP.
Moreover, Zsync is different because of the way metadata is downloaded. In Rsync, a weak checksum is transferred to the server. Then, the server will only respond for request to get the strong checksum. In other hand, Zsync is unable to do that, where Zsync client can download the separated weak checksum and then only retrieve the strong checksum from the server.
Comparison between Rsync and Unison:
Unison is useful in term of synchronizes rather than mirroring which what Rsync does. Besides that Rsync is widely used and the support is a lot more better than what does Unison can provide. Although both of the tools are open source, but Unison is already stop its development. What does Unison get now is only patches, bug fixes and other small improvement buy third party.
Besides that, Unison is used in term of synchronizes whereas Rsync is better as it does both synchronize and mirroring the server. Rsync also use a faster algorithm compared to Unison.
Implemented on server side
Implemented on client side
Synchronising data from one computer to another.
Designed for file distribution, one file on server distributed to thousand of downloaders.
Extra load on server since the implementation is on server side.
No extra load on server since the implementation is on client side.
Require special server software
No need special server software, only web server to host the file will do.
Table 2-1: Comparison between Rsync and Zsync.
2.4.2 Comparison conclusion.
After comparing the tools namely Rsync, Zsync and Unison. The advantages and disadvantages of each of the tools can be clearly seen. In our project, which are to develop a mirror server for Fedora source download, the most appropriate tool that can achieve the goal of the project is the Rsync tool. Rsync is not only reliable, it is also secure and fast in term of time.
Moreover, Rsync is widely use by name organization. With lots of update and improvement on the Rsync itself, it may help to improve the mirroring server in future with more functions. Besides that, Fedora remote server is also using Rsync currently. As mentioned earlier, both remote and local host need to be installed with Rsync in order for Rsync to work. Since that Fedora is also running the Rsync tool, it is very sure that using Rsync is the correct solution in order to set-up a mirror server for Fedora download in Malaysia to serve the Fedora use community.
Chapter 3: Methodology and tools.
Research had been done for the past few months and eventually we came up with the decision to choose Rsync as the development tools for this project. Moreover, Rsync is mentioned in Fedora official website itself that it is the most appropriate tool to be used to set up a mirror server. This project will be considered as a success if the mirror server is successfully set-up in the next semester.
The challenge that we faced when doing the project is the time constrain. Time has to be used carefully in order to avoid last minute work. Besides that, another challenge was to find the sources for the project. Moreover, the decision to choose tools is also a hard task.
Besides that, writing a proper report is not an easy task. With all the gathered information, the hardest part is to put all together as in proper sequence. References of every cited source should also be written in proper format which is the challenge of the reference part because some of the reference is hard to be identify properly such as the missing author name and others.
In this project, we use the quantitative method as the measurement method. Quantitative methods refer to a research method that relies and focused on data collection and analysis of statistics. In the project, we will take down the data parameter from the server and analyze it whether the project is a success or failure. Data will be collected when the mirroring server is put up for testing purpose. If the data collected shows is appropriate, that's mean that the server mirroring is working as expected. In conclusion, we can say that the project is a success since it fulfills the objectives as we mentioned earlier.
In conclusion, the goal of this project is to setup a working mirror server that serves the Malaysia's Fedora user community. The aim is to have a reliable, fast and efficient download to be made available for the community. This project will be done using the helps of Rsync to develop the mirroring server and using quantitative method to analyze to ensure that the mirroring server is working properly.
3.1 Requirement Specifications
In this project, the software and hardware requirement are as follow.
A PC with at least Pentium 3 processor with at least 2Gb of RAM, and 300GB of harddisk.
This is the minimum requirement that the PC should have to be able to run the mirroring server.
Fedora 13 Operating System installed on the PC.
Fedora13 operating system will be needed in order to set-up the mirroring server. This is the operating system that will be used to run the server mirroring as well.
Rsync is the tool that will be used to set-up the mirroring server in this project.
Others network enabled PC.
It is needed for us to test the mirror server to identify if there is any problem.
3.2 Timeline planning for Project 2.
As for Project 2, we had to come out a timeline or plan to carefully use up the time we have to complete the project. Failure to fully use the given time will definitely lead to a failure of the project. Basically, the work load will be broken into few sections which are planning, designing, development, testing, and finally testing in real world. With strictly following the designed timetable, the change to not delay work will mostly can be accomplished. So, we must do a proper timetable for Project 2 in order to avoid any obstacles especially problem such as shorts of time.
First, the most important thing to start with the project is to have a proper planning of what to do for the semester. Then we will have to do the design plan of the mirror server such as how are we going to set up the server. Next, once the design is done, we have to start develop the system according to the design.
Once the system is developed, it has to be put in testing to ensure that it is working properly. We would start testing it in a small local area network at first. This is important as we can identify if any error happen since it is easy to detect under a small network rather than in a network of a big organization. If there's error or problem detected during the test, we will need to find out what is the problem caused the error. Once its found, we have to fix it before we go for the final testing. Once the testing is successful, then we can go to the next step that is the complete system will be put in real-world for testing before we finally finalize that the project is successful. We can then say that the project is a success if the quantitative data that will be collected once the system is done proves that the server mirroring is working.
Here is the timeline that is well planned for the Project 2. This timetable is used to guide the timeline of the Project 2 so that the time is used as efficient as possible and to avoid the possibility of insufficient time in the end of the project.
Task / Week
Full Final Report
Table 3.1 Gantt Chart of Project 2 timeline.
If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please click on the link below to request removal: