Dataset Migration to AWS Cloud using Opensource Tools
✅ Paper Type: Free Essay | ✅ Subject: Computer Science |
✅ Wordcount: 8526 words | ✅ Published: 18th May 2020 |
Contents
1.3 Research Question and Solution
2.1 Open Source Tools for data Migration
2.1.4 AWS Command Line Interface
6.1 Scheduling/Time management
6.4 Hardware and Software Failure
Table of Figures
Figure 1 : Migration of data to the Cloud (Marinela MIRCEA, 2011)
Figure 4: Rivermeadow (Rivermeadow, n.d.)
Figure 6 : Current method of Migration
Figure 9 : WhatsApp (WhatsApp, n.d.)
Figure 10: Google Docs (Google Docs, n.d.)
Figure 11: Oracle VM VirtualBox (Oracle VM Virtualbox, n.d.)
Figure 12: AWS CLI (AWS CLI, n.d.)
Figure 15: Configuring AWS CLI
Figure 19 : Creating trust-policy.Json
Figure 20: Giving access to Data & VM Import/Export
Figure 31: Windows 7 running in cloud along with datasets
Figure 32: Copy of AMI in Singapore Region
List of Tables
Table 1: Open Source Tools comparison
Table 2 : Features of Opensource Migration Tools
1.1 Data Migration Overview
The companies move their data from one place to another place. The companies nowadays have several reasons to migrate their data according to their requirements. If company decides any change in their infrastructure, they migrate data to the cloud for their benefits such as security and availability. The companies often transfer data or may require their legacy data exported to more reliable resource. Based on the company’s requirements, the data export may have various scenarios. All these have one common practical approach is to transfer the data in limited time with reliability and integrity (Alley, 2018).
We consider three primary types of data migration tools.
- On-premise tools: The on-premise tools are designed to migrate data within small and large enterprise networks. With the help of on-premise tools, we can move a company’s data to two or more servers or databases. Cloud migration feature is not provided by on-premise data migration tools. Many companies are afraid to migrate their data to cloud due to security reasons, so they prefer on-premise tools (Alley, 2018).
- Open Source tools: It’s sometimes free community-supported developed tool for data migration. We can use, share and make modifications in open source tools, because it can be accessed publicly. Even though it’s free, we may need some coding skills to work with it (Alley, 2018).
- Cloud-Based tools: The cloud-based tools are designed to migrate data to various sources and streams, which includes on-premise and cloud-based data stores, services etc. This solution is optimal if companies are planning to migrate to cloud, as it is cost efficient and has increased security policies. The data migration tools are flexible, and it can handle all data at once (Alley, 2018).
1.2 Scope of Research
The scope of our research is to migrate dataset using open-source migration tools, almost all business enterprises uses different types of datasets for storing their business-related information. Migrating to cloud will help to easily upscale and downscale the computing resources according to their needs and thereby reduce cost.
1.3 Research Question and Solution
How can we securely migrate datasets from on-premise to cloud platform?
We are migrating datasets into AWS cloud with the help of opensource migration tools.
1.4 Aim and Objectives
Our aim is to securely migrate datasets from on-premise to AWS cloud by using opensource cloud migration tools.
Our objectives are:
- Increase computing resource scalability
- Reduce the complexity of IT infrastructure
- Ensure business continuity
- Speed up application or service deployment
- Reduce cost
1.5 MOV and Added Value(s)
The Measurable Organizational Value (MOV) of our project is limited to those business organizations whom are dealing with different datasets for their business requirements.
The following are our added values:
- Migrating to cloud platform will help them to reduce cost while taking advantage of scalability and flexibility.
- With cloud migration business organizations get the chance to expand their business easily and quickly without investing in servers, firewalls and technical staff.
- It will make disaster recovery and business continuity for business organizations.
2. Literature Review
The migration of business elements to the cloud can be accomplished by well organised models or strategies. According to the policies of the organization, the data migration model should involve specific objectives. The things that we need to keep in mind while performing data migration are data accuracy, migration speed, non-operational time and minimum costs. (Marinela MIRCEA, 2011)
Making use of cloud business intelligence solution requires changes in the process manipulation, like processing of data, development, storage means, archiving data, receipt of information, saving data, etc. While migrating important applications and infrastructure to cloud, business activities should be maintained by human resources for managing sensitive data and applications. Finally, after the implementation, every user must be trained to operate the new systems. (Marinela MIRCEA, 2011)
Figure 1 : Migration of data to the Cloud (Marinela MIRCEA, 2011)
2.1 Open Source Tools for data Migration
2.1.1 Bluethroat
The Bluethroat is an open source cloud migration tool build by a seasoned developer whose main aim was to make customers and business happier. The tool basically lifts and shift migration, it migrates thousands of fleet servers to the cloud in just a few clicks. The tools are at alpha stage and it doesn’t test for mass scale of migration. In Bluethroat cloud migration tools migration can be done from DC to DC, or DC to Cloud, Cloud to DC and Cloud to cloud. We can access bluethroat tools from a web portal (Vishnu, 2019).
The functionalities are limited
- Migration of data to AWS.
- The Discovery of the Current Environment
- Automatic creation of as well as deployment of server.
- It’s a Windows VM migration.
All Services and payloads are written in python, Ansible is used to prepare the servers ready for migration and MongoDB has been used to collect information of all the servers and status of the whole process. The Bluethroat tool supports DC to AWS migration (Vishnu, 2019) .
2.1.2 Pentaho
Pentaho is the migration tool which can securely migrate data with the minimal effort. Pentaho Data Integration (PDI), kettle create ETL (Extraction, transformation and loading) process effortlessly and secure. One of the purposes of PDI is migrating data between database or applications. Either someone is trying combine numerous solutions into the one solution or opting to move into the latest IT solution. Kettle make the transformation process must easier and safer. First it extracts the data from the system, then transform to the new one and at the last loading process takes place in which it loads the data to the required destination (Pentaho, 2018) .
Get Help With Your Essay
If you need assistance with writing your essay, our professional essay writing service is here to help!
Find out more about our Essay Writing Service
Not only this tool facilitates safety and minimal effort but also there are some other reasons which makes data migration using Pentaho easy in terms of data, source and destination. It provides graphical support for the data as well as it offers many sources such as big data stores and relational source for extracting and blending of data. It also facilitates with the automated arrangement for the transformation and visualization of data. Pentaho also offers the advance version of big data integration for the script writing which can help eliminate the need of writing the script by yourself (Pentaho, 2018).
2.1.3 Rivermeadow
Figure 4: Rivermeadow (Rivermeadow, n.d.)
Rivermeadow cloud migration software provides solution to resolve the problems that occurs during the migration of large and complex tasks between hypervisors and cloud platforms. They are using Software as a Service (SaaS) to provide a fast, secure and automated migration. By using API’s, Rivermeadow is moving the existing server into public, private or hybrid clouds without causing any disturbance to ongoing works or tasks. By making use of automation Rivermeadow enables the migration into multiple servers, so that large workloads can be effectively migrated at the same time, with less cost and saving migration time (Rivermeadow, 2018).
Many enterprises are making use of River Meadow’s Software as a Solution (SaaS) for complex migrations from different sources into a low-cost cloud platform. With the help of Secure Data Migration (SDM) feature, API’s can create a connection from source to destination, and thereby ensuring efficient, secure and high-quality migrations without affecting the business continuity (Rivermeadow, 2018).
Benefits of using Rivermeadow:
- Easy to use, testing and migration can be done easily with the help of GUI.
- Reduced manual configuration because of automation.
- No vendor lock in or PaaS.
- Users can easily migrate into cloud whenever they needed.
- Source server is not impacted, therefore there will not be any reboots or agents.
2.1.4 AWS Command Line Interface
AWS CLI is an tool that allows developers to manage or control Amazon public cloud services through the command scripts. In AWS CLI if the user knows the right command to use the entire task of searching a file just take a few seconds. With AWS CLI tool user can manage and control all AWS services from the command line. Moreover, the user can automate the entire process of managing the services by writing scripts in his preferable programming languages. AWS CLI makes the AWS services easier for the user.
AWS CLI will provide a user with the three options:
1. Linus Shell.
2. Window Command Line.
3. Remotely.
Benefits of AWS CLI:
- Easy Installation.
- Support all AWS services.
- Saves Lot of time
- Automation by scripting.
2.1.5 Comparison Table
Open Source migration Tool |
BlueThroat |
Pentaho |
Rivermeadow |
AWS CLI |
Functionalities |
-Migration of data to AWS. -Environment Discovery -Linux server migration |
-ETL (Extraction, transformation loading) process. -Migrating data between database or application |
-Migration large and complex into cloud environment – Provides fast, secure, automated cloud solution |
-Manage AWS services -Control multiple AWS services -Automate the services through scripts. |
Features |
-accessed and used from web portal |
-Access, manage and blend any type of data from any source. -provide graphical support for data pipeline. -offer Big Data Integration |
-use secure data migration (SDM) method. |
-Simple File Command to and from Amazon S3 -Dynamic in-line documentation. -Export executed commands to a text editor. |
Benefits |
-Automated network creation -server deployment -Agentless -suitable for infrastructure migration. |
-safety of data -Minimal effort. -offer pre-built component for extraction and blending of data -managing workflow and job execution. |
-Fast completion on boarding -Low cost -Low risk |
-Easy installation -Support all AWS services. -Saves Lot of Time -Automation by scripting. |
Resources |
-Python -Ansible -MongoDB |
-Kettle |
-SaaS |
Table 1: Open Source Tools comparison
2.1.6 Features Comparison
Features |
BlueThroat |
Pentaho |
AWS CLI |
Rivermeadow |
Supported Cloud |
AWS, Google cloud |
AWS, Microsoft Azure, Google cloud, IBM Cloud |
AWS |
Microsoft Azure, AWS, VMware |
Data Migration |
Yes |
Yes |
Yes |
Yes |
SDM (secure direct migration) |
No |
No |
No |
Yes |
OLAP (Online Analytical Migration) |
No |
Yes |
No |
No |
Server Migration |
Yes |
Yes |
Yes |
Yes |
Dashboard |
Yes |
Yes |
No |
Yes |
Services Available |
Migration of infrastructure to AWS, Linux server migration, Automatic network creation. |
Migrating data between application and database, Big Data Integration, Analytic Database, 24/7 Technical Support |
Control many AWS services |
Testing and migration process. |
Pricing |
Most Migration Services are free |
Most Migration Services are free |
Migration Services are free |
Most Migration Services are free |
Table 2 : Features of Opensource Migration Tools
3. Research Methodology
We have researched and discussed about four different opensource cloud migration tools, compared those tools and found the one which is suitable for us. We have selected the AWS Command Line Interface (CLI) tool for the migration purpose. Even though Bluethroat, Pentaho and Rivermeadow are opensource migration tools, they are charging for most of their services. That’s our main reason for choosing AWS Command Line Interface tool. AWS CLI is free of cost, after installing AWS CLI in our machine, with the help of command prompt we can run scripts to manage the AWS services which will allow us to configure multiple AWS services (AWS CLI, n.d.).
4.Design and Analysis
Figure 6 : Current Method of Migration
- Identify the tool for data migration.
- Download the tool for migration.
- Install and configure the tool in windows.
- Download the dataset from Kaggle.com.
- Configure a Windows virtual machine.
- Insert the datasets into the windows virtual machine.
- Migrate the Windows virtual machine to AWS Cloud.
- Users can access this machine from any region.
We have installed and configured the AWS CLI in our Windows machine. To confirm the installation, in the command prompt the aws -version command is executed. We had to run certain scripts in the command prompt to connect our PC to AWS. Then a Windows 7 operating system is configured in the Oracle Virtual Machine. From the Kaggle.com, we had downloaded two datasets of 2.26 MB (Brazilian Cities) and 547 MB (Zomato Bangalore Restaurant) respectively. The next step is to put the two datasets inside the Windows 7 virtual machine. Finally, we have successfully migrated the Windows 7 virtual machine to AWS cloud, using the AWS CLI tool. Any user can sign in to AWS from Oregon or Singapore regions and access this Windows 7 machine, along with the datasets.
5. Resources Used
In this section, we will explain which resources we used and need for the project.
5.1 AWS
It is a global cloud platform or cloud storage platform which allows the user to host and manage services on the internet. AWS is a hosting provider service which gives lots of services. Commonly used services in AWS are EC2(Elastic computing cloud), VPC (Virtual Private Cloud), S3(Simple Storage Service), Rational Database Service, Route53, ELB (Elastic load Balance), Auto-scaling.
5.2 Office 365
For the presentation, documentation, graphs and Gantt chart we are using software like Excel, Word, PowerPoint. All this Microsoft software are very crucial for our project as we are very relying on it.
5.3 WhatsApp
WhatsApp is the way we are communicating with each other and used for sharing files through it.
Figure 9 : WhatsApp (WhatsApp, n.d.)
5.4 Google Docs
It is a word processor services provided by Google. Through Google docs, we are editing our files and seeing the changes instantaneously.
Figure 10: Google Docs (Google Docs, n.d.)
5.5 Oracle VM VirtualBox
Oracle VM VirtualBox is a free application for creating and managing virtual machines. It offers user to create more than one virtual machine on a single physical machine. Also, each virtual machine can have its own operating system. Virtual box mostly runs on Window, Linux and support many guest operating system. In this project, we are using Oracle VM VirtualBox 6.0.
Figure 11: Oracle VM VirtualBox (Oracle VM Virtualbox, n.d.)
5.6 AWS CLI
The AWS command line interface is a tool for AWS services. With the AWS CLI tool developer can manage and control multiple AWS services by typing commands on the command line and self-operate them using scripts.
Figure 12: AWS CLI (AWS CLI, n.d.)
6. Project Risks
Risk |
Risk description |
Type |
Impact |
Likelihood |
Mitigation Strategy |
Scheduling/Time Management |
Time management problem leading to project not being finished in time. |
External |
High |
Low |
Arrange regular meetings, Created the Gantt chart. |
Communication Risk |
Improper communication can create confusion which will affect scope of project indirectly. |
External |
High |
Low |
Direct face to face meeting, phones and emails |
Scope Risk |
If scope is not clear it will lead to failure of project. |
External |
Medium |
Medium |
In group meeting we discussed about the project scope deeply and planned project according to that. |
Hardware and software failure |
All the works are done in computers and using software. Fault in any of these may result in project failure. |
Internal |
High |
High |
By having the backup at every time after completion of that stage |
Extended Downtime Risk |
This risk occurs during the data migration process when the migration takes more time than expected |
Internal |
Low |
Medium |
We have used a more faster internet connection. |
6.1 Scheduling/Time management
Our plan is to complete the project within eight weeks but the schedule risk activities taking more time than expected. This will be a challenge for us to complete the project in 8 weeks as some of the tasks are more time-consuming. So, to manage all the task in the project we have regular meetings to discuss the project and how to achieve all the task successfully within eight weeks. To achieve this target, we also created Gantt chart, in which we mentioned about the plan details and who will do which task and we will do the work based on that chart so that project will be complete within the deadline given to us.
6.2 Communication Risk
Lack of communication causes a lack of clarity and confusion which can lead to unnecessary delays. Not only it causes confusion but also affects the scope indirectly. Initially, we face this issue as we are also doing our industrial internship along with this project. So, we have different time schedules and availability and to eliminate this risk our team leader organised face to face meetings. We have contacted each other through emails and phone.
6.3 Scope Risk
The scope is a vital part of the project. So, it’s very important to have knowledge and understanding of the scope of the project. If there is no clarity regarding the scope it will lead to misunderstanding between the group members which can result in failure of the project. In the group meeting, we discuss the scope profoundly and work according to that.
6.4 Hardware and Software Failure
This is the external risk. Overall plan can be affected if there are any hardware and software failure. As in this project, all the work involves computers and if any fault occurs which can lead to project failure. To overcome this failure, we always take the backup of work. With the backup, we can eliminate this risk to some extent.
6.5 Extended Downtime
This risk occurs during the process of the data migration process. When the data migration process takes more time than expected it may affect the project indirectly.
7.Planning
7.1 Gantt Chart
8.Implementation & Testing
Prerequisite: Download, install and configure AWS Command Line Interface tool.
- Export VM from its current environment as an OVA file (or VMDK, VHD, or RAW)
- Create a S3 bucket and upload VM image along with Dataset using S3 upload/drag and drop or using AWS CLI.
- Import VM and Dataset using the ec2 import-image command
- Use the ec2 describe-import-image tasks command to monitor the import progress.
- Once Import is completed Launch Ec2 Instance from the AMI created, or Copy the AMI to another multiple region.
VM Export and Import
- VM Import/Export enables you to import virtual machine images from existing virtualization environment to Amazon EC2, and to export them back.
- Enables to migrate application and workloads to Amazon EC2.
- Copy VM Image catalogue to Amazon EC2 or create a repository of VM images for backup and disaster recovery.
AWS Dataset and VM import export Commands on CLI tool:
Step 1: Download and configure CLI tool with AWS Access keys
Figure 15: Configuring AWS CLI
Step 2: Exported VM OVA file and Create Scripting file for migration:
Step 3: On premise VM with Dataset
Step 4: To create the service role
1.Create a file name trust-policy.Json with the following policy (AWS, 2019):
Figure 19 : Creating trust-policy.Json
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Principal”: {“Service”: “vmie.amazonaws.com”},
“Action”: “sts:AssumeRole”,
“Condition”: {
“StringEquals”:{
“sts:Externalid”: “vmimport”
}
}
}
]
}
2. vmimport role gives an access to Data and VM Import/Export
AWS IAM create-role –role-name vmimport –assume-role-policy-document file://trust-policy.json
Figure 20: Giving access to Data & VM Import/Export
3.Create file role-policy.json where disk images are stored in the bucket (AWS, 2019):
{
“Version”:”2012-10-17″,
“Statement”:[
{
“Effect”:”Allow”,
“Action”:[
“s3:GetBucketLocation”,
“s3:GetObject”,
“s3:ListBucket”
],
“Resource”:[
“arn:aws:s3:::disk-image-file-bucket”,
“arn:aws:s3:::disk-image-file-bucket/*”
]
},
{
“Effect”:”Allow”,
“Action”:[
“ec2:ModifySnapshotAttribute”,
“ec2:CopySnapshot”,
“ec2:RegisterImage”,
“ec2:Describe*”
],
“Resource”:”*”
}
]
}
Put-role-policy command: aws iam put-role-policy –role-name vmimport –policy-name vmimport –policy-document file://role-policy.json
4. The Following is AWS CLI command to Import-image of data to create import task (AWS, 2019).
Import an OVA and Dataset
Example:
aws ec2 import-image –description “Windows 2008 OVA” –license-type <value> –disk-containers file://containers.json
Creating the containers. json file script
[
{
“Description”: “Windows 2008 OVA”,
“Format”: “ova”,
“UserBucket”: {
“S3Bucket”: “my-import-bucket”,
“S3Key”: “vms/my-windows-2008-vm.ova”
}
}]
[
{
“Description”: “First disk”,
“Format”: “vmdk”,
“UserBucket”: {
“S3Bucket”: “my-import-bucket”,
“S3Key”: “disks/my-windows-2008-vm-disk1.vmdk”
}
},
{
“Description”: “Second disk”,
“Format”: “vmdk”,
“UserBucket”: {
“S3Bucket”: “my-import-bucket”,
“S3Key”: “disks/my-windows-2008-vm-disk2.vmdk”
}
}
]
Check the status of import task
aws ec2 describe-import-image-tasks –import-task-ids import-ami-abcd1234
Choosing region:
aws ec2 describe-conversion-tasks –region <region>
Step 5: Migration Status
- Converting
- Updating
- Booting
- Preparing AMI
- Completed
Step 6: AMI created and Snapshots
Step 7: Ec2 Instance Oregon and Copy of Ami in Singapore and Sydney
Figure 31: Windows 7 running in cloud along with datasets
Figure 32: Copy of AMI in Singapore Region
9. Future Work
The above design shows the future plan of migrating dataset in AWS using AWS storage gateway tool. It is the tool that create a solid-state storage system which connects an on-premises software environment with cloud-based storage. AWS storage gateway makes it easy to securely backup enterprise data to the cloud.User can backup snapshots of his on-premises application data to Amazon S3 and if someone wants replications of his data then he can easily mirror data from on-premises location to application running on Amazon ec2. Storing data in the cloud means no more expensive hardware to buy and no more complex hard work to configure and manage. Without making any changes in your applications it backs up your application data to the world more popular cloud storage services amazon s3.It provide low latency for the frequently used data. It provides services like identity management, storage services and AWS encryption which make the existing enterprise more manageable, durable, scalable and secure. It uses in cases such as disaster recovery, backup and archiving, transferring data to S3. Although migration through the AWS data storage tool is easy , fast and secure but we didn’t opt it for the dataset migration because this tool is not cost effective for us. Even for written data to AWS storage by user’s gateway charged around $0.01 per GB and volume storage as well as snapshot storage in EBS cost about $0.023 per GB.
Working of AWS Storage gateway:
1. Application reads data from Gateway using the iSCSI block protocol
2. Gateway returns requested data from local storage.
3. Data not in local storage is requested from Backend.
4. Backend Fetches compressed data from Amazon EBS.
5. EBS returns the requested data to the Gateway.
6. Gateway returns the requested data to the VM.
10.Conclusion
We have successfully migrated the datasets along with the operating system to AWS Cloud using AWS Command Line Interface Tool. As a future work we are planning to improve the migration features using the AWS storage gateway. AWS storage gateway is a hybrid storage service which can be used for archiving and backup, disaster recovery, cloud data processing, migration and storage tiering.
11.References
- Alley, G. (2018, October). Data Migtration Tools. Retrieved from alooma: https://www.alooma.com/blog/data-migration-tools
- AWS. (n.d.). Retrieved from AWS-logo: https://aws.amazon.com/partners/logo-guidelines/
- AWS CLI. (n.d.). Retrieved from AWS: https://aws.amazon.com/cli/
- AWS CLI. (n.d.). Retrieved from https://manjaro.site/install-aws-cli-ubuntu-16-04/
- Google Docs. (n.d.). Retrieved from https://www.google.com/search?q=google+docs+logo&tbm=isch&source=iu&ictx=1&fir=h0tVimnA0CxqLM%253A%252C_C5qxEOFyasdxM%252C_&vet=1&usg=AI4_-kSZJoloY4DRhwJ6v9NuMUkBCdY7vA&sa=X&ved=2ahUKEwi_qcjbyobjAhUFdCsKHTETDVQQ9QEwAXoECAQQBg#imgrc=h0tVimnA0CxqLM:
- Marinela MIRCEA, P. B.–M. (2011). COMBINING BUSINESS INTELLIGENCE WITH CLOUD COMPUTING TO DELIVERY AGILITY IN ACTUAL ECONOMY.
- Oracle VM Virtualbox. (n.d.). Retrieved from https://blogs.oracle.com/virtualization/oracle-vm-virtualbox-60-now-available
- Pentaho. (2018, February). Retrieved from Alooma: https://www.alooma.com/answers/what-is-pentaho-data-integration
- Rivermeadow. (n.d.). Retrieved from https://www.rivermeadow.com/migrating-workloads-to-aws
- Rivermeadow. (2018). Retrieved from Rivermeadow: https://www.rivermeadow.com/about
- Vishnu, K. (2019, March 16). Codementor. Retrieved from Codementor: https://www.codementor.io/vishnu_ks/how-and-why-i-built-bluethroat-an-open-source-cloud-migration-tool-t2z8vpzl4
- WhatsApp. (n.d.). Retrieved from https://whatsappbrand.com/
Cite This Work
To export a reference to this article please select a referencing stye below:
Related Services
View allDMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: