Technical Proposal Of Hosted Document Management System Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Newgen will provide hosted document management system as a SaaS offering to customers by hosting OmniDocs DMS on Amazon Web Services platform. The hosted solution will be a scalable multitenant DMS system that can serve the needs for the enterprise content management operations of multiple customers. The other key features of the hosted document management service would be cost effectiveness, fast implementation and global presence in US, Europe and APAC.

Newgen's Value Proposition

Mission critical, complex, robust, scalable & Enterprise scale of deployment

Amazon Cloud Hosting to provide highly secure, scalable and reliable cloud hosting

Multi tenant support through OmniDocs logical cabinets which will ensure separation of customer data in different database Images and different Image File Store locations on file system

Production Scanning Capabilities for scanning huge Image volumes, Image enhancement features & Image Indexing & annotation support.

WebScan support for low volume scanning with minimal client desktop installation

Comprehensive product suite for meeting end to end business requirements

Strong Integration Support with third party applications using Web Services, XML based, Java APIs

Newgen product suite based on open, flexible architecture which supports Open Source Applications & Databases

Combined offering of Newgen Products hosted on Amazon Cloud will be highly cost effective

About Newgen

Newgen Software Technologies Limited, incorporated in 1992, in India, is an SEI CMM Level 3 Certified and ISO-9001 compliant organization. This is also the First Indian Software House to have its Document Management and workflow solutions certified by SAP. Newgen, today, is a reputed global software development and consulting powerhouse operating out of nine offices in India, with a fully owned subsidiary in the USA, offices in the UK. Newgen has over a decade's proven expertise in:

Enterprise Document Management Solution

Business Process Management


IT consulting in the areas of Business Process Reengineering, Cross Functional Analysis etc.

Newgen Software Technologies Limited has established itself as a focused, global software product firm in the Enterprise Content Management and Business process Management Space. Newgen has been rated as having the largest (39.5%) market share in the ECM and Workflow market in India by Frost & Sullivan. Newgen is also among the few companies in the World to feature in the Gartner Magic Quadrant for ECM and BPM.

The company has four software development centers; three located in New Delhi and one in Chennai. The development centers are seamlessly linked to its facilities in the US and other parts of the world. In the past 16 years the company has invested more than Rs.700 million in developing Business Process & Document Management Technologies. Newgen has to its credit over 700+ software installations worldwide.

Newgen has a large pool of Internet technology consultants, digital strategists, domain experts, enterprise architects and creative and cognitive experts based in India and other parts of the world who are geared to provide its clients with the most cost effective and quality solutions.

About Amazon Web Services

Amazon Web Services is a scalable, reliable hosted IaaS service compliant with SAS70. AWS delivers a set of reliable and scalable web based computing and storage services that together form the Amazon Cloud. The Amazon SLA guarantees 99.95% availability of the Amazon Cloud Service. AWS has a global presence and is currently available in four regions: US East (Northern Virginia), US West (Northern California), EU (Ireland), and Asia Pacific (Singapore).

Key Functional Requirements

Customer will primarily upload Document Images through Scanning & Indexing Application like OmniScan or WebScan. Documents can also be uploaded through Mail Capture or Fax Capture. Then the upload into OmniDocs Repository or OmniFlow repository can be carried out. An acknowledgment will be generated by Newgen upload applications indicating success or error along with exception status specific to each document upload operation.

Integration touch points with the third party workflow if being used will be identified and Newgen upload components will be configured/ customized accordingly. This document upload framework mentioned above will be studied in more detail during implementation to design the exact upload methodology as required by Customer.

AWS Terminologies

Amazon Region:

Amazon provides the ability to deploy and access server instances in multiple geographic locations called regions. Amazon Regions consist of Availability Zones, are geographically dispersed, and will be in separate geographic areas or countries. The Amazon Service Level Agreement commitment is 99.95% availability for each Amazon Region. AWS is currently available in four regions: US East (Northern Virginia), US West (Northern California), EU (Ireland), and Asia Pacific (Singapore).

Amazon Availability Zone:

Amazon Availability Zones are distinct datacenters present at different locations in the same region that are engineered to be insulated from failures in other Availability Zones and provide inexpensive, low latency LAN connectivity to other Availability Zones in the same Region. By launching instances in separate Availability Zones, the applications can be protected from the failure of a single location.

Amazon EC2:

Amazon EC2 (Elastic Compute Cloud) is the standard Cloud Server unit in Amazon whose instance consists of CPU, RAM & some non persistent storage space.

Amazon EBS Volumes:

Amazon EBS (Elastic Block Storage) will be used for placing database files and will also be used as boot drives for EC2 instances. Thus EBS Volumes will serve as the hard disk storage space for OS, other installed applications, Database hosting and SMS hosting. Image Store will also be created using EBS volumes. File System can be installed on EBS.

Amazon S3:

Amazon S3 (Simple Storage Service) provides key-bucket based scalable storage which will be utilized for storing the EBS snapshots which can then be used to restore EBS storage volumes.

Amazon Elastic Load Balancer:

Amazon Elastic Load Balancer can automatically balance requests across multiple instances and multiple Availability Zones and ensure load balancing and failover to DR site.

Amazon Elastic/Public IPs:

Amazon Elastic IP addresses are static IP addresses designed for dynamic cloud computing. Elastic IP addresses allow you to mask instance or Availability Zone failures by remapping the public IP addresses to the replica instances. Amazon EC2 enables the access to a different instance by quickly remapping Elastic IP address to the replacement instance.

Non Functional Key Requirements

Basic Cloud Characteristics:

AWS cloud is available across three continental regions of US, Europe & APAC. The first hosting will be in US region followed by the hosting in other continents. AWS is SAS 70 certified. The Amazon EC2 cloud servers and storage volumes are highly scalable with support for increasing capacity as and when required in terms of storage as well as servers' processing power. The AWS cloud hosting is reliable with inbuilt redundancy for storage volumes provided by hosting provider.

AWS has all the required security features as explained in the AWS Security Whitepaper. Easy launch, re launch & monitoring support for cloud server instances is available through AWS Cloud Watch. AWS monitoring framework for the hosted instances provide a continuous heartbeat view of the hosted servers to watch for any latency or performance issues. SaaS based pricing model with monthly subscription charges is provided by AWS. Technical Support model for maintaining the hosted servers is provided along with specific defined SLAs.

Network Connectivity Capabilities:

Hosted cloud servers can be accessed through HTTPS and VPN by the customers. HTTP with SSL support will be available as and whenever application would be accessible through the internet. AWS recommended product called VPN-Cubed will be used to provide an overlay network so that the VPN device on the cloud server end will be accessible to multiple VPN devices at various customers' endpoints. Currently AWS MPLs connection to AWS will be available if Customer is able to terminate existing MPLS infrastructure connection in Equinix (Virginia).

Shared & Dedicated Hosting:

There will be an option of providing DMS hosting to all customers in a shared environment with one hosted environment shared among a set of customers. A dedicated hosted environment exclusive for each customer can also be provided. In case of big customers having huge load, there might be dedicated server instances in a separated hosted environment for each customer. These dedicated server instances will be accessible to the customers through MPLS/VPN as well.

Query to Customer: Will there be a separate complete replica of the hosting servers for the dedicated scenario? Will separate Oracle Server Instances be setup for dedicated hosting environment? Will there be a separate DR environment for each dedicated hosting setup? Do we need to provide Active-Active cluster for the hosted Web/App servers and Oracle/SMS servers in dedicated hosting environment scenario?

Active-Active & Active-Passive Cluster:

Newgen's enterprise architecture provides clustering at application server layer, database server layer and storage management server layer. So Amazon Elastic Load balancer will be used for registering various sets of servers and load balancing input requests between them. Load balancer will be able to provide access to a set of Web/App Servers in Active-Active mode. The set of DB/SMS servers will be deployed in Active-Passive mode which will provides just fail-over, so that the requests will be sent to the second DB/SMS server only when the primary server is not working.

Disaster Recovery Site:

DR site can be created in another AWS region or AWS zone. DR site location in another AWS region can provide more security from disaster in terms of affected area, but will add to the latency required to transfer data between the Primary and DR sites.

Disaster Recovery Site in another AWS availability region/zone will be provided for each hosting instance in each continental region mentioned above. The Configuration files, Database files and Image files will be synchronized between primary site and the DR site in real time, so that whenever fail over happens the DR site will be able to take over and handle the requests immediately. Failover will be carried out using Amazon Elastic Load Balancer which will automatically switch from the primary server in one availability region/zone to the disaster recovery server in another availability region/zone.

Query to Customer: What is the expected restore time for a DR site to become operational and handle the load? Should DR site be created in another Amazon region or Amazon availability zone? DR site in another Amazon region will provide more geographic location distance but increase the time taken in synchronization of primary and DR site.

Incremental Snapshot Backups to Cloud Backup Storage:

Our Document Management Software has application configuration files, metadata in databases like SQL, Oracle and Images stored in a proprietary format on file system. Thus we will ensure regular scheduled backup for the configuration files, database files and stored Image files. A scheduler framework will be used to take regular incremental snapshot-based backups from all the EBS storage in primary site to the S3 storage in primary site.

Import Existing Bulk Data to the Cloud:

AWS Import/Export Service API will be used. Data provided on media will be uploaded to S3 from where it will be then moved to EBS. Images with relevant naming convention for identification and Metadata in CSV format will be extracted from old system and provided for upload to Newgen. The objects' data like annotation objects will be provided in some xml format which will be understood and then imported in Newgen's repository format. More detail about data conversion to Newgen format identifying the involved activities and effort will be provided in Data Migration Activities section.

Integrations with Third Party Applications in customer environment:

Integration with the existing applications installed in customer's LAN environment is required e.g. integration with Active Directory, SiteMinder, SAP and Oracle ERP. So there will be a support for cloud hosted servers to be able to interact with existing application servers in customer's LAN environment through all of the following possible networking frameworks: HTTP, HTTPS, VPN and MPLS environment. Authentication with one or multiple Active Directory Servers installed in multiple customers' premises or authentication with one or multiple SiteMinder Servers installed in multiple customers' premises will be provided.

Query to Customer: We will like to know in more detail about these ERP, LDAP & SSO integrations to calculate the scope of work, and thus the effort required and the cost estimate for carrying out these integrations.

Scope of Work

The Newgen Scope of Work will involve hosting of the Newgen DMS and its management, monitoring and maintenance as a SaaS based document management service on Amazon IaaS cloud. The Customer might already be having some backlog of data that has to be migrated to Newgen DMS and hosted online in cloud.

The main activities which will have to be carried out for deploying Newgen DMS Services on Amazon cloud for Customer are:

Allocating Amazon Server Instances, EBS Storage space volumes, S3 Backup storage blocks, Load Balancer and Elastic IPs on Amazon Cloud.

Setting up Image Storage on cloud by installing NFS file system on huge EBS storage volumes attached to a server instance.

Allocating space in another Amazon Region/Availability Zone for setting up DR site.

Deploying Oracle and Oracle DataGuard on the Database Cloud servers on cloud setting up master database and standby databases in primary as well as DR site.

Installing Newgen OmniDocs Web & App Server instances & Newgen Storage Management Servers on Amazon Cloud Servers in primary as well as DR site.

Setting up Load Balancer, Newgen App/Web servers in Active-Active cluster and DB/SMS servers in Active-Passive cluster.

Creating backup framework for the App/Web Servers, DB/SMS servers and Image Storage on cloud to the cloud S3 backup storage.

Creating replication methodologies for synchronizing Image Storage between Primary and DR region. Database synchronization between Primary and DR site will be done using Oracle DataGuard.

Setting up multi VPN connection on all instances of Newgen Web Servers.

Migrating existing data of the customer by converting the data available to Newgen format and associating the converted data to Newgen Servers.

Setting up Newgen Product components to capture documents through thick client production grade OmniScan, light weight Web Scan, Mail Capture, Fax Capture etc.

Configuring Newgen products to work as per Customer's requirement of two stage upload in which first the Images and indexes are uploaded to FTP site from where upload into OmniDocs and any third party workflow will take place.

Integration steps/touch points with third party workflow in the document upload framework to be identified and implemented.

Integration required with third party Directory Servers, SSO servers, SAP, other ERPs to be studied in detail and accordingly Newgen Integrating components to be configured and customized if needed.

For setting up each customer, folder structure, indexing fields, scanning templates to be studied and created in the cabinet of that particular customer.

OmniDocs Record Management cut-off and retention policies to be configured if required.

Allocating to a Customer branded URL for the access which will serve customer logo branded hosted Newgen DMS service for the customer.

Whenever a new customer has to be migrated to Newgen DMS services, first it will be decided whether a repository for that customer has to be deployed on multitenant shared Newgen Cloud Servers or a dedicated server setup has to be hosted on the cloud for that customer.

For each customer being migrated to Newgen DMS services, irrespective of being deployed on shared/dedicated instances, some standard installation, deployment and configuration of OmniDocs products will be done and some additional specific work will be done as per that customer's requirement.

Standard deployment activities for each customer:

OmniDocs User Licenses: Allocating OmniDocs user licenses on OmniDocs server as per that customer's requirement.

Production Scanning Services Setup: Allocation and Installation of OmniScan thick client as per the no of desktop scanning stations. Web Scan will be enabled, configured and allocated as per the requirement.

Scanning & Indexing Templates: Document Scanning and Indexing templates will be configured as per the business document types to be uploaded for that customer.

Creation of cabinet and Storage space allocation: A logical isolated OmniDocs cabinet repository will be created for that customer that will mean creating a database instance on the Oracle database server and allocating storage space on the cloud for that customer e.g. allocating a storage space of 500 GB for that customer.

Data Migration: Backlog data migration/migration from other repository will be carried out including conversion of data received from old repositories in some standard format to Newgen format.

Users Import & Authentication: Users will be imported in most of the cases from Directory Server e.g. Active Directory. LDAP authentication of users will be enabled in case where Directory Server has to be used.

Folder Structure creation: Folder structure will be created in OmniDocs as per the document types required for that customer.

Setting up additional VPN service connectivity for that customer on the cloud server.

Allocating a branded URL to the customer for DMS for access which will provide DMS services branded for that customer including customer logo etc.

Setting up record management retention policies for the documents archived for that customer if required.

Custom deployment activities for each customer:

Configuring additional capture configuration for a customer e.g. Mail Capture or Fax capture.

Single Sign On enabling as per the customer e.g. Active Directory NTLM, Kerberos, Site Minder or any other third party SSO configuration.

Integration required with third party Directory Servers, SSO servers, SAP, other ERPs to be studied in detail and accordingly Newgen Integrating components to be configured and customized if needed.

Newgen Cloud Deployment Architecture on Amazon

Newgen will use Amazon Elastic Compute Cloud as the computing servers and attached Electronic Block Storage as online hosting storage for DMS product. Online Storage present in EBS volumes will be backed up to Amazon S3 storage for backups and recovery purposes. Database Images and configuration files will also be backed up to Amazon S3.

The cloud hosting based DMS deployment architecture is described below. We have provided Active-Active clustering of Application/Web servers using Load Balancer in the below design. Database Servers and Newgen's Storage Management Servers will be deployed in primary site as an Active-Passive cluster.

The planned architecture layers for Newgen ECM's deployment on cloud hosting which takes into consideration setup of a disaster recovery site are as explained below.

Web DNS Layer:

Authoritative DNS Server will be used which will by default point to the public elastic IP of cloud server hosted in primary region. Whenever primary region is down DNS server will be pointed automatically to the IP of cloud server hosted in DR region.

Web Server Layer:

The two instances of Web & App Server instances will be registered as instances of a HTTP/HTTPS Listener Service on Amazon Elastic Load Balancer. Public IP in primary region will point to the Amazon Elastic Load Balancer which will distribute requests between two instances of Web & App Server.

Both HTTP & HTTPS requests will be handled. Each Web & App Server will be deployed on a cloud server instance with at least 4 GHz & 8 GB RAM. Applications will be installed on storage volumes attached with these cloud server instances. A snapshot of the Web & App Server will be backed up to backup device regularly and will be used for reloading Web & App Server Instance as and when required.

Application Servers will use high availability multiple data sources support to access fail-over standby database instance whenever primary database instance is unavailable.

Database Layer:

Oracle Enterprise Database will be used as the database server and installed on Amazon EC2 Cloud Server. Active-Passive Database cluster will be created to provide Fail-Over. Oracle Data Guard will be used to provide a standby actively synchronized database server in the same primary region. An Oracle Data Guard Observer will have to be installed on another machine instance for providing active synchronization to standby database server and automatic failover. Whenever master database is inactive, standby database will take over the primary database role. Application Servers will also switch and initiate database connections with the standby database. Snapshots of the storage volumes containing master DBMS files will be backed up to Amazon S3 storage regularly.

An Oracle standby database server will also be maintained in another Amazon region/availability zone to provide a remote disaster recovery site. Oracle Data Guard will be used to synchronize primary database in main region to the Oracle Instance in DR region a well.

Storage Management Server Layer:

The two SMS servers will also be installed as instances of a TCP/IP Listener Service on Amazon Elastic Load Balancer. A separate Image File Store will be created by associating large EBS storage volumes with an Amazon EC2 server instance. NFS file system will be installed on this Image Store to make it accessible from both Newgen SMS servers in a shared mode. The SMS servers will be configured in an Active-Passive configuration. In this case, the second SMS server will provide the fail-over support to the primary SMS server and will take the active role only when primary server is down.

Disaster Recovery Layer:

A Disaster Recovery Layer will be setup to which DNS Server will forward requests whenever primary site is not active. Snapshots taken in primary region will be used to restore instances of Web & App Server and SMS & DB server in DR region. Image File Store stored on SAN/NAS in primary region will be synchronized with parallel storage volume available in DR region. Database Files will be synchronized from primary database in the main region to standby database in DR region which will be done using Oracle Data Guard.

Database backup to the Amazon S3 storage will be scheduled only in main region. Similarly incremental snapshot backups of EBS Image storage volumes will be taken to Amazon S3 only in main region.

The public IP in DR region will be assigned to Web & App Server which will in turn forward requests to the SMS & DB server in DR region. Whenever DR site is active, the database changes made in standby database will be synchronized back to the main database using Oracle Data Guard functionality. Image storage changes made in DR site will be replicated back to the main site as well through Amazon snapshot functionality. The EBS backup snapshot to S3, and the S3 snapshot replicated back to the EBS volume in the other site, will be customized and automated by scripts for the desired frequency as appropriate to synchronize Image data between primary and DR site.

Cloud Hosting Infrastructure Components

Hosting Assumptions:

The minimum possible cloud server configuration has been considered which will help in launching an initial shared instance for customers. Cloud Hosted Shared Servers' usage will increase as new customers get added and load increases.

VPN software termination endpoint on the hosted cloud server instance will be able to provide VPN connectivity to multiple customers.

Infrastructure Specifications:

The list of all third party components which will be required for Newgen DMS's Cloud Hosting are as mentioned below.

Sr. No.





Amazon EC2 Cloud Servers - 4 GHz,8 GB RAM


5 in primary region and 3 in DR region


HDD Elastic Block Storage - 1 TB volume


5 TB in primary and 5 TB in DR


Backup S3 Storage - 1 TB block


5 TB in primary and 5 TB in DR


Amazon Elastic Load Balancer



Amazon Cloud Watch Monitoring


1 for each EC2 cloud server


Amazon VPN infrastructure


VPN-Cubed Cloud Enterprise - Number of Managers - 4, Overlay Network Devices - 100

Multi VPN support - third party


Amazon Network Bandwidth - Data Transfer


Amazon EBS àS3 snapshots for backup


AWS Import for importing existing 5 TB to cloud

One Time activity


Oracle Enterprise Server License


One Time


Oracle Active DataGuard License


One Time


Oracle Support


Amazon Cloud Hosting support

Proposed Newgen Product Solution

Newgen Product Suite will comprehensively and effectively provide the functional requirements for Document Managed Services as required by Customer. The Newgen products and product extensions which will be used to deliver the Document Management Services are as mentioned below:

OmniDocs: OmniDocs is the Enterprise Document Management (EDM) platform for creating, capturing, managing, delivering and archiving large volumes of documents and contents.  It is scalable, multi-tiered, platform-independent and built using Java and J2EE technologies.

OmniDocs Records Manager: OmniDocs manages the complete lifecycle of documents for record retention, storage and retrieval policies.

OmniScan: It provides a comprehensive system for high-speed bulk production grade scanning, document separation & indexing and supports separation based on barcode, forms and blank-page.

OmniScan Web: OmniScan Web is a lightweight scanning solution that allows the user to scan the documents and segregate them into appropriate records from anywhere in the world using the web. This web based OmniScan needs minimal deployment on the client desktop.

OmniDocs Authentication Manager: OmniDocs Authentication Manager provides the capability of integrating with all LDAP compliant directory servers to provide user import, authentication and single sign on support.

OmniDocs SAP Connector: Newgen is SAP certified and OmniDocs SAP connector provides SAP Image Enabling using SAP Archive Link HTTP API.

OmniDocs ERP Integration: OmniDocs is generally integrated with ERPs like Oracle ERP, JD Edwards, PeopleSoft etc. using OmniDocs Restful Web Services. OmniDocs can Image Enable an ERP as well as carry out any required data exchange.

OmniDocs Fax Capture: OmniDocs Fax Capture captures Images from the Fax dump folder and uploads into OmniDocs repository. Upload acknowledgement and reports are generated.

OmniDocs Mail Capture: OmniDocs Mail Capture captures emails, extract email headers, body and attachments and use the extracted data to be uploaded in OmniDocs as documents. Upload acknowledgement in form of xml, mail capture, parsing and upload reports are generated by Mail Capture.

OmniDocs FTP Capture: Images along with the Image Indexes in the form of csv/xml files are captured from the FTP source folders and uploaded in configured OmniDocs destination folders. Document Indexing is carried out as per the metadata related to that Image obtained from the FTP source folder.

Data Migration Activities

Data from other repositories will be available in a csv or xml format with a well defined structure. The format is expected be almost uniform across all repositories. The proposed migration activity will require Customer to provide network connectivity and other facilities to migrate data at the Customer data centre and then the migrated data will be moved to the cloud using some media.

The data migration from the old repositories to the OmniDocs repositories will involve following activities:

Importing indexes, images and folder hierarchy.

Annotation format conversion from the source repository to the Newgen repository.

Access permissions migration.

User Group migration from directory service or any other system.

Migration utility testing on sample data from each repository.

Migration exercise on one repository.

Verification of migration by Customer.

Migration of other repositories in a similar methodology.

Assumption and Dependencies

MPLS can currently be provided if MPLS connection termination endpoint is available through Equinix, Virginia.