The Need For Gridgain Technology In Business Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

GridGain software is the most popular Java based distributed computing middleware these days with is used by hundreds and thousands of business organizations in their business every day. It has the capability of working on any infrastructure from Android to 1000 nodes in a cloud.It is a free, open source software that provides Enterprise Level support.

GridGain 3.o is JVM based post functional distributed middleware that supports both functional programming and object oriented programming.The result of this upgrade from grid gain 2.0 to grid gain 3.0 is a very powerful tool that has come up with API's which are easy to use and are highly impressive.The two main new feautures that are added in Grid gain 3.0 from its older version are :

1) support for SCALA.

2) Java based functional programming framework.

Because of these two features Gridgain becomes the first and the only language supporting both JAVA and SCALA languages.

The paper will talk about Gridgain shell and its fully integrated Middleware in detail. Thus this term paper will be focused on exploring this latest technology that defines itself as:

GridGain = Java * Scala + Compute grid + Data grid + Cloud auto-scaling


The latest trend going on right now is to eliminate 3rd party involvement in deploying the technology.In the early stages when this technology emerged ,the deployment was very tough.So many companies had to jump in to help the technology inventors down the line. Gridgrain technology not only provides zero deployment but also provides cloud elasticity that will eliminate the need for 3rd party providers in deployment.

These 3rd party deployment providers were the greatest hurdle from the software engineer's prospective for the cloud computing innovators: deploying of simple web site by us may require the 3rd party involvement though it looks very simple and smooth.


Data Partitioning without Gridgain

Figure 1[1]

Figure 1 illustrates the grid without grid gain which result into network traffic.Here the data server is searched for the data and the result is then delivered to the user node.

Data Partitioning with Gridgain

Figure 2 [1]

Gridgain eliminates serialization resulting into minimum network traffic. Moreover data can be accessed from both node 2 and node 3.Gridgain with compute the data in distributed in different nodes separates and will then aggregate the result into the master node.


From the feedback given by the grid gain a new set of goals were developed with Grid Gain 2.0. These goals can be put in five general categories as they have set a new name to Grid Gain:

2.1 Core Functionality Improvements

It was necessary for GridGain to improve its functionality in core areas such as REST interfaces, discovery and communication capabilities, and at the same time retaining the maximum backward compatibility.

2.2 Even Simpler Usage Model

Grid Gain had simple powerful APIs that set Grid Gain apart from older technologies at the beginning. Its research in functional programming and domain specific languages showed that even simpler model can be developed that would introduce distributed operation right at the language level enabling a fundamental shift in how cloud enabled application scan be developed.

2.3Easier Cloud Scalability

GridGain was the first middleware that worked on both private and public clouds infrastructure .But still further enhancement was required to ensure that Gridgain supports Hybrid infrastructure along with private and public clouds infrastructure.

For Gridgain to be powerful it should include the cloud elements in the topology into API level along with providing simple usage model.This will facilitate users to directly operate on cloud resources from inside their applications.

2.4 Nave Data Grid

GridGain from its first version provided integration with number of available memory data grid products such as JBoss Cache, Oracle Coherence and GigaSpaces. But such integration provided end-users within coherent and less than ideal experience further compounded by the lack of number of fundamental features such on demand peer to peer class loading, advanced transactions supporting eventually consistent modes, comprehensive distributed query capabilities, and non trivial affinity based distribution.

Next version of GridGain had to provide fully integrated and cloud enabled in memory data grid with industry leading feature set.

2.5 Pay Per Usage Pricing Model

The most required change in Grid Gain is in pricing model. Lot of clients showed a clear disappointment between fixed price support subscription and its pay per usage pricing from cloud providers.

It was expected that the next version had a clear pay per usage pricing model which will eliminate the inherited price impedance between GridGain software and cloud providers.


GridGain 3.0 is an outcome of research performed by GridGain Systems for more than 20 months.

Grid gain 3.0 is said to be highly backward compatible to grid grain 2.0. Though its API is extensively improved, it lot of cases the migration to new version just requires a recompilation. This durability of the design is a witness to GridGain core design.

Following are the key feature if Gridgain 3.0

3.1 Functional Programming In Java And Scala

GridGain 3.0 is the first distributed middleware in the market that is a combination of functional and object oriented programming approach.The by product of this combination is a set of API's that are very dominant, elastic and easy to read.

Two new features were introduced in Gridgain 3 to support functional programming :

Java-based functional programming framework built from the ground up in GridGain 3.0 providing its users with the most comprehensive functional programming capabilities for Java programming language.

Scalar - Scala-based internal Domain Specific Language (DSL) built on top of GridGain 3.0 Java-based functional core allowing Scala developers native access to GridGain functionality.

3.2 100% Integrated Platform

GridGain = Compute + Data + Cloud

GridGain 3.0 provides incorporated, consistent and user-friendly distributed middleware that is a combination of data grids and auto-scaling on any infrastructure.

GridGain 3.0 is 1st of its kind that provides such fully functional platform which can be used to develop and scale the application on any infrastructure. The advantages of this kind of middleware range from noticeably cut down the education curve and growth cycle to exceptional capabilities that cannot be found in any other products such as advanced data grid, zero deployment, GridGain Visor, etc.

3.3 Advanced Data Grid

GridGain 3.0 is developed on top of the existing older versions of it which had various functionalities such as support for functional programming, communication, peer-to-peer on demand class loading,etc. Some of its main features include:

Expiration policies (LIRS, LFU, LRU, FIFO)

Named caches

Read-through and write-through logic with pluggable cache store

Synchronous and asynchronous cache operations

Pluggable data overflow storage via new swap space SPI

Pluggable memory model including off-heap allocation

Pessimistic, optimistic and eventually consistent transactions

JTA/JCA integration

Data replication and data invalidation in synchronous and asynchronous modes

Partitioned cache with active replicas

Advanced distributed query capability including SQL based, Lucene based, H2 text based and predicate based scanning with support for pagination, local and remote filtering, transformation and reduction

Full integration for compute grid for non-trivial affinity based routing

Functional and object-oriented APIs [2]

GridGain 3.0 is the first ever data grid that has zero-deployment potential which allows the users to get the GridGain nodes online.That nodes right away turn into the integrated part of the topology and has the ability to store the user objects ,they without being deployed explicitly.

3.4 Advanced Compute Grid

MapReduce pattern is a key feature of this principal compute grid know-how. It describes the process of dividing the original task in to subtasks and computing the various sub tasks in parallel.Then the results from all this sub tasks are aggregated to get the final result.

MapReduce pattern or paradigm is provided with most powerful and advanced features such as:

Direct API support for split and aggregation

Pluggable failover and topology resolution

Distributed task session

Distributed continuations

Distributed recursion

Node-local cache

AOP, OOP-based, FP-based, synch/asynch execution models

Cron-based scheduling

Redundant mapping

Zero deployment with peer-to-peer class loading

Partial asynchronous reduction

Support for weighted and adaptive split

Checkpoints for long running tasks

Early and late load balancing

Affinity routing with data grids [2]

3.5 Seamless Cloud Enabling

GridGain is one of t he first and only platform that can grid enable the existing code without modifying it. This is made possible by making use of Java annotations and solving the cross cutting issues using AOP and thus forming DSL that is grid-enabled.In JBoss for example we don't even require to have to grid enable the code as java annotations and Aspect Oriented programming style can be applied to the code using external file.

3.6 GridGain Visor

The Enterprise version of GridGain 3.0 bring in a new plug in management tool called GridGain Visor.This is command line tool used for management. Some of its main features are: 

Allows to "script" various operations on GridGain deployment

Interactive and command modes

Fully extensible via user defined pluggable commands

Seamless connectivity to the running GridGain deployment

Some of the available out-of-the-box commands:

Review and monitor topology

Execute grid tasks

Monitor status

Get statistics

Query data grid

Query distributed events [2]

3.7 Pay-Per-Usage Pricing Model

GridGain 3.0's Enterprise Edition introduces a novel pricing model where user has to pay per usage.Its the first distributed middleware that has the capability to work on any infrastructure - ranging from a single computer to hundreds of nodes in the cloud - offering a single pay-per-usage model.

GridGain 3.0 also includes idle-detection technology that temporarily stops charging for the nodes that are idle for more than 1 hour.Node being idle can be defined as no user operation done on that node or not preserving any data on t hat node for more than an hour.

Thus this pay per usage model allows the gridgain users to use the technology without being penalized for frequent scenarios of over provisioning such as failure recovery sites,planned builds,etc.

3.8 Hybrid Cloud Support

Enterprise version of Gridgain 3.0 supports all the cloud enhancements which is made possible using Serial Peripheral Interface implementations for detection and communication. These new features are included in the technology so that it can work more effectively with large hybrid cloud setting using uni directional associations, and cloud routing capabilities.

These new implementations support the following hybrid deployments:

Multiple private/public clouds with no direct connectivity

Multiple private/public cloud with out-connectivity or in-connectivity only

Geographically distributed hybrid cloud

Single cloud with WAN/LAN/VPN connectivity between nodes [2]

3.9 API-Level Cloud Control

GridGain 3.0 also brings in advanced Application Programming Interface level power of cloud operations.It completely removes the need for the involvement of 3rd party management solutions by using a novice cloud SPI style with three major implementations for RackSpace,EC2 and in-memory cloud, enabling users to keep track of any infrastructure from the code level itself.The 3rd party's involvement required before this control was implemented included features such as:

Starting, stopping and managing virtual instances

Querying cloud resources such as images, storage devices, network quotas, etc.

Changing virtual instance profile (where supported) [2]

Gridgain 3.0 provides clear topology and a single view with one single API on all the cloud consisting of many cloud providers This unified view simplifies auto-scaling capabilities of GridGain applications.

3.10 Zero Deployment

GridGain 3.0 is the first ever entirely incorporated cloud platform that has a zero deployment capability. This means that all essential classes and resources are loaded only on request. It also provides 4 diverse modes of peer-to-peer deployment which supports the most complex deployment environments like custom class loaders, WAR/EAR files, etc.

In Zero deployment a user can simply bring up a default GridGain node online which later becomes element of the data. Users can also compute grid topology and can store and perform any client responsibilities without any explicit deployment of client's resources.

3.11 Advanced Load Balancing

Both early and late load balancing can be achieved by Grid Gain 3.0. They basically enable complete customization of full load balancing process. They also allow adapting of the grid task execution of the grid to a non-deterministic nature of execution.

It is believed that the grid environment is often mixed and is non-static. This means that complexity profiles of the task can be dynamically changed during execution and at any point they can be effected by the external resources.

3.12 Pluggable Fault Tolerance

Failover management and its resulting fault tolerance are key properties of any grid computing infrastructure. As per the SPI-based design of GridGain it completely provides pluggable failover logic with quite a lot of accepted implementations accessible out-of-the-box. Unlike other grid computing frameworks it also allows to failover the logic along with data.

The atomic unit of execution on the grid is the grid task and the completely customizable failover logic enables to choose the specific policy by the developer, similarly as in RDBMS transactions one would choose concurrency policy.

This allows tuning of the grid task reactions to failure. Example:

fail entire task immediately upon failure of any of its jobs (fail-fast approach)

failover any failed job to other nodes until all nodes are exhausted for this job (fail-slow approach)

3.13 REST APIs for GridGain Access

GridGain 3.0 introduces a new SPI which will allow access to Grid Gain from the outside of GridGain deployment. By default the implementation uses built-in Jetty container and REST-style API with supported XML and JSON data formats.

This new property was basically designed so that a non-Java environment such as Flex applications can access GridGain functionality.

3.14 Management and Monitoring

GridGain comes with an wide collection of JMX MBeans which is used to expose all main monitoring and statistical information about all nodes in the grid. Any JMX-compliant Web or standalone GUI viewer can be used to access this information.

There are three major types of clouds: private, public and hybrid. Private and public clouds are provided by private and public cloud providers. Hybrid clouds consist of computing resources from more than one cloud provider (and can be fully private, fully public or truly hybrid consisting of a mix of private and public resources).

Hybrid clouds is the topic that gaining more and more traction as software supporting complex hybrid topologies is emerging. GridGain has been supporting hybrid clouds since version 2.0.

Hybrid clouds bring plenty of additional challenges for applications that are developed to work on them. One of the obvious complications is the fact that your application should scale transparently over two or more different cloud providers, each providing its services in potentially very different way - starting with operating systems, pricing models, management APIs, SLAs, type of computing resources available, and even different concepts (e.g. Rackspace flavors vs. Amazon AMIs). Moreover, having computing resources from several different providers introduces networking challenges as there could be different security, firewall or other network configuration settings required for each providers.

All of that clearly requires a substantial support from middleware framework like GridGain to be at least minimally productive when you are developing an application to work on such infrastructure.


4.1 Broadcasting

The following is a simple application that will broadcast the execution of a closure to its participating nodes. In this case the simple scenario just prints out a string: The "F.println()" will return a simple closure which prints out the passed argument. All the available nodes by Grid Gain then execute this node



    F.println("Broadcasting This Message To All Nodes")  


4.2 Splitting Execution

The following is a simple application that will split a given phrase into words and will create closures that will print individual words, and then will execute them on different nodes.

In this case, the standard JDK split method is used to split the initial string into words. The method "F.yield()" will take each word and will give it to "F.println()" closure and will return a set of"F.println()" closures that carries a predefined argument. Using the round-robin fashion this set of closure is collected and executed by sending the closure to different node.:



    F.yield("Splitting This Message Into Words".split(" "), F.println())  


4.3 The World's Shortest MapReduce App

The following application will split an axiom into several words, will have count for individual grid nodes letters in each word, will also reduce step, and will add up all the counts. The application will look as follows:

int letterCnt = G.grid().forkjoin(  


    F.yield("Counting Letters In This Phrase".split(" "),  

        new C1<String, Integer>() {  

[email protected](String word) {  

                return word.length();  






The C1 is a convenience alias for the class 'Grid Closure'; in this case, it will take a string and will return the count of letters in that string. A collection of closures with dissimilar words as an argument will be distributed to individual grid nodes where each of the closures will return all the characters it received in the word. "F.sumIntReducer()" is used by the local nodes to  add all character counts which are given to it.

In addition, to top it all, deployment is not required for any of the node above. You just have to start a couple of bare bone Grid Gain nodes, write your code and then hit the Run button, and the code will be executed on the Grid. This is because it uses the Grid Gain peer-class-loading mechanism. There are no Maven scripts to execute this. All needed is to write your code and run it. If any changes need to be done to the code, then do the changes and again run it.

Grid Gain 3.0 has two main concepts in our APIs. They are as follows :

Grid Projection and


Grid Projection is used to define a monadic set of operations on any arbitrary set of grid nodes. Actually, Grid, GridRichNode and GridRichCloud all implement Grid Projection interface and are monadic projections. Grid is used to define projection on the nodes in global topology. The GridRichNode is used to define a projection with a single node, and GridRichCloud defines a projection of all the nodes in a cloud.And thus a  GridCacheProjection is used to define a set of monadic operations on any arbitrary set of cache entries.

The above two concepts give massive elegancy to our APIs without making them complex. Here are listed few more examples to make the concept understand better.

A dynamic grid projection can be produced in grid gain version three that says that the projection can contain any number of nodes that have a specific attribute. Once such projection is obtained, it is possible to execute any tasks on it, listen and send messages, or perform couple of other operations. This being a dynamic projection any new node joining the grid with a given attribute set will be automatically added to the projection and vice versa. This will give a well-designed approach on isolating yourself from any changes in cloud topology. Here is a small example in Java


GridProjection p = G.grid().projection(new PN() {


    public boolean apply(GridRichNode node) {


        return node.getAttribute("foobar") != null;









In the above example, if projection becomes empty (no nodes in it) - execute (…) then the operation will throw an exception.

There are many operations that will manage projections as it is possible to merge or cross a projection, it's possible to examine their size, you can check if they are empty or dynamic, it is also possible to get the projections predicate, and the majority of operations support further filtering predicates on the projection.

Below is one more example with cache projection:

Let us create a cache projection that will filter out any value, which are greater than 10. To make this possible we will create a dynamic cache projection with a predicate and as cache projection defines over 95% of all operations obtainable on cache - we can utilize them directly as named cache instance . Below is an example using Scalar:


scalar {


    val c = cache[String, Int] ~/ ((e: GridCacheEntry[String, Int]) => e.peek() > 10)




    c += "1" -> 1 // That won't be added since value below 10


    c += "1" -> 11 // That will work ok!



As predicate can, have any logic in it, this gives a good way to restraint your cache for your precise purposes - but at the same time, it does not require any extra APIs on the cache. The below example describes how to generate strongly typed projection by specifying the types of values and keys:


scalar {


    val c = cache ~| (classOf[String], classOf[String])




    c += ("skey" -> "sval") // That compiles ok.


    c += ("s" -> 2) // That WON'T compile since 2 is not a String.



All the listed examples show the trivial usage of monadic operations in Grid Gain. However, is case of a real application their usage will minimize your code thus resulting in a concise and powerful usage pattern and at the same time giving it clarity and readability.