Big Data Frameworks for Weather Prediction Analysis

2585 words (10 pages) Essay in Computer Science

18/05/20 Computer Science Reference this

Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.


Abstract Everyone is affected adversely by natural calamities such as storms, cyclones, etc. In this modern era, where the weather is continuously changing due to many factors (like global warming etc.), it is very important to get accurate weather prediction. Weather forecasting can help many important sectors like agriculture, tourism, water resources, and air traffic, etc. For correct weather prediction, there are many IoT devices which produce petabytes of Data every day. Big data processing requires a methodological structure with the fast rise of information from these different sources. These weather observations are yet another task to evaluate and monitor weather changes. However, as weather information is big in quantity, complicated and real, present information handling methods involve high-performance computing systems, distributed storage to process high-dimensional information.


Keywords—Distributed Computing, Big Data, Weather Data, Prediction.



Weather change performs a significant part in determining product quality. All over the globe, huge financial decline and fatalities are forced by tropical storms (TCs) [5][6]. The accurate forecast of cyclone creation is required to decrease these casualties. We need a precise rainfall forecast scheme in advance for decision-making. In addition, the forecast of rainfall also enables to prevent disasters and monitor water management. Early rainfall information can assist the farmers to reduce damage [7]. Each society around the globe will be susceptible to shifts in food production, amount and price due to climate modifications along with their subsequent socio-economic pressures.

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Find out more

                 So, we require a highly accurate weather prediction system and we need to create a high-performance system which predicts any casualties or disaster caused by nature. This will help humankind to live safely. Weather information is kept in the datasets. Weather datasets contain information on data mixture of humidity, temperature, rainfall, radiation, snow density, vapor pressure and wind speed, air pressure, sunlight strength, etc. With the help of historical data, these systems can improve the weather forecast [8]. This large amount of data requires advanced tools and techniques. Big data analytics plays a significant role in handling very big quantities of data and extracting value and expertise from it. Big data’s distinct difficulties are scalability, complexity, and velocity [3]. Big Data includes tremendous and mammoth data in a structured, semi-organized and unstructured way. Big Data is based on mainly following these five pillars:

  • Variety refers toOrganized, semi-organized and unstructured information such as recording, logging, sound, text and image. Since weather information is complicated and extremely dimensional, it needs effective information models, instruments for better decision-making in weather change prediction [1].
  • Veracity refers to data inclinations, commotion, and irregularity.
  • Volume represents the size of the datasets which requires distributed storage and computing.
  • Velocity refers to the time and speed required for data streaming.
  • Value relates to concealed meaningful value from big data. The task here is to define, extract, transform and analyze this data in order to discover the hidden value from it [2].

Hadoop MapReduce model and Spark is helpful for analyzing the big data set. Big data analytics enables to extract information patterns and trends of discovery. The big data in the weather prediction assessment focuses primarily on two parameters: 1). Large weather information resources and 2). Techniques of big data analysis. Evolution of weather data basically works in four directions: observing information, tracking, understanding and lastly predicting and optimizing weather changes. In early years, different tools were used to analyze weather data. Such as, AIRS (First Tool for weather forecasting) developed and evaluated by Lockheed Sanders Infrared and Imaging Systems (LIRE), Lexington MA, under an agreement with the Jet Propulsion Laboratory, Pasadena, CA [9].

In next parts of the paper, Section 2 describes the Big data tools, Section 3 will focus on different methodologies used to process weather information, and lastly, Section 4 concludes the study.

II. BIG Data Analytics Framework

A framework helps the system to compute and process the

large volume data. These frameworks are sorted into Batch


frameworks, Stream-only, and Hybrid frameworks.

  1. Apache Hadoop:

It is an open-source structure for handling big amounts of information across the computer community using high-level languages. The Hadoop framework is based on a MapReduce programming model where map and reduce tasks are performed in a distributed way and offers an efficient solution that is scalable, flexible, fault-tolerant and cost-effective. It consists of threemodules

a) HDFS: A Master-Slave architecture (name node -data node). The document is split into HDFS chunks and distributed throughout the nodes.

b) MapReduce function: Input data passed to the Map function as key-value pair and Reduce function allocates a value to that key.

c) Yarn is a resource manager accountable for organizing and handling tasks.

B. Apache Spark:

Apache Spark is a lightning-fast cluster computing technique intended for quick computing It is focused on Hadoop MapReduce and expands the MapReduce model to use it effectively for more kinds of computations, including integrated queries and stream processing. Spark cluster is focused on three components:

  • driver program that retains the SparkContext item to handle and monitor apps.
  • Cluster Manager is accountable for monitoring all assets in the cloud and returning the state to the Driver Program.
  • Worker Nodes contains operations and time of execution of the spark program.

3. Literature review on weather Data Analysis

Different scientists[24] suggested their distinct techniques and designs on climate forecasting to predict weather conditions prematurely. Weather forecasting methods are four distinct time-based scales:

  • long-scale (annual)
  • medium scale (monthly)
  • short-scale (quarterly) and
  • very brief (daily).

Namitha et al.[13] proposed an approach for weather prediction with large volume of data utilizing Hadoop. The proposed approach uses artificial Neural network (ANN) carried out on Hadoop systems for short and long-term rainfall prediction. Rainfall can be predicted a day before by analyzing immediately preceding temperature and rainfall data. Executing this idea on Hadoop made the approach faster and useful irrespective of estimated data size increases from terabytes or petabytes. In recent years this technique is used widely in India. Following results are produced by this approach.

As we can see clearly, MapReduce framework is effective in decreasing runtime. Regression performance has been enhanced in batch training. But on the other side, classification tasks performance is decreased. Also, Intensity prediction using classification, performance difference was slightly lower equated to rain/no-rain prediction [13].

Shabariram[14] discusses a new alternative for managing information using a Map Reduce Framework centered on spatial temporal features. The workload is categorized using Vector Support Machine (SVM). It utilizes the dataset selection and reduction algorithm. The data analysis is played out in the MapReduce function using the Hadoop framework to enhance

cluster scalability.

Above Analysis stated that nearly “30% of information on the flood stage could be derived from the upstream flood stage and 10% to 20% from the rainfall”[14].

This proposed system enables for easy analysis and classification of large data from the accessible Cluster.  In this paper they used National Weather Service (NWS) Dataset available from National Climate Data Center[15]. The result showed that the suggested idea improved the execution as far as effectiveness and precision were concerned. The approach could predict the rainfall for just one day. Successive processing leads to speed and accuracy issue.

Sunitha et al.[16] used three distinctive RA procedures for rainfall forecast. Input Dataset is collected from Australia Meteorology Bureau[17]. By their experimental results, their suggested approach can predict accurately with parallel processing compared to the existing techniques[22] with MapReduce for efficient rainfall forecasting.

They also demonstrate that their proposed model can reduce the error rate to 0.08% compared to the existing model whom error rate is 85%.

With the advancements in IOT technology, Onal et al.[18] Big Data IOT tools is used for weather prediction. Data collected from low powered sensor devices and stored in Resource Description Framework (RDF) format which transformed into CSV using ETL phases.

NoSQL is used for data storage and k-means clustering algorithm is used for considering different scenarios such as relative humidity, temperature analysis and wind speed.

Jayanthi et al.[19] deciphered an approach using Spark and proposed a weather analytics model with focus on improved processing time compare to Hadoop as shown below.

Input weather dataset is recorded from NOAA[21] for analyzing highest average precipitation or temperature for top ten weather stations. Weather dataset is loaded into iPython Notebook[23] and later Spark API is used for analyzing this data.

However, their results are based on small weather dataset so effectiveness of the above model on larger volume of data and large iterations is yet to be explored.

Hu et al.[20] demonstrated that how large distributed data can be used for predicting weather using ClimateSpark. ClimateSpark, is addressing the big data management and analytics issue related to weather.

They also demonstrated the performance comparison between ClimateSpark, SciSpark, and the pure Spark. It  is concluded that ClimateSpark is efficient for multi-dimensional  queries and array-based weather data with high data availability zone.

4. Conclusion

Thus, we can state that weather Data analytics require high precision, so high-performance computing required for handling the large volume of Data. This paper presented an

sets. This paper presented a summary of existing Big data framework for analyzing weather data using Hadoop, Spark, IOT framework. In future, IOT devices can plays a crucial role in weather forecasting.


[1] Han Hu, Yonggang Wen, Tat-Seng Chua and Xuelong Li, “Toward Scalable Systems for Big Data Analytics: A Technology Tutorial”, IEEE Access, vol. 2, pp. 652-687, 2014. Available: 10.1109/access.2014.2332453 [Accessed 12 July 2019]

[2] S. Yin and O. Kaynak, “Big Data for Modern Industry: Challenges and Trends [Point of View]”, Proceedings of the IEEE, vol. 103, no. 2, pp. 143-146, 2015. Available: 10.1109/jproc.2015.2388958 [Accessed 12 July 2019].

[3] D. Jayanthi and G. Sumathi, “Weather data analysis using spark — An in-memory computing framework”, 2017 Innovations in Power and Advanced Computing Technologies (i-PACT), 2017. Available: 10.1109/ipact.2017.8245142 [Accessed 11 July 2019].

[4] R. Kune, P. Konugurthi, A. Agarwal, R. Chillarige and R. Buyya, “The anatomy of big data computing”, Software: Practice and Experience, vol. 46, no. 1, pp. 79-105, 2015. Available: 10.1002/spe.2374 [Accessed 11 July 2019].

[5] S. Singh, N. Jaiswal, C. Kishtawal, R. Singh and P. Pal, “Early Detection of Cyclogenesis Signature Using Global Model Products”, IEEE Transactions on Geoscience and Remote Sensing, vol. 52, no. 8, pp. 5116-5121, 2014. Available: 10.1109/tgrs.2013.2286900 [Accessed 11 July 2019].

Find out how can help you!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our services

[6] S. Kotal, P. Kundu and S. Roy Bhowmik, “Analysis of cyclogenesis parameter for developing and nondeveloping low-pressure systems over the Indian Sea”, Natural Hazards, vol. 50, no. 2, pp. 389-402, 2009. Available: 10.1007/s11069-009-9348-5 [Accessed 11 July 2019].

[7] R. A. Betts, “Integrated approaches to climate–crop modelling: needs and challenges”, Philosophical Transactions of the Royal Society B: Biological Sciences, vol. 360, no. 1463, pp. 2049-2065, 2005. Available: 10.1098/rstb.2005.1739 [Accessed 11 July 2019].

[8] M. Bendre, R. Thool and V. Thool, “Big data in precision agriculture: Weather forecasting for future farming”, 2015 1st International Conference on Next Generation Computing Technologies (NGCT), 2015. Available: 10.1109/ngct.2015.7375220 [Accessed 11 July 2019].

[9] H. Aumann and L. Strow, “AIRS, the first hyper-spectral infrared sounder for operational weather forecasting”, 2001 IEEE Aerospace Conference Proceedings (Cat. No.01TH8542). Available: 10.1109/aero.2001.931472 [Accessed 12 July 2019].

[10]”Apache Spark Introduction”,, 2019. [Online]. Available: [Accessed: 12- Jul- 2019].

[11] T. Sunitha Manepalli and D. Chamakuzhi Subramanian, “Map reduce technique for parallel-automata analysis of large scale rainfall data”, International Journal of Engineering & Technology, vol. 7, no. 4, pp. 2752-2759, 2018. Available: 10.14419/ijet.v7i4.18370 [Accessed 13 July 2019].


[13] K. Namitha, A. Jayapriya and G. Kumar, “Rainfall Prediction using Artificial Neural Network on Map-Reduce Framework”, Proceedings of the Third International Symposium on Women in Computing and Informatics – WCI ’15, 2015. Available: 10.1145/2791405.2791468 [Accessed 13 July 2019].

[14]C. Shabariram, K. Kannammal and T. Manojpraphakar, “Rainfall analysis and rainstorm prediction using MapReduce Framework”, 2016 International Conference on Computer Communication and Informatics (ICCCI), 2016. Available: 10.1109/iccci.2016.7479954 [Accessed 13 July 2019].

[15]”National Centers for Environmental Information (NCEI) formerly known as National Climatic Data Center (NCDC) | NCEI offers access to the most significant archives of oceanic, atmospheric, geophysical and coastal data.”,, 2019. [Online]. Available: [Accessed: 13- Jul- 2019].

[16] T. Sunitha Manepalli and D. Chamakuzhi Subramanian, “Map reduce technique for parallel-automata analysis of large scale rainfall data”, International Journal of Engineering & Technology, vol. 7, no. 4, pp. 2752-2759, 2018. Available: 10.14419/ijet.v7i4.18370 [Accessed 13 July 2019].

[17] “Climate Data Online”,, 2019. [Online]. Available: [Accessed: 13- Jul- 2019].

[18] A. Onal, O. Berat Sezer, M. Ozbayoglu and E. Dogdu, “Weather data analysis and sensor fault detection using an extended IoT framework with semantics, big data, and machine learning”, 2017 IEEE International Conference on Big Data (Big Data), 2017. Available: 10.1109/bigdata.2017.8258150 [Accessed 14 July 2019].

[19] D. Jayanthi and G. Sumathi, “Weather data analysis using spark — An in-memory computing framework”, 2017 Innovations in Power and Advanced Computing Technologies (i-PACT), 2017. Available: 10.1109/ipact.2017.8245142 [Accessed 14 July 2019].

[20] F. Hu et al., “ClimateSpark: An in-memory distributed computing framework for big climate data analytics”, Computers & Geosciences, vol. 115, pp. 154-166, 2018. Available: 10.1016/j.cageo.2018.03.011 [Accessed 14 July 2019].

[21] “Quick Links | National Centers for Environmental Information (NCEI) formerly known as National Climatic Data Center (NCDC)”,, 2019. [Online]. Available: [Accessed: 14- Jul- 2019].

[22] S. Mehrmolaei and M. Keyvanpour, “Time series forecasting using improved ARIMA”, 2016 Artificial Intelligence and Robotics (IRANOPEN), 2016. Available: 10.1109/rios.2016.7529496 [Accessed 14 July 2019].

[23] S. Karpovich, A. Smirnov, N. Teslya and A. Grigorev, “Topic model visualization with IPython”, 2017 20th Conference of Open Innovations Association (FRUCT), 2017. Available: 10.23919/fruct.2017.8071303 [Accessed 14 July 2019].

[24] J. Sillmann et al., “Understanding, modeling and predicting weather and climate extremes: Challenges and opportunities”, Weather and Climate Extremes, vol. 18, pp. 65-74, 2017. Available: 10.1016/j.wace.2017.10.003 [Accessed 14 July 2019].

Cite This Work

To export a reference to this article please select a referencing style below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please:

Related Lectures

Study for free with our range of university lectures!