Weather Prediction through Sentiment Analysis on Twitter and Multi-Dimensional Data

8589 words (34 pages) Essay in Computer Science

18/05/20 Computer Science Reference this

Disclaimer: This work has been submitted by a student. This is not an example of the work produced by our Essay Writing Service. You can view samples of our professional work here.

Any opinions, findings, conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of UK Essays.

ABSTRACT

While it is strongly believed in the psychology that weather has some influence on a human being’s mood, the discussions regarding their interrelations  have been taking place from a long time. This project aims to study this long lasting discussion through sentimental analysis of data compared from regular psychological area: Twitter and regular weather  forecast from forecast links. Analysis performed on the twitter data obtained by twitter API which is collected with respect to the attributes of multi-dimentional data and tries to reveal the correlations between tweet and multi dimensional weather data.  Also this project aims to predict the weather based on neural combinational associations.

  1. INTRODUCTION

A human being’s physical, psychological, and economic well-being are supported by their mood and emotional state. Biological factors such as cortisol levels and cardiovascular functioning are related with positive emotions. These factors influence social involvement, support and may amplify economic success. Social platform reflects several emotional states and individual emotions in an elaborated manner. Limitations of small sample sizes and generalization were noticed while studying relations between mood and weather. [40]

The term weather is used to illustrate day-to-day variations in our atmosphere. This includes temperature, humidity, wind speed, wind direction and atmospheric pressure, among other variables. The climate of a locality is characterized by examining the weather statistics to achieve assessment of the daily, monthly and annual means, medians and variability of the weather data. Climate is, therefore, a long-term average of weather [15].

Get Help With Your Essay

If you need assistance with writing your essay, our professional essay writing service is here to help!

Find out more

Weather data is collected and stored in the datasets. These datasets contains the information about data combination of humidity, temperature, rainfall, radiation, snow depth, vapor pressure and wind speed, air pressure, sun light intensity etc., for improving the prediction we need historical datasets that refers huge amount of data sets collected from different sources (big data), to process this data, we need new hardware and software with tools and techniques [1].

Various methods like Radial Basis Function Network, BPA (Back Propagation Algorithm), SVM (Support Vector Machine) and SOM (Self Organization Map) reviewed in [12], states that many of the researchers used BPA for weather prediction. In [5] the authors have reviewed various rain forecasting models by NN (Neural Networks) like FFNN, RNN, and TDNN. The survey shows these are compatible to predict weather forecasting techniques such as numerical and statistical models. Neural networks give better results for yearly basis data, but they gives poor performance for daily and monthly data. In paper [14] the authors Shoba G. et al. are investigated the different methods like ANFIS (Adaptive Neuro Fuzzy Inference System), and SLIQ Decision tree for rainfall forecasting. Balamurugan et al. [15] differentiates data mining algorithms like Decision Tree, KNN (K-Nearest Neighbor), Neural Networks, and Fuzzy Logic for rainfall prediction. They come up with conclusion that neural networks giving better results. Anshal Savla et al. [7] discussed different algorithms-based classification techniques of data mining such as SVM (Support Vector Machine), RF (Random Forest), NN, Bagging and REP Tree. Finally, they concluded bagging classification method is the best to predict the rain fall forecasting.

1.1  Objective

-          Attribute based sentimental analysis on Twitter data and multi-dimensional data obtained from weather forecast links.

-          Predicting the weather forecast using weather labels, based on the Sentiment analysis result.

1.2  Problem Statement

Sentiment analysis prediction on twitter as well as the multi-dimensional data with respect to attributes obtained in the weather data is quite complicated as there might not be any relevant attribute related mentions in twitter.

Thus, there is a need of weather data from official weather forecast open access sites with gateways. Both these data sources (twitter and multi-dimensional) would not match. Thus, a preprocessing is required for the analysis.

During this process, the multi-dimensional data is converted to XLSX, as the data processing algorithm requires raw data with respect to the native API. Similarly, twitter data filtration is required as data needs to be in a numerical format, and not textual with a filtration challenge in the form of attributes.

  1. Related Work

2.1  Methodologies used for weather forecast based on different types of time period

On weather forecasting different researchers proposed their different methods and models to early prediction of weather conditions. Here weather forecasting approaches are four different scales based on period of time:

-          long scale is yearly,

-          medium scale is monthly,

-          short scale is weekly and

-          very short is daily.

Models have been developed in papers [8] and [18] for forecasting long term data. The work proposed in papers [4], [10] and [28] have developed models for forecasting medium term data, while the work in papers [3], [16] and [27] talk about models developed for forecasting short term data. For forecasting daily basis data (very short-term data) models have been developed in papers [1],[2],[5],[7],[19],[20] and [21].

Map Reduction Algorithm:

Bendre and Thool [1] proposed map reduce algorithm to predict weather conditions on daily basis using ICT services in agricultural big data environment to collect huge amount of data. They generated data from KVR (Krishi Vidyapeeth Rahuri) Weather station and analyzed on daily basis. They conclude as this approach is used to escalation the accuracy of the weather forecasting system by using various weather parameters for the future precision farming.

SVM and FCM:

Sanjeev Kumar Singh et al. [2] proposed prior recognition of tropical cyclones (TC) using global model products. They inspected 14 TC’s developed in NIO (North Indian Ocean) in between 2008–2011. They attained forecast fields at 6-h intervals up to 120 h, ahead formation of a cyclone over the NIO domain during 2008–2011. In next continuation they apply above methods on non-developing systems also for complete validation and for further enhancement in future. Kulwarun Warunsin and OrachatChitsobhuk [10] exhibited attainment of early cyclone discovery system based on wind speed and wind direction. They proposed SVM classification and FCM clustering techniques to identify the early cyclones and concluded as FCM offers highest accuracy of 93% and SVM produces poor results due to outliers.

Predictions based on Atmospheric Computer Models:

Takemasa and Keiichi and Koji [3] proposed numerical weather prediction based on computer models of atmosphere. In this method they synchronize the computer simulation with the real world is essential to accurately determine the atmosphere’s current state in six-hour interval circle. This method is not suitable for more magnitude samples, observations, and higher resolutions.

CGMS:

Huang Qing et al. [8] proposed China-CGMS. They analysis daily basis data in china using regression and Scenario analysis and they made early predictions of crop for administrative sectors. They concluded as constructing more yield type’s calendar and input parameters by using in detailed soil analysis and weather datasets to increase the adaptation of China -CGSM in near.

Wavelet ANN and Wavelet postfix- GP model:

V.Dabhi et al. [29] proposed weather prediction system for daily basis using Wavelet ANN and Wavelet postfix- GP model [12][13]. By daily basis we are not predicting accurate values.

2.2  Issues involved in weather forecasting

Due to the large volumes of weather data sets, conventional models do not give accurate results. To increase the accuracy of the system, the straight storm formation prediction system is required. NWP techniques cannot solve the prediction local weather conditions because they are unstable [25]. Statistical models also cannot produce great results because they produced based on assumptions [6]. Here are four different types of weather forecast methods [5], they are:

-          Very short-scale forecast: hours basis (1-5)

-          Short-scale forecast: 6 hours – few days (week basis)

-          Medium-scale forecast: months basis (1-10).

-          Long-scale forecast: Yearly basis

The challenges of Long, medium, short-term and very short-term data are as follows. In long term weather datasets, there is no simple process for determination of the weather input parameters. Too many or too few parameters can affect due to long time period (years). It is difficult to use same prediction model for short time because input parameters are changing on a daily or weekly basis. Therefore, a changed or newly added parameter does not fit in to the model which is already developed. Long term forecasting data is dependent on a period of sampling of input data. For long term data, if the training dataset is huge then it gives better results. Distortion and noises associated with the random variations of input parameters are possible in very short term or short-term weather datasets. Therefore, daily or weekly data may not provide accurate results. In comparison, monthly data provide results that are better than weekly data and results produced by yearly data are far better than monthly and weekly [16][27].

Fig 1: Types of Weather Forecast Methods

2.3  Sentiment Analysis

In the recent years, sentimental analysis has been the most utilized evaluation technique. The reason behind such desirability is due to the use of NLP, Biometrics and Text analysis to evaluate an emotional state of an individual. Sentimental Analysis aids in acquiring various information including emotion, opinion [33].

Since Social Media platforms have been governing the recent years, there certainly comes a requirement of an automated system for analysis. Consider social media platforms as the data sources, user gets access to large amounts of data to analyze and make decisions [33]. However, this becomes burdensome when preprocessing the manually. Hence sentimental analysis plays a vital role in providing the user with automated systems to analyze the data sources available.

The results of Sentiment analysis are based on the attributes. The tools used for this process assist in matching the attributes with the opinions of human emotions. It also involves collection of information based on the desired keywords, emotion from data sources (Twitter as used in this project) [34]. In addition to the extraction of data, theses analysis tools also pilots in prediction and can be exploited in different areas of fields. Several investigations conclude that sentimental analysis is widely used and published in many research papers and has been taking turns on the web world.

Sentimental Analysis encompasses several techniques which can be utilized in the fields of Business and general analysis. Several approaches and application comprising Scaling Systems, bales Interaction Process and Subjectivity/Objectivity Identification has been conferred on the paper [37]. The above-mentioned analysis has been entitled as Machine learning; NLP; Text Mining approach and Hybrid approach by the author.

Approaches of Sentiment Analysis:

Generally, Lexicon and Machine Learning approach are two eminent methods of sentimental analysis. The popularity of these methods is because of the type of result they produce irrespective of the fields they are used.

      Lexicon Approach:

Over the years, Lexicon approach has been implemented for various studies to perform Sentiment analysis. Lexicon approach has been portrayed as list of words with a score which conclude them as a positive/negative/objective in nature. According to [38] Lexicon approach uses Textual opinions to calculate the sentiment polarity. Novel Machine Learning approach, Ensemble approach and Corpus learning are considered as widely used Lexicon approaches for sentiment analysis.

Lexicon Based approaches have its own share of advantages and disadvantages. The major advantage being getting the data without any preparation. The functionality of this approach is based on extracting the positive and negative words in a sentence and thus the extraction or collection of data is accomplished by the pre-defined list of words

Though they are considered to be easily implemented, there are several disadvantages to the approach. Lexical based approaches find it hard to understand the slangs used in the social media sites [39]. Another disadvantage includes the creation of a predefined list of words also called as Lexicon based dictionary) wherever the approach is instigated. Hence this clinches the reason why they are not suitable for the modern language sets.

      Machine Learning Approach:

The intention of the approach lies in extracting the sentiment polarity based on data sets. The machine learning approach stays ahead as they are capable of adapting with the aid of linguistic features and ML Algorithms. ML Algorithms works with both supervised and unsupervised methods. Some of the most widely used methods based on this approach for sentiment analysis are: Support Vector Machine, K-Nearest Neighbor, Naïve Bayes, Neural Network, etc.

As we compare the leads of both the approaches, Machine based approach edges Lexicon base approach as they are capable of adaptation towards the context of study. Due to its adaptation capability, Machine based approach do not require a specific set of keywords or dictionary [38]. The ability of handling multiple languages, providing high accuracy makes machine-based approach advantageous.

The other side of Machine based approach is the need for a set of labelled data with respect to new data which leads to its reduced applicability in this context. Models trained on text in a specific field will not be attuned in another field. Nonetheless ML based approaches have a knack to classify and provide better sentiment analysis as compared to other approaches [38].

2.4  Sentiment Analysis on Weather

Weather forecasts are predicted by collecting vast data about the attributes that include Temperature, Wind speed, Wind direction, Air Pressure and Humidity. The availability of vast data can be chaotic and would lead to a less accurate predictions for future. Hence this project performs data acclimatization and utilizing it in concurrence with threshold and keyword-based filtration to predict the weather forecast [35].

  1.  Methodology

3.1  High Level Architecture

The following figure shows the high-level architecture of the project:

Fig 2: High-Level Architecture

The architecture shows two kinds of data that are involved in the analysis: Twitter data and Multi-dimensional data. Both the dataset obtained are put through various stages before they arrive at a prediction. Initially, they are preprocessed and then segregated month wise (May, June and July). At this stage we have 6 separate data files, 3 data files for twitter, for each of May, June and July, and 3 data files for Multi-dimensional data, each of May, June and July. A sentiment analysis is performed based on the attributes, i.e. weather variables, and the attribute which is most dominant is determined (highest frequency). Thus, for each Month, a pair of attributes are obtained. This pair of attributes is compared against the standard weather labels with all neural combination of attributes from twitter and multi-dimensional data. Based on the combinations, the result is a predictive quantitative output based on prediction with neural combinational associations.

3.2  Data Collection:

This project involves two kinds of data:

-          Weather related data collected from Social Media Website such as Twitter

-          Weather data collected from weather forecast websites

Both the data are collected for Location: Edmonton, Canada.

3.2.1        Multi-Dimensional Data Collection:

The entire monthly wise data is collected from the web weather gateways like open access weather crawling sites. From these sites, data is gathered with input parameters and depending upon the site access and attributes the data is populated month wise in the form of CSV format. This kind of multidimensional population method is not single time but multiple access mechanism as the data is month wise with attribute-based filtration model. Here the gateway code is written in python and by filtration we populated month wise data with CSV. Some missing data was manually filled by obtaining data from the official Canadian government weather website [36]. CSV is then converted into XLSX format as the data is populated to java where the application works on XLSX with POI (Poor Obfuscation implementation) API.

Fig 3: Python script for data collection from open weather API

3.2.2        Twitter Data Collection:

The social media consist of 3.5 billion posts in total, with 2.4 billion from Twitter. Twitter data is more likely to consist of text expressions revealing the user’s underlying emotional state, while it also allows additional investigation into the mechanisms underlying the changes in expressed sentiment and to compare the effect sizes to other events.

Data Collection in Twitter allows developers access to a range of streaming API’s which offer low latency access to flows of twitter data. For the data collection implementation, the public streams API was used, it was found that this was the most suitable method of gathering information for data mining purposes as it allowed access to a global stream of twitter data that could be filtered as required. In order to take advantage of this stream, a java interface library had to be installed this library was necessary for java to interface with twitters API v1.1.  For this task there were a number of libraries available. Java twitter tools v1.14.3 was chosen as it allowed the basic filtering and streaming functionality required for this project. Twitter has numerous regulations and rate limits imposed on the API. For this reason, it is required that all users must register an account and provide authentication details when they query the API. This registration requires users to provide an email address and telephone number for verification, once the user account is verified the user will be issued with the authentication detail which allows access to the API. A Java script was then created which provided the API with the authentication details and initialized a streaming process where data could be pulled from twitters RESTful web service to a local machine. A filter function was used to allow the program to request twitter content based on specific keywords related to this specific study. All the downloaded data was transmitted in JSON format, it was found that this standard was less verbose than the alternative format that was offered XML.

Each JSON formatted package contained a large amount of information but it was decided that for this project only the tweet and the time the tweet was written was required. In order to remove the unwanted content each package was parsed using a java script which located the useful content and stored it in RAM until main memory storage became available. An additional check was performed to ensure all the tweets downloaded were written in the English language. This check involved parsing the JSON content for a ‘Lang’ tag and then performing an equality check on its content. 

Thus, once the required content was removed from the JSON package and stored in RAM it now could be written to main memory. There were many options on how to store the information such as, comma separated values (CSV) file, a text file or in a dataset. It was decided that the optimum approach was to use a text file. A dataset was created with a simple able structure which had the fields priority attributes. The priority attribute was automatically generated by simply incrementing a counter each time the dataset was written to.

3.3  Data Preprocessing:

Data preprocessing is done to eliminate the incomplete, noisy and inconsistent data. Data must be preprocessed in order to perform any data mining functionality.

3.3.1        Multi-Dimensional Data Preprocessing:

The data collected from open weather API provided data only for some days of the month. Due to API exception error, there were some missing data. These missing data were filled manually by obtaining data from the official Canadian Government weather website.

The final obtained dataset consists of 5 weather related attributes:

      Temperature

      Humidity

      Wind direction

      Wind Speed

      Air Pressure

3.3.2        Twitter Data Preprocessing:

Twitter is a real time information network that connects an individual to the latest climatic conditions and news about what they find interesting. This can be done by simply searching for accounts which are found most compelling and following their conversations and tweets. At the heart of Twitter are small

postings of information called Tweets. The length of each tweet is 140 characters long. Emoji’s, photos, videos and conversations are directly visible in tweets, which provides the whole story at a glance, all in one place.

By using java IO streams, all tweets are populated to memory. The tweets collected contain a mix of sentimental tweet with attribute data and numerical representation. By using following methods data is preprocessed and cleaned.

Fig 4: Preprocessing of Twitter Data

      Collection and filtration of tweets according to weather, with respect to keywords.

      By using the threshold, the keywords pre and post words, the data is filtered, and the remainder data is treated as noisy data.

      The above process is done by tokenizing each tweet into fully qualified words with cleaning process.

      Removing URL’s:  Some tweets contain URL with single token which start with http://, https:// and www://. They are cleaned through the use of regular expressions.

      Question words such as what, which, how, etc, do not contribute to polarity. Hence, in order to reduce complexity, such words are removed.

      Special characters like.,[]{}()/’ should be removed in order to remove discrepancies during the assignment of polarity. For example, “it’s good:”, if the special characters are not removed sometimes the special characters may concatenate with the words and make those words unavailable in the dictionary. In order to overcome this, special characters need to be removed. By using numeric data discovery, the data will be surrounded with [90.89] as an example for further calculation purpose.

      Retweeting is the process of copying another user’s tweet and posting to another account. This usually happens if a user likes another user’s tweet. Retweets are commonly abbreviated with \RT.” For example, consider the following tweet: “Horrible weather with 33 degrees temperature :)”. This tweet will be considered as temperature with [33], where [33] here is the numerical value associated with attribute temperature.

      After populating the data, the tweets will be extracted along with numerical data based on all the 5 attributes (Temperature, humidity, wind speed, wind direction and air pressure), as in the multi-dimensional data.

3.4  Data Loading:

3.4.1        Multi-dimensional data loading:

Apache has provided POI (poor obfuscation implementation) API. Apache POI, a project run by the Apache Software Foundation, and previously a sub-project of the Jakarta Project, provides pure Java libraries for reading and writing files in Microsoft Office formats, such as Word, PowerPoint and Excel.

The POI API is used to load the excel sheet into java memory, where a XSSFWorkbook is created and Microsoft excel sheet loaded is recreated.

3.4.2        Twitter data loading:

The twitter data is loaded into java, by calling the File Input Stream.

3.5  Implementation

This project is implemented using java with swing API. The following figures show the flow of implementation. The below figure shows exact flow of the work. Initially, data is collected from two sources: twitter and gateway source (from websites with gateway access through language API). The collected data is cleaned by removing unwanted data as, the data needs to be relevant to weather model with numeric data, while also it should be relevant to weather with proper lists which are specific to attributes like:

-          Temperature

-          Humidity

-          Wind speed

-          Wind direction

-          Air pressure.

Fig 5: Implementation

By using back propagation technique and mean weighted average vector the threshold will be created, as the threshold depends on the locality. Upon obtaining the output attributes per month, those attributes which is affecting the weather that month will be predicted with relevant labels, so that, in the future if the weather from the same information is needed, the label will be the prediction.

3.5.1        Flow Chart

Fig 6: Flow Chart

3.5.2        Packages

Many java class files and related metadata and resources are bundled into one file for circulation, in the form of a package file format, Java Archive (JAR). The following figure shows the JAR files used in this project.

Fig 7: JAR files

3.5.3        Attribute definitions

allTweets() : An array list used to accommodate all tweets in one single variable

allAttributes() : An array list of all attribute names as labels with comparable keywords.

at1(), at2(), … at10() : Array list of 10 variables to store individual attribute values from the multi-dimension data (excel sheet).

availableAttributes() : Array list containing the attribute names with respect to multi-dimensional data.

allAttsVals() : Array list to store the  individual attribute’s threshold value calculation result.

3.5.4        Calculation for Mean Weighted Vector

The following figure shows the code used to calculate the mean weighted average vector of all the attributes per month to determine the maximum valued attribute for the month.

Fig 8: MWV calculation

The maximum attributed value with attribute is determined by:

    Fig 9: Maximum attributed value calculation

The accuracy is calculated by:

       Fig 10: Accuracy calculation for July

The accuracy is calculated by dividing the maximum attributed value by total number of tweets which are posted on that attribute.

For example, if humidity is the maximum attributed value for that month, the accuracy would be calculated as: 78/123

Where,

Maximum MWA of the humidity: 78 and

Total number of tweets from twitter with numerical data: 123

3.5.5        Neural Associations

An attribute-based association model is applied on all the tweets to fetch relevant attribute with numeric data extraction in the form recursive loops as the association is per attribute. For each neural recursion, an attribute association data will be populated which is of the filtration model. This filtration model provides all individual attributes maximum value, based on the maximum value per month as the above the model (neural association) is based on month wise individual calculation. Based on the output, the data numeric maximum values will be populated.

3.5.6        Functionality

initComponents():

This function instantiates all the widget component which are placed on the swing UI (JFrame) including JFrame.

jButton1ActionPerformed():

This function will be used to trigger the code when a button with clickable event occurs. Two events are stored: A twitter event and a collected event (multi-dimensional).

If the category selected is Twitter: For each month, the input text file is stored as a string and the java StringTokenizer() class is called. A loop is used to iterate over the length of the text file. Checks are performed to determine if the tweets contain any of the 5 attributes mentioned (temp, humidity, wind direction, wind speed and air pressure). If a tweet contains the attribute, the getValueFromTweet() class is called, and the size is incremented. The consolidated value for each attribute is calculated. The maximum consolidated attribute value is determined and that attributed is resulted as the one which most affected that months weather. The sentiment frequency is displayed as a bar graph. The accuracy is calculated by dividing the maximum attributed value by total number of tweets which are posted on that attribute.

If category selected is collected: For each month, the POI API is used to load the excel sheet (input file) into java memory, where a XSSFWorkbook is created and the Microsoft excel sheet loaded is recreated. A rowIterator() and cellIterator() is used to iterate through each row and cell in the recreated excel workbook. A loop is used to iterate over the length of the workbook. The mean weight vector for each attribute is obtained. The maximum consolidated attribute mean weight is determined, and that attribute is resulted as the one which most affected that months weather. The sentiment frequency is displayed as a bar graph. The accuracy is calculated by dividing the maximum mean weighted attributed by the size.

The maximum attributes form a neural association. The prediction() class is called to predict the result of this neural association.

getValueFromTweet():

A public class which takes as inputs the tweet and attribute. It returns the data as an attribute count and the numerical value associated with that attribute.

getPredictionResult():

This function takes as input the neural association, i.e. the highest sentiment frequency attributes, and returns the predictions based on the neural combinations of this association, by using weather labels.

  1. EXPERIMANTAL SETUP

4.1  Approaches

4.1.1        KNN (K- Nearest Neighbors)

There are several considerations important for the interpretation of the results.

While data is obtained month wise of an individuals’ expressed sentiment as reflected by their social media posts as tweets, optimal data would also include these individuals’ daily self-reported emotional states. While sentiment expressions on social media can be reflective of underlying emotions [30], the linguistic measures employed here represent an imperfect and noisy proxy of emotional factors. Further studies are needed to improve the accuracy validity of sentiment metrics based on attributes.

Find out how UKEssays.com can help you!

Our academic experts are ready and waiting to assist with any writing project you may have. From simple essay plans, through to full dissertations, you can guarantee we have a service perfectly matched to your needs.

View our services

KNN will take inputs after classification per tuple entries. Thus, the single-entry tweets appear all the time. It was found that the accuracy of prediction very less and sometimes it is non comparable with other trialed approaches. Somehow this worked well with multi-dimensional data with Euclidian distances. But the accuracy is too weak as the k value generation is random and fluctuating.

4.1.2        LIWC (Linguistic inquiry word count)

The chosen LIWC sentiment metrics may imperfectly measure the sentiment of expressions on social media. The robustness is examined of the findings to the use of other sentiment classification tools with the Twitter data in SI: Alternative measures of expressed sentiment.

In these analyses both the key word strength and tweet’s priority algorithms were employed, and it was found that the results are quite robust across all three of the employed sentiment metrics. However, because all three of the metrics used likely have idiosyncratic errors associated with them, our measurement of the sentiment of expressions remains imperfect. To determine whether a social media post uses words that express positive or negative sentiment, it is relied on Linguistic Inquiry Word Count (LIWC) sentiment analysis tool22.  [40]

LIWC is a highly validated, dictionary-based, sentiment classification tool that is commonly used to assess sentiment in social media posts [5],[6],[23],[24] (Note: the results obtained are similar under the use of alternative sentiment classifiers, SI: Alternative measures of expressed sentiment). In this analysis, positive and negative sentiment are treated as separate constructs [31].

4.1.3        Threshold and key-word based filtration

This was the final approach which gave a valid outcome of the result. This approach totally depends on the keywords of the attributes from the multi-dimensional data observed in the twitter cleaned data.

The keywords are:

      Temperature

      Humidity

      Wind Speed

      Wind Direction

      Air Pressure

Based on the above attributes, the tweets are populated as the data gathered from the weather forecast open source gateway links (multi-dimensional      data) is based on the above mentioned attributes.

Thus, by getting the numeric data out of all the tweets and by comparing the result (per month and per attribute) with the attribute’s threshold, the maximum (highest frequency) among all attributes is found. This attribute would be regarded as the attribute which most affected the corresponding months weather with respect to twitter data.

With respect to the multi-dimensional data, the mean of all the above mentioned attributes are obtained, and the MWA (mean weight average) is calculated, checking for the maximum (highest frequency) among all the attributes mean with the individual thresholds. This attribute is regarded as the attribute which most affected the corresponding months weather. 

Thus, for each month, a pair of attributes is obtained which most affected that months weather. This pair of attributes is compared against the standard weather labels with all neural combination of attributes from twitter and multi-dimensional data. Based on the combinations, the result is a predictive quantitative output based on prediction with neural combinational associations.

The combinations are as below:

Source Attribute

Target Attribute

Combinational Result

Temperature

Temperature

Hot

Temperature

Humidity

Hot and Humid

Temperature

Wind Direction

Warm

Temperature

Wind Speed

Warm and Windy

Temperature

Air Pressure

Warm and Clear Skies

Humidity

Temperature

Hot and Humid

Humidity

Humidity

Rainy

Humidity

Wind Direction

Rain and Moving Clouds

Humidity

Wind Speed

Rain and Storm

Humidity

Air Pressure

Rain and Clear Skies

Wind Direction

Temperature

Warm

Wind Direction

Humidity

Moving Clouds and Rain

Wind Direction

Wind Direction

Windy

Wind Direction

Wind Speed

Cool and Windy

Wind Direction

Air Pressure

Windy and Clear Skies

Wind Speed

Temperature

Warm and Windy

Wind Speed

Humidity

Moving clouds and Rainy

Wind Speed

Wind Direction

Windy

Wind Speed

Wind Speed

Windy

Wind Speed

Air Pressure

Windy and Clear Skies

Air Pressure

Temperature

Sunny and Clear Skies

Air Pressure

Humidity

Rainy and Clear Skies

Air Pressure

Wind Direction

Windy and Cool with Clear Skies

Air Pressure

Wind Speed

Windy and Clear Skies

Air Pressure

Air Pressure

Cool and Clear Skies

  1. RESULTS

    Fig 11: Sentiment Analysis Dashboard

The above figure shows the main sentimental dashboard. This is designed using java swings. The two options of data population to memory are done here: from twitter and multi-dimensional datasets. Java API of files streams with Array Lists are used to accumulate data to java memory and by using tokenization all data is tokenized with comparison of all attributes.

With the calculation of the total and individual consolidated attributes mean weighted average. The highest value from the attributes will be populated based on the threshold value per attribute. This will be done for 3 individual months as the data is segregated month wise (May, June and July).

5.1  Selection of Data: Twitter

Following figures show the results for May, June and July

                   Fig 12: Consolidated Weather report for May (Twitter)

The above figure displays the results for the month of May. Evaluation is done based on the calculation of May month’s consolidated and aggregated mean weighted average of all attributes individually and by taking the individual thresholds. From the collection classes, the maximum valued attribute is obtained, as the sentimentally dominant attribute for that particular month’s data. Thus, May is dominant with high humidity.

Fig 13: Consolidated Weather report for June (Twitter)

The same process applied for June month. It is observed that June is dominant with high temperature.

Fig 14: Consolidated Weather report for July (Twitter)

Similarly, the same process applied for July month. It is observed that July is dominant with high humidity.

5.2  Selection of Data: Multi-Dimensional

Fig 15: Consolidated Weather report for May (Multi-Dimensional)

The above figure shows the result for May. The results show that May has high air pressure and humidity, based on the evaluation that mean weighted average of air pressure is high from the attribute calculation. This calculation is done from the POI populated values. The data is accommodated in array list of the java memory. These individual array lists are inputs to collection classes, to get the maximum valued attribute. Thus, in the month of May air pressure is high with this process.

   Fig 16: Consolidated Weather report for June (Multi-Dimensional)

The above figure shows the results for June. The same process is applied as above. It is observed that June has high air pressure and humidity with this process.

The figure below, shows the results for July. The results show that July has high air pressure and humidity, based on the evaluation that mean weighted average of air pressure and humidity is high from the attribute calculation. This calculation is done from the POI populated values. The data is accommodated in array list of the java memory. These individual array lists are inputs to collection classes, to get the maximum valued attribute. Thus, in the month of May air pressure and humidity is high with this process.

    Fig 17: Consolidated Weather report for July (Multi-Dimensional)

5.3  Sentimental Analysis Results

Fig 18: Maximum values per month

This above figure shows the result of the maximum values per month with respect to attribute and by observing the tweets for the month of May June and July has given the fluctuated results for twitter data and for last three multidimensional data has given similar data. So the tweets data is the fluctuated results and gave non similar results and multi-dimensional data is with similar results.

Fig 19: Accuracy

   Fig 20: Final Accuracies

This above two figures show the accuracies. The formulas and calculations are shown below:

For twitter:

Total number of tweets with respect to max attribute / the total number of tweets.

The result is multiplied by 100.

For multidimensional:

Total mean weighted average per attribute (ie air pressure, if air pressure is maximum) is divided by total number of the gathered count for specific attribute (air pressure).

This result will be multiplied by 100.

The following figure shows the final representation for all the combinations, month wise and attribute wise (which is of maximum value).

Fig 21: Pie Chart of Accuracies

5.4  Prediction

Fig 22: Prediction results

The above figure shows the prediction results as combinational neural model attributes with all neural association from twitter prediction and multi-dimensional prediction.

  1. CONCLUSION

This review discusses about a survey of various prediction methods by different researchers for weather forecasting data from social media as twitter and the gathered data from the accessible government weather forecast web link gateways. The work as well addressed the limitations and issues that needs attention while applying different methods of weather forecasting. The review shows that Threshold and keyword based filtration and mean weight average vector approach is better than other prediction techniques like KNN and LIWC and produces accuracy results. The review also identifies that KNN performs well for large scale basis (monthly). But for medium scale and daily basis LIWC produces less accuracy results. From the review, we identify that Threshold and keyword based filtration produces good results for monthly basis (large scale), this performs better for large scale basis and produces better results for short scale basis (monthly). Threshold based Classification technique is better for predicting the weather sentimentally with respect to all above approached techniques. Threshold and keyword based filtration with mean weight average offers highest accuracy of 87.24% this leads to a reduction in detection performance due to outliers.

7. REFERENCES

Cite This Work

To export a reference to this article please select a referencing style below:

Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.
Reference Copied to Clipboard.

Related Services

View all

DMCA / Removal Request

If you are the original writer of this essay and no longer wish to have the essay published on the UK Essays website then please:

Related Lectures

Study for free with our range of university lectures!