Abstract—In the Data driven era, understanding the feedback of the customer plays a vital role in improving the performance and efficiency of the product or system. Sentiment Analysis plays a major role in understanding the customer feedback especially if it’s a Big Data. Sentiment Analysis allows the machine to understand what the human have conveyed in the text and basically understand the sentiment in the text. Understanding the feedback of the customer is necessary to understand the sentiment of the person towards a particular product or service however since the data are voluminous it is necessary to imply machine learning techniques to understand the sentiment of the text. In this paper the main focus is the fine graining of the classifiers and implementing multiple classes for classification for the feedbacks to give more accurate sentiment of the feedback.
As we ascend into the era of Web 2.0, Reviews and feedback are a pivot factor in understanding the performance and reliability of the product. Hence understanding the sentiment of these feedbacks and reviews play an important role but however due to subjective nature of these feedbacks getting the suitable information becomes a mundane process especially if we are dealing with Big Data. The main purpose of sentiment analysis is to excerpt the behavior of the user towards a particular product or service by using various algorithms and text mining techniques. Basically sentiment analysis involves breaking down of the text into clusters based on particular sentiment words and thus aggregating the sentiment score based on polarity of the opinion. For example, words like “great” and “bad” express the positive and negative opinion respectively and also there can be various levels of sentiment in a particular cluster, for instance “great” and “excellent” both mean positive opinion but “excellent” indicates a higher level of positivity compared to “great”. Also words like “very” prefixed with any sentiment word can affect the level of sensitivity of the word. When we subdivide the classes into subclasses, the accuracy tends to degrade significantly.
In this paper, we discuss about fine graining the classes and improving the sentiment analysis of the text. We will discuss about the various machine learning algorithms used in sentiment analysis and also the hybrid approach which fuses the conventional lexicon and the machine learning approach. The SVM is taken as the baseline model for comparison, a RNN Bi-LTSM model, and a bi-GRU model. Word embedding models are used to convert the words to vector for improving the efficiency of the system.
- Literature Review
Sentiment Analysis has been the area of interest for various scholars around the globe have carried out the popular area of research in the current time and many researches. Tetsuya Nasukawa and Jeonghee Yi in their paper discussed how classifying the whole document as positive or negative could be a bad approach if both opinion subject and document subject are assumed the same. In their paper, they proposed an alternative approach to determine the sentiment for the given subject by identifying the text clusters that determine the sentiment of the subject rather than analyzing the document for polarity . Janyce Wiebe criticized the subjective sentiment classification or sentence level document classification. Identifying only opinioned sentence are not sufficient as two or more similar or contrasting opinions can be presented in a single sentence. Hence he proposed Natural Language Systems that extract texts relevant to a particular sentiment thus distinguishing between credible and non-credible information. Flame detection systems identifies intense tirade and emotional rants but ignores the milder opinions. He proposed a fine-grained annotation schema by performing annotations by word or phrase level rather than document or sentence level .
Pang B, Lee L and Vaithyanathan used Naïve bayes, maximum entropy and Support vector Machines to analysis the Sentiment Analysis. It was interesting to find that the performance of the SVM was better than the other methods. They used the IMDB database to understand the sentiment of the movie reviews. Positive, Negative and Neutral were there three classes they chose to distinguish the sentiment. Unigrams were found to improve the efficiency of the model . In  Gamon used linear SVM with larger feature corpus and finally reducing the features based on their significance. He showed that analyzing noisy data resulted in bad prediction hence he proposed to identify important features in the noisy data along with their sentiment polarity and feature reduction is done to optimize the performance of the model. A hybrid approach was proposed by Ruchika Aggarwal, Latika Gupta  where they discussed how the lexicon based approach and machine learning can be fused to develop the hybrid approach. This suppresses the difficulties of the hand produced rules in lexicon approach and thus makes soft decision which are otherwise extremely difficult, error prone and time consuming if done by humans.
If you need assistance with writing your essay, our professional essay writing service is here to help!Essay Writing Service
Anwar alnawas and nural arici in their paper  about sentiment analysis discussed the various approaches implied in it. Broadly, Sentiment analysis can be done by either linguistic approach or machine learning approach. The linguistic approach is further divided into lexicon based and corpus based. The linguistic approach requires the creation of a dictionary using SentiWordNet. This dictionary has opinion words that is used to match with the words present in the review and polarity score is assigned based on the opinion. In  the authors have illustrated the various algorithms in sentiment analysis. The machine language approach is based on analyzing the data and recognizing the patterns in it. However this method is time consuming and requires large quantity of data. Hybrid approach combines both the earlier methods and achieves the bet result and good accuracy since it has a supervised approach and stability of lexicon method. Haowei Zhang, Jin Wang proposed an machine learning approach to classify twitter data. They used an LTSM system model in a recurrent neural network. The performance of this system exceptional and this model can be further improvised by implementing different word embedding model and other neural networks .
- Research Plan
- Problem Statement
Given a corpora of reviews, we aim to distinguish each one of them to one of the following classes: “positive comment”, “negative comment”, “bug”, “complaint”, “severe complaint”, “undefined” and “mixed”. The corpora is divided into training and test dataset thus used to train the model.
The data used was collected from the IJCNLP shared task. The dataset consist of test set, training set and development set. More feedback data will be taken from other sources to train the model.
We are going to use machine-learning approach to analysis the sentiment. SVM model is consider as the baseline model. A multi-channel Convolution Neural Network along with LTSM model is used. The multi-channel is used since we can imply multiple filters of various lengths thus enabling it to extract n-gram features of different scale, later on LTSM cell is applied sequentially. By fusing CNN and LTSM we can achieve better past-distinct dependency. The model consists of an initial Embedding layer which is used to create token matrix and the output is fed to a CNN layer. The convolution filters performs the convolution operation over the vectors and ReLu Function is chosen to speed up the operation. Max-over Pooling layer is used to extract the maximum value the filters and chosen as the feature. The LTSM layer is used to train the data for proper tokenization. Hidden layer is used to product the results from LTSM layer with a weighted matrix to add a biased vector. The output layer classifies the review based on the sentiment. An Bi-directional Gated Recurrent Unit can also be used instead of an LTSM to predict the sentiment. The initial approach for GRU is same as the LTSM network. We use an embedded layer to convert words to vector, word2vec and glove can be used for this process. In the next stage the GRU uses the two gates to update the previous state function and the current input. Thus there is one reduced gate compared to LTSM which results in less computational time. The encoded matrix vector is reduced into a single vector using a attending layer and finally predictions is obtained as the classification label. Activation functions are used at each layer of the process. By exploring the two models with different word embedding techniques we aim at modeling a better efficient system.
This paper proposed an alternative efficient approach to perform sentiment analysis and fine graining the classification. The output of the research is to build a better customer feedback analysis model and implement it in multi-lingual corpora without reducing the accuracy and score of the model. The research also provides an insight about various word embedding models and reflects how the efficiency changes upon each model.
 Tetsuya Nasukawa Jeonghee Yi, “Sentiment Analysis: Capturing Favorability Using Natural Language Processing,” K-CAP ’03 Proceedings of the 2nd international conference on Knowledge capture 2003.
 Janyce Wiebe and Theresa Wilson, “AnnWiebe, J., Wilson, T. & Cardie, C. Language Res Eval (2005) 39: 165. https://doi.org/10.1007/s10579-005-7880-9
 Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? Sentimentclassification using machine learning techniques. In: Proceedings of the ACL-02 conference on Empirical methods innatural language processing, pp 79-86.J. M. Soler, F. Cuartero, and M. Roblizo, “Twitter as a tool for predicting elections results,” in Proc. IEEE/ACM ASONAM, pp. 1194–1200, Aug. 2012.
 B. Sriram, D. Fuhry, E. Demir, H. Ferhatosmanoglu, and M. Demirbas, “Short text classification in twitter to improve information filtering,” in Proc. 33rd Int. ACM SIGIR Conf. Research and development in information retrieval, pp. 841–842, July 2010.
 Ruchika Aggarwal, Latika Gupta, “A Hybrid Approach for Sentiment Analysis using Classification Algorithm,” in Proc. International Journal of Computer Science and Mobile Computing, pg.149 – 157, IJCSMC, Vol. 6, Issue. 6, June 2017.
 Anwar ALNAWAS, Nursal ARICI, “The Corpus Based Approach to Sentiment Analysis in Modern Standard Arabic and Arabic Dialects: A Literature Review,” in Proc. Journal of Polytechnic, 2018;21(2) pg- 461-470, Sept. 2018.
 Brian Keithl, Exequiel Fuentes and P Claudio Meneses, “Analyzing internet slang for sentiment mining,” in Proc. 2nd Vaagdevi Int. Conf. Inform. Technology for Real World Problems, pp. 9–11 Dec. 2010.
 Haowei Zhang, Jin Wang, Jixian Zhang, Xuejie Zhang, “YNU-HPCC at SemEval 2017 Task 4: Using A Multi-Channel CNN-LSTM Model for Sentiment Classification,” in Proc. 11th International Workshop on Semantic Evaluations (SemEval-2017), pages 796–801, Aug. 2017.
Cite This Work
To export a reference to this article please select a referencing stye below:
Related ServicesView all
DMCA / Removal Request
If you are the original writer of this essay and no longer wish to have your work published on UKEssays.com then please: