Ontology-based Semantic Analysis On Social Media

downloadDownload
  • Words 2186
  • Pages 5
Download PDF

Introduction 

As of late, an immense number of individuals have been pulled in to social-networking platforms like Facebook, Twitter, and Instagram. Most utilize social destinations to express their feelings, convictions or conclusions about things, spots or identities. The primary reason for choosing Twitter’s profile information is that we can get subjective information from this platform since Twitter contains the validated records of the disaster, which isn’t the situation of Facebook or other social destinations. Finding a precision of the recovering information remains a major test today. Additionally incorporates a more up to date technology remains a more prominent concern. Strategies for sentiment analysis, semantic analysis and ontology can be sorted prevalently as machine-learning. There is a space for performing testing research in wide territories by computationally breaking down sentiment and semantics. Along these lines, a steady practice has developed to extricate the information from information accessible on social networks for the expectation of a disaster, to use for disaster purposes. 

The precision of analysis and forecasts can be gotten by ontology extraction dependent on social networks. A keyword-based tweet accumulation, concentrated on the names of the disasters, was made to test the ubiquity of the disasters of 2018 – 2017. This dataset was tried with both managed and unsupervised machine-learning calculations. It utilized the Random Forest (RF), Naïve Bayes, bolster vector machines (SVM), Neural Networks (NN), Decision Trees (DT) and Logistic Regression (LR) grouping techniques on unigram and bigram information. The equivalent dataset was tried utilizing regulated machine-learning calculations which bolstered vector machines (SVM), Naive Bayes, Neural Networks (NN), Decision Trees (DT), Random Forest (RF), and Logistic Regression (LR). Our analysis depends on the examination of various sentiment and semantic analyzers and approves the outcomes with the distinctive classifiers. The trial on Twitter information will demonstrate which strategy has a superior ability to estimate sentiment and semantic forecast exactness. Xinxin Gao, Wencheng Yu, Yilong Rong, Songmao Zhang (2017), The reason for existing is to investigate methods that can help the urban arranging organizations to enhance the social detecting and social observation capacities under the developing data and innovation conditions. A framework coordinating a far-reaching set of text mining algorithms is displayed to direct point to demonstrate, text clustering, occasion development location, sentiment analysis, conclusion mining, and data extraction on client produced a substance in Chinese social media. Domain ontology of Beijing urban arranging is developed to encourage the text mining forms. Assessments on two expansive, genuine datasets made out of microblogs and We Chat articles about the private network and school instruction in Beijing exhibit the viability of our framework. 

Click to get a unique essay

Our writers can write you a new plagiarism-free essay on any topic

The examination outlines the intensity of consolidating machine learning with information based, semantic methodologies in investigating social media for the domain of premium. Pratik Thakora, Dr. Sreela Sasi (2015), In this examination, a novel Ontology-based Sentiment Analysis Process for Social Media content (OSAPS) with negative sentiments is introduced. The social media content is naturally separated from the twitter messages. An ontology-based process is intended to retrieve and examine the clients’ tweet with negative sentiments. This thought is shown with the recognizable proof of client disappointment of the conveyance benefit issues of the United States Postal Service, Royal Mail of the United Kingdom, and Canada post. The tweets identified with the conveyance benefit incorporate a deferral in the conveyance, lost bundle/s or inappropriate client administrations at the workplace face to face or at call focus. A mix of advancements for twitter extraction, information cleaning, emotional analysis, ontology show building, and sentiment analysis are utilized. The outcomes from this analysis could be utilized by the organization to take restorative measures for the issues and also to create a robotized online answer for the issues. A standard based classifier could be utilized for creating the computerized online answers. They gave amazing data about the different segments of the ontology demonstrate. They clarified distinguishing the critical building parts of an ontology demonstrate in a content by applying OOP concepts. Vanni Zavarella, Hristo Tanev, Ralf Steinberger, and Erik Van der Goot ( 2014), A monstrous measure of messages on social media platforms, for example, Twitter, Facebook, Ushahidi and so forth are created immediately after and amid expansive fiascos for the trade of continuous information about circumstance advancements, by individuals on the influenced zones. We propose rather a general design for increasing unstructured client commitments with organized information that are naturally removed from a similar client produced content. The general technique is top-down. We at that point apply a semi-directed strategy for the lexicalization of the objective ontology classes and properties from content. The technique takes in a mapping from classes of phonetic developments, for example, thing and intercession, such developments can be straightly joined into limited state language structures for identification and action word phrases, to semantic classes and occasion designs, separately. 

At that point, with a generally constrained human of occasion reports. At last, we run the yield syntax on publicly supported substance and populate the objective ontology with organized information from that content, by conveying a case of the occasion discovery motor tuned to the social media streams. As the ontology lexicalization technique is dialect and is free, the proposed engineering is exceptionally versatile crosswise over dialects, including for example the not well-framed ones utilized in social media platforms. Thushari Silva, Vilas Wuwongse (2013), In disaster administration, disaster moderation and readiness stages assume crucial jobs to lessen the harms caused by a specific disaster. The advancement of information frameworks which recreate disaster readiness and alleviation exercises requires the reconciliation of a lot of information from various broadened information sources. The objective of our work is to give a powerful path, by methods for the Linked Open Data (LOD) technology, to bring such extraordinary, appropriated information sources into an institutionalized and replaceable basic information organization and thus to empower their interoperability and mix. We portray how to distribute diverse arrangements of information sources in the Resource Description Framework (RDF), the fundamental information structure of LOD, and how to interconnect them with different sources in the LOD cloud. The Sahana disaster administration framework has been improved with this interconnected RDF information prompting a framework called Sahana Asia. Sahana Asia can mimic disaster alleviation and readiness exercises, profiting helpless individuals, crisis administration groups and also different associations engaged with disaster administration. It has been assessed by members with a few foundations in disaster administration and observed to be compelling as far as framework quality, information quality, and generally framework execution. A couple of future augmentations of the framework have likewise been exhibited. These augmentations can be promptly actualized as they all consistently speak to their information and information in RDF. 

Research Gap

In this section we will have an overview of the existing tools and techniques for analyzing social media data followed by the present tools and techniques that could replace existing ones. Sentiment analysis over Twitter offers organizations a fast and effective way to monitor the publics’ feelings towards their educations, disasters, politics, etc. A wide range of features and methods for training sentiment classifiers for Twitter datasets have been researched in recent years with varying results. In this thesis, we introduce a novel approach of adding semantics as additional features into the training set for sentiment analysis and ontology as data extraction. For each extracted entity from tweets, we add its semantic concept as an additional feature and measure the correlation of the representative concept with the negative/positive sentiment. 

Objectives

The Primary objective of this present tool is: To find the best accuracy among the various machine learning algorithms. To analyze the performance of ontology extraction. 

Methodology

Here we present a comparative study of various classification algorithms in machine learning used to classify disaster & non-disaster tweets with some newly devised features like passive-aggressiveness and emoji sentiment flip for better accuracy. 

Various Classifications algorithms used in this thesis

  • Naive Bayes Naive Bayes classifiers are based on applying Bayes’ theorem with strong independence assumptions between the features.
  • Logistic Regression In regression analysis, logistic regression is estimating the parameters of a logistic model; it is a form of binomial regression.
  • Support Vector Machines Support vector machines are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis.
  • Random Forest Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes of the individual trees.
  • Neural Networks A positive weight reflects an excitatory connection, while negative values mean inhibitory connections. All inputs are modified by a weight and summed.

This activity is referred to as a linear combination. Finally, an activation function controls the amplitude of the output. For example, an acceptable range of output is usually between 0 and 1, or it could be −1 and 1. 

  • Decision Trees Decision tree learning uses a decision tree to go from observations about an item to conclusions about the item’s target value. Data collection and preprocessing was a major step in the project where the following techniques were used:
  • We remove tweets that start with ‘@User’ as they are re-tweets and do not provide information about the original tweet which can potentially be Disaster Tweet.
  • We limit our study to English tweets as more resources are available for the processing of text in the English language.
  • In our thesis, we integrate emoji’s like a feature to detect emotion or at least a change in polarity.
  • User mentions and URLs are removed from the tweet as they are not indicative of the original nature of the tweet.
  • We remove duplicates that may result from re-tweets. Ontology Extraction: In this thesis, we use 21 special features along with usual unigrams and bigrams for classification. These 21 features were divided into 4 categories:
  • Text expression-based features Text in the label expression extracted from tweets to have it appear with your labels on the map.
  • Emotion-based features Emotion is the process of identifying the human emotion, most typically from facial expressions as well as from verbal expressions.
  • Semantic-based features Semantic features represent the basic conceptual components of meaning for any lexical item. An individual semantic feature constitutes one component of a word’s intention, which is the inherent sense or concept evoked. The linguistic meaning of a word is proposed to arise from contrasts and significant differences with other words. Semantic features enable linguistics to explain how words that share certain features may be members of the same semantic domain.
  • Ontology-based features An ontology encompasses a representation, formal naming, and definition of the categories, properties, and relations between the concepts, data, and entities that substantiate one, many, or all domains. Ontology Feature Extraction is done by text to ontology extraction and we find the Ontology.
  • Unstructured texts
  • Ambiguity in English text
  • Multiple senses of a word
  • Multiple parts of speech – e.g., “like” can occur in PoS.
  • Lack of closed domain of text categories
  • Noisy texts

Result and discussion

Comparison study of various classification algorithms in machine learning used to find the better accuracy of algorithms. Here in figure 1,2,3,4,5,6 we compare text , Emotion ,Semantic & Ontology in the X-axis . The Accuracy range from 0 to 70. Each algorithm shows the accuracy level according to the data from Twitter. In Figure 7 the comparative study of all the 6 algorithm accuracy level is shown. In Figure 8 the algorithm study of accuracy + Feature is shown Figure: 1 Decision Tree Figure: 2 Naive Bayes Figure: 3 Neural Network Figure: 4 Logistic Regressions Figure: 5 Random Forest Figure: 6 SVM Figure: 7 Multi Classifiers CONCLUSION To develop a concept for Ontology-based Semantic Analyses, which is able to process real-time streaming data from Twitter and extract meaningful inferences. In order to accomplish the problem formulation, research framework was specified. The pre-processing approach plays a crucial role in getting quality data and help in data normalization An effective technique Kafka is used for disaster data collection and extraction of emotions from Twitter and stored in the tweepy library. The proposed framework focused on methods of analysis and visualization of user’s opinions that do not depend upon the assumption of normality and historical data. The classification of the dataset has been performed using machine learning techniques phases which provide better results in comparison to other approaches proposed by other authors. A surveillance plot which measures the percentage has been presented which clearly indicated that the proposed predictive mapping improves prediction performance. However, we tested their results with supervised machine-learning classifiers destined the text by applying algorithms such as Naive Bayes, Logistic Regression, Support Vector Machine (SVM), Random Forest, Neural Networks and Decision trees on the files. In conclusion, from the comparison that has adopted a hybrid approach for sentiment analysis we have learned that TextBlob and Wordnet use word sense disambiguation with greater accuracies and can be used further in predicting. The work carried out through this thesis can be heavily exploratory in nature. The framework presented in the thesis helps in gaining understanding early stages of events. 

Scope of future work

In the future, we can develop this work as follows 

  • We can take input data from multiple Social Networks for extraction.
  • We can use other machine learning algorithms for comparing the accuracy of the algorithm.
  • The data set may contain images, emoticons and etc.

 

image

We use cookies to give you the best experience possible. By continuing we’ll assume you board with our cookie policy.