This article demonstrates the ease of text mining using Repustate’s sentiment analysis platform and highlights the technology behind it. For this instance, I decided to choose Repustate’s Turkish sentiment analyzer capability and do an analysis of a well-known online flower shop in Turkey.
Saying It With Flowers
It took a pandemic for us to realize the things we take for granted. A simple coffee in a coffee shop, a movie with pals, a haircut, a dentist appointment – all of this became extraordinary in the strange times that Covid19 brought. Given all this, there was a New York Times article I saw sometime back, which said that people had started using flower services more than ever during Covid19, even as the industry was struggling because of the restrictions on movement, shortage of resources, and canceled events. This was in contrast to an article I’d read back in 2017, about how an ad campaign run by Ogilvy & Mather showed that the flower industry was struggling, albeit because of their own complacency and unwillingness to move ahead with the times.
I was intrigued with this development and thought why not do sentiment analysis on flower shops through social media listening and see what consumers are actually feeling and thinking about the whole thing. To put a twist, because Repustate IQ analyzes 23 languages natively, I decided to do it in Turkish. After all, Turkey’s a lovely country, diverse and heterogeneous, and with a people known for their hospitality and generosity.
How Is Sentiment Mining Of Turkish Text Done Using Repustate IQ?
Here’s what I did. I chose a very popular online flower shop in Turkey and gathered 10,000 of their reviews. I then uploaded the reviews onto Repustate IQ to begin analyzing sentiment in Turkish text.
Using Turkish natural language processing to analyze sentiment in Turkish text, Repustate IQ analyzed all the data for sentiment and semantic analysis in real-time and uploaded the information on its dashboard. You can immediately see that the overall sentiment of the flower shop (shown in green) is very positive.
A deep dive into aspect-based sentiment shows us which aspects of the service people liked the most and which were most likely the driving force behind the company’s success. Delivery, appearance, and even an emotion like gratitude, were found to be the most commonly mentioned aspects in these reviews.
Repustate IQ’s sentiment analysis dashboard also shows you the emotions people are expressing and what aspects occur with which emotion. All of this analysis is done natively in Turkish, without translations. The video also shows the named identities mentioned, all of which are in Turkish.
Major Problems In Turkish Text Sentiment Mining
Back to text mining in Turkish. Well, Turkish is very different from English. Turkish lexicon does not have grammatical gender and no order in the sentence structure regarding the subject, object, or verb. Another very interesting aspect of the Turkish language is that it’s heavily influenced by vast and varied languages including Romance languages (languages derived from Latin between the 3rd and 8th centuries such as French and Italian), as well as German, Persian, Arabic, and most recently, English. This makes it difficult for AI and natural language processing models to analyze sentiment in Turkish text because it’s a living language that is constantly evolving. A Turkish sentiment analyzer platform needs to be constantly up to date to be relevant and give accurate results, or else it will become redundant.
Added to this, Turkish has numerous productive derivational morphemes, which means that it has an infinite word lexicon because of which fixed size tag/feature encoding schemes do not work in NLP tasks. There are many more challenges that data scientists face including derivative and syntactic relations in Turkish word structure and the different inflectional groups of a word that can be involved in different syntactic relations. But all of that for a technical blog, perhaps later.
How Do We Do Turkish Sentiment Analysis Without Any Translations?
This is a 6-step process in which machine learning and Turkish natural language processing tasks are combined to derive sentiment from Turkish text.
- Step 1: Collect a massive corpus of manually tagged data
Collect a massive corpus of manually tagged dataFirst we collect a massive and diverse Turkish corpus and clean it up.
- Step 2: Create a Turkish part-of-speech tagger
Create a Turkish part-of-speech taggerNow we create a part-of-speech tagger for the language. Each Turkish word is classified so it can be identified for conjunctions, subordinate clauses, prepositional, and noun phrases. This is what separates Repustate IQ from other platforms in the market that claim to be multilingual but in reality, use English translations instead of native language speech taggers under the hood to analyze a non-English language.
- Step 3: Build lemmatization
Build lemmatizationWe apply the rules of conjugating nouns and verbs based on gender, tense, etc. in the model through lemmatization. This helps the model determine the root of a word while mining in Turkish text.
- Step 4: Build prior polarity
Build prior polarityIn this step, we determine the positive and negative context of the word, where Excellent could be +1, good +0.5, and poor as -0.5. Ultimately, this is what helps give each review a sentiment score. Eventually, the sentiment analysis API will give the aggregate score of the brand we are analyzing in the range of 0 to 100, 100 being the highest positive score.
- Step 5: Determine grammatical constructs
Determine grammatical constructsGrammatical constructs are determined so that the Turkish sentiment analyzer can read the Turkish text natively. Different languages have different rules when it comes to negations and amplifiers, so this step is key.
Also Read: News Monitoring Services