Much of the data that is critical to making informed business decisions — news, documents, reviews, and online conversations — is locked away in unstructured text. Recent advancements in text analytics mean that organizations can now access this intelligence to gain a competitive advantage.
Thematic sentiment analysis is the process of identifying and categorizing feelings, opinions, and attitudes found within text documents. AI advancements has led to major improvements in the quality of sentiment data and the ways in which this data can be applied. Organizations are using thematic sentiment to evaluate and assess everything from SEC filings and earnings call transcripts to corporate documents and internal communications to social media, news articles, and online forums.
Sentiment analysis, also known as "opinion mining," gauges the predominant opinion toward a subject of interest (commonly referred to as an“entity”) such as people, places, organizations, locations, and things. Opinions may be classified as positive, negative, or neutral.
Thematic analysis provides a structured, systematic approach to understanding sentiment. Viewing sentiment data grouped by thematic events — for example, commentary on a product launch or a company’s growth forecast — makes it possible to spot patterns, trends, and outliers, and track these findings over time.
Organizations today find themselves having to access seemingly endless streams of text data: internal and external reports, online posts, transcripts, reviews, emails, blogs, news articles, and so forth.
While there are valuable nuggets of information waiting to be revealed within these data sources, the process of sifting, sorting and analyzing all of this information is no small feat. Such an endeavor requires a significant outlay of time and money, not to mention the knowledge of a domain expert to interpret the data correctly.
So how do organizations negotiate these obstacles? By using next-gen AI solutions that can extract usable insights from any text data source at scale, while also displaying a sophisticated understanding of human language.
The first step in thematic sentiment analysis is to assess and apply a"sentiment score" to a negative or positive opinion. This score is created by one or more of the following approaches:
Human language is rich, compelling, and dynamic. These same qualities make language difficult to interpret through computational methods.
Consider homonyms, which have the same spelling and pronunciation, but multiple meanings. The word “break” can mean to separate into pieces (break a vase), to psychologically destroy (break a person), to exceed (break a record), to make an opening shot in a game of pool, a gap or opening (break in the ice), an interruption (break in the weather), an act of emerging (break of day), respite(coffee break), a favorable treatment (tax break), or an expression meaning to be irritated (give me a break) — among other things.
Context can also change the meanings of words. Most people would view "liability"as being something negative. But in the language of finance, liability can refer to a company’s debt and so be interpreted as neutral.
The complexity of human language is why sentiment analysis is hard to do right. Here are four challenges that sentiment analysis tools frequently struggle with.
Bag-of-Words (BOW) operates by first viewing all texts as unordered collections of words and then by analyzing their frequencies and distributions. This is the most commonly used approach to sentiment analysis.
But because the simplicity of this model, BOW has trouble matching the right sentiment to the right target entity. With the sentence below, BOW would incorrectly assign a positive sentiment to Company A and a negative one to Company B.
Words and phrases called "sentiment shifters" (also “opinion shifters” or “valence shifters”) can change the meaning of another word,phrase, or sentence. Words that negate, such as "not" and "cannot,” are the most common form of shifters. Verbs such as "would" "could" and "should” can also alter the orientation of an opinion.
Both of these statements would be scored as positive if you ignored the sentiment shifter “could.”
Ex. “the brakes are improved” vs. “the brakes could be improved”
In the following sentence, the Bag-of-Words model would view Company D as having an overall positive sentiment based strictly on the number of positive phrases. It has failed to take into account how “recall” has shifted the overall sentiment as negative towards Company D.
The Bag of Words model also struggles to accurately analyze sarcasm, often viewing sarcastically negative texts as positive.
Ex. What a great car. It did not start the first day.
Most people can recognize when a speaker’s body language or verbal signals don’t align with his or her message. Financial analysts take these cues into account when assessing the in-person communications from corporate leaders.
This task becomes more difficult when it’s a machine trying to interpret words on a page. An effective sentiment analysis tool would have to be sophisticated enough to pick up on speaking patterns that are evasive, hostile, deliberately unclear, uncertain or otherwise misleading.
Noise filtration is another core challenge for thematic sentiment analysis (and to text analysis in general). Only a portion of the data being gathered and analyzed is truly actionable or relevant. In order to parse mountains of unstructured data for these elements, a tool that employs thematic sentiment analysis should be able to minimize noise and false positives, while flagging text that truly represents a shift in perspective or rhetoric.
Most commercial tools that perform sentiment or thematic sentiment analysis are greatly limited in terms of accuracy, flexibility, and overall analysis.
Accuracy: The majority of sentiment analysis tools offer roughly 70 percent accuracy (or less) in terms of their analysis.
Flexibility: Sentiment analysis is not just about providing an overall document score. Many tools can’t target different sentiments associated with different entities within the same document.
Overall Analysis: When it comes to text analysis, context is key. How was the sentiment score calculated? Most tools do not provide you with the story behind the score—the relevant words, phrases and other elements that lead to real insight. If your text analysis is limited to just a score, you’re not getting the complete picture.
Amenity Analytics platforms use an integrated thematic sentiment engine that produces structured data sets with a higher degree of accuracy and relevancy. This is due to several factors.
First, it performs a full linguistic parsing on every analyzed document. Every word is reviewed and assigned information that relates to its base canonical form.
Amenity’s software can then identify the underlying meanings of linguistic patterns, including those associated with deception.
The proprietary NLP (natural language processing) analysis software adds context and metadata to each extraction, which increases recall and precision. This allows the engine to better filter noise, thereby surfacing the most relevant and actionable insights. Amenity’s software basically performs the role of a domain expert—but with greater speed and accuracy.
Now that we've covered the basics, let's look at how bank research analysts, asset managers, hedge funds, and other clients are using Amenity tools for thematic sentiment analysis on text data from external and internal documents.
A leading investment bank needed a solution for the monitoring of news and conversation involving trade wars, with a goal of flagging any tone changes exhibited by policy makers.
Amenity Analysis conducted a dynamic analysis of trade and tariff-focused news articles, statements, and social media conversations involving major players in politics and economics.
The challenge was significant, given the fragmented nature of social media and the signal-to-noise ratio. Because of its ability to filter noise, understand context and reduce false positives, our platform was able to identify and score critical text data associated with currencies, credit, commodities and equity in key world regions.
Amenity Analytics' NLP model can be used to identify deceptive language in earnings statements. Embedded within the product are language analysis techniques commonly used by CIA interrogators. These techniques can identify uncertainty, doubt, dishonesty, hostility, and evasion.
The solution is refined enough to flag clichéd language and rhetorical tricks such as detour statements. Sudden increases in such language during earnings calls is often an early indicator that problems are brewing. Amenity's solution can compare text patterns from prior quarters, identify deviations, and even highlight questions that trigger questionable responses.
Amenity Analytics can generate key insights from SEC filings (registration statements; 10K reports; 10Q reports; 8K Reports; Schedule 13D; proxy statements; form 3, 4 and 5; and so on).
Wading through these filings is labor intensive. Amenity automates this process. Our platform can identify new risk factors, new disclosures, and any other red flags.
A company’s performance is often impacted by environmental, social and governance (ESG) issues. Given this, ESG investing continues to gain traction within the financial world. Amenity has embedded processing models that can replicate the perspective of an ESG expert.
If an investor needs access to ESG data within a filing, a news article, or a call transcript, Amenity can create data sets with texts it has highlighted from an ESG perspective.
Business application of thematic sentiment analysis continues to gain traction, especially within the financial sector. It helps to understand what is really behind a sentiment score and whether that information will move the needle when it comes to strategy and decision making.