NLP Text Mining In Finance: Quantifying the Significance

A Practical Approach to Advanced Text Mining in Finance

A recent study from Quantitative Management Associates applied Amenity Analytics’ NLP platform to quantify the impact of using unstructured data analysis in Finance. To conduct their research the authors applied our text analytics methodology which scores earnings call transcripts based on sentiment extracted from financial events identified within the text.

The research detailed how the application of our NLP model to earnings call transcripts yielded three statistically significant results:

  1. 1. It produced a signal that is incrementally additive to earnings surprises and the short-term returns around the earnings announcement
  2. 2. The change in sentiment has a relatively low correlation with both earnings surprises and the short-term returns; in other words, the signal is a sufficiently different potential source of information
  3. 3. Using Amenity Analytics added 26 bps of alpha per quarter

These findings indicate how text analytics adds a meaningful layer of insight for investors that can’t be obtained from structured data.

An Excerpt From This White Paper


For this analysis, we obtained conference call transcripts from Thomson Reuters for the period 2002– 2016. We restricted our sample to earnings conference calls of US companies that had preliminary earnings information in the Compustat Point-in-Time database and returns in the Center for Research in Security Prices (CRSP) database. For each conference call, we first calculated the earnings surprise (SUE) as the earnings per share (EPS) reported in the earnings release minus the EPS reported in the same quarter of the prior year,and minus the average same-quarter EPS differences in the prior eight quarters.* We scale this earnings surprise by the standard deviation of the same-quarter EPS differences during the prior eight quarters. We then rank all the earnings surprises during a calendar quarter into quintiles (0 through 4), divide by 4,and subtract 0.5. We use this transformed variable as an independent variablein quarterly regressions of the abnormal future return on various signals. Its coefficient is equivalent to the return on a hedge portfolio that has a long position in the top quintile (4, the largest positive earnings surprises) and a short position in the bottom quintile (0, the most negative earnings surprises).

We use two abnormal return windows in this study. The first is a short window around the earnings release date [−1, +1], where day 0 is the earnings release date (XRET_PRELIM). The second begins on day +2 through one day after the earnings announcement date of the subsequent quarter (XRET_DRIFT). We use XRET_PRELIM to complement the earnings surprise in case additional information is released in the preliminary earnings announcement. As we did for SUE, we rank XRET_PRELIM within a calendar quarter into quintiles, divide the rank by 4, and subtract 0.5. The longer return window is a standard definition of the drift return. We calculate abnormal return as the buy and-hold return on the stock minus the value-weighted buy-and-hold return on all stocks of the same size (three groups), book/market ratio (B/M; three groups), and 11-month momentum (three groups).

The initial analysis we performed on the conference call transcripts involved counting the number of positive words (POS) and negative words (NEG) according to Loughran and McDonald (2011). For each transcript, we calculated the word count tone as (POS − NEG)/ (POS + NEG). We then calculated the word-count tone change variable as the transcript tone minus the average tone of all available transcripts for this company in the prior 370 days (TONE_CH_L&M). Thus, the tone change was a number in the range of [−2, +2]. In the following, we provide evidence about the incremental contribution of TONE_CH_L&M to the drift return beyond the earnings surprise and the short-window return around the earnings announcement.

To assess the contribution of using Amenity’s software plus our rule writing, we began by writing additional rules on six areas of interest to us. One of these areas included operational issues discussed by management or analysts. For example,we identified any problems in distributing products, sourcing raw materials,labor strikes, and so on and created specific rules to identify such events under the heading of operational problems. We added approximately 500 rules to the roughly 3,600 rules that Amenity already had already written to capture events. Using our own weights for these rules, we obtained a new tone score for each transcript based on a weighted combination of sentiment scores and event scores. In addition, we compiled a list of euphemisms that management or analysts used on the conference call, such as headwinds, speed bumps, and hiccups, (Suslava 2016), and created specific rules to identify those. We added the euphemisms score to the combined sentiment and events score and calculated a total tone score as (POS − NEG)/(POS + NEG). As before, we focused on the tone change variable by subtracting the average tone of all available earnings transcripts in the prior 370 days (TONE_CH_AM).

*The subtraction of the average differences adjusts for cases in which earnings grow(or decline) by a constant amount each time period.

Access The Full White Paper

To obtain a copy of this research from the Journal of Financial Data Science for the Volume 1, Issue 1, Winter 2019 publication: A Practical Approach to Advanced Text Mining in Finance, email:

This communication does not represent investment advice. Transcript text provided by S&P Global Market Intelligence. Analysis provided in collaboration with Quantitative Management Associates.

Copyright ©2019 Amenity Analytics. 

November 26, 2019

Deception Spotlight: Smucker Jammed on Revenue Growth

We highlight The J. M. Smucker Company (SJM) following the company’s 2Q20 earnings call on 22 November 2019. Despite beating EPS estimates, the underlying story remains unsettling. Focusing more on the longer-term, our analysis finds questions about revenue have triggered deceptive answers by management in earnings calls at an increasing rate, which may further erode confidence given the ongoing calls for top-line improvements. This sort of evidence may provide fodder for bear theses and be a honeypot for activists. We narrate the context for our concern and detail deceptive language detected by Amenity’s NLP models.
November 21, 2019

Amenity Portfolio Analytics: Warren Buffett's Portfolio at Berkshire Hathaway

We pilot a method for using Amenity's NLP solutions to analyze earnings calls for multiple companies and scoring their impact on portfolios of diverse, unevenly distributed holdings. We analyze Warren Buffett's holdings at Berkshire Hathaway as a test case and detail the Oracle of Omaha's Amenity Portfolio Score with attribution at the position level. We find that Berkshire Hathaway's portfolio score is significantly greater than the average company score in our universe (>12k companies, >10 years), which is a function of Buffett favoring companies with positive Amenity Scores.
October 30, 2019

Ahead of the Fed: S&P 500 Bank Barometer

We’re back on Fed Watch! With the Federal Reserve’s Open Market Committee expected to announce another rate decision at 2pm on 30 October 2019, markets have baked in more than a 90% likelihood of a 25 basis point cut. We've kept a close eye on regional and diversified banks as earnings season progresses to gauge how lower rates are impacting management commentary as well as their business and economic implications. We share our analysis context ahead of the Fed’s announcement and press conference.
October 24, 2019

Sizing-Up New CEOs at Nike & Under Armour

Under Armour and Nike announced recently that their current CEOs are stepping aside and handing over control to the next generation of apparel giant leaders. Both successors have extensive experience in the upper echelons of corporate management. And as a result, we have both incoming CEOs on the proverbial record. We use Amenity’s suite of NLP tools below to analyze and estimate the level of clarity we can expect from Patrik Frisk and John Donahoe when they take the reins.

Stay Informed: Join Our Newsletter

Keep up to date with our analyses and how we're making changes.