NLP Text Mining In Finance: Quantifying the Significance

A Practical Approach to Advanced Text Mining in Finance

A recent study from Quantitative Management Associates applied Amenity Analytics’ NLP platform to quantify the impact of using unstructured data analysis in Finance. To conduct their research the authors applied our text analytics methodology which scores earnings call transcripts based on sentiment extracted from financial events identified within the text.

The research detailed how the application of our NLP model to earnings call transcripts yielded three statistically significant results:

  1. 1. It produced a signal that is incrementally additive to earnings surprises and the short-term returns around the earnings announcement
  2. 2. The change in sentiment has a relatively low correlation with both earnings surprises and the short-term returns; in other words, the signal is a sufficiently different potential source of information
  3. 3. Using Amenity Analytics added 26 bps of alpha per quarter

These findings indicate how text analytics adds a meaningful layer of insight for investors that can’t be obtained from structured data.

An Excerpt From This White Paper

Analysis

For this analysis, we obtained conference call transcripts from Thomson Reuters for the period 2002– 2016. We restricted our sample to earnings conference calls of US companies that had preliminary earnings information in the Compustat Point-in-Time database and returns in the Center for Research in Security Prices (CRSP) database. For each conference call, we first calculated the earnings surprise (SUE) as the earnings per share (EPS) reported in the earnings release minus the EPS reported in the same quarter of the prior year,and minus the average same-quarter EPS differences in the prior eight quarters.* We scale this earnings surprise by the standard deviation of the same-quarter EPS differences during the prior eight quarters. We then rank all the earnings surprises during a calendar quarter into quintiles (0 through 4), divide by 4,and subtract 0.5. We use this transformed variable as an independent variablein quarterly regressions of the abnormal future return on various signals. Its coefficient is equivalent to the return on a hedge portfolio that has a long position in the top quintile (4, the largest positive earnings surprises) and a short position in the bottom quintile (0, the most negative earnings surprises).

We use two abnormal return windows in this study. The first is a short window around the earnings release date [−1, +1], where day 0 is the earnings release date (XRET_PRELIM). The second begins on day +2 through one day after the earnings announcement date of the subsequent quarter (XRET_DRIFT). We use XRET_PRELIM to complement the earnings surprise in case additional information is released in the preliminary earnings announcement. As we did for SUE, we rank XRET_PRELIM within a calendar quarter into quintiles, divide the rank by 4, and subtract 0.5. The longer return window is a standard definition of the drift return. We calculate abnormal return as the buy and-hold return on the stock minus the value-weighted buy-and-hold return on all stocks of the same size (three groups), book/market ratio (B/M; three groups), and 11-month momentum (three groups).

The initial analysis we performed on the conference call transcripts involved counting the number of positive words (POS) and negative words (NEG) according to Loughran and McDonald (2011). For each transcript, we calculated the word count tone as (POS − NEG)/ (POS + NEG). We then calculated the word-count tone change variable as the transcript tone minus the average tone of all available transcripts for this company in the prior 370 days (TONE_CH_L&M). Thus, the tone change was a number in the range of [−2, +2]. In the following, we provide evidence about the incremental contribution of TONE_CH_L&M to the drift return beyond the earnings surprise and the short-window return around the earnings announcement.

To assess the contribution of using Amenity’s software plus our rule writing, we began by writing additional rules on six areas of interest to us. One of these areas included operational issues discussed by management or analysts. For example,we identified any problems in distributing products, sourcing raw materials,labor strikes, and so on and created specific rules to identify such events under the heading of operational problems. We added approximately 500 rules to the roughly 3,600 rules that Amenity already had already written to capture events. Using our own weights for these rules, we obtained a new tone score for each transcript based on a weighted combination of sentiment scores and event scores. In addition, we compiled a list of euphemisms that management or analysts used on the conference call, such as headwinds, speed bumps, and hiccups, (Suslava 2016), and created specific rules to identify those. We added the euphemisms score to the combined sentiment and events score and calculated a total tone score as (POS − NEG)/(POS + NEG). As before, we focused on the tone change variable by subtracting the average tone of all available earnings transcripts in the prior 370 days (TONE_CH_AM).

*The subtraction of the average differences adjusts for cases in which earnings grow(or decline) by a constant amount each time period.

Access The Full White Paper

To obtain a copy of this research from the Journal of Financial Data Science for the Volume 1, Issue 1, Winter 2019 publication: A Practical Approach to Advanced Text Mining in Finance, email: david@amenityanalytics.com

Please note: This in no way represents investment advice. All transcript text provided by S&P Global Market Intelligence. Analysis provided in collaboration with Quantitative Management Associates.

Copyright ©2019 Amenity Analytics. 

October 16, 2019

Earnings Preview: Netflix Shed First Blood in Streaming Wars — What's Next?

Ahead of Netflix's 3Q19 earnings we took a look with the Amenity NLP toolkit to paint an objective picture of the company’s storyboard and found a drop off in positivity around key financial commentary and elevated deception. In their 3Q19 earnings call later today, we expect eyes to be fixated on screens when we get another commercial-free report from the company.
October 15, 2019

3Q19 Earnings: Financials Up First, Margins Out Front

Earnings kick off this week with Financials posting an early first round of results. In a lower interest rate environment, we’re conscious that some banks may be feeling the squeeze on net interest margins. To baseline analyst and investor expectations, we used Amenity’s NLP tools to examine margin-related commentary from the last round of earnings calls for each of the Financials reporting this week.
October 14, 2019

3Q19 Earnings Preview: Margins in Focus, Trick or Treat?

Third quarter earnings kick off this week with expectations of a single-digit decline in S&P 500 margins after slight contractions over the last two quarters. To set the stage for a busy earnings season, we used Amenity’s NLP models and text analytics tools to look closely at how public companies spoke about margins on earnings calls last quarter.
September 18, 2019

Sentiment Analysis: Updated Regional Bank Uncertainty Ahead of Fed Rate Decision

We follow up on our regional banks white paper, applying our NLP platform to the full set of earnings calls from regional banks this quarter to explore the state of uncertainty before an expected rate cut on September 18th. Our rationale for close scrutiny of regional bank earnings calls holds true since our last publication as we find deceptive commentary indicating there may be underlying uncertainty regarding headwinds to net interest margins.

Stay Informed: Join Our Newsletter

Keep up to date with our analyses and how we're making changes.