A recent study used our NLP platform to quantify the significance of NLP text mining in Finance which yielded significant results.

Quantitative Management Associates
February 7, 2019

NLP Text Mining In Finance: Quantifying the Significance

White Paper
NLP Text Mining In Finance: Quantifying the Significance

A Practical Approach to Advanced Text Mining in Finance

A recent study from Quantitative Management Associates applied Amenity Analytics’ NLP platform to quantify the impact of using unstructured data analysis in Finance. To conduct their research the authors applied our text analytics methodology which scores earnings call transcripts based on sentiment extracted from financial events identified within the text.

The research detailed how the application of our NLP model to earnings call transcripts yielded three statistically significant results:

  1. It produced a signal that is incrementally additive to earnings surprises and the short-term returns around the earnings announcement
  2. The change in sentiment has a relatively low correlation with both earnings surprises and the short-term returns; in other words, the signal is a sufficiently different potential source of information
  3. Using Amenity Analytics added 26 bps of alpha per quarter

These findings indicate how text analytics adds a meaningful layer of insight for investors that can’t be obtained from structured data.

An Excerpt From This White Paper


For this analysis, we obtained conference call transcripts from Thomson Reuters for the period 2002– 2016. We restricted our sample to earnings conference calls of US companies that had preliminary earnings information in the Compustat Point-in-Time database and returns in the Center for Research in Security Prices (CRSP) database. For each conference call, we first calculated the earnings surprise (SUE) as the earnings per share (EPS) reported in the earnings release minus the EPS reported in the same quarter of the prior year,and minus the average same-quarter EPS differences in the prior eight quarters.* We scale this earnings surprise by the standard deviation of the same-quarter EPS differences during the prior eight quarters. We then rank all the earnings surprises during a calendar quarter into quintiles (0 through 4), divide by 4,and subtract 0.5. We use this transformed variable as an independent variablein quarterly regressions of the abnormal future return on various signals. Its coefficient is equivalent to the return on a hedge portfolio that has a long position in the top quintile (4, the largest positive earnings surprises) and a short position in the bottom quintile (0, the most negative earnings surprises).

We use two abnormal return windows in this study. The first is a short window around the earnings release date [−1, +1], where day 0 is the earnings release date (XRET_PRELIM). The second begins on day +2 through one day after the earnings announcement date of the subsequent quarter (XRET_DRIFT). We use XRET_PRELIM to complement the earnings surprise in case additional information is released in the preliminary earnings announcement. As we did for SUE, we rank XRET_PRELIM within a calendar quarter into quintiles, divide the rank by 4, and subtract 0.5. The longer return window is a standard definition of the drift return. We calculate abnormal return as the buy and-hold return on the stock minus the value-weighted buy-and-hold return on all stocks of the same size (three groups), book/market ratio (B/M; three groups), and 11-month momentum (three groups).

The initial analysis we performed on the conference call transcripts involved counting the number of positive words (POS) and negative words (NEG) according to Loughran and McDonald (2011). For each transcript, we calculated the word count tone as (POS − NEG)/ (POS + NEG). We then calculated the word-count tone change variable as the transcript tone minus the average tone of all available transcripts for this company in the prior 370 days (TONE_CH_L&M). Thus, the tone change was a number in the range of [−2, +2]. In the following, we provide evidence about the incremental contribution of TONE_CH_L&M to the drift return beyond the earnings surprise and the short-window return around the earnings announcement.

To assess the contribution of using Amenity’s software plus our rule writing, we began by writing additional rules on six areas of interest to us. One of these areas included operational issues discussed by management or analysts. For example,we identified any problems in distributing products, sourcing raw materials,labor strikes, and so on and created specific rules to identify such events under the heading of operational problems. We added approximately 500 rules to the roughly 3,600 rules that Amenity already had already written to capture events. Using our own weights for these rules, we obtained a new tone score for each transcript based on a weighted combination of sentiment scores and event scores. In addition, we compiled a list of euphemisms that management or analysts used on the conference call, such as headwinds, speed bumps, and hiccups, (Suslava 2016), and created specific rules to identify those. We added the euphemisms score to the combined sentiment and events score and calculated a total tone score as (POS − NEG)/(POS + NEG). As before, we focused on the tone change variable by subtracting the average tone of all available earnings transcripts in the prior 370 days (TONE_CH_AM).

*The subtraction of the average differences adjusts for cases in which earnings grow(or decline) by a constant amount each time period.

Access The Full White Paper

Journal of Financial Data Science for the Volume 1, Issue 1, Winter 2019 publication: A Practical Approach to Advanced Text Mining in Finance

This communication does not represent investment advice. Transcript text provided by S&P Global Market Intelligence. Analysis provided in collaboration with Quantitative Management Associates.

Copyright ©2019 Amenity Analytics.