A recent study used our NLP platform to quantify the significance of NLP text mining in Finance which yielded significant results.

March 1, 2022

NLP Text Mining In Finance: Quantifying the Significance

White Paper


Quantifying the Significance of NLP in Finance

Learn how Amenity Analytics applies natural language processing in various industries and use cases. Our custom solutions showcase the flexibility and power of our data signals. Read our preview below and request the full whitepaper.


Quantifying the Significance of NLP in Finance

Learn how Amenity Analytics applies natural language processing in various industries and use cases. Our custom solutions showcase the flexibility and power of our data signals. Read our preview below and request the full whitepaper.

A recent study from QMA was featured in the Journal of Financial Data Science for the Volume 1, Issue 1, Winter 2019 publication: A Practical Approach to Advanced Text Mining in Finance. Theytested Amenity Analytics' NLP platform and text analytics methodology which scores earnings call transcripts based on sentiment extracted from financial events identified within the text.

The research detailed how the application of Amenity Analytics' NLP model to earnings call transcripts yielded three statistically significant results:

  • It produced a signal that is incrementally additive to earnings surprises and the short-term returns around the earnings announcement
  • The change in sentiment has a relatively low correlation with both earnings surprises and the short-term returns; in other words, the signal is a sufficiently different potential source of information
  • Using Amenity Analytics added 26 bps of alpha per quarter

These findings indicate how text analytics adds a meaningful layer of insight for investors that can't be obtained from structured data.

An Excerpt From This White Paper


For this analysis, we obtained conference call transcripts from Thomson Reuters for the period 2002– 2016. We restricted our sample to earnings conference calls of US companies that had preliminary earnings information in the Compustat Point-in-Time database and returns in the Center for Research in Security Prices (CRSP) database. For each conference call, we first calculated the earnings surprise (SUE) as the earnings per share (EPS) reported in the earnings release minus the EPS reported in the same quarter of the prior year,and minus the average same-quarter EPS differences in the prior eight quarters.* We scale this earnings surprise by the standard deviation of the same-quarter EPS differences during the prior eight quarters. We then rank all the earnings surprises during a calendar quarter into quintiles (0 through 4), divide by 4,and subtract 0.5. We use this transformed variable as an independent variablein quarterly regressions of the abnormal future return on various signals. Its coefficient is equivalent to the return on a hedge portfolio that has a long position in the top quintile (4, the largest positive earnings surprises) and a short position in the bottom quintile (0, the most negative earnings surprises).

We use two abnormal return windows in this study. The first is a short window around the earnings release date [−1, +1], where day 0 is the earnings release date (XRET_PRELIM). The second begins on day +2 through one day after the earnings announcement date of the subsequent quarter (XRET_DRIFT). We use XRET_PRELIM to complement the earnings surprise in case additional information is released in the preliminary earnings announcement. As we did for SUE, we rank XRET_PRELIM within a calendar quarter into quintiles, divide the rank by 4, and subtract 0.5. The longer return window is a standard definition of the drift return. We calculate abnormal return as the buy and-hold return on the stock minus the value-weighted buy-and-hold return on all stocks of the same size (three groups), book/market ratio (B/M; three groups), and 11-month momentum (three groups).

The initial analysis we performed on the conference call transcripts involved counting the number of positive words (POS) and negative words (NEG) according to Loughran and McDonald (2011). For each transcript, we calculated the word count tone as (POS − NEG)/ (POS + NEG). We then calculated the word-count tone change variable as the transcript tone minus the average tone of all available transcripts for this company in the prior 370 days (TONE_CH_L&M). Thus, the tone change was a number in the range of [−2, +2]. In the following, we provide evidence about the incremental contribution of TONE_CH_L&M to the drift return beyond the earnings surprise and the short-window return around the earnings announcement.

To assess the contribution of using Amenity’s software plus our rule writing, we began by writing additional rules on six areas of interest to us. One of these areas included operational issues discussed by management or analysts. For example,we identified any problems in distributing products, sourcing raw materials,labor strikes, and so on and created specific rules to identify such events under the heading of operational problems. We added approximately 500 rules to the roughly 3,600 rules that Amenity already had already written to capture events. Using our own weights for these rules, we obtained a new tone score for each transcript based on a weighted combination of sentiment scores and event scores. In addition, we compiled a list of euphemisms that management or analysts used on the conference call, such as headwinds, speed bumps, and hiccups, (Suslava 2016), and created specific rules to identify those. We added the euphemisms score to the combined sentiment and events score and calculated a total tone score as (POS − NEG)/(POS + NEG). As before, we focused on the tone change variable by subtracting the average tone of all available earnings transcripts in the prior 370 days (TONE_CH_AM).

*The subtraction of the average differences adjusts for cases in which earnings grow(or decline) by a constant amount each time period.

Access The Complete White Paper – Submit a Request Above

This communication does not represent investment advice. Transcript text provided by S&P Global Market Intelligence. Analysis provided in collaboration with Quantitative Management Associates.

Copyright ©2019 Amenity Analytics.

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset she

No black-box outputs. Get full visibility into every sentence analyzed.


Extract the data that matters. Our models cut through the noise and eliminate the false positives.


Gather insights out of a universe of data, from SEC filings and industry reports to chats and emails.


Amenity's NLP

Our Promise

We are committed to being a strategic partner that provides clients with a platform for discovering highly-accurate, valuable and actionable insights.

Get meaningful results, every time. Many of our models in production are capable of hitting near perfect levels of precision and recall.


Access, query, and navigate in minutes and seconds—not hours and days. Our speed also applies to deployment. We deliver in terms of weeks, not years.


AI and Data Science in Trading Conference 2019

Amenity Analytics exhibited upcoming features at the AI and Data Science and Trading Conference March 19-20 in NYC.

Attend this event
Amenity Analytics
Amenity Team

BattleFin Discovery Day 2019 New York: Visit Our Booth!

Amenity Analytics will be showcasing its latest NLP accomplishments with hedge funds and investment firms.

Attend this event
Amenity Analytics
Amenity Team

Eagle Alpha: Access and Alpha Conference

Amenity Analytics CEO Nathaniel Storch presented at Eagle Alpha's "Alternative Data: Access and Alpha" conference.

Attend this event
Nathaniel Storch
Co-founder & CEO

Amenity Analytics vs AlphaSense, Sentieo, and Yewno

Rules based approach
Portolio Scorecard
Neural Network in Progress
Full Transparency Into Insights (Not a Black Box Score)
Unique Dashboards & Lenses
Topic Search (Query Insights)
Machine Learning/Deep Learning Enabled
Document Viewer (Sentiment & Search)
Keyword Search
Link Back to News Source Within Platform
Link Back to Filings & Transcripts Within Platform
Reads Grammer & Context (Non "Bag of Words" NLP)
Private Company Search
Financial Data Extraction from Tables