Technical Session

Igniting Your Coding Engine – A Hands-on Workshop on Extracting and Working with Textual Data
Presenters: Khrystyna Bochkay, University of Miami, Roman Chychyla, University of Miami, Andrew J. Leone, Northwestern University
Date: Friday, January 11, 2019
Time: 8:00 am – 11:30 am (includes Continental breakfast)

In recent years, using textual data has gained an increased popularity among accounting researchers. To assist researchers in understanding and using textual data, this “hands-on” research workshop will focus on how to collect and process large volumes of data. The first part of the workshop will introduce different methods of scraping data from EDGAR (or similar data gathering systems). You will learn basics of programming in Python, including: 1) how to get a simple programming environment up and running; 2) how to write a simple program to collect and clean textual data; 3) how to write and use regular expressions; 4) where to find online resources and examples and much more. The second part of the workshop will provide you with the opportunity to work with textual data by focusing on specific methods and techniques. You will learn how to use advanced regular expressions to extract specific excerpts from filings and how to “quantify” collected data—get from countless pages of text to a single number representing your variable of interest. For example, you will learn dictionary-based methods of 1) measuring document sentiment, 2) identifying forward-looking sentences and risk disclosures, 3) collecting informative numbers in text, etc.

Overall, this research workshop will introduce you to a basic framework for coding textual data, which can be easily adapted to your future projects and needs.