NYU FRE 7871: Special Topics - News Analytics and Machine Learning

Overview •  Abstract •  Course Outcomes •  Prerequisites •  Schedule •  Supplementary material •  Homework •  Course project

Overview:

Course: News Analytics and Machine Learning (NYU FRE GY 7871 - I2)

Term: Fall 2019, first half

Instructor: Andrew Arnold (aoa216@nyu.edu)

Disclaimer: All views and opinions expressed by the instructor in this course are his own and do not reflect the views, opinions, or confidential information of any of his current or former employers.

Office hours: by appointment

GA/Grader/Tutor: TBD

Location: Rogers Hall, Room 216 (Brooklyn Campus)

Time: Tuesdays, 6:00 PM - 8:41 PM

Course style: Given the small class size, the course will be taught as a colloquium. New topics will be introduced in interactive lecture format, and then discussed and expanded by the group. These topics will then be built upon in the team projects, which will be further discussed and presented to the class. Active class attendance and participation is required.

Grading:

  • Attendance and participation*: 25%
  • Homework*: 10%
  • Midterm exam: 15%
  • Course project: 50% total
    • Project proposal: 15%
    • Midterm presentation: 35%
    • Final presentation: 50%
* Note about late registration: Since the class only meets seven times and the first homework is assigned on the first day of class, it may be difficult to make up for missed homework and attendance if you miss even the first day of class. Please let me know if you are considering joining the class late so we can discuss the implications.


Collaboration policy: As in the real world, collaboration is encouraged, but plagiarism is not. Transparency is the difference. If you collaborate (with other members of this class, other classes, colleagues, friends, random people on the internet) that is fine, just state so. If the contributions of authors for a particular work is uneven, just give a rough estimate of each author's contribution (e.g., A did most of the math, B did most of the programming, and C did the literature review). Feel free to use all publicly available resources on the internet, but please cite them if they are used as more than basic background research (both to give proper credit to the original author and to help your peers discover new resources). Since this is a special topics course, I tend to assume students are interested in learning the material and thus give the benefit of the doubt. If proper credit is not given, however, or if bad faith / dishonesty is shown, consequences can be severe, including failing the class and referral to the administration.

Abstract:

The fast-growing field of news analytics requires large databases, fast computation, and robust statistics. This course introduces the tools and techniques of analyzing news, how to quantify textual items based on, for example, positive or negative sentiment, relevance to each stock, and the amount of novelty in the content. Applications to trading strategies are discussed, including both absolute and relative return strategies, and risk management strategies. Students will be exposed to leading software in this space.

Students will benefit from some familiarity with basic probability, statistics and programming (python), and an interest in natural language processing (NLP) or computational linguistics. While the course will introduce a few trading strategies, it will also focus on NLP as a tool in its own right, applicable to domains outside of quantitative trading strategies.

There will be readings, discussion, homework, a midterm exam and a final project.

Course outcomes:

After this course you should be able to:

  • Build a basic trading strategy based on natural language signals:
    • Identify, locate and clean appropriate data sources.
    • Formulate a trading hypothesis based on natural language signals.
    • Investigate this hypothesis qualitatively and quantitatively, using statistical, programming, nlp and trading best practices.
    • Present the results of your investigation to your peers for feedback and analysis.

  • Read an academic paper / industry whitepaper about natural language techniques applied to trading and have a basic understanding of it.

  • Have a sense of where the state of the art is currently and where it might head in the near future. Know the difference between science fiction and reality.

  • Decide if you would like to pursue further research in this area.

Prerequisites:


  • Foundations of Financial Technology (FRE-GY 6153) or equivalent:
    • Basic knowledge of financial markets (What is a stock? How does it trade?)

  • Basic statistics (What is variance?)

  • Big Data in Finance (FRE-GY 7221) or equivalent:
    • Basic programming ability (Parse a csv file and calculate the variance of the values. Python/R/Matlab)

  • Test: Given enough time and access to the internet could you:
    • Determine the 10 largest US stocks by market capitalize as of 12/31/2018
    • Download the closing prices for these stocks for the last 5 Tuesdays of 2018
    • Calculate the variance of each stock during that period
    If so, you are qualified to take this course.

Schedule:

Supplementary material:

There are many excellent nlp courses taught around the world each year, most with lectures freely available on the internet. If there is a particular topic you would like more background on, or further topics we did not have time to explore in class, I encourage you to take advantage of these resources. As always, if you do reference this material in your work, please cite it.

Unfortunately, there are not as many publicly available resources on developing quantitative trading strategies. Nevertheless, there are still a (growing) number of excellent resources, including: Here are some publicly available datasets:

Homework:

  • HW 1 assigned (HW 1 data), due 6:00 pm (beginning of class) on Tuesday, September 10, 2019 via e-mail to the instructor.

Course project: