Andrew O. Arnold

Principal Applied Machine Learning Engineer, Shopify
Adjunct Professor, New York University
~~Quantitative Portfolio Manager~~
Ph.D., Machine Learning, Carnegie Mellon University
About me • Professional • Teaching • Patents • Publications • Invited talks

About me

I am a hands-on technical and scientific expert with experience both as a senior individual contributor and org leader delivering complex, high impact applied machine learning research projects in the areas of science, technology, commerce and quantitative trading. Currently Principal Applied Machine Learning Engineer at Shopify, building machine learning based products to help make commerce better for everyone. Previously Chief Scientist at Oracle Alpha leading machine learning and natural language processing research and production for an emerging systematic fundamental hedge fund. I am also an Adjunct Professor in NYU's Department of Finance and Risk Engineering, lecturing on natural language processing and machine learning applied to quantitative trading and finance. I received my Ph.D. in Machine Learning from Carnegie Mellon University, and my BA in Computer Science and Artificial Intelligence from Columbia University.

With expertise in machine learning, natural language processing and quantitative trading, my personal research interests are in robust machine learning, developing models and features that are robust to:

extremely low signal to noise ratio and low sample size regimes
changes in the distribution of features and labels across train and test sets (transfer learning)
extracting features from unstructured data

I am particularly interested in applications of robust machine learning to time series and natural language processing models in financial and other domains.

Professional

Current:

Principal Applied Machine Learning Engineer at Shopify.
- Commerce Foundation Models
  NeurIPS 2024, Vancouver, BC, Canada (December 10, 2024). [Video and Slides]
Adjunct Professor at New York University lecturing on natural language processing and machine learning applied to quantitative trading and finance. (Press)
Academic descendant of Galileo, Newton, Leibniz, Euler, Bernoulli, Poisson, Laplace, Lagrange and Church (thanks to William Cohen) [1].

Former:

Chief Scientist at Oracle Alpha.
Senior Science Manager, AI Labs at Amazon Web Services (AWS), leading a team of 45+ applied research scientists and science managers, working closely with our product and engineering partners, to build novel AI-enabled cloud services driven by large scale machine learning and natural language understanding research, specifically in the areas of Large Language Models (LLM) and No-Code/Low-Code Application Development:

Amazon CodeWhisperer: Build apps faster with large language model (LLM) powered coding companion.

Multi-lingual Evaluation of Code Generation Models, International Conference on Learning Representations (ICLR) (notable: top 25%), 2023. [paper]
Exploring Continual Learning for Code Generation Models, Association for Computational Linguistics (ACL), 2023. [paper]
Multitask Pretraining with Structured Knowledge for Text-to-SQL Generation, Association for Computational Linguistics (ACL), 2023. [paper]
ContraCLM: Contrastive Learning For Causal Language Model, Association for Computational Linguistics (ACL), 2023. [paper]
ReCode: Robustness Evaluation of Code Generation Models, Association for Computational Linguistics (ACL), 2023. [paper]
A Static Evaluation of Code Completion by Large Language Models, Association for Computational Linguistics (ACL), 2023. [paper]
Greener yet Powerful: Taming Large Code Generation Models with Quantization, Foundations of Software Engineering (FSE), 2023. [paper]
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization, Association for Computational Linguistics (ACL), 2022. [paper, blog]
CoCoMIC: Code Completion By Jointly Modeling In-file and Cross-file Context, arXiv:2212.10007, 2023. [paper]
Constrained prefix matching for generating next token predictions. US patent application #20230418567 A1, 2023. [patent]
Programmatically generating evaluation data sets for code generation models. US patent application #20230418566 A1, 2023. [patent]
Random token segmentation for training next token prediction models. US patent application #20230419036 A1, 2023. [patent]
Validating and providing proactively generated code suggestions. US patent application #20230418565 A1, 2023. [patent]

Amazon Kendra: An intelligent search service powered by machine learning.
Contact Lens for Amazon Connect: Understanding the sentiment and trends of customer conversations in real-time.
Amazon Connect Voice ID: Real-time caller authentication using ML-powered voice analysis.
Amazon QuickSight Q: Natural language business intelligence querying.
Amazon Lookout for Metrics: Automatic time series anomaly detection and diagnosis.

Machine learning research at Google Research.
Area Chair at North American Chapter of the Association for Computational Linguistics (NAACL), 2021
Portfolio manager and research director at Cubist Systematic Strategies, applying machine learning to quantitative trading.
Cofounder, Partner and Portfolio Manager at Ophir Partners.
Chief technology officer at Trexquant.
Quantitative portfolio manager at WorldQuant.
Quantitative research intern at Merrill Lynch.
Machine learning research intern at Microsoft Research.
Machine learning research intern at IBM Research.
Software engineer at Bloomberg.

Teaching

News Analytics and Machine Learning (NYU FRE GY 7871, Fall 2022) (Fall 2021, Fall 2020, Fall 2019, Fall 2018)
This course introduces students to the topics of machine learning (ml) and natural language processing (nlp), in particular, as used to develop quantitative trading strategies. Students learn the mathematical fundamentals underlying many of the latest ml and nlp techniques (including deep neural networks, embeddings, and sentiment models), along with the basics of developing practical quantitative trading strategies based on these insights (such as quantifying the positive or negative sentiment of text, determining the relevance of text to particular stocks or classes of stocks, and the amount of novelty contained in textual content).

Patents

Constrained prefix matching for generating next token predictions. US patent application #20230418567 A1, 2023. Work done at Amazon Web Services. [patent]
Programmatically generating evaluation data sets for code generation models. US patent application #20230418566 A1, 2023. Work done at Amazon Web Services. [patent]
Random token segmentation for training next token prediction models. US patent application #20230419036 A1, 2023. Work done at Amazon Web Services. [patent]
Validating and providing proactively generated code suggestions. US patent application #20230418565 A1, 2023. Work done at Amazon Web Services. [patent]
Methods of unsupervised anomaly detection using a geometric framework. U.S. patent granted #8544087 B1, 2013. Work done at Columbia University. [patent]
Query-Dependent Ranking Using K-Nearest Neighbor. US patent application #20100169323 A1, 2008. Work done at Microsoft Research Asia. [patent]

Publications

Jing Wang, Jie Shen, Xiaofei Ma and Andrew O. Arnold
Uncertainty-based Active Learning for Reading Comprehension
Transactions on Machine Learning Research (TMLR), 2022. [paper, video, code]
Yuantong Li, Xiaokai Wei, Zijian Wang, Shen Wang, Parminder Bhatia, Xiaofei Ma and Andrew O. Arnold
Debiasing Neural Retrieval via In-batch Balancing Regularization
NAACL Workshop on Gender Bias in Natural Language Processing (NAACL:GeBNLP), 2022. [paper]
Zhihan Zhou, Dejiao Zhang, Wei Xiao, Nicholas Dingwall, Xiaofei Ma, Andrew O. Arnold and Bing Xiang
Learning Dialogue Representations from Consecutive Utterances
North American Chapter of the Association for Computational Linguistics (NAACL), 2022. [paper]
Xisen Jin, Dejiao Zhang, Henghui Zhu, Wei Xiao, Shang-Wen Li, Xiaokai Wei, Andrew O. Arnold and Xiang Ren
Lifelong Pretraining: Continually Adapting Language Models to Emerging Corpora
North American Chapter of the Association for Computational Linguistics (NAACL), 2022. [paper]
Danilo Neves Ribeiro, Shen Wang, Xiaofei Ma, Xiaokai Wei, Henghui Zhu, Rui Dong, Xinchi Chen, Peng Xu, Zhiheng Huang, Andrew O. Arnold and Dan Roth
Entailment Tree Explanations via Iterative Retrieval-Generation Reasoner
Findings of the North American Chapter of the Association for Computational Linguistics (NAACL), 2022. [paper]
Zheng Li, Zijian Wang, Ming Tan, Ramesh Nallapati, Parminder Bhatia, Andrew O. Arnold, Dan Roth and Bing Xiang
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization
Association for Computational Linguistics (ACL), 2022. [paper, blog]
Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold and Li Wan
Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022. [paper]
Shen Wang, Xiaokai Wei, Cicero Nogueira dos Santos, Zhiguo Wang, Ramesh Nallapati, Andrew O. Arnold and Philip S. Yu
Knowledge Graph Representation via Hierarchical Hyperbolic Neural Graph Embedding
IEEE International Conference on Big Data (BigData), 2021. [paper]
Dejiao Zhang, Wei Xiao, Henghui Zhu, Xiaofei Ma and Andrew O. Arnold
Virtual Augmentation Supported Contrastive Learning of Sentence Representations
Findings of the Association for Computational Linguistics (ACL), 2022. [paper, code]
Andy T. Liu, Wei Xiao, Henghui Zhu, Dejiao Zhang, Shang-Wen Li and Andrew O. Arnold
QaNER: Prompting Question Answering Models for Few-shot Named Entity Recognition
arXiv:2203.01543, 2022. [paper]
Xiaokai Wei, Shen Wang, Dejiao Zhang, Parminder Bhatia and Andrew O. Arnold
Knowledge Enhanced Pretrained Language Models: A Comprehensive Survey
arXiv:2110.08455, 2021. [paper]
Dejiao Zhang, Shang-Wen Li, Wei Xiao, Henghui Zhu, Ramesh Nallapati, Andrew O. Arnold and Bing Xiang
Pairwise Supervised Contrastive Learning of Sentence Representations
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2021. [paper, code]
Yifan Gao, Henghui Zhu, Patrick Ng, Cicero Nogueira dos Santos, Zhiguo Wang, Feng Nan, Dejiao Zhang, Ramesh Nallapati, Andrew O. Arnold and Bing Xiang
Answering Ambiguous Questions through Generative Evidence Fusion and Round-Trip Prediction
Association for Computational Linguistics (ACL), 2021. [paper]
Feng Nan, Cicero Nogueira dos Santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O. Arnold and Bing Xiang
Improving Factual Consistency of Abstractive Summarization via Question Answering
Association for Computational Linguistics (ACL), 2021. [paper]
Xiaofei Ma, Cicero Nogueira dos Santos and Andrew O. Arnold
Contrastive Fine-tuning Improves Robustness for Neural Rankers
Findings of the Association for Computational Linguistics (ACL), 2021. [paper]
Dejiao Zhang, Feng Nan, Xiaokai Wei, Shang-Wen Li, Henghui Zhu, Kathleen McKeown, Ramesh Nallapati, Andrew O. Arnold and Bing Xiang
Supporting Clustering with Contrastive Learning
North American Chapter of the Association for Computational Linguistics (NAACL), 2021. [paper, code]
Shen Wang, Xiaokai Wei, Cicero Nogueira dos Santos, Zhiguo Wang, Ramesh Nallapati, Andrew O. Arnold, Bing Xiang, Isabel F. Cruz and Philip S. Yu
Mixed-Curvature Multi-relational Graph Neural Network for Knowledge Graph Completion
The Web Conference (WWW), 2021. [paper]
Andrew O. Arnold and William W. Cohen
Instance-based Transfer Learning for Multilingual Deep Retrieval
The Web Conference Workshop on Multilingual Search (WWW), 2021. [paper]
Haitian Sun, Andrew O. Arnold, Tania Bedrax-Weiss, Fernando Pereira and William W. Cohen
Faithful Embeddings for Knowledge Base Queries
Neural Information Processing Systems (NeurIPS), 2020. [paper]
Cheng Tang and Andrew O. Arnold
Neural document expansion for ad-hoc information retrieval
arXiv:2012.14005, 2020. [paper]
Andrew O. Arnold
Exploiting Domain and Task Regularities for Robust Named Entity Recognition
Ph.D. Thesis, Carnegie Mellon University (CMU), 2009. [paper, slides, proposal, proposal slides]
Andrew O. Arnold and William W. Cohen
Information Extraction as Link Prediction: Using Curated Citation Networks to Improve Gene Detection
International AAAI Conference on Weblogs and Social Media (ICWSM), 2009. [paper, extended version, poster]
Amr Ahmed, Andrew O. Arnold, Luis Pedro Coelho, Joshua Kangas, Abdul-Saboor Sheikh, Eric Xing, William Cohen and Robert F. Murphy
Structured Literature Image Finder
ISMB BioLINK Special Interest Group (BioLINK), 2009. [paper]
Andrew O. Arnold and William W. Cohen
Intra-document Structural Frequency Features for Semi-supervised Domain Adaptation
Conference on Information and Knowledge Management (CIKM), 2008. [paper, slides]
Andrew O. Arnold, Ramesh Nallapati and William W. Cohen
Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition
Association for Computational Linguistics: Human Language Technologies (ACL:HLT), 2008. [paper, slides]
Xiubo Geng, Tie-Yan Liu, Tao Qin, Andrew O. Arnold, Hang Li and Harry Shum
Query Dependent Ranking Using K-Nearest Neighbor
Special Interest Group on Information Retrieval (SIGIR), 2008. [paper]
Andrew O. Arnold, Ramesh Nallapati and William W. Cohen
A Comparative Study of Methods for Transductive Transfer Learning
International Conference on Data Mining Workshop on Mining and Management of Biological Data (ICDM), 2007. [paper, Extended version, slides]
Andrew O. Arnold, Yan Liu and Naoki Abe
Temporal Causal Modeling with Graphical Granger Methods
International Conference on Knowledge Discovery and Data Mining (KDD), 2007. [paper, slides, video]
Andrew O. Arnold, Joseph E. Beck and Richard Scheines
Feature Discovery in the Context of Educational Data Mining: An Inductive Approach
AAAI Workshop on Educational Data Mining (AAAI), 2006. [paper]
Andrew O. Arnold, Richard Scheines, Joseph E. Beck and Bill Jerome
Time and Attention: Students, Sessions, and Tasks
AAAI Workshop on Educational Data Mining (AAAI), 2005. [paper]
Eleazar Eskin, Andrew O. Arnold, Michael Prerau, Leonid Portnoy and Salvatore Stolfo
A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data
Applications of Data Mining in Computer Security, 2002. [paper]
Kristinn R. Thorisson, Hrvoje Benko, Denis Abramov, Andrew O. Arnold, Sameer Maskey, and Aruchunan Vaseekaran
Constructionist Design Methodology for Interactive Intelligences
AI Magazine (AAAI), 2004. [paper, abstract, article, video]
Andrew O. Arnold and Andrew Howard
Reinforcement Learning in the Presence of Hidden States
Computer Science Department, Columbia University, 2002. [paper]

Invited Talks

Commerce Foundation Models
NeurIPS 2024, Vancouver, BC, Canada (December 10, 2024). [Video and Slides]
A Shallow Introduction to Deep AI for Finance
AI Disruption in Investment Management Conference. Wolfe Research, New York, NY (May 22, 2023).
ChatGPT, NLP, Predictive Analytics: Is Artificial Intelligence Finally Here?
Battle of the Quants, Panel with Li Deng (Vatic/Citadel). University Club, New York, NY (May 10, 2023).
Recruiting for the Next Generation of Quant
Carnegie Mellon University's Master of Science in Computational Finance (MSCF) 25th Anniversary Celebration. Pittsburgh, PA (March 21, 2020) [postponed].
Building the Optimal Data/ML Teams
AI & Data Science in Trading Conference. New York, NY (March 17-18, 2020) [postponed].
Transfer Learning for Machine Learning and NLP: Adapting Models to Changing Markets
AI & Data Science in Trading Conference. New York, NY (March 17-18, 2020) [postponed].
Navigating Data Challenges with the Latest ML/AI Applications
AI & Data Science in Trading Conference. New York, NY (March 17-18, 2020) [postponed].
AI in the Workplace
CMU NYC Tech & Entrepreneurship Panel, with Manuela Veloso, Tom Doris and Evan Schnidman. Liquidnet, New York, NY (September 26, 2019).
Machine Learning Developments and Applications to Quantitative Trading
AI & Data Science in Trading Conference. New York, NY (March 19-20, 2019) [invited].
Transfer Learning for Quantitative Trading
machineByte 2018: The Global Machine Learning in Quantitative Investment Management Forum. Half Moon Bay, CA (December 13, 2018) [invited].
Machine Learning and Trading
Career Speaker Series. Bendheim Center for Finance, Princeton University, Princeton, NJ (March 29, 2017).
Intra-document Structural Frequency Features for Semi-supervised Domain Adaptation
Association for Computing Machinery Conference on Information and Knowledge Management (CIKM), Napa, CA (October 29, 2008). [slides]
Exploiting Document Structure and Feature Hierarchy for Semi-supervised Domain Adaptation
Machine Learning Lunch. Carnegie Mellon University, Pittsburgh, PA (September 29, 2008). [slides, video]
Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition
46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL:HLT), Columbus, OH (June 16, 2008). [slides]
A Comparative Study of Methods for Transductive Transfer Learning
IEEE International Conference on Data Mining (ICDM) 2007Workshop on Mining and Management of Biological Data, Omaha, NE (October 28, 2007). [slides]
Temporal Causal Modeling with Graphical Granger Methods
Thirteenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, CA (August 13, 2007). [slides, video]
A Comparison of Methods for Transductive Transfer Learning
Information Retrieval and Mining Seminar. Microsoft Research Asia, Beijing, China (May 30, 2007). [slides]
Feature Discovery in the Context of Educational Data Mining: An Inductive Approach
IBM Mathematical Sciences Department Seminar. IBM Watson Research, Yorktown Heights, NY (July 6, 2006). [slides]
Causal Modeling for Anomaly Detection
IBM Mathematical Sciences Department 2006 Summer Student Seminar Series. IBM Watson Research, Yorktown Heights, NY (June 23, 2006). [slides]

Software

Network-based Information Extraction System (NIES), part of the The Querendipity Project.