Big Data Science in Finance examines the mathematics, theory, and practical use of the revolutionary techniques that are transforming the industry. Designed for mathematically-advanced students and discerning financial practitioners alike, this energizing book presents new, cutting-edge content based on world-class research taught in the leading Financial Mathematics and Engineering programs in the world. Marco Avellaneda, a leader in quantitative finance, and quantitative methodology author Irene Aldridge help readers harness the power of Big Data.
Financial technology has been advancing steadily through much of the last 100 years, and the last 50 or so years in particular. In the 1980s, for example, the problem of implementing technology in financial companies rested squarely with the prohibitively high cost of computers. Bloomberg and his peers helped usher in Fintech 1.0 by creating wide computer leasing networks that propelled data distribution, selected analytics, and more into trading rooms and research. The next break, Fintech 2.0, came in the 1990s: the Internet led the way in low-cost electronic trading, globalization of trading desks, a new frontier for data dissemination, and much more. Today, we find ourselves in the midst of Fintech 3.0: data and communications have been taken to the next level thanks to their pure volume and 5G connectivity, and Artificial Intelligence (AI) and Blockchain create meaningful advances in the way we do business.
To summarize, Fintech 3.0 spans the A, B, C, and D of modern finance:
- A: Artificial Intelligence (AI)
- B: Blockchain technology and its applications
- C: Connectivity, including 5G
- D: Data, including Alternative Data
Big Data Science in finance spans the A and the D of Fintech, while benefiting immensely from B and C.
The intersection of just these two areas, AI and Data, comprises the field of Big Data Science. When applied to finance, the field is brimming with possibilities. Unsupervised learning, for example, is capable of removing the researcher’s bias by eliminating the need to specify a hypothesis. As discussed in the classic book, How to Lie with Statistics (Huff  1991), in the traditional statistical or econometric analysis, the outcome of a statistical experiment is only as good as the question posed. In the traditional environment, the researcher forms a hypothesis, and the data say “yes” or “no” to the researcher’s ideas. The binary nature of the answer and the breadth of the researcher’s question may contain all sorts of biases the researcher has.
As shown in this book, unsupervised learning, on the other hand, is hypothesis-free. You read that correctly: in unsupervised learning, the data are asked to produce their key drivers themselves. Such factorization enables us to abstract human biases and distill the true data story.
As an example, consider the case of minority lending. It is no secret that most traditional statisticians and econometricians are white males, and possibly carry their race-and gender-specific biases with them throughout their analyses. For instance, when one looks at the now, sadly, classic problem of lending in predominantly black neighbor-hoods, traditional modelers may pose hypotheses like “Is it worth investing our money there?,” “Will the borrowers repay the loans?,” and other yes/no questions biased from inception. Unsupervised learning, when given a sizable sample of the population, will deliver, in contrast, a set of individual characteristics within the population that the data deem important to lending without yes/no arbitration or implicit assumptions.
What if the data inputs are biased? What if the inputs are collected in a way to intentionally dupe the machines into providing false outcomes? What if critical data are missing or, worse, erased? The answer to this question often lies in the data quantity. As this book shows, if your sample is large enough, in human terms, numbering in millions of data points, even missing or intentionally distorted data are cast off by the unsupervised learning techniques, revealing simple data relationships unencumbered by anyone’s opinion or influence.
While many rejoice in the knowledge of unbiased outcomes, some are understandably wary of the impact that artificial intelligence may have on jobs. Will AI replace humans? Is it capable of eliminating jobs? The answers to these questions may surprise. According to the Jevons paradox, when a new technology is convenient and simplifies daily tasks, its utilization does not replace jobs, but creates many new jobs instead, all utilizing this new invention. In finance, all previous Fintech innovations fit the bill: Bloomberg’s terminals paved the way for the era of quants trained to work on structured data; the Internet brought in millions of individual investors. Similarly, advances in AI and proliferation of all kinds of data will usher in a generation of new finance practitioners. This book is offering a guide to the techniques that will realize the promise of this technology.
- Why Big Data?
- Neural Networks in Finance
- Supervised Learning
- Modeling Human Behavior with Semi-Supervised Learning
- Letting the Data Speak with Unsupervised Learning
- Big Data Factor Models
- Data as a Signal versus Noise
- Applications: Unsupervised Learning in Option Pricing and Stochastic Modeling
- Data Clustering
Big Data Science in Finance By Irene Aldridge, Marco Avellaneda pdf