Deep probabilistic programming (DPP) combines three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming. Many applications in trading and risk management involve uncertainty quantification such as portfolio optimization and risk estimation. DPP provides a general purpose Bayesian framework for fitting high dimensional models to large datasets. Wielded correctly, it provides a richer set of statistical properties and more explanatory power over deterministic models such as deep learning. Furthermore, increasing low-cost compute power made available through many-core processors, can be readily exploited by open-source DPP packages such as Edward. This short positional article provides a brief and informal background on deep probabilistic programming and discusses its implications for trading, risk management and financial stability modelling.
Deep probabilistic programming (DPP) combines three fields: Bayesian statistics and machine learning, deep learning (DL), and probabilistic programming. Although in its infancy, DPP is a powerful combination of several different probabilistic modelling approaches and inference techniques that have historically been treated as separate mathematical ideas in the financial modelling literature. Modelling approaches includes Bayesian neural networks (BNNs, Bayesian NNs), directed graphical models and the broad availability of a wide range of techniques for efficient and robust model training. DPP can be regarded as a compositional modelling framework, composing previously disparate modelling concepts and algorithms over a scalable computational graph implementation. Just as TensorFlow's (Abadi et al., 2016) computational graphs have made DL synonymous with modelling high dimensional input spaces with very large data sets, so too is Edward (Tran et al., 2017) poised to make DPP a tool of choice alongside STAN (Carpenter et al., 2017) and PyMC3 (Salvatier, Wiecki, & Fonnesbeck, 2016b) for Bayesian modelling and inference of computationally intractable problems.
1.1 Gaussian Processes
In order to explore some of the key capabilities of DPP for financial modelling and uncertainty quantification, we shall first revisit more familiar territory - Gaussian processes (GPs). The idea of GPs is to, without parameterizing F(X), place a prior P [F (x)] directly on the space of functions (MacKay, 1998). Gaussian processes can hence be viewed as a generalization of Gaussian distributions from finite dimensional vector spaces to infinite dimensional function spaces. The main advantage of using GPs is modelling co-movements of time series. GPs learn the relationship between the time series directly building up an empirical model of the co-movements rather than assuming linear correlations between stochastic processes.
1.2 Bayesian Neural Networks
Gaussian processes have been shown to be a limiting case of Bayesian Neural networks (BNNs). These offer a probabilistic interpretation of neural network models by inferring distributions over the models' weights. To model uncertainty, BNNs assume that model parameters (weights and biases) are random variables. A prior distribution is placed over the weights and biases, which induces a distribution over a parametric set of functions. As the number of weights goes to infinity (Neal, 2012; Williams, 1997) show that the Bayesian neural networks converge to GPs when a standard matrix of Gaussian prior distributions is placed over the weights and point estimates used for the priors. Of course, there is no reason to restrict modelling assumptions to these limiting conditions and, in fact, one of the main benefits of either approach is the flexibility to combine different distributions and include interaction effects.
2 Deep Learning
DL has shown remarkable success in a wide field of applications, including Artificial Intelligence (AI) (DeepMind, 2016; Kubota, 2017; Esteva et al., 2017), image processing (Simonyan & Zisserman, 2014), learning in games (DeepMind, 2017), neuroscience (Poggio, 2016), energy conservation (DeepMind, 2016) and skin cancer diagnostics (Kubota, 2017; Esteva et al., 2017). Current advances in hardware (GPU/TPU) and software (Theano/TensorFLow) allow for developing scalable deep learners to find a deterministic approximation to an unknown nonlinear function. However, in many trading and risk management applications, it is necessary to model uncertainty of a predictive model. Deep learners are typically used to build deterministic approximations to an unknown nonlinear function and are largely unusable when uncertainty needs to be explicitly represented. And so deep probabilistic programming builds on deep learners by providing inferential techniques to estimate the posterior distribution and hence characterize the uncertainty in a model estimate.
2.1 Bayesian Computations
As a Bayesian approach, the choice of architecture of a deep network, combined with regularization, can be viewed as defining a prior probability distribution over non-linear functions and deep learning can be viewed as finding the posterior probability distribution of the unknown function over the network weights (Gal, 2016). In general, the posterior distribution is analytically intractable and Bayesian computational methods are required to approximate it. One common class of solution approaches, referred to as variational inference, solves an optimization problem by minimizing the Kullback-Leiber (KL) distance between an approximating variational distribution and the posterior obtained from the original model. Bayes' filtering techniques, such as Kalman filters, HMMs and particle filters, are examples of variational inference techniques common place in trading and risk management, albeit for low dimensional input spaces. The other well-known class of posterior approximation techniques use sampling techniques, rather than variational techniques - a prime example being the use of MCMC simulation. Neither of these inference engines, however, scale well with respect to model and data size.
Recent advances in variational inference techniques and software can now represent probabilistic models as a computational graph (Blundell, Cornebise, Kavukcuoglu, & Wierstra, 2015; Tran et al., 2017; Salvatier, Wiecki, & Fonnesbeck, 2016a). The key benefit here is the ability to build scalable probabilistic deep learners, without having to perform testing (forward propagation) or inference (gradient-based optimization, with back propagation and automatic differentiation) from scratch. Alternatives to variational and MCMC algorithms were recently proposed by (Gal, 2016) and build on efficient Dropout regularization techniques, a variable selection approach which has fuelled the popularity of DL.
The combination of Edward and TensorFlow is the first time that we have seen large-scale Bayesian modelling fused with AI and HPC and that's likely to spawn critical developments in applications where uncertainty and model risk play a central role, such as finance. The remainder of this article briefly discusses some areas which are likely suitable financial modelling applications for DPP.
3 Financial Applications
Oftentimes, an automated decision to trade or a price forecast may be shrouded in uncertainty arising from noisy data or model risk, either through incorrect model assumptions or parameter error. This uncertainty should be accounted for in predictions so that the model output can be more clearly interpreted. DPP thrives in high dimensional input spaces. In trading and risk management applications, high dimensionality may arise naturally where there is spatial structure in the data. Examples include deep portfolios (Heaton, Polson, & Witte, 2017), large scale loan modelling (Sirignano, Tsoukalas, & Giesecke, 2016), predictors for prices using the many depth levels in the limit order book (Dixon, Polson, & Sokolov, 2017; Dixon, 2017; Sirignano, 2016), predictors arising from combining several data sets as in mortgage risk (Sirignano, Sadhwani, & Giesecke, 2016) or consumer lending (Dixon, Giesecke, Sheshardi, & Troha, 2017), or from text mining and natural language processing of news and other documents (Glasserman & Mamaysky, 2015).
Portfolio Optimization and Market Risk
For some of these applications, we need look no further than existing applications of Gaussian processes to finance. (da Barrosa, Salles, & de Oliveira Ribeiro, 2016) present a spatio-temporal GP method for optimizing financial asset portfolios which allows for approximating the risk surface. (Cousin, Maatouk, & Rulliere, 2016) propose a new term-structure interpolation method that extends classical spline techniques by quantifying uncertainty. Other examples of GPs include spatial meta-modelling for Expected Shortfall through nested simulation (Liu & Staum, 2012). Spatio-temporal GPs are used to infer portfolio values in a scenario based on inner-level simulation of nearby scenarios. This significantly reduces the required computational effort by avoiding inner-level simulation in every scenario and takes account of the variance that arises from inner-level simulation. DPP could improve these existing applications through Bayesian neural networks and robust regularization techniques for generalizing modelling and improving robustness and scalability.
Contagion and Credit Risk
Capturing dependencies between financial institutions is a significant aspect of contagion and counterparty credit risk modelling. Examples in contagion modelling include the inter-dependency between banks participating in Federal Reserve System's emergency programs (Battiston, Puliga, Kaushik, Tasca, & Caldarelli, 2012), joint dependencies of banks and other institutional investors through transactions in the repurchase agreement market or through various other channels of systemic risk (Bisias, Flood, Lo, & Valavanis, 2012). Other examples in counterparty credit risk modelling include wrong way risk assessment of OTC portfolios. It is well known that the dependence structure between variables can be explicitly model with a directed graph. Given this capability, a promising research direction is how to use probabilistic graphical programming to build Bayesian network copulas (Elidan, 2010) for contagion and credit risk?
Deep probabilistic programming (DPP) combines three fields: Bayesian statistics and machine learning, deep learning, and probabilistic programming. Many applications in trading and risk management involve uncertainty quantification, such as portfolio optimization and risk estimation. DPP provides a general purpose Bayesian framework for fitting high dimensional models to large datasets to provide a richer set of statistical properties and more explanatory power.
This article provides some background on deep probabilistic programming and discusses its implications for trading and risk management. In particular, we see merit in exploring high dimensional modelling problems arising in large-scale portfolios, systemic risk and contagion in financial networks and where spatial structure is present in the application, such as limit order book modelling. Tools such as Edward lower the barrier to entry and add salt to the fire ignited by TensorFlow. The barrier to entry is deceptively simple however and some discipline is needed. The programming convenience of Keras for TensorFlow is alluring, but the ability to build effective and interpretable financial models is still in its infancy and should be investigated more fully by cross-disciplinary researchers in computational science, statistics and finance, before any definitive and practical conclusions can be drawn.
The author would like to thank Keiran Thompson, Stanford University, and Saeed Amen, Cuemacro, for their useful feedback.
Matthew Dixon is a co-founder of the Thalesians and an Assistant Professor of Finance at the Illinois Institute of Technology, Chicago. Matthew will be speaking at two sessions at the Global Derivatives USA event, on machine learning and the impacts on trading and risk management.