
John Hawver
Hacking the Fed - A Machine Learning Exercise
This week the Federal Open Market Committee (the Fed) meets and makes yet another rate decision. The market fully expects the Fed to take a dovish stance. Need evidence? Look no further than the CME Fedwatch which publishes probabilities for rate cuts based upon the futures markets. By the end of the year, the market thinks the target rate is between 1.5 and 1.75 percent.
It seems the trade war(s) have powerfully colored the markets future expectations. That’s called Recency Bias (link). Traders specialize in this skill, taking the most recent news and economic information (see my Noize Trading post) and extrapolated it far into the future.
The expectation that the Fed will be dovish for the rest of the year has certainly buoyed the stock market (see this post). But, should we really want the Fed to cut rates?
I pulled the data (code below). And looked at the forward equity returns after each Fed decision since 2003. Like the market, when the Fed moves rates up, it goes slow and steady. When they cut, they move less often and more aggressively. Looking at the returns, especially the 60-day forward numbers, it looks like we really should be rooting for the Fed to have cause to raise, NOT lower, rates.

Looking at this chart, I started wondering, can the decisions of these 12 board members be forecasted? Amazon and Google seem to be able to guess my every next move on my phone; people seem pretty predictable. Why should these 12 be any different? Plus, they have actually *told* us exactly what factors bear on their decisions in their meeting minutes, interviews, books, and academic papers. And they have the explicit goals of “price stability and sustainable economic growth.” The Fed is just begging to be hacked.
So let’s get to it. After a reasonable review of the above sources, I boiled down the Fed’s decision making into 8 different features: Previous 3 months of SP500 returns, GDP deflator, yield curve level, changes in the unemployment rate, change in non-farm payrolls, change in CPI, and the change in industrial production.
Some of these features are correlated. And the relationship between these features and the Fed decisions certainly do not have to be linear. Given this, I decided to use a machine learning technique called Random Forest. Here’s a link that provides a good explanation. In short, it is a collection of decision trees and decision trees are actually a bit simpler to understand than a standard regression. All in, we’re using a technique a bit fancier than a multi-factor regression.
Let’s not get too bogged down in the gory statistical details. In the code below, which is replicable in R with minimal work, you can follow my approach. We have a small set of data, 61 Fed decisions. I calculated all of the 8 features above, then linked them to the decision dates, and subsetted the data set into one for training the model and one for testing it, standard practice. The training results look pretty decent:
R2 F1 Ac AUC
0.839 0.8571 0.75 0.8
Most will recognize the R2 metric. At 84%, that’s not bad. Out of sample, with a test set of only 12 data points, the R2 holds up, but the accuracy falls. Not unexpected. For a data set this small, we’ll have to roll with it. The results are in the next plot.


The first plot above is how the predicted rate changes line up with the actual. Again, not too bad. On the second, we can see how important each factor is to the Fed’s thinking. Apparently, the stock market, employment, GDP, and the yield curve have the most bearing when the Fed meets.
Here’s the kicker. We’ve built this model, so what does it say the Fed will do this week?
+20 basis points.
So, Mr. Market’s dovish optimism may be wishful thinking. I’m not an economist, nor am I a “strategist” at an investment bank with a sales story to spin, so take my model’s prediction with a healthy dose of salt. And note that most of the model’s features will not incorporate the effects of the trade war just yet. But, people are predictable, and the Fed, for all its mystery, is just 12 people.
Time for me to go click on that ad for a new hockey stick…
Good luck this week!
John
##### Code to Replicate #####
# setup
library('httr')
library('XML')
library('quantmod')
library('Quandl'); Quandl.api_key("<your key>")
library('randomForest')
library('randomForestExplainer')
library('pROC')
# helper functions
fGetHTMLTable <- function(url) { try(readHTMLTable(rawToChar(GET(url)$content), stringsAsFactors = F), silent = T) }
fLead <- function(vec, leadN) { c(vec[(leadN + 1):(length(vec)) ], rep(NA, leadN)) } # pulls data from t+h to t
fLag <- function(vec, lagN) { c(rep(NA, lagN), vec[1:(length(vec) - lagN)]) } # pushes data from t to t+h
fFitStatsSimple <- function(target, prediction, round_digits = 4, print_cm = F) {
fConvert2Direction <- function(in_vector) {
direction <- as.numeric(as.numeric(as.vector(in_vector)) > 0)
direction[direction == 1] <- 'Up'; direction[direction == 0] <- 'Down'
direction
}
fAccuracy <- function(CM) { sum(diag(CM))/sum(CM) }
fRecall <- function(CM) { sum(CM[1,1]) / sum(CM[,1]) }
fF1Score <- function(CM) { 2 / ((1/fRecall(CM)) + (1/fAccuracy(CM))) }
cm <- table(fConvert2Direction(target), fConvert2Direction(prediction))
temp <- na.omit(cbind(target, prediction))
R2 <- as.numeric(cor(temp[, 1], temp[, 2])^2)
F1 <- fF1Score(cm)
Acc <- fAccuracy(cm)
AUC <- auc(roc(as.numeric(target>0), as.numeric(prediction>0)))
df <- data.frame(R2 = round(R2, round_digits),
F1 = round(F1, round_digits),
Ac = round(Acc, round_digits),
AUC = round(AUC, round_digits))
if (print_cm) { print(cm); cat('\n'); print(df) }
return(df)
}
# get fed dates and clean up data
fed_wiki_page <- fGetHTMLTable('https://en.wikipedia.org/wiki/History_of_Federal_Open_Market_Committee_actions')
fedwp_df <- fed_wiki_page[[1]]
names(fedwp_df) <- fedwp_df[1, ]
fedwp_df <- fedwp_df[-1, ]
fedwp_df$Date <- as.Date(fedwp_df$Date, format = '%B %d, %Y')
fedwp_df$DiscRate <- as.numeric(unlist(strsplit(fedwp_df$`Discount Rate`, '%')))
fedwp_df$DR_diff <- c(-diff(fedwp_df$DiscRate), NA)
fed_xts <- xts(fedwp_df$DR_diff, as.Date(fedwp_df$Date))
# Get SPY data and construct aligned data with forward returns
spy_q <- Quandl('EOD/SPY')
spy <- xts(spy_q[, 'Adj_Close'], spy_q$Date); names(spy) <- 'Adj_Close'
spy$Rtn_1d_fwd <- (fLead(coredata(spy$Adj_Close), 1) / spy$Adj_Close) - 1
spy$Rtn_5d_fwd <- (fLead(coredata(spy$Adj_Close), 5) / spy$Adj_Close) - 1
spy$Rtn_10d_fwd <- (fLead(coredata(spy$Adj_Close), 10) / spy$Adj_Close) - 1
spy$Rtn_20d_fwd <- (fLead(coredata(spy$Adj_Close), 20) / spy$Adj_Close) - 1
spy$Rtn_60d_fwd <- (fLead(coredata(spy$Adj_Close), 60) / spy$Adj_Close) - 1
# subset spy by the fed dates
fed_dates <- c(as.Date(fedwp_df$Date), as.Date('2008-03-17')) # note this is to account for a sunday emergency meeting
spy_fed <- spy[fed_dates, ]
spy_fed$fed_change <- fed_xts[, 1]
# explore the data
unique(spy_fed$fed_change)
summary(spy_fed)
hist(spy_fed$fed_change, 10)
# melt data and a ggplot of returns vs changes, per horizon
sfdf <- as.data.frame(spy_fed)
sfdf$Date <- as.Date(rownames(sfdf))
sfdf <- sfdf[, c('Date', 'fed_change', 'Rtn_1d_fwd', 'Rtn_5d_fwd', 'Rtn_10d_fwd', 'Rtn_20d_fwd', 'Rtn_60d_fwd')]
sfm <- melt(data = sfdf, id.vars = c('Date', 'fed_change'), variable_name = 'Fwd_Returns')
ggplot(sfm, aes(x=fed_change, y=value, col=Fwd_Returns)) + geom_point(size = 4) + xlab('Fed Rate Change') +
ylab('Forward Equity Returns') + ggtitle('Forward Equity Returns vs. Fed Rate Changes')
# can we predict rate changes?
# get data
spy$Rtn_60d_prev <- (spy$Adj_Close / fLag(coredata(spy$Adj_Close), 60)) - 1
GDPC1 <- getSymbols('GDPC1', src = 'FRED', auto.assign = F) # Real Gross Domestic Product
GDPDEF <- getSymbols('GDPDEF', src = 'FRED', auto.assign = F) # Gross Domestic Product: Implicit Price Deflator
tsy2y <- getSymbols('DGS2', src='FRED', auto.assign = F) # 2y UST rate
tsy10y <- getSymbols('DGS10', src='FRED', auto.assign = F) # 10y UST rate
UNRATE <- getSymbols('UNRATE', src = 'FRED', auto.assign = F) # civilian unemployment rate
NFP <- getSymbols('PAYEMS', src = 'FRED', auto.assign = F) # NFP
CPI <- getSymbols('CPIAUCSL', src = 'FRED', auto.assign = F) # cpi
INDPROD <- getSymbols('INDPRO', src = 'FRED', auto.assign = F) # industrial production
# construct features, merge and trim to FOMC dates
SPYRTNS_feature <- spy$Rtn_60d_prev
GDPDEF_feature <- 100 * diff(GDPDEF, 4) / GDPDEF
GDPC1_feature <- log(GDPC1) * 100
CURVE_feature <- tsy10y - tsy2y; names(CURVE_feature) <- 'Curve'
UNRATE_feature <- 100 * diff(UNRATE, 12) / UNRATE
NFP_feature <- 100 * diff(NFP, 12) / NFP; names(NFP_feature) <- 'NFP'
CPI_feature <- 100 * diff(CPI, 12) / CPI
INDPROD_feature <- 100 * diff(INDPROD, 12) / INDPROD
features <- na.omit(na.locf(merge(SPYRTNS_feature, GDPDEF_feature, GDPC1_feature, CURVE_feature, UNRATE_feature, NFP_feature, CPI_feature, INDPROD_feature)))
feat_fed <- features[fed_dates, ]
feat_names <- names(feat_fed)
feat_last <- features[as.Date('2019-06-14'), ]
# Split data into training and test; Construct model
Y <- as.vector(fed_xts[, 1])[-1]
X <- as.matrix(feat_fed)[-1, ]
X_last <- as.matrix(feat_last)
split <- floor(.8 * length(Y))
Y_trn <- Y[1:split]
X_trn <- X[1:split, ]
Y_tst <- Y[(split+1):length(Y)]
X_tst <- X[(split+1):length(Y), ]
# random forest
rf_fit <- randomForest(y = Y_trn, x = X_trn, mtry = ceiling(sqrt(length(feat_names))), importance = T)
importance(rf_fit); varImpPlot(rf_fit)
# model stats
Y_hat_rf_trn <- predict(rf_fit, newdata = X_trn)
Y_hat_rf_tst <- predict(rf_fit, newdata = X_tst)
fFitStatsSimple(Y_trn, Y_hat_rf_trn, print_cm = T)
fFitStatsSimple(Y_tst, Y_hat_rf_tst, print_cm = T)
# Fit whole data set
rf_fit_all <- randomForest(y = Y, x = X, mtry = ceiling(sqrt(length(feat_names))), importance = T)
importance(rf_fit_all); varImpPlot(rf_fit_all);
Y_hat_all <- predict(rf_fit_all, newdata = X)
fFitStatsSimple(Y, Y_hat_all, print_cm = T)
# Plot a scatter plot
plt_df <- data.frame(actual = Y, prediction = Y_hat_all)
ggplot(data = plt_df, aes(x = Y_hat_all, y = actual)) +
geom_point(color = 'red') + xlab('Predicted Fed Changes') + ylab('Actual Fed Changes') +
geom_smooth(method = 'lm') + ggtitle('Actual Fed Changes vs Predicted Fed Changes')
# final prediction
fed_prediction <- predict(rf_fit_all, newdata = X_last)
# Tree plots
min_depth_frame <- min_depth_distribution(rf_fit_all)
plot_min_depth_distribution(min_depth_frame)
importance_frame <- measure_importance(rf_fit_all)
plot_multi_way_importance(importance_frame, size_measure = "no_of_nodes")
plot_multi_way_importance(importance_frame, x_measure = "mse_increase", y_measure = "node_purity_increase", size_measure = "p_value", no_of_labels = 5)