Time series as a regression bayesian model with TensorFlow Probability and ArviZ

3 min readJul 22, 2021

Let’s go step by step to build a regression model for timesries data. Time series can be modelled as sum of multiple components like trend, seasonality and residuals (There could be more, but these are the 3 classical components).

Let’s begin with some formalities.

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import xarray as xr
import tensorflow as tf
import tensorflow_probability as tfp

We’ll be taking a demo dataset. Atmospheric CO 2 measurements have been taken regularly at the Mauna Loa observatory in Hawaii since the late 1950s at hourly intervals. The observations are aggregated into monthly average.

co2_by_month = pd.read_csv(“/content/monthly_mauna_loa_co2.csv”)
co2_by_month[“DateMonth”]=pd.to_datetime(co2_by_month[“DateMonth”]
co2_by_month[“CO2”] = co2_by_month[“CO2”].astype(np.float32)
co2_by_month.set_index(“DateMonth”, drop=True, inplace=True)
co2_by_month.plot()
plt.xlabel("year)

Trend : Linear predictor to capture the upward increasing trend we see in the data

Seasonality : Using month of the year predictor directly to index a vector of regression coefficient.

trend = np.linspace(0., 1., len(co2_by_month))
trend = trend.astype(np.float32)
seasonality=pd.get_dummies(co2_by_month.index.month).values.astype(np.float32)

Writing down the first timeseries model using TFP.

root = tfd.JointDistributionCoroutine.Root
tfd = tfp.distributions
@tfd.JointDistributionCoroutine
def ts_regression_model():
    intercept = yield root(tfd.Normal(0., 100., name=’intercept’))
    trend_coeff = yield root(
        tfd.Normal(0., 10.,name=’trend_coeff’)
    )    seasonality_coeff = yield root(
        tfd.Sample(
            tfd.Normal(0., 1.),
            sample_shape=seasonality_all.shape[-1],
            name=’seasonality_coeff’)
        )
    noise = yield root(
        tfd.HalfCauchy(loc=0., scale=5., name=’noise_sigma’)
    )
    y_hat = (
        intercept[…, None] + 
        tf.einsum(‘i,…->…i’, trend_all, trend_coeff) +
        tf.einsum(‘ij,…j->…i’, seasonality_all, seasonality_coeff)
    )
    observed = yield tfd.Independent(
        tfd.Normal(y_hat, noise[…, None]),
        reinterpreted_batch_ndims=1,
        name=’observed’
    )#Taking a sample from the joint distribution object will give us some simulated observations
trace = ts_regression_model.sample()
 
#Sampling prior observations from our generative model without conditioning on any data.
*prior_samples,prior_predictive= ts_regression_model.sample((2,500))

Now that the prior predictive sampling is done, we can move onto a harder problem — conditioning on the data and estimating the posterior and posterior predictive distribution.

run_mcmc = tf.function( tfp.experimental.mcmc.windowed_adaptive_nuts, autograph=False, jit_compile=True)mcmc_samples, sampler_stats = run_mcmc(500, ts_regression_model, n_chains=2, observed=co2_by_month[“CO2”].values )*_, posterior_predictive = ts_regression_model.sample(value=mcmc_samples)#posterior_predictive is reshaped so that it follows arviZ convention of (chain, draw, ...)posterior_predictive = tf.transpose(posterior_predictive,[1,0,2])parameter_names = ts_regression_model.experimental_pin(
    observed=trace.observed
)._flat_resolve_names()

As from_tfp is underdevelpment, we’ ll be using howfrom_dict can be used to convert data to InferenceData object. After converting it, we could use the diagnostics of ArviZ to analyse the results in an easy way!

idata = az.from_dict( 
prior = {k: v for k, v in zip(parameter_names, prior_samples)},
prior_predictive={“y”: prior_predictive_timeseries},posterior_predictive = {
    “y”: posterior_predictive,
    “y_test”: posterior_predictive_test,
},
posterior={
    k:np.swapaxes(v.numpy(), 1, 0) 
    for k, v in mcmc_samples._asdict().items()
},
sample_stats={
    k:np.swapaxes(sampler_stats[k], 1, 0)
    for k in [
        “target_log_prob”,
        “diverging”,
        “accept_ratio”,
        “n_steps”
    ]
},
coords={
    “time”: np.array(co2_by_month.index),
    “seasons”: np.arange(12)
},
observed_data={ “y”: np.array(co2_by_month) },
dims={
    “y”: [“time”],
    “seasonality_coeff”: [“seasons”],
    “seasonality”: [“time”, “seasons”],
    “trend”: [“time”]
},
constant_data={“seasonality”: seasonality_all, “trend”: trend_all})

Feel free to experiment with the notebook!

I am gonna do some magic here and unlike typical magicians, I am gonna share the trick as well, but in the next blog. It will be dropped in coming week(s). “From design to prototype to deployment”.
For now, I am gonna call a function plot_ts which doesn’t exist for you.

plot_ts(idata, y=”y”, x=”time”, components=[“trend”, “seasonality”], holdout=120)

Ah! I know that you it’s not completed yet and magic is just a excuse to escape. Yea, that’s true. Just enjoy the results for now. We’ll be tweaking the inputs and knowing more details in coming blogs! If you’re interested to see the function, see you in the next blog!

Have a good day!

Time series as a regression bayesian model with TensorFlow Probability and ArviZ

Written by Utkarsh Maheshwari