Gas Usage Systems Identification Model Backtesting

In this notebook, we will evaluate the daily performance of our gas usage method percentage systems identification VAR(X) model, as initially defined with minute level data. We will backtest on a period containing the atypical patterns analyzed in our initial adhoc analysis of gas usage actor/methods. To evaluate the model, which in this test case, is performing one-step predictions with retraining after each step, we use the Root Mean Squared Error (RMSE) criteria.

RMSE is a commonly used measure of the difference between actual and forecasted values. RMSE is always $\geq 0$, with 0 being a perfect forecast. Since RMSE is the square root of the averaged squared errors, it is sensitive to outliers.

$$RMSE =\sqrt{\frac{\sum_{t=1}^T (\hat y_t - y_t)^2}{T}}$$

What is Systems Identification?

Systems Identification uses statistical methods to create models of dynamical systems from observed input and output signals of the system. A dynamic system is an economic system such as a stock market or in our case, Filecoin's network gas economy. In our Systems Identification model, our goal is to create a model from measurements of the behavior of the system and its external inputs to determine a mathematical model of what is occurring. Depending on the level of knowledge of the system, we could use a white box, grey box, or black-box modeling approach. In our case, there is no prior model available of the gas usage methods, so we will be using the black-box modeling paradigm.

To learn more about systems identification, visit the links in this write-up.

Our model:

What is Backtesting?

Backtesting is a process used to validate a model on historical data. With backtesting, a model is tested against a historic time series and compared to actual values to see how it would have performed, if it had been used during the historical period. Backtesting is a valuable tool for determining a model's domain performance, as long as its limitations are understood. Below we will enumerate the pros and cons.

Pros

Cons

Below we will perform one-step forecasts with retraining. We forecast from August 30, 2021, through October 15th, 2021.

The first step is to obtain all of the data for our full period of study. In our case, we will begin with daily values from July 1, 2021, to give us one month and 29 days' worth of 'training' data. We will query data through the end of our backtest, which is October 15, 2021.

Conclusion

Our gas usage percentage forecasting model, when performing one-step forecasts, performed well during our backtesting period. August 29th through October 15, 2021, was a volatile period. Our model is more volatile than the actual data but recorrects efficiently. We will examine, in a subsequent notebook, ways to refine our model to behavior more as a filter, muting volatility.

Future enhancements