a research paper about FDSLRM modeling with supplementary materials - software, notebooks
Authors: Jozef Hanč, Martina Hančová, Andrej Gajdoš
Faculty of Science, P. J. Šafárik University in Košice, Slovakia
emails: [email protected]
EBLUP-NE for tourism
Python-based computational tools - SciPy, CVXPY
Table of Contents
Data and model - data and model description of empirical data
Natural estimators - EBLUPNE based on NE
NN-DOOLSE, MLE - EBLUPNE based on nonnegative DOOLSE (same as MLE)
NN-MDOOLSE, REMLE - EBLUPNE based on nonnegative MDOOLSE (same as REMLE)
To get back to the contents, use the Home key.
Python libraries
CVXPY: A Python-Embedded Modeling Language for Convex Optimization
Purpose: scientific Python library for solving convex optimization tasks
Version: 1.0.1, 2018
Computational parameters of CVXPY:
solver - the convex optimization solver ECOS, OSQP, and SCS chosen according to the given problem * OSQP for convex quadratic problems *
max_iter
- maximum number of iterations (default: 10000). *eps_abs
- absolute accuracy (default: 1e-4). *eps_rel
- relative accuracy (default: 1e-4). * ECOS for convex second-order cone problems *max_iters
- maximum number of iterations (default: 100). *abstol
- absolute accuracy (default: 1e-7). *reltol
- relative accuracy (default: 1e-6). *feastol
- tolerance for feasibility conditions (default: 1e-7). *abstol_inacc
- absolute accuracy for inaccurate solution (default: 5e-5). *reltol_inacc
- relative accuracy for inaccurate solution (default: 5e-5). *feastol_inacc
- tolerance for feasibility condition for inaccurate solution (default: 1e-4). * SCS for large-scale convex cone problems *max_iters
- maximum number of iterations (default: 2500). *eps
- convergence tolerance (default: 1e-4). *alpha
- relaxation parameter (default: 1.8). *scale
- balance between minimizing primal and dual residual (default: 5.0). *normalize
- whether to precondition data matrices (default: True). *use_indirect
- whether to use indirect solver for KKT sytem (instead of direct) (default: True).
Scipy - NumPy, Pandas
Numpy is the fundamental Python library of SciPy ecosystem for fast scientific computing with large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays.
default precision: double floating-point precision
Pandas is the Python library providing high-performance, easy to use data structures.
In this econometric FDSLRM application, we consider the time series data set, called visnights
, representing total quarterly visitor nights (in millions) from 1998-2016 in one of the regions of Australia -- inner zone of Victoria state. The number of time series observations is . The data was adapted from Hyndman, 2018.
The Gaussian orthogonal FDSLRM fitting the tourism data has the following form:
where .
We identified the given and most parsimonious structure of the FDSLRM using an iterative process of the model building and selection based on exploratory tools of spectral analysis and residual diagnostics (for details see our Jupyter notebook tourism.ipynb
).
SciPy(Numpy)
SciPy(Numpy)
CVXPY
NE as a convex optimization problem
stage of EBLUP-NE
using formula (3.10) from Hancova et al 2019.
where are NE, are initial estimates for EBLUP-NE
SciPy(Numpy)
cross-checking
using formula (3.6) for general FDSLRM from Hancova et al 2019.
KKT algorithm
using the the KKT algorithm (tab.3, Hancova et al 2019)
SciPy(Numpy)
CVXPY
nonnegative DOOLSE as a convex optimization problem
ParseError: KaTeX parse error: Got function '\boldsymbol' with no arguments as subscript at position 97: …hbf{e}'-\Sigma_\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲{\nu}||^2 $6pt]…
CVXPY
using equivalent (RE)MLE convex problem (proposition 5, Hancova et al 2019)
stage of EBLUP-NE
SciPy(Numpy)
KKT algorithm
SciPy(Numpy)
CVXPY
nonnegative DOOLSE as a convex optimization problem
ParseError: KaTeX parse error: Got function '\boldsymbol' with no arguments as subscript at position 109: …hrm{M_F}\Sigma_\̲b̲o̲l̲d̲s̲y̲m̲b̲o̲l̲{\nu}\mathrm{M_…
CVXPY
using equivalent (RE)MLE convex problem (proposition 5, Hancova et al 2019)
stage of EBLUP-NE
SciPy(Numpy)
References
This notebook belongs to suplementary materials of the paper submitted to Statistical Papers and available at https://arxiv.org/abs/1905.07771.
Hančová, M., Vozáriková, G., Gajdoš, A., Hanč, J. (2019). Estimating variance components in time series linear regression models using empirical BLUPs and convex optimization, https://arxiv.org/, 2019.
Abstract of the paper
We propose a two-stage estimation method of variance components in time series models known as FDSLRMs, whose observations can be described by a linear mixed model (LMM). We based estimating variances, fundamental quantities in a time series forecasting approach called kriging, on the empirical (plug-in) best linear unbiased predictions of unobservable random components in FDSLRM.
The method, providing invariant non-negative quadratic estimators, can be used for any absolutely continuous probability distribution of time series data. As a result of applying the convex optimization and the LMM methodology, we resolved two problems theoretical existence and equivalence between least squares estimators, non-negative (M)DOOLSE, and maximum likelihood estimators, (RE)MLE, as possible starting points of our method and a practical lack of computational implementation for FDSLRM. As for computing (RE)MLE in the case of observed time series values, we also discovered a new algorithm of order , which at the default precision is times more accurate and times faster than the best current Python(or R)-based computational packages, namely CVXPY, CVXR, nlme, sommer and mixed.
We illustrate our results on three real data sets electricity consumption, tourism and cyber security which are easily available, reproducible, sharable and modifiable in the form of interactive Jupyter notebooks.
Hyndman R.J., Athanasopoulos G. (2018). Forecasting: Principles and Practice (2nd Edition), OTexts, Monash University, Australia. Data in R package fpp2 version 2.3. https://CRAN.R-project.org/package=fpp2