Papers
Topics
Authors
Recent
Search
2000 character limit reached

Comparison of different Methods for Univariate Time Series Imputation in R

Published 13 Oct 2015 in stat.AP and cs.OH | (1510.03924v1)

Abstract: Missing values in datasets are a well-known problem and there are quite a lot of R packages offering imputation functions. But while imputation in general is well covered within R, it is hard to find functions for imputation of univariate time series. The problem is, most standard imputation techniques can not be applied directly. Most algorithms rely on inter-attribute correlations, while univariate time series imputation needs to employ time dependencies. This paper provides an overview of univariate time series imputation in general and an in-detail insight into the respective implementations within R packages. Furthermore, we experimentally compare the R functions on different time series using four different ratios of missing data. Our results show that either an interpolation with seasonal kalman filter from the zoo package or a linear interpolation on seasonal loess decomposed data from the forecast package were the most effective methods for dealing with missing data in most of the scenarios assessed in this paper.

Citations (163)

Summary

  • The paper compares basic, time series specific, and lagged multivariate methods for imputing missing values in univariate time series data using R.
  • Experiments demonstrate that R methods accounting for time series characteristics, especially seasonality (e.g., na.interp, na.StructTS), outperform basic techniques based on RMSE and MAPE metrics.
  • The study concludes specialized univariate time series imputation techniques are necessary and advocates for developing more robust, efficient R packages tailored for this challenge.

Overview of Time Series Imputation Methods in R

The paper, "Comparison of Different Methods for Univariate Time Series Imputation in R," by Moritz et al., presents a comprehensive examination of strategies for addressing missing values in univariate time series data within R, a statistical programming language. The authors identify a gap in the existing literature and tools for time series imputation, particularly when dealing with univariate data where inter-attribute correlations commonly utilized in multivariate datasets are not applicable.

Imputation Challenges and Context

Univariate time series data, characterized by measurements recorded at consistent time intervals, are prevalent across various domains like biology, finance, social sciences, and climate change studies. Missing values pose significant challenges, impacting further analysis and decision-making processes. Traditional imputation techniques, effective for datasets with multiple attributes, often fail in the univariate context where time dependency becomes crucial. Hence, specialized algorithms leveraging this aspect are necessary.

Time Series Characteristics and Methodologies

The paper delineates time series decomposition into trend, seasonal, and irregular components, emphasizing the importance of focusing on such characteristics when devising imputation strategies. It evaluates several datasets representing distinct time series characteristics: google (white noise), SP (trend), beersales (seasonality), and airpass (trend and seasonality). These datasets were chosen to illustrate varied scenarios an imputation algorithm might encounter.

The authors classify imputation methodologies into three principal categories:

  1. Univariate Algorithms: Basic methods such as mean, mode, and median fall here. These do not leverage time series characteristics effectively and show limited applicability for imputation in time-dependent datasets.
  2. Univariate Time Series Algorithms: Using techniques such as locf (last observation carried forward), seasonal Kalman filters, and linear with seasonal decomposition, these methods incorporate the inherent temporal sequences, thus improving imputation accuracy.
  3. Multivariate Algorithms on Lagged Data: By converting time data into covariates using lagged observations, multivariate imputation techniques can be indirectly applied to univariate time series.

Experimental Insights

A series of experiments were conducted utilizing R functions: na.aggregate, na.locf, na.StructTS, na.interp, na.approx, and a custom implementation ar.irmi, comparing their effectiveness across different datasets with various missing data ratios. Key metrics used for performance evaluation included RMSE and MAPE.

Significant results from these experiments indicate that methods tailored to univariate time series, particularly those accounting for seasonality, such as na.interp from the forecast package and na.StructTS from zoo, offer superior imputation outcomes compared to basic univariate and multivariate on lagged data approaches. Nevertheless, running times varied notably, with simpler methods like na.aggregate excelling in computational efficiency albeit lacking in accuracy, especially on datasets with a pronounced trend where it performed poorly.

Implications and Future Directions

This study underscores the necessity of specialized imputation techniques for univariate time series, opening avenues for further exploration into optimizing and potentially integrating multivariate techniques with univariate time dependencies. The authors call for the development of more robust and efficient R packages tailored explicitly for univariate time series imputation, addressing current limitations.

Future research could explore deeper into time series characteristics and how they may assist multivariate algorithms adapted for univariate data, potentially enhancing both precision and computation time. Additionally, regular updates and expansions to current R packages could provide researchers with more flexible and robust options for their time series imputation needs.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.