Rethinking Streaming Machine Learning Evaluation

Published 23 May 2022 in cs.LG, cs.AI, and stat.ML | (2205.11473v1)

Abstract: While most work on evaluating ML models focuses on computing accuracy on batches of data, tracking accuracy alone in a streaming setting (i.e., unbounded, timestamp-ordered datasets) fails to appropriately identify when models are performing unexpectedly. In this position paper, we discuss how the nature of streaming ML problems introduces new real-world challenges (e.g., delayed arrival of labels) and recommend additional metrics to assess streaming ML performance.