Exponential Moving Averages at Scale: Building Smart Time-Decay Systems
Editor’s note: Tulika Bhatt is a speaker for ODSC East this May 13th-15th! Be sure to check out her talk, “How Netflix Delivers Real-time Impressions at Scale,” there to learn more about exponential moving averages and more!
Exponential Moving Averages (EMAs) have been a cornerstone of financial analysis for decades, offering a mathematically elegant way to smooth time series data by giving greater weight to recent observations. Originally devised for stock market analysis to help traders identify trends amidst volatile price fluctuations, the underlying principles of EMAs are now finding new life in fields such as recommendation systems and real-time analytics. In these modern applications, the challenge is analogous: how do you weigh user interactions over time when their relevance naturally decays?
In this post, we’ll explore how exponential moving averages can be retooled to create intelligent time-decay systems. We’ll examine the core mathematics behind exponential moving averages, review a simplified implementation, and discuss the complex engineering challenges that arise when scaling these systems to process millions of events in real time.
The Mathematics of Time-Decay
At the heart of the exponential moving average is a simple idea: recent events are more relevant than older ones. This is achieved by applying an exponential decay to the weight assigned to past data points. For instance, if you set the decay factor (α) to 0.5 and define your window as one day, an event occurring right now has full weight (0.⁵⁰ = 1.0), an event one day old contributes half as much (0.⁵¹ = 0.5), and an event two days old contributes only a quarter (0.⁵² = 0.25).
A compact Python function below captures this logic:
This function clearly demonstrates how the weight decays with time — controlled by the decay factor and the specified window size.
Implementation Challenges
While the underlying formula is straightforward, applying EMAs in a high-throughput, real-time system introduces a new set of challenges. In practice, a recommendation system might receive millions of user interactions per second, and each interaction must be incorporated into the exponential moving averages calculation while maintaining sub-100ms response times. Consider this more comprehensive implementation written in Python:
In this class, new interactions are aggregated with their respective time-based weights, and the previous exponential moving averages value is decayed appropriately based on the elapsed time. For example, if you instantiate the calculator with a one-day window and α set to 0.5, you might update the EMA as follows:
This code snippet demonstrates a practical scenario where the EMA is updated by blending freshly recorded data with the decayed value of past data.
Beyond Basic Implementation
Deploying such a system at scale is not merely about getting the math right — it’s about architecting a solution that can handle extreme loads while preserving accuracy. Real-world systems must process millions of interactions with very low latency. Achieving this requires a robust interplay between computational efficiency and system design:
- High Throughput and Low Latency:
The system must update the EMA quickly as new data arrives. This often means relying on different techniques for handling EMAs computed for shorter windows vs for longer windows. - Distributed Data Consistency:
In large-scale distributed systems, ensuring that every node has a consistent view of the EMA value is crucial. This involves advanced caching, load balancing, and fault tolerance strategies that can reconcile updates in real time. - Dynamic Data Volumes:
User interaction volumes can fluctuate significantly. The architecture must adapt to sudden spikes without compromising on response time or accuracy, necessitating scalable and elastic processing pipelines.
Looking Ahead with Exponential Moving Averages
In my upcoming ODSC East 2025 talk, “How Netflix Delivers Real-time Impressions at Scale,” I’ll share deeper insights into the architectural patterns and engineering practices that enable Netflix to handle trillions of data points while keeping response times below 100ms. I’ll explore how advanced distributed time-series processing and real-time data orchestration techniques come together to create a robust EMA system that powers content personalization for over 250 million subscribers.
About the Author: Tulika Bhatt is a Senior Software Engineer at Netflix, focusing on building scalable real-time data systems. She brings extensive experience in distributed systems and data engineering to solve complex challenges in content personalization.
Connect with Tulika:
- LinkedIn: https://www.linkedin.com/in/tulikabhatt/
- Email: tbhatt@netflix.com