How Spotify Uses GenAI and ML to Annotate a Hundred Million Tracks

How Spotify Uses GenAI and ML to Annotate a Hundred Million TracksThis article explains how Spotify addressed these challenges by building an annotation platform designed to scale with its machine learning needs.
͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     ͏     
Forwarded this email? Subscribe here for more
How Spotify Uses GenAI and ML to Annotate a Hundred Million Tracks
ByteByteGo
Jul 1 

READ IN APP

Azure VM Cheatsheet for DevOps Teams (Sponsored)
Azure Virtual Machine (VM) lets you flexibly run virtualized environments and scale on demand. But how do you make sure your VMs are optimized and cost-effective?
Download the cheatsheet to see how Datadog’s preconfigured Azure VM dashboard helps you:
Visualizing real-time VM performance and system metrics
Correlating host data with application behavior
Right-sizing VMs to optimize costs and performance
Download the cheatsheet
Disclaimer: The details in this post have been derived from the articles shared online by the Spotify Engineering Team. All credit for the technical details goes to the Spotify Engineering Team. The links to the original articles and sources are present in the references section at the end of the post. We’ve attempted to analyze the details and provide our input about them. If you find any inaccuracies or omissions, please leave a comment, and we will do our best to fix them.
Spotify applies machine learning across its catalog to support key features. 
One set of models assigns tracks and albums to the correct artist pages, handling cases where metadata is missing, inconsistent, or duplicated. Another set analyzes podcasts to detect platform policy violations. These models review audio, video, and metadata to flag restricted content before it reaches listeners.
All of these activities depend on large volumes of high-quality annotations. These annotations act as the ground truth for model training and evaluation. Without them, model accuracy drops, feedback loops fail, and feature development slows down. As the number of use cases increased, the existing annotation workflows at Spotify became a bottleneck. Each team built isolated tools, managed their reviewers, and shipped data through manual processes that didn’t scale or integrate with machine learning pipelines.
The problem was structural. Annotation was treated as an isolated task instead of a core part of the machine learning workflow. There was no shared tooling, no centralized workforce model, and no infrastructure to automate annotation at scale.
This article explains how Spotify addressed these challenges by building an annotation platform designed to scale with its machine learning needs. It covers:
How was human expertise organized into a structured and scalable workflow?
What tools were built to support complex annotation tasks across different data types?
How was the infrastructure designed to integrate annotation directly into ML pipelines?
The trade-offs involved in balancing quality, cost, and speed.
Warp's AI coding agent leaps ahead of Claude Code to hit #1 on Terminal-Bench (Sponsored)
Warp just launched the first Agentic Development Environment, built for devs who want to move faster with AI agents.
It's the top overall coding agent, jumping ahead of Claude Code by 20% to become the #1 agent on Terminal-Bench and scoring 71% on SWE-bench Verified.
✅ Long-running commands: something no other tool can support
✅ Agent multi-threading: run multiple agents in parallel – all under your control
✅ Across the development lifecycle: setup → coding → deployment
Try Warp's coding agent for yourself
Moving from Manual Workflow to Scalable Annotation
The starting point was a straightforward machine learning (ML) classification task. The team needed annotations to evaluate model predictions and improve training quality, so they built a minimal pipeline to collect them.
They began by sampling model outputs and serving them to human annotators through simple scripts. Each annotation was reviewed, captured, and passed back into the system. The annotated data was then integrated directly into model training and evaluation workflows. There was no full-fledged platform yet, but just a focused attempt to connect annotations to something real and measurable.
Even with this basic setup, the results were significant:
The annotation corpus grew by a factor of ten.
Annotator throughput tripled compared to previous manual efforts.
This early success wasn’t just about volume. It showed that when annotation is directly tied into the model lifecycle, feedback loops become more useful and productivity improves. The outcome was enough to justify further investment.
From here, the focus shifted from running isolated tasks to building a dedicated platform that could generalize the workflow and support many ML use cases in parallel.
Platform Architecture
The overall platform architecture consists of three pillars. See the diagram below for reference: