🎧 Rolling Stones Song Clustering – Spotify x ML

INDUSTRY

Music Analytics, Machine Learning, Data Science

DATE

2025

SERVICES

Exploratory Data Analysis (EDA), Dimensionality Reduction (PCA) & Unsupervised Learning (KMeans Clustering)
Clustered Rolling Stones tracks using Spotify audio features to uncover mood-based cohorts and explore how machine learning can enhance music discovery and playlist design.

🧠 Project Overview

Cohorts of Songs is an unsupervised learning project that applies clustering algorithms to Spotify audio features from Rolling Stones tracks. The goal is to create interpretable cohorts of songs that capture patterns in acousticness, energy, danceability, and valence, enabling personalized music recommendations.

85%

variance retaining while 11->5 Features reduced via PCA

4.1x

2 Albums Identified as most recommendable based on popularity distribution.

72%

4 Distinct Song Clusters identified and interpreted.

🛠️ Methodology

Data Preprocessing
Cleaned dataset (rolling stones_spotify.csv), removed irrelevant fields.
Normalized numerical features (loudness, tempo, duration_ms).
Ensured unique identifiers (id) for each track.

Exploratory Data Analysis
Feature distributions: analyzed energy, valence, tempo, danceability.
Correlation analysis: strong association between loudness + energy, moderate between valence ++ popularity.
Identified albums with highest concentration of popular tracks.

Dimensionality Reduction (PCA)
PCA reduced correlated features to orthogonal components.
Retained 5 principal components covering-85% variance.
Used PCA projection for visual separation of clusters.

Clustering (KMeans)
Determined K-4 as optimal number of clusters (Elbow Method).
Trained KMeans on PCA-transformed features.

Cluster definitions:
Cluster 1: High-energy rock anthems (loud, danceable).
Cluster 2: Acoustic/low-energy ballads.
Cluster 3: Upbeat positive tracks (high valence, fast tempo).
Cluster 4: Live/instrumental sessions (high liveness, instrumentalness).

Key Achievements

✅ Segmented 100+ Rolling Stones tracks into stylistically meaningful clusters
✅ Identified high-energy, danceable tracks vs. low-valence, acoustic-heavy songs
✅ Highlighted key feature trends within clusters — like tempo, loudness, and instrumentalness
✅ Demonstrated how unsupervised learning can support intelligent playlist design and music discovery
✅ Delivered interpretable clusters for music personalization.
✅ Applied PCA for dimensionality reduction, improving cluster quality.
✅ Provided insights into song popularity drivers (energy, loudness, valence).
✅ Identified albums with highest recommendable songs for fan engagement.

Technical Stack
Languages: Python
Libraries: Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn (PCA, KMeans)
Data Source: Spotify API (Rolling Stones Discography)
Techniques: EDA, PCA, Clustering, Correlation Analysis

Future Extensions
Incorporate deep learning embeddings (eg, autoencoders) for feature representation.
Experiment with DBSCAN or HDBSCAN for density-based clusters.
Deploy as a Spotify Recommender Web App (Flask/Streamlit).
Scroll to Top