CrossMuSim: A Cross-Modal Framework for Music Similarity Retrieval with LLM-Powered Text Description Sourcing and Mining
Abstract
Music similarity retrieval is fundamental for managing and exploring relevant content from large collections in streaming platforms. This paper presents a novel cross-modal
contrastive learning framework that leverages the open-ended nature of text descriptions to guide music similarity modeling, addressing the limitations of traditional uni-modal
approaches in capturing complex musical relationships. To overcome the scarcity of high-quality text-music paired data, this paper introduces a dual-source data acquisition approach
combining online scraping and LLM-based prompting, where carefully designed prompts leverage LLMs’ comprehensive music knowledge to generate contextually rich descriptions.
Extensive experiments demonstrate that the proposed framework achieves significant performance improvements over existing benchmarks through objective metrics, subjective evaluations,
and real-world A/B testing on a commercial music streaming platform.
LLM-Based Prompting
Prompt Structure
Category
Details
System
You are a music critic. Please provide a detailed description of the song's musical semantics based on the information I provide, including the artist name and song title.
The paragraph should cover the following elements:
genre,
melody,
accompaniment,
themes,
and
emotions.
User
Artist: [ARTIST], Title: [TITLE]
Generated Examples by GPT-4o-mini
Example 1: Coldplay - Viva la Vida
"Viva la Vida" by Coldplay is a poignant blend of alternative rock and baroque pop, characterized by its sweeping orchestral arrangement and anthemic quality.
The melody is both uplifting and melancholic, driven by a memorable string section that weaves through the verses, creating a sense of grandeur.
The accompaniment features a combination of lush strings and rhythmic percussion, providing a rich backdrop that enhances the song's emotional depth.Lyrically, the song explores themes of power, loss, and redemption, reflecting on the fleeting nature of fame and the inevitable decline that follows.
The narrator's journey from a position of authority to a state of vulnerability evokes a profound sense of nostalgia and introspection.Emotions range from triumphant to sorrowful, as the chorus bursts forth with an almost euphoric energy, juxtaposed against the introspective verses.
This dynamic interplay between joy and melancholy resonates deeply, making "Viva la Vida" a powerful anthem of reflection and resilience.
Example 2: Adele - Take It All
"Take It All" by Adele is a moving ballad rooted in the pop and soul genres, showcasing Adele's powerful vocal prowess and emotional depth.
The song features a haunting piano melody that serves as a poignant backdrop, allowing her rich, emotive voice to take center stage.The sparse yet effective accompaniment gradually builds, incorporating strings that add a layer of intensity and drama.Thematically, "Take It All" delves into the complexities of heartbreak and vulnerability, capturing the feelings of loss and surrender.
The lyrics convey a sense of resignation, reflecting a deep emotional struggle and the desire to relinquish what once was, even when it hurts.
Emotionally, the song evokes a profound sense of sadness and longing, perfectly encapsulated in Adele's soulful delivery.
Her ability to convey raw emotion through her voice resonates deeply with listeners, making the experience both intimate and universal.
Overall, "Take It All" stands as a testament to Adele's artistry, blending exquisite melody with heartfelt themes that leave a lasting impact.
Example 3: DJ Khaled featuring T-Pain, Ludacris, Snoop Dogg, and Rick Ross - All I Do Is Win
"All I Do Is Win" by DJ Khaled, featuring T-Pain, Ludacris, Snoop Dogg, and Rick Ross, is a quintessential hip-hop anthem that encapsulates the genre's celebratory spirit.
The song features a catchy, infectious melody driven by a punchy, synth-heavy instrumental that propels the track forward with relentless energy.The production includes crisp snares and booming bass, creating an empowering backdrop that encourages listeners to embrace a winning mindset.Lyrically, the song revolves around themes of success, perseverance, and triumph over adversity.
Each artist contributes their own verses that highlight personal victories and the determination to rise above challenges, creating a collective narrative of confidence and resilience.The repeated refrain of "All I do is win" serves as a rallying cry, fostering an atmosphere of motivation and celebration.
Emotions conveyed in the track range from exuberance to defiance, making it a staple in both sports arenas and parties.
The collaboration of these prominent artists adds depth and variety, with T-Pain’s melodic hooks, Ludacris’s rapid-fire delivery, Snoop Dogg’s laid-back charisma, and Rick Ross’s authoritative presence, all coming together to create an unforgettable, high-energy anthem that inspires listeners to embrace their own victories.
Music Similarity Retrieval Demo
Here, we present demo results based on the evaluation set of 50k samples selected from our in-house music collection. We evaluate the performance of three methods: open-source MTT (baseline), online scraping (ours), and online scraping with LLM-based prompting (ours).