"Separating Explicit and Implicit Control for Expressive Neural Audio Synthesis" (en français : « Séparation de contrôles explicites et implicites pour la synthèse neurale expressive en temps réel »)
Doctorant au sein de l'EDITE (ED 130) de Sorbonne Université, Nils Demerlé a réalisé sa thèse au sein de l'équipe Analyse et synthèse des sons du laboratoire STMS (Ircam - Sorbonne Université - CNRS - ministère de la Culture). Soutenance en anglais.
//
PhD candidate within the EDITE doctoral school, Nils Demerlé completed his thesis within the Sound Analysis-Synthesis team at the STMS laboratory (IRCAM - Sorbonne University - CNRS - Ministry
of Culture). Defense in English.
???? Resume
Les récents progrès en apprentissage automatique ont profondément transformé notre rapport au son et à la création musicale. Les modèles génératifs profonds s’imposent aujourd’hui comme de nouveaux instruments potentiels, capables de soutenir et d’étendre les pratiques créatives. Leur adoption reste toutefois limitée par la question du contrôle : les approches actuelles offrent soit des paramètres explicites bien définis (note, instrument, description textuelle), soit des espaces de représentation plus abstraits permettant d’explorer des dimensions subjectives comme le timbre ou le style, mais plus difficiles à intégrer dans un contexte musical.
//
Recent advances in machine learning have profoundly transformed our relationship with sound and musical creation. Deep generative models are emerging as powerful tools that can support and extend creative practices, yet their adoption by artists remains limited by the question of control. Current approaches either rely on explicit parameters (notes, instruments, textual descriptions) or on abstract representation spaces that enable the exploration of subjective concepts such as timbre and style, but are harder to integrate into musical workflows.This thesis aims to reconcile these two paradigms of explicit and implicit control to design expressive audio synthesis tools that can be seamlessly integrated into music production environments. We begin with a systematic study of neural audio codecs, the building blocks of most modern generative models, identifying design choices that influence both audio quality and controllability. We then explore methods to jointly learn explicit and implicit control spaces, first in a supervised setting, and later through AFTER, a framework designed for the unsupervised case. AFTER enables realistic and continuous timbre transfer across a wide range of instruments while preserving control over pitch and rhythm.Finally, we adapt these models for real-time use through lightweight, streamable diffusion architectures and develop an intuitive interface integrated into digital audio workstations. The thesis concludes with several artistic collaborations, demonstrating the creative potential and practical impact of these generative approaches.
ℹ️ Infos https://www.stms-lab.fr/agenda/soutenance-de-these-de-nils-demerle/detail
#STMS #thèse #laboratoire
Doctorant au sein de l'EDITE (ED 130) de Sorbonne Université, Nils Demerlé a réalisé sa thèse au sein de l'équipe Analyse et synthèse des sons du laboratoire STMS (Ircam - Sorbonne Université - CNRS - ministère de la Culture). Soutenance en anglais.
//
PhD candidate within the EDITE doctoral school, Nils Demerlé completed his thesis within the Sound Analysis-Synthesis team at the STMS laboratory (IRCAM - Sorbonne University - CNRS - Ministry
of Culture). Defense in English.
???? Resume
Les récents progrès en apprentissage automatique ont profondément transformé notre rapport au son et à la création musicale. Les modèles génératifs profonds s’imposent aujourd’hui comme de nouveaux instruments potentiels, capables de soutenir et d’étendre les pratiques créatives. Leur adoption reste toutefois limitée par la question du contrôle : les approches actuelles offrent soit des paramètres explicites bien définis (note, instrument, description textuelle), soit des espaces de représentation plus abstraits permettant d’explorer des dimensions subjectives comme le timbre ou le style, mais plus difficiles à intégrer dans un contexte musical.
//
Recent advances in machine learning have profoundly transformed our relationship with sound and musical creation. Deep generative models are emerging as powerful tools that can support and extend creative practices, yet their adoption by artists remains limited by the question of control. Current approaches either rely on explicit parameters (notes, instruments, textual descriptions) or on abstract representation spaces that enable the exploration of subjective concepts such as timbre and style, but are harder to integrate into musical workflows.This thesis aims to reconcile these two paradigms of explicit and implicit control to design expressive audio synthesis tools that can be seamlessly integrated into music production environments. We begin with a systematic study of neural audio codecs, the building blocks of most modern generative models, identifying design choices that influence both audio quality and controllability. We then explore methods to jointly learn explicit and implicit control spaces, first in a supervised setting, and later through AFTER, a framework designed for the unsupervised case. AFTER enables realistic and continuous timbre transfer across a wide range of instruments while preserving control over pitch and rhythm.Finally, we adapt these models for real-time use through lightweight, streamable diffusion architectures and develop an intuitive interface integrated into digital audio workstations. The thesis concludes with several artistic collaborations, demonstrating the creative potential and practical impact of these generative approaches.
ℹ️ Infos https://www.stms-lab.fr/agenda/soutenance-de-these-de-nils-demerle/detail
#STMS #thèse #laboratoire
- Catégories
- Intelligence Artificielle
- Mots-clés
- Ircam, Ircam STMS, Stms lab


Commentaires