Authors

* External authors

Venue

Date

Share

Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

Junghyun Koo*

Marco A. Martínez-Ramírez

Wei-Hsiang Liao

Stefan Uhlich*

Kyogu Lee*

Yuki Mitsufuji*

* External authors

ICASSP 2023

2023

Abstract

We propose an end-to-end music mixing style transfer system that converts the mixing style of an input multitrack to that of a reference song. This is achieved with an encoder pre-trained with a contrastive objective to extract only audio effects related information from a reference music recording. All our models are trained in a self-supervised manner from an already-processed wet multitrack dataset with an effective data preprocessing method that alleviates the data scarcity of obtaining unprocessed dry data. We analyze the proposed encoder for the disentanglement capability of audio effects and also validate its performance for mixing style transfer through both objective and subjective evaluations. From the results, we show the proposed system not only converts the mixing style of multitrack audio close to a reference but is also robust with mixture-wise style transfer upon using a music source separation model.

Related Publications

BigVSAN: Enhancing GAN-based Neural Vocoders with Slicing Adversarial Network

ICASSP, 2024
Takashi Shibuya, Yuhta Takida, Yuki Mitsufuji*

Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between re…

HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes

TMLR, 2024
Yuhta Takida, Yukara Ikemiya, Takashi Shibuya, Kazuki Shimada, Woosung Choi, Chieh-Hsin Lai, Naoki Murata, Toshimitsu Uesaka, Kengo Uchida, Wei-Hsiang Liao, Yuki Mitsufuji*

Vector quantization (VQ) is a technique to deterministically learn features with discrete codebook representations. It is commonly performed with a variational autoencoding model, VQ-VAE, which can be further extended to hierarchical structures for making high-fidelity recon…

Enhancing Semantic Communication with Deep Generative Models -- An ICASSP Special Session Overview

ICASSP, 2024
Eleonora Grassucci*, Yuki Mitsufuji*, Ping Zhang*, Danilo Comminiello*

Generative adversarial network (GAN)-based vocoders have been intensively studied because they can synthesize high-fidelity audio waveforms faster than real-time. However, it has been reported that most GANs fail to obtain the optimal projection for discriminating between re…

  • HOME
  • Publications
  • Music Mixing Style Transfer: A Contrastive Learning Approach to Disentangle Audio Effects

JOIN US

Shape the Future of AI with Sony AI

We want to hear from those of you who have a strong desire
to shape the future of AI.