Google AI model to generate music from text could be bigger than ChatGPT

Researchers from Google published a paper saying they created a model generating high-fidelity music from text description. It’s called MusicLM, and according to AI scientist Keunwoo Choi, this model’s overall structure is based on other models, which combine MuLan + AudioLM and MuLan + w2b-Bert + Soundstream.

Choi explains a bit about how each of these models works:

  • MuLan is a text-music joint embedding model with contrastive training and 44M music audio-text description pair from YouTube;
  • AudioLM uses an intermediate layer from a speech-pre-trained model for semantic information;
  • w2v-BERT is a Bidirectional Encoder Representation from Transformers, a deep learning tool originally for speech, this time used for audio;
  • SoundStream is a neural audio codec.

Google combined all of this to generate music from text. Here’s how the researchers explain MusicLM:

We introduce MusicLM, a model generating high-fidelity music from text descriptions such as ‘a calming violin melody backed by a distorted guitar riff‘. MusicLM casts the process of conditional music generation as a hierarchical sequence-to-sequence modeling task, and it generates music at 24 kHz that remains consistent over several minutes. Our experiments show that MusicLM outperforms previous systems both in audio quality and adherence to the text description. Moreover, we demonstrate that MusicLM can be conditioned on both text and a melody in that it can transform whistled and hummed melodies according to the style described in a text caption. To support future research, we publicly release MusicCaps, a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts.

Image source: Google

Comparatively, it’s interesting to think about the things ChatGPT was able to perform. Tough exams, analyzing complex codes, writing laws for Congress, and even creating poems, music lyrics, etc. In this case, MusicLM goes beyond and transforms intention, a story, or paint into a song. Seeing The Persistence of Memory by Salvador Dalí transformed into a melody it’s fascinating.

Google’s MusicLM made available more than 5,000 music-text pairs available for people to experiment with its creation. Unfortunately, the company doesn’t plan to release this model to the public. That said, you can still look – and listen – at how this AI model can generate music from text here.

For all the latest Technology News Click Here 

 For the latest news and updates, follow us on Google News

Read original article here

Denial of responsibility! TechNewsBoy.com is an automatic aggregator around the global media. All the content are available free on Internet. We have just arranged it in one platform for educational purpose only. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, all materials to their authors. If you are the owner of the content and do not want us to publish your materials on our website, please contact us by email – [email protected]. The content will be deleted within 24 hours.