Facebook AI Research (FAIR) scientists yesterday unveiled a neural network capable of translating music from one style, genre, and set of instruments to another. Soon, you won’t have to blow your own horn; you can just whistle to an AI and it’ll turn your song into the symphony or dance hit of your dreams.
The AI takes one input, such as a symphony orchestra playing Bach, and translates it into something else, like the same song played on a piano in the style of Beethoven, for example.
FAIR becomes the first AI research team to create an unsupervised learning method for recreating high-fidelity music with a neural network.
Our results present abilities that are, as far as we know, unheard of. Asked to convert one musical instrument to another, our network is on par or slightly worse than professional musicians. Many times, people find it hard to tell which is the original audio file and which is the output of the conversion that mimics a completely different instrument.
The incredible level of fidelity is achieved by teaching a neural network how to auto-encode audio. As far as the AI is concerned, it’s just making a bunch of noise sound like a different bunch of noise – but don’t call it style transfer. The team says:
We distance ourselves from style transfer and do not try to employ such methods since we believe that a melody played by a piano is not similar except for audio texture differences to the same melody sung by a chorus. The mapping has to be done at a higher level and the modifications are not simple local changes.
FAIR’s approach involves a complex method of auto-encoding that allows the network to process audio from inputs it’s never been trained on. Rather than try to match pitch, or memorize notes, it’s unsupervised learning method uses high-level semantics interpretation – you could say it plays by ear.
This is another example of how far the field of AI has come in just the past few years. Other examples of musical AI we’ve seen create noise, most of which is more like abstract sound that could reasonably be interpreted as music, don’t even come close. This one, we’d argue, is the first that could be mistaken for actual humans playing real instruments.