Why we are far from having usable AI music creation tools

So today I was pointed to ProducerAI and thought, ok, let’s do my regular test to see how this machine will create solo instrument lines. I will share the whole back and forth prompting session below but I just want to collect some general observations first.

The overall sound of the cello that was produced in this session is frankly quite outstanding. I mean there are aspects that I don’t like (the reverb for example I would dial down a bit) – there are rarely any pauses in the playing, which gets super tiresome… I wonder what would have happened with woodwind or brass instruments… infinite breath and such. Reminds me of my composition arranging teachers who always said that writing for strings has the big downside that they don’t have to breathe, so people tend to write these ongoing things that never stop and get boring quickly.

The other general observation is that the system very very soon started to just do its own thing and go down a rabbit hole that was getting further and further away from what I actually intended and wanted. I’ve noticed this in all kinds of creation (not just here with producerAI): the first try of these systems is usually kind of good but if you then try to modify that first result it gets horrible quickly. And from a user perspective this will get quite frustrating as you want to just shake that machine to make it wake up again and realize that it’s not “listening” to your instructions anymore. I guess starting over again and again is the way to go…

This producerAI system first “translated” my prompt into another prompt and I have to admit that I learned from these second prompts because those usually did have very precise descriptive terms and words that sometimes elude me as a non-native speaker. These generated prompts also sounded like it was understanding exactly what I wanted, but the result… well, judge for yourself. For me there might be the occasional fragment I could cut out of these to use in a sketch – but it probably will have to be recorded to make sense in a real piece the way I imagine it. Overall, at this point this system is almost useless as a tool for me to write.

We’ll see, this technology is in its infancy and perhaps they will get it better, but my hunch is, because all of these systems are designed to not offend the user, no art that is challenging will come out if this. Only bland regurgitation of mainstream pleasantness. Yuck.

Anyway, so here is the exchange:

Resulting audio:

Prompt 1:
create a dramatic cello melody that has a fast part (like 3 or 4 notes) going into a held note – create 6 of those motives

Prompt 1 (producerAI version):
dramatic solo cello melody, dark and intense, fast four-note staccato run leading into a long sustained vibrato note, cinematic and emotive, 110 bpm

Prompt 2:
I was thinking more like dagadagadagadaaaaaaahhhhhdgadigidagadigiduuuhhhhhhh

Prompt 2 (producerAI veresion):
solo cello, aggressive fast sextuplet runs followed by a heavy long held vibrato note, dramatic and rhythmic, dagadagadagadaaaaaaah, 120 bpm

Prompt 3:
now create it in Ab minor

Prompt 3 (producerAI version):
solo cello in Ab minor, fast rhythmic staccato bursts followed by a long intense held note, dagadagadagadaaaaaaah, 120 bpm, dark classical style

Prompt 4:
No. you are moving away from dagadagadagadaaaaaaaahhhh – next phrase digidigidigduuuuuuhhhh…. next phrase etdc…

Prompt 4 (producerAI version):
solo cello, sequence of dramatic phrases, fast staccato runs of 4-5 notes followed by a long held vibrato note, dagadagadagadaaaaaaaahhhh then digidigidigduuuuuuhhhh, dark Ab minor, aggressive classical, 120 bpm

Prompt 5:
it’s all sounding very nice – but it is much too baroque and not really what I want. I imagined at 150bpm these very short 4 , 6 or note 16th runs that consist only of three notes into a note that is held for almost 2 measures.

Prompt 5 (producerAI version):
aggressive solo cello, Ab minor, rapid 16th note bursts of three distinct notes followed by a massive two-measure held vibrato note, cinematic and high tension, 150 bpm

Prompt 6:
that was all based on 8th notes in relation to 150 bpm – I was talking 16th notes and then a long held not (very little vibrato if at all!!!)

Prompt 6 (producerAI version):
dramatic solo cello, 150 bpm, sharp 16th note runs, three note cells, leading into a two measure held note, zero vibrato, cold and mechanical, Ab minor, cinematic intensity

Prompt 7:
Nothing I’m asking for is in here. I’m getting music that Bach would have written 300 hundred years ago. I’m not interested in that.

Prompt 7 (producerAI version):
modern solo cello, Ab minor, industrial precision, fast 16th-note cells, 3 unique notes, slamming into a 2-measure dead-tone sustain, no vibrato, flat and intense, cinematic horror aesthetic, 150 bpm

The front page of today’s LA Times featured the headline, “Workers lead the resistance as AI risk emerges.”

The online edition uses the title, “Column: Your boss wants AI to replace you. The writers’ strike shows how to fight back”

From the article…

A big reason that the AI hype machine has been in overdrive, issuing apocalyptic claims about its vast power, is that the companies selling the tools want to make it all feel inevitable — to feel like the future — and have you believe that resisting it is both futile and stupid. Conveniently, most of these discussions eschew questions such as: Whose future? Whose future does AI really serve?
The answer to that is “Big Tech” and, to a lesser degree, “your boss.”
Matt Nicholas, a 30-year-old writer and WGA member, who was all too aware exactly how AI was going to be used by the film and television industry — not to replace writers, but to undermine them.
Too many executives in too many industries, such as entertainment, tech and journalism, recognize generative AI for what it is: an opportunity to wield leverage over already precarious workforces. There’s going to be a long, hard struggle, but it’s one worth fighting.
I liked this post on VI-Control by user JJP so much, I wanted to archive it here.

Why Can Everyone Spot Fake News But The Tech Companies?

From Daring Fireballs John Gruber:

It’s the same reason why Facebook, Twitter, and YouTube are overrun with state-backed troll accounts from Russia. Engagement leads to growth, growth is all that matters, and if the trolls and fake news are engaging, better not to look for them. The oft-quoted Upton Sinclair quote fits perfectly: “It is difficult to get a man to understand something, when his salary depends upon his not understanding it.”

Source: Why Can Everyone Spot Fake News But The Tech Companies?

So in the end, we will have Scambots talking to Anti-Scambots. Result: huge waste of energy and increased CO2 emissions. Make #tech #green already!

While I totally love the idea of keeping the scammers busy using a chatbot, in the grand scheme of things this is just wasted energy.

We really need to start thinking about the energy use for these stupid things.

Source: Somebody wrote an email bot to waste scammers’ time / Boing Boing

SoundCloud tries to allay fears, but streaming needs a business model – CDM Create Digital Music

And, look, while all of this shakes out, musicians and labels continue to pursue a strategy that caters to building relations on all these services. Some of them have great success stories with YouTube, with SoundCloud, with Spotify.

But maybe that’s the point. It seems to be the businesses in between that are non-functioning – or (in the case of futuristic blockchain propositions) just not ready for primetime.

Musicians and labels keep doing the hard work of making the music and fighting to get it heard. Yet investment and attention pours into the middleware between us and listeners – and that middleware really isn’t working terribly well.

Source: SoundCloud tries to allay fears, but streaming needs a business model – CDM Create Digital Music