Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
My Procedurally Generated Music Is Awful (groovelet.com)
120 points by mproud on Dec 6, 2020 | hide | past | favorite | 111 comments


I like this quote “Don’t try to procedurally generate something that you can’t already create”.

I’ve been actively making music full time for a good 15 years now. I work as a writer/producer and despite all of my experience and theoretical understanding I’d probably be hard pressed to come up with a way to algorithmically describe my musical instincts in a way that generates satisfying music.

I think a large part of the trouble comes from the notion that most music requires a certain degree of subversion of expectation. And not randomly selected subversions either. Deliberate choices that feel meaningful. I could spend a lot of time defining one melodic phrase and immediately need to tweak the rule for the second half or a reharmonization. Not to mention rhythmic cadences and groove.

I’ve been doing a deep dive on Autechre recently and am revisiting interviews where they hint at their process. The duo is notorious for using self-built tools that algorithmically generate the sounds, texture, and rhythms. These programs are not expected to run independently or generate great music on their own. They are actively guided and performed by the band as they jam and compose. In this way it’s more of an augmentation of their will that allows them to create such rhythmically dense compositions.

In re-reading these interviews what I found interesting is that for them the act of building the program is one and the same with the act of composition. They’re defining the parameters, limits, and characteristics of a composition and then using their judgement to steer and flex the sonic structures that they’ve built up.

When it comes to generative music, I think they’ve got the right attitude in not expecting the software to do the heavy lifting. It’s instead treated as an extension of their performance.


Experimenting with generative music I put together an extremely simple (and ugly) prototype that only uses random generation (no machine learning of any kind) and very few rules; it is very basic but can sometimes produce things that could pass for elevator music.

For example (with some imagination) this could be the background music in a Chinese restaurant (because of the pentatonic scale):

http://autopedie.medusis.com/#011004155|g113220|g213410|c114...

What are missing are repetitions (see the other thread by MauranKilom) and some kind of structure. But as a minimalist experiment it's fun.


It's a fun experiment, but I feel like my cat walking across the piano produces more musical results due to the presence of intent.

Music, like all art, is communication. Randomly generated communication is just noise.

The intent in this exhibit seems to be "this is what these rules sound like", but then it's the generator itself that is the artwork.


Your cat walking across the piano certainly doesn't follow scales, and probably not tempo either. But yes, the more rules there are, the more "musical" it can sound.


It could if it avoided all the black keys.


I'm just a hobbyist but I play guitar and am getting more into production (drum machines, synths, sequencing multi-sampled instruments, etc.) Recording my guitar gives me an appreciation for the fact that as great as virtual instruments are, there is still so much left on the table that a good instrumentalist (let alone a vocalist) is doing and often not even thinking about it. There's much more to it than playing the right notes at the right times if you want it to sound right, much less good. Stuff like the Linnstrument is taking it to higher levels, but I think the horizon for Faux-y Bonnamosa is a little further out than some in the tech crowds might imagine.

The bar might seem a little lower for pure electronic genres but it'll still be some time before Steve Faoki is consistently better than the real deal.


Well, playing and recording a live instrument is much more expressive for many reasons. But the biggest thing are these little random bits and noises that can be present when recording with a microphone and inconsistencies such as pitch fluctuations. This can be emulated with soft synths and samplers but with many limitations.

When I produce music digitally, I usually get the most interesting results by randomising devices. For example, in Ableton Live there is a device called "Device Randomiser" that can randomise other devices. A few of these in a chain can produce crazy sounds over time. Then these can be used together with more traditional soft synths, for example by using them as an external vocoder signal.


Algorithmic music has a history stretching back at least as far as Mozart. His algorithmic music sounded good. His non-algorithmic music sounded good too. Not a coincidence.

https://en.wikipedia.org/wiki/Musikalisches_W%C3%BCrfelspiel


I'd be curious to know how tightly Mozart would have stuck to the outcomes that the dice dealt him. I could imagine him also using the dice as a jumping off point for compositional suggestion but disobeying the dice for a round if he wanted to follow an idea in a different direction. Because he could hear the composition as it was being generated, I can just as easily imagine him deviating from the dice-scheme if he saw an opportunity for something interesting. Who knows.

A similar thing happens in Schoenberg's twelve-tone system. For melodic writing, the composer is locked into using the next pitch in their tone-row, but the art of it comes in them exercising their ability to repeat that prescribed pitch or change it's register however they see fit to get the best possible musical effect.

It's quasi-algorithmic in that the pitch sequence is prescribed, but the writer is allowed to get creative with the other musical dimensions to make it as compelling as possible.

That said, it generally doesn't make for very enjoyable listening. At least, it's an acquired taste. 20th century classical music got pretty far up it's ass with the "games for games-sake".


Mozart was not rolling the dice. Mozart created the melodic fragments; people who bought the sheet music got instructions to roll their dice to combine the fragments into new melodies.


I produce music aswell. I think some genres like techno could have procedurally generated tracks, that could be then brought to an artistic level by a dj mix of those tracks. The dj would provide that needed subversion and unpredictability to make it worth listening to.


Hmm, i play and make music, but i can see a place for randomly generated music, or at least randomly generated music as a tool for creation. But, like with a lot of randomly generated video games, it's easy to make them suck. I think something maybe more like A robot named fight's approach to procedural generation may be better for music, random generation based on pre-assembled parts, or maybe random generation to create variation for pre written parts.

A lot of arpeggiators already do something similar.

Though, i do agree with the general premise of that and think any randomly generated parts should be hand touched up by someone who knows what they're doing and should be used as any other tool in a musician's arsenal and not as a way to just generate songs wholesale.

There is a potential for random generation to be used creatively in music, it just shouldn't be used as a crutch to cover lack of skill or as the sole means by which a song is created.


Well for the band in question, Autechre, I'm not trying to suggest that they compose by having the machines generate parts and then they go back to tweak those parts. It's more as though the program they write for a composition is something to be steered and actively adjusted on the fly. THAT's the performance. They're not micro-managing note and rhythm choice. They're performing by adjusting probabilities, or adding new seed conditions on the fly, or controlling meta functions.

I believe their compositional style is more akin to a long form improvisational jam that may then be edited down for conciseness and redundancy. I don't get the impression that they're going back and tweaking the fine details at this stage in their long career. They're way too prolific to keep that up.

Here's an example of some tracks that are typical of them:

https://www.youtube.com/watch?v=cMD_oenMGVk&list=OLAK5uy_lvg...

https://www.youtube.com/watch?v=CRBKryLk29E

As you'll hear it's pretty removed from conventional rhythm, harmony, and form. There are certainly underlying foundations, but they've transitioned into a mode that's almost entirely rhythm and timbre. In fact, by leaning into the generative components and ignoring the need to replicate traditional song forms, melody, or cadence in a credible way they've pioneered a very radical and exciting form of composition with electronic instruments.


I dunno, the first one sounds like random dissonant noise to me, the second one is a bit less grating to listen to still not what i would enjoy though or what i would really consider music. Not quite what i had in mind.


Yeah. It’s definitely an acquired taste. I would describe their musical arc as a pair of British B-boys who took super-hard to the beatmaking aspect of early electro hiphop and then pushed that as far as they possibly could.

They’re certainly not songs, in the western sense of having chords and melody. It’s a very different and unrelated paradigm. It’s primarily an exploration of rhythm and timbre. I would not be surprised to discover that a lot of what sounds like musical texture is, in fact, very dense rhythmic sequences.

I find it thrilling. Once you live with it for a bit you can grow accustomed to it and start to recognize the inner-logic of how it works. I know that they perceive it as dance music, but I perceive it more as braindance.


Exposure is a lot of it. They have tons of devoted fans and are some of the most beloved musicians on music sites I go on. It just takes one's brain some time to carve new grooves and perceptions of meter, like for a lot of avant-garde music. (Not to say everyone who exposes themselves to a lot of it will like it - it's necessary but not sufficient.)


To give you perspective of the range Autechre has, this track is much closer to mainstream song writing than their generative and left field stuff: https://www.youtube.com/watch?v=qkTJTf7Yvk8

It's an acquired taste when you go through their discography and start to notice the intricacies of the creations, their rhythms and sound design are really damn hard to replicate.


>The duo is notorious for using self-built tools that algorithmically generate the sounds, texture, and rhythms.

I didn't know that about them but now their music makes much more sense to me.


I was noting this exact thing while watching Township Rebellion mix music live that they produced in the studio.


Do you mind sharing some of your music?


I'd be happy to email you a track or two. I'd want to share recent work, but it's also unreleased so I'd like to not have it be made public until it's complete.


Nice, my emails up now. Likewise I’ll send something in return, but there’s no obligation to provide feedback


> most music requires a certain degree of subversion of expectation

And that's usually a really small amount within the confines of "what works".


I've started down the same path that Autechre has gone recently - just using Sonic Pi as the live-looper to build up a library of part-players and a conductor. Then output MIDI to a DAW to use plugins and make further arrangement and mix decisions.

The thing is, the sound design is, in a sense, the easy part. You can buy the sound you want, with the substantial exception of some acoustic and vocal performances. You can feed it into a tremendous number of processing effects and get artistically interesting results by turning the knobs. This has been commodified: you can buy up software bundles for holiday sales right now and have an absolutely vast kit of professional production tools for perhaps $500 USD; less, going to zero, if you focus just on essentials and make good use of freebies, more if you aim for high-effort multisampled instruments and certain high-end plugins. And so we have a golden age for the improvisational electronic jam session. There's no need to build your own patches, though it helps from a sound engineering perspective to have those skills.

But compositions aren't jam sessions, they're a tighter, more directed layer of structure than that. And the problem that rears its head with algorithmic approaches is that it isn't reducable to one algorithm with different values, rather, a different piece is a different algorithm, and the final result is assembled most easily by switching between layers of intentional, hand-picked sequences and automations that extend upon them.

But if you go in agnostic to the principles driving that material, as it seems the author did, the result doesn't go in any coherent direction, and so it doesn't sound musical.

Many people like the old chiptunes on the C64. One of the distinguishing features of many of them is the fact that the sequences are, in fact, algorithmic in this way. They were not composed entirely on the piano and then painstakingly input as a sequence of events. Some were, of course, but memory constraints drove many more towards compression, if only at the level of "loop this pattern", and the lengthiest tunes leaned heavily on algorithms(one example: Times of Lore). There is no requirement for algorithmic approaches to also be academic experiments and these tunes often demonstrate that.

Somehow this fact gets very little representation in the modern DAW environment. Sequences are uniformly multitrack tapes, printed scores, piano rolls, tracker grids, or lists of events. If they're particularly cheeky they might think to use divisions of a circle. But you can define an interesting sequence in a few lines of code, really: you just need to tap into a pattern that has some intrinsic chaos(predictable, but in a difficult-to-compute way - cellular automata, double pendulums, fractals or even just a polyrhythm all work) and subsequently shape it to the needs of the larger structure. It's the use of the mathematics that matters, and traditional theory is kind of burdensome in this way, because it often obscures the mathematics in trying to convert it into something that can be muscle-memory trained and called out verbally.


Most of the music that most people enjoy doesn't follow the rules of math, but subversions of it that collectively we call a musical culture.

In fact, most of the music ever created (not all, but a majority) was written to dance to, and typically as part of a co-evolutionary process that connected the music and the dance together.

Western art music in the 20th century made a good effort at creating music without reference to existing musical cultures, and although there have been some interesting developments along the way, for the most part (minimalism being a principal exception), it didn't really succeed at generating much music that was enjoyed by a lot of people.


> I’ve been actively making music full time for a good 15 years now. I work as a writer/producer and despite all of my experience and theoretical understanding I’d probably be hard pressed to come up with a way to algorithmically describe my musical instincts in a way that generates satisfying music.

> I think a large part of the trouble comes from the notion that most music requires a certain degree of subversion of expectation. And not randomly selected subversions either. Deliberate choices that feel meaningful. I could spend a lot of time defining one melodic phrase and immediately need to tweak the rule for the second half or a reharmonization. Not to mention rhythmic cadences and groove.

I mean this in the kindest way but... you’re right. You should not seed generated music algorithms. You’re describing something people feel viscerally as something mechanical. I don’t know if that’s how you experience music but... it’s not what most people feel hearing it. And that’s okay.


Ha. I'm not sure that I understand exactly what you think I'm right about or why that requires you to gently explain it as a good thing. But thank you?

I don't think that there's anything fundamentally objectionable about generating music algorithmically. It just tends to be disappointing and not as engaging as more traditional forms of composition at this moment in time. I'm sure it'll be really exciting once it matures.

I like Brian Eno's idea of records someday being programs that you purchase instead of recordings that are frozen in time. Something that you can play and hear a different version every time that it generates. In theory that sounds really lovely.

Sometimes I do think of music as something mechanical, or procedural. A lot of artists do stuff like this with pleasing results. Steve Reich's 'Come Out', 'It's Gonna Rain', or 'Clapping Music' are prominent examples for me. Granted, those three pieces probably resonate more with the academically and musically nerdy types, but they're powerful if you keep an open mind going in and submit to it.

But right now I think the most compelling generated stuff still has an element of human input to steer it's development and isn't quite ready to run on it's own.

I think some of the most interesting stuff today is music that's composed by hand but borrows the affect of procedural music. Stuff like minimal techno that evolves very slowly. I'm particularly fond of the work of Ricardo Villalobos:

https://www.youtube.com/watch?v=OZWdWzMdndc


Random 2 cents: A lot of what makes music (at least the kind that most people like to consume) pleasant is repetition and self-similarity. Consider the "sometimes behave so strangely" (https://en.wikipedia.org/wiki/Speech-to-song_illusion) for a fascinating illustration of this.

For many genres, you fundamentally have repetitions at every power of two interval. For example, the hi-hat repeats every eigth. Kick drum every quarter. Snare every second quarter. Maybe an effect every bar, a rhythmic variation every second bar, phrasing covering four bars. Cadence repeats every eight bars. Melody maybe 16 bars. Verse or chorus 32 bars. ABAB song structure. It's repetitions all the way down.

As with all "rules" in music you can ignore them to whatever degree you like. But ignore too many and it just sounds like a mess.

On a related note (no pun intended), I am starting to notice that, for me, getting tired of a song coincides with "hearing" the repetition of material (e.g. chorus) at larger and larger intervals. Of course you know that the chorus repeats, but here I specifically mean that it sounds like an echo of the chorus a minute ago.


Yep, I was thinking along the same lines. The music we hear in day-to-day is often very simple.

Besides structure, chords are often the same. Most of the time it's I-IV-V, sometimes I-IV-V-vi... I even saw a video today mentioning how the I-ii-IV-V of Cure's "Just Like Heaven" is an uncommon progression. For soundtracks it's often just two chords vamping. Melody notes also rarely stray from whats sounds consonant with the current chord. Dissonance is only for passing notes. In the strong beats it's certainly something consonant. Sometimes it's all just a pentatonic. You also gotta emphasise that tonic everywhere. Melodies also need structure: you gotta have that AABA (or something like that) in your verse/theme/phrase, almost like a fractal. Broken rules only happen once or twice per song, or not at all.

The more good music I dissect the more impressed I am with how simple it is.


Not all but too much of that description applies only to western popular music.

Indian classical music (probably among the most melodically advanced in the world) doesn't use anything like the song structures you're referring to (it also doesn't use harmony).

The ease of certain kinds of music production has led to us to living in a decade (or two) of huge amounts of what I can only call "audio candy for ears trained on western popular music". There's nothing wrong with this music, but it's like eating cookies all the time, not realizing there are other delicious kinds of food out there.


Of course, I never claimed it was a universal description.

My point was not that we should create only simple music, but rather that we should understand how simple music is created, before jumping into anything more complex than that, or even into a general solution like so many people into procedural generation seem to jump into. You need to understand the craft itself other than just understanding the theory.

Musical styles are not unconstrained. If you want it to produce procedural Carnatic Music, you can replace my description of western music with the equivalent for it, like zozbot234 started doing. Then you'll have a starting point.


Indian classical music can be understood (admittedly in very broad terms) as being based on melodic elaboration of what in Western music would be the bass line (aka basso continuo or thoroughbass). The way this elaboration works is very similar to what one finds in the Western musical tradition (as clarified e.g. by Schenker), it's just applied in a very different way in terms of structure (indeed there's no equivalent to the 'lead' melody, to which diminution would be applied in Western music!), and obviously with a strong focus on a single monophonic 'line'.


I feel as if you just said "A is a lot like B, or rather A is made of the same stuff as B but built totally differently".

From my admittedly not deep knowledge about Carnatic music, it would be a serious mistake to think of it as "being based on melodic elaboration of what in Western music would be the bass line". That's not how its composers and practitioners think of it, and critically, it misses out entirely on the whole concept of a raga, which defines not just a set of notes (the way a western scale does), but also their ordering when ascending and descending, and comes with a large amount of cultural attribution (evening ragas, seasonal ragas, emotional ragas and so on). What is played is not an elaboration of a line, but an exploration of the raga.

If you were going to draw comparisons to western musical practice, I would think that Schoenberg's 12 tone/serial experiments have at least as much in common.


I'm not sure if you've seen it already, but here's an essay-length elaboration of some of these ideas:

https://aeon.co/essays/why-repetition-can-turn-almost-anythi...

It references the same speech-to-song illusion you have described.

I found this essay, and its ideas, to be really helpful in understanding in why my brain likes certain types of music -- some compositions really trigger this particular part of my perception.


Interesting, the speech to song illusion reminds me of a related visual effect ... https://en.wikipedia.org/wiki/Leaning_tower_illusion


Repetition legitimizes.

Repetition legitimizes.[0]

[0]: https://www.youtube.com/watch?v=9MzKx0fKg5o&feature=youtu.be...


Hello, I made some ML generated videogame music: https://soundcloud.com/theshawwn/sets/ai-generated-videogame...

There are ballroom themes, battle themes, boss fights, and the last one is my attempt at a final battle track. If there's interest, I can walk through how the model works. Gwern's writeup is here: https://www.gwern.net/GPT-2-music

All 25 tracks were made in a single night. That's the kind of boost you can get from ML.

Note that it's not entirely automatic. But that's a benefit, not a downside. It's more like an AI instrument that you play.

The songs each contain the ABC source code that generated it. (Click on the song's description to see.) You can replicate the song for yourself by copying the source code into a text file, then running something like this:

  abc2midi "${1}.abc" -o "${1}.mid" && timidity "${1}.mid" -Ov -o "${1}.ogg" && timidity "${1}.mid"
(That's from my bash script called "abc2ogg".)

An effective model will "group" its data in a way that the model can understand. That's the essence of why this model worked so well.


To be fair these results can be used as a representation of ML art output as being distinct in its qualities and related ease of production. It immediately appears as a hazy simulacrum when listened to be an interested critical ear. Which can work to your point; there vast production use cases for ML-quality media issues background dressing, eg music for cheap/free video games, infotainment videos , or any application where cheap production libraries will suffice. Huge amounts of human labor have been thrown at making throw away media assets. Similarly a large proportion of serious, foreground art works consist of mundane background chores that ML can achieve with incredible efficiency.

But when ML generation is placed to the foreground, as the main focus, it appears like an unnerving vacuous simulacra. I can’t yet see it being something beyond shovel-like, in its function and form


I partly agree, but I like "Crossing The Channel": https://soundcloud.com/theshawwn/crossing-the-channel?in=the...

The reason the beat is so unique is because the model started out by making a mistake. Or at least I thought it was a mistake; it took several measures to hear what it had in mind. GPT did its usual "generate the next most likely thing," which happens to be a rhythm that sounds good. In some sense, it's impossible to say whether it was a mistake, or if it "planned" to write it like this.

"The Cruise" is lovely and relaxing: https://soundcloud.com/theshawwn/the-cruise?in=theshawwn/set...

and "Green Mountain" is just fun: https://soundcloud.com/theshawwn/the-green-mountain?in=thesh...

As for the effect on the art world, I think indie gamedevs will be happy they have some nice tunes without having to work with a musician. There's no way to know what the wider implications will be, though. Perhaps kids will grow up hearing stories of how granddad learned to play the guitar himself rather than asking the computer to write a hit song.


I think they can be used as inspiration sources. Like recording automatically created arpeggios and keeping only what pleases you.


This may be marginally relevant: the latest post on the Digital Antiquarian [0] briefly talks about "Sid Meier's CPU Bach", a procedural music generator that (the year being 1994) does not use ML in any way and only relies on music theory. It's not so famous because it was a 3DO exclusive (!), but there are some YouTube videos that showcase its output [1]. It may or may not fall into the "bland and samey" category. After all it just tries to emulate the style of one artist. You be the judge.

[0]: https://www.filfre.net/2020/12/ethics-in-strategy-gaming-par...

[1]: https://www.youtube.com/watch?v=8QLQ_mYXf1I


Based on the replies so far, I think most of you would find Schmidhuber's Compression Progress Drive interesting. The idea is essentially that of an adversarial agent with two components. A attempts of to compress its entire life history, breaking it all into neat chunks. B gets off on A's progress in so doing, and finds/creates patterns that are just within A's reach. So B's intrinsic motivation comes from discovering novel patterns that are broken down by A.

You can get really far with this idea, and music is the perfect example. We enjoy patterns. Complicated patterns become more interesting with listening experience. Simple patterns are quickly exhausted. So long as there is structure, there is opportunity for enjoyment.

Of course, Schmidhuber's theory is a bit more complicated than that. And his own opinion of it is a bit ... grandiose. Title of one of his papers on the topic: "Driven by Compression Progress: A Simple Principle Explains Essential Aspects of Subjective Beauty, Novelty, Surprise, Interestingness, Attention, Curiosity, Creativity, Art, Science, Music, Jokes"

Nonetheless, I think it's a neat theory.


I've spent days on this problem and the hard part is not the music theory. Even Moonlight Sonata will sound dinky if you are playing it with sin waves. Generating an interesting synth sound is at least as hard as generating an interesting melody.

If anyone has any good psychoacoustic resources, I'd be all ears.

Melody is quickly becoming secondary to sound design in modern music, for better or worse.


Here's some Magenta-generated music played through a Moog One[!]:

https://soundcloud.com/zndx/20201102-01-415v2

I think it would be great to set up a studio with a few modern analog synths 'in the loop' for automated training.

[!] recorded through a Tascam 688 I found at a thrift store.


For sure! I've been putting random magenta drum and melody generations through this free synth Vital ( https://vital.audio/ )

Second step is choose a lead preset based on random letter generation (whatever is closest). (I believe this is the hard part as far as generation goes ... but there are patterns to oscillator presets)

Third step is import the generated MIDI into your DAW like Logic or Ableton. Fourth, pick a key and quantize the notes. Finally pick a pad and match the bass notes.

I'll generate one right now ... all AI/randomized except for the design of the preset. Be back in 5 minutes.


https://soundcloud.com/nicholasbulka/random-magenta

lol, if this is interesting to anyone let me know and I'll put together a yt video.


OP's examples all use generated waves, but tone.js can use samples that sound pretty realistic and can be much more pleasant to listen to IMHO.


I figured out how to do it :)

Repetition is key to people being able to latch onto music, but you can still deeply randomize things. What I'm doing is driving everything off Music Thing Modular Chord Organs, with firmware I programmed. Everything's based off a chord (or note) choice within one of twelve keys, and the secret is that you've got to modulate by circle of fifths, not chromatically. Also, chords are better arranged off pentatonic roots, rather than diatonically.

Do that and you can go to challenging jazz chords, but things hang together in a logical/musical way. I've been evolving this system for years: here's a jam with heavy reliance on procedurally-generated chords and arps: https://soundcloud.com/airwindows/therefore

And this one's more recent, but set to wander a bit more far afield as far as the random range goes. Chords and roots and arps generated procedurally, plus other playing on top. https://soundcloud.com/airwindows/fogbound-2020-11-03


Are you Chris Johnson? If yes, thanks for everything!! ;-)


Yup! And you're quite welcome! I try to put as much code as I can up on GitHub, recently working on some more Music Thing Modular Chord Organ firmware. I've got simple versions of the oscillators running up to 12 bit/2.9 megahertz, and a chord version running happily at 768k :) this greatly improves aliasing performance, both at high and low frequencies.


>explaining in painful detail that the soundtrack to Fart Chalice IV actually takes advantage of phrygian pentameter and post-modern phrasing to create artificial dissonance between the fourth and sixteenth notes in an alternating jazz-inspired progression.

> I can just barely read music. I can’t deal with that!

Sadly, the author has this backwards. Reading music has almost nothing whatsoever to do with the music theory (admittedly gobbledy-gook) in the first paragraph.

If you're a coder, you can do music theory. You might choose not to, and that's OK. But don't make it out to be harder than it is. There are lots of places to start.

The problem with not reading music is not that you won't be able to understand music theory. It's that it is very hard to find anyone who will explain it you (in person or in writing) without the terminology (and conceptual skeleton) of western notation and 18th century western tonal harmony. Neither of these things are actually central at all, but good luck avoiding it if you were ever to try.


My problem is that whenever I've picked up a music theory textbook, the author opens with "put your hands on the keyboard... doesn't playing <foo> <bar> and <baz> feel natural?"

As someone who has had zero music training, but has considered generative music as a possible hobby, I'd like to at least start trying to understand the domain before committing to a physical instrument and working on developing muscle memory. I'd really love a rigorous music theory textbook that does not assume you already play an instrument.


To be honest the math is super simple, but standard music theory piles a heap of messy terminology on top of it. For years I've been meaning to write a "programmer's guide to 85% of western music theory" article, but never quite gotten around to it.

The core of it boils down to stuff like:

    * scales are arrays of note offsets:
      * majorScale = [0, 2, 4, 5, 7, 9, 11]
    * to change a scale's key, offset each note:
      * dMajor = cMajor.map(n => n+2)
    * "modes" typically means scales that are reindexed:
      * dorian = major.map((n,i) => major[(i+1) % 7])
and so on. Obviously knowing that stuff doesn't make the music theory any more intuitive - if a textbook says "a V/ii chord", I have no idea what that sounds like. But deriving what notes it has, or writing a function to play them, is super easy once the terminology is demystified.


Here are my notes on it [0] [1].

Feel free to extend, "steal", add or whatever.

I'd also love to see a book, blog post or article from someone who's more competent then myself.

[0] https://github.com/abetusk/dev/blob/release/notes/MusicNotes...

[1] https://github.com/abetusk/scratch/blob/release/notes/Music-...


A book is lot of work, but a blog post would be a great start!

Regarding modes, your presentation is excellent and true (AFAIK), but why some scales don't have the same modes as other scales; given scales with the same number of notes, they should have the same modes?

Yet for example, while the modes of the major scale are Ionian then Dorian, Phrygian, Lydian, Mixolydian, Aeolian and Locrian, the modes of the double harmonic major scale are [name of the scale] then Lydian, Ultraphrygian, Hungarian, Oriental, Ionian and Locrian.

Is it only a problem of nomenclature or is there a profound truth behind it?


I think the short answer is nomenclature. Much of what makes music theory difficult is that the terms and notation systems have built up over centuries, and they come from musicians who were concerned with what's useful, not from programmers who understood the math. So pretty much everything has nine different names, and pretty much every term gets used to mean nine different things.

That said, for modes, AFAIK the names are all 100% arbitrary and used for historic reasons. For scales with varying numbers of modes, the first thing that jumps to mind is that some scales have modes that are identical to the scale, or to other modes, so references probably leave them out (e.g. consider the modes of the whole tone scale!). But there's probably also cases where this or that mode might get omitted because it whoever wrote the authoritative textbook about that scale thought it sounded awful, or felt it shouldn't be considered a mode for whatever reason.

There could be other stuff going on I don't know about - when I was learning this stuff I ignored big swaths of material that didn't seem useful for programmatic music.


> That said, for modes, AFAIK the names are all 100% arbitrary

This annoyed me a lot when learning the basics. One thing I noticed when I experimented with what things would be called without the traditional names is that I would lose the "namespace" of what I'm thinking about. I think the names are awful, but Dorian means something specific, C something specific, etc. You can map it to just numbers like... scale 1 for note 1, mode +1, chord 5 and it may be initially easier to "calculate" the result, but also more confusing.

But then you just get used to the names with enough practice. A bit of a Stockholm syndrome. They're still awful, but familiar enough you stop caring.


One unfortunate dangling problem with the historical nomenclature is this:

   "scales have modes"
This is wrong. A scale is an interval series. A mode is an interval series. There is no difference between them.

It just so happens that there are ways to permute some interval series to generate others. Because western music is so focused on the interval series we call the major scale, people noted that you can permute this into a series of other interval series. But none of those interval series (Aeolian, Phrygian, Lydian etc) "own" or are the basis for the others: they are all related by permutation. You could just as easily say "Modes of the Phrygian scale" and you would still be referring to the same set of interval series.


> This is wrong.

Modes are (typically) permutations of scales. The fact that a permutation of a scale is also a scale doesn't mean it's wrong to say that scales have permutations.


Would love to read more about this.

Until your article is done, can you recommend other sources that go into this?


It doesn't contain code, but "Musimathics" by Gareth Loy was way more useful to me than any vanilla theory book. I think most other musically ignorant programmers would agree.


I don't really remember finding any good dense sources, I mostly slogged through wikipedia articles and gradually decoded things. I really ought to get around to that article...


Music theory is based on generalizing patterns that are found in actual music. Your "muscle memory" is an integral part of understanding this generalization semi-rigorously. About the only thing that can be called loosely 'rigorous' wrt. music theory is the derivation of the diatonic scale from "stacking" fifths.


I don't agree with your dismal final sentence.

Musical set theory [1] is an examination of scales/harmony without giving preference to western modes. The harmonic series is a physical phenomena at the root of all musical cultural behavior, and is a completely rigorous subject.

The diatonic scale isn't derived by stacking fifths, since that would be tautological: a "fifth" is a concept taken from the diatonic scale. What you likely meant was "derivation of the diatonic scale by stacking key integer ratios between frequencies, particularly the ratio 3:2)"

That already elides the whole concept that dissonance and consonance correspond to the presence or absence of non-integer ratios between frequencies with notable energy. It also skips over the admittedly subjective way in which introducing more dissonance by way of "odd" ratios can still be perceived as musically interesting (though almost always in culturally determined ways).

[1] this isn't pure MST, but it's a great place to start even if it is limited to 12TET tuning: https://ianring.com/musictheory/scales/


A book that I came across the other day (via a recommendation from YT-ber Adam Neely): "Harmonic Experience" by W.A. Mathieu. It's a little expensive as a first dive into this stuff, but it does look like a very natural way to lead you into and through the actual experience of western tonal harmony, often by singing along with a drone (easy to generate on today's computers).

https://www.amazon.com/Harmonic-Experience-Harmony-Natural-E...


If you want something practical / simpler, have a look at https://www.youtube.com/user/MusicTheoryForGuitar (it's guitar-oriented, but contains general content) - they're not in a specific order so you'll have to pick and choose your topics. It's a bit better than a book, because you'll hear the examples in most cases too. Then you could go to more advanced ones like 12tone or Adam Neely's Q&As.


I feel for you on this. I was in the same boat. Even talking to musicians, they couldn't explain it to me with the rigor I wanted.

For me, the real insight came when I understood why the 12 note chromatic scale was chosen, why chords "sound good" and that a lot of the confusing terminology hides pretty simple concepts.

I don't feel like I could make any music, generative or otherwise, that sounds passable but here's what I've learned so far:

* As a good approximation of the human auditory system, frequencies are perceived on a logarithmic scale

* Rhythm is tied heavily with language and the tempo of speech, word length and frequency [0]

* Note combinations sound more "pleasing" when the ration of their frequencies are small as reduced fractions [1]

* The 12 note chromatic scale is a good compromise of number of notes and enough pleasantly sounding note combinations

* The 12 note chromatic scale is generated from a base frequency of 440Hz multiplied by the twelfth roots of 2

* The 12 note chromatic scale can be further restricted to the diatonic scales to help further restrict the note set to allow for ease of composition

As far as I can tell, a lot of music theory centers around the different diatonic scales ("Ionian", "Aeolian", etc.), which are just restrictions of the chromatic 12 note scale to some notes that sound reasonable together. I have some very rough notes on it [2], though I'm not sure how readable it is.

I'm also in the same boat in terms of "instrument choice", though I would much rather invest in some live coding environment, like Gibber or Sonic Pi, or some software DAW system like LMMS. I still haven't found something that gets at the right level of functionality and abstraction that I want.

[0] https://github.com/abetusk/papers/blob/release/Music/patel20...

[1] https://github.com/abetusk/papers/blob/release/Music/measure...

[2] https://github.com/abetusk/dev/blob/release/notes/MusicNotes...


Don't feel bad, plenty of human-made music is awful too.


Among the many ways of creating good music, one is to crate music that is awful but becomes great through the passage of time and changing tastes. Most of the music that I perform sounds awful to most people. It's kind of funny when it comes up in conversation, people ask me what kind of music I play, and I tell them, then there's that awkward moment when they're trying to think of something nice to say about it. I've figured out the easiest thing to say is: "I play music that most people hate."

Sometimes awful music stays awful too.


I think most people would put Trout Mask Replica in this category. For reference: https://www.youtube.com/watch?v=9CeLjmIW5wk

After you give it a listen, consider that it got one of Pitchfork's rare 10/10 reviews and ranks high in many lists of the greatest albums of all time.


Indeed, my friends and I threw a band together, to cover that album, and quickly discovered that it was impossible, so we turned to punk rock instead.


I'm intrigued. Anything you can share?


Experimental, avant garde jazz. Never been recorded.


If you don't record it, how can it possibly ever become great with passage of time?


Ah, good point. I'd say the genre can gain traction, and our performances can also evolve as we figure this stuff out ourselves. But the bands that I play with don't tend to record.


This is a Very Hard Problem™. All current music gen is either:

1) obviously algorithmic and pattern/rule based 2a) generated from computational statistics (deep learning) but handpicked to show only the best results, or... 2b) applied to very simplistic musical styles that sound roughly fine no matter what you do (techno, elevator pop, etc.) or... 2c) not handpicked, but exists within a strange "uncanny valley" where you can tell it's not quite right and just feels a little "off" the whole time


I think (1) potentially has a place for cases where it’s supposed to be background music that you wouldn’t really pay attention to (but shouldn’t repeat).

It’s pretty easy to take a style of music and produce something that sort of goes round and round modulating into different keys and so forth without actually going anywhere in particular.

But what you’re really doing is composing a parameterised piece of music and the randomly substituting parameters as you play it. This is what people are doing in real-life with things like cocktail piano anyway. If you can’t compose to begin with then you don’t stand a chance, however.


"Uncanny" is quite subjective when it comes to music. I know of at least 'DeepBach', Feynman Liang et al.'s (2016) 'BachBot' and Daniel Johnson's (2015) 'biaxial-rnn' as algorithmic music systems that could all be described as 'quirky' in many ways, but definitely not 'off'. And they're not limited to 'simplistic' music styles - all of them deal with Western music of the common practice period, which is quite non-trivial. Their main limitation is modeling structure beyond the 'short musical phrase' level, but that's to be expected.


Those are all very clearly quirky enough to be considered "off" by people who understand Bach's music and the classical tradition more generally. And the large scale structure problem is one of the primary reasons the whole thing is so hard in the first place.


But my point is that 'quirky enough to be considered off' is a rather subjective and context-dependent POV. The main issue really is the lack of consistent structure beyond the very short term, but aside from that the music seems to replicate many interesting features of the style, and it's certainly interesting enough to be listenable. Clearly it doesn't sit within the classical tradition of composed music, but it really is quite passable as 'noodling' or improvisation.


Regarding 2a: human music that most people listen to (e.g. on Spotify) is also handpicked to show only the best results.


As a composer, I know my craft is far more than just deciding how to make music. It's about getting in the mind of the listener, making them second guess what's coming, or to make them experience an emotion that's already in their brain, it just has to be coaxed out. All art does this, and it takes a uniquely human understanding of the human brain to make it happen successfully.


Not sure why you’ve been downvoted.

I agree it’s about triggering emotions... and for that having the ability to empathise helps.


These attempts at procedurally generated music remind me a bit of twelve-tone technique (https://en.wikipedia.org/wiki/Twelve-tone_technique) where Schoenberg and other composers came up with new rules for writing music. The resulting twelve-tone music sounds awful (to me), worse than the procedurally-generated examples from the article.


The twelve-tone technique is like a Gaussian blur for melody. It generates a melody which is all-but-impossible for a listener to focus on actively.

That can be useful in certain scoring contexts where you want the music to be background while dramatic on-screen elements occupy the foreground.

That's a pretty limited use case, though... which is basically my point. In most circumstances, the twelve-tone technique doesn't produce results many people find compelling.


Here's an example of some nice procgen music.

https://github.com/devinroth/GenerativeMusic

Its a lot more opinionated, its not just a blank slate at the start, but thats the only way to generate stuff that sounds nice, as things stand at the moment.


The music being generated in the article doesn't sound like it has any coherent structure. When you procedurally generate terrain, for example, you do it in a way that is coherent and has structure similar to what you find in nature. You have areas that are the same height, areas where the height varies slowly or quickly. But you don't generally just have random bits of terrain stuck together. This music sounds like random bits of notation stuck together. There doesn't appear to be a time signature or chord progression or any melodic development. I think building that type of structure into it would be a good first step.


As someone who's worked on this a lot, this article resonated but I disagree with the conclusions. (Or at any rate I feel like my procedural stuff is listenable, though I can't read or compose music.)

I think the author's biggest problem is the bottom-up approach of creating lots of raw clips and then at a higher level trying to assemble them. I've found music generation to consistently be a very similar task to text generation, and to me the author's approach seems analogous to generating lots of random sentences and then trying to assemble them into a short story.

Everything I've done that's worked has been completely top down - the core of the engine determines the state of the song (we're in the A section, bar 7, the current chord is V/V, etc.) and the low level parts generate something to match all that. E.g. the melody part might decide to play the root tone of the current chord, but it has to ask the core what note that would be.

The hard thing about this approach is that you need an environment where you can dynamically play notes of an arbitrary pitch/length/instrument, which either means lots of sound fonts or realtime audio gen. I've been doing this with WebAudio, and while it works it's been very painful - for me the "play sounds" part of procedural music has more difficult than the "compose music" part.


Have you tried tone.js? It's pretty easy to setup and play with, and lets one use a few samples to imitate any instrument.

Edit: tune -> tone.


Do you mean tone.js? If so then yes I started out using it, but wound up rolling my own engines. I don't remember specifically why, but something to do with wanting less magic and more low-level control.

I wanted everything to be realtime web audio, so I wound up just building instruments out of oscillators and so on. Here's a very old sample to give the idea:

https://fenomas.com/2017/12/advent-17/


Yes, sorry, it's tone.js

I find using the sampler very easy and convenient.


Interesting. Is there anywhere I can listen to the music generated by your algorithm?


What I've done has all been ad-hoc logic to build melodies out of patterns and whatnot, as opposed to an ML-style algorithm that learns from input data. That said, here are some samples:

a jazz chiptune: https://fenomas.com/2017/12/advent-17/

Zelda as gypsy jazz: https://fenomas.com/2017/12/advent-13/

vaguely fuguelike: https://fenomas.com/2017/12/advent-18/


Dunno what you were expecting. Granted, it's not Mozart, but then, if music were easy, someone would have solved it and put the whole industry out of business. But the examples you posted - which, granted, were apparently the cream of the crop - sound decent enough. I think this voxel fellow is going to hard on you. If you filtered it to only capture the decent stuff, it would be fine to have in the background while the player was focused on the game. Though I suppose whatever you get someone to compose for you will sound that much better.

On that basis, I would definitely grant the implied generalization of the title: if the audio samples in this post aren't good enough for you, then procedurally generated music won't cut it. On the other hand, if you want procedurally generated music, and you could see yourself living with the music from this post, it would definitely be workable.


Iannis Xenakis experimented with stochastic music, as well as compositions that were derived from his own architectural designs. A lot of it is very hard to listen to, but he was a trailblazer in this direction.


Dance/electro/pop/(?) band YACHT released an album that basically started as an AI mashup of there previous albums. It's actually not awful(!), but presumably is heavily reworked.

https://arstechnica.com/gaming/2019/08/yachts-chain-tripping...


A lot of sounds are procedurally generated (e.g. noise sources in synths). Composition is much harder though.

My friend once said: Would you ever read a randomly generated book? ... so why do you think that any other art form would become palatable under procedural generation? (game levels, music etc.).

Procedural generation is quantity over quality. It makes sense in some settings, but not in 1st order art.


> It makes sense in some settings, but not in 1st order art.

I'm forced to disagree -- in part -- here. There are plenty of beautiful and stirring (or even just intriguing) generative pictures out there. I agree in part, though: from my own experience, generative music composition is harder than generative imagemaking. Probably because there are many, many more dimensions of choices that need to be made.


> Would you ever read a randomly generated book?

1. You may have, and didn't notice! ;-)

2. Games use randomly generated terrains and it feels just fine.


Terrain yes, npc face yes. Levels... no


Algorithmic composition is indeed a wicked hard problem.

But were you are aware that there exists a whole industry for licensing music for the kind of thing you're doing? Getting music licenses is not hard. It might be too expensive for your use case, but this is what many game companies do.


Whatever is generated needs to align with the scales’ “mode” so that each part’s i,iv,v are the same notes. Then sections of 16 bars verse, chorus, 16 bars, chorus, bridge, chorus or some variant of song structure and I’m sure this would crank out hits.


Listening, I feel the tunes have maybe some short-term pattern from line to line. But they don't go anywhere. Just noodling along without a plan. That's got to be part of music: it has to go somewhere?


It would take a long time for one person to make PGM-created music (melodies, variations, chord progressions, rhythms, ABACA-type forms, etc) decent enough to be sort of satisfying.

I'd bet that in that same time or less, you could learn to improvise equally decent stuff - probably far better - yourself on at least two instruments. If you've been listening to music for 5-10 years, you'll probably never 'train' a machine to know what music is as well as you've been trained.

Whichever way you go, if the product can't compete with the flood of stuff that's already out there? including production quality? I can guarantee you what people's reaction will be. Not pretty.


I think part of the problem is that the 'simple plan' at the foundation of the attempt was this:

1. Have a background server process generate 8-second three-part-tune clips

2. Use some basic heuristics to guess at the key signature (“C# major”) of the clips, evaluate their intensity (lots of drum hits? loads of notes?) and save them in a huge clip database.

3. Create a search interface for the clip database.

4. Have the client request clips from the server in a specific key and intensity.

5. Weave two or three clips together, repeating them a couple of times, to make a full “song”.

I think thats just a bad plan, and if you start from that, any embellishments or improvements you try to make later just wont get anywhere.

Why is it a bad plan?

Working out what key signature a short phrase of music is in is not that simple.

e.g. you can look at the notes and say 'well that fits C major' but actually C major and A minor contain the same notes, and the difference between them is subtle, its to do with how often the root note is used, or even more abstract stuff like how often the root note is implied. There is kindof a probability distribution of notes for each key signature, and that distribution is how we recognise them. Sometimes there's ambiguity, and thats part of the art.

If the generated clip only uses (say) 6 semitones, there might be a multitude of differnt key signatures or scales that it could _potentially_ fit into to (Pentatonic, phrygian, Major, Minor etc etc with various different root notes), but the most accurate one musically would be very hard to determine out of context, because in reality the context around the clip (say the 20 bars before and after) play a big part in how a 'clip' would be experienced.

And if you're going to generate random clips to begin with (step 1), I'm hoping those werent random clips where any of the 8 tones or 12 semitones in play have an equal probability of being used, because thats just going to sound muddy.

I guess he was going for a system that had very few rules and then tried to get the system to learn from there, but I think thats just too muddy a starting point to ever get anywhere.

And obv purely generative music is hard, but what I'm talking about here I think explains why the stuff he generated was classified by him as 'awful' rather than just 'weak' or a bit boring.

TLDR: I think a misunderstanding of what a key-signature really is might be to blame. Its not just 'which notes', its a whole probability curve and its very contextual.


Well, Punk-o-matic's Random Song sounded pretty good.


I don't know I quite enjoy some of the generative.fm streams, but they're kind of highly rigged to sound decent.


i am currently rocking out to cube drone and enjoying it.. I feel like I'm on a quest to slay a dragon and save the village of cubedroneville


Weird nobody mention Brian Eno or Damon Albarn here.


I thought the last song, the goodbye, was great.


I loved reading this very much. Well done.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: