HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
Algorithmic Music Generation with Recurrent Neural Networks

Matt Vitelli · Youtube · 67 HN points · 2 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Matt Vitelli's video "Algorithmic Music Generation with Recurrent Neural Networks".
Youtube Summary
This is the result of a project I worked on for CS224D with Aran Nayebi. The idea is to design a neural network that can generate music using your music library as training data. While I believe we could probably improve upon this model significantly, it serves as a good proof of concept for showing that it is indeed possible. We use Long Short-Term Memory (LSTM) networks to model hidden recurrences over time. For this demo, we used a network that was trained on a variety of Madeon songs.

UPDATE:
Due to popular demand, we've released the source code on GitHub. Check it out here: https://github.com/MattVitelli/GRUV

Happy training! :D
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
> But doing automated dance music makes sense.

This has been done with LSTM[1][2]. Impressively, the NN is generating waveforms and not MIDI notes - at around 3:50 it even attempts some singing.

[1]: https://www.youtube.com/watch?v=0VTI1BBLydE [2]: https://github.com/MattVitelli/GRUV

jerf
"at around 3:50 it even attempts some singing."

Mmmmm... that sounds like overfitting. That's not "attempting some singing", that's "playing back one of the things it trained on". Which really raises questions about the rest of what you hear, too; it seems like what is being produced is probably in some sense the "average" of the training data, rather than something able to generate new samples from it. But it's a very interesting "average" full of interesting information.

Since I wouldn't expect this to produce much else, I'm not being critical about the effort, just pointing it out so others understand what they are hearing. It was an interesting and worthy experiment that I wondered about myself.

illumin8
Very cool! I actually think Jazz improvisation would not be very hard to do:

One of the things you realize if you study jazz improvisation deeply, is that it isn't that random. Someone like Charlie Parker learned over 100 interesting riffs or patterns, then learned to play them in any key (transposing as necessary), and when improvising, transposes each into the chord the band is currently playing, and arranges them in a unique and interesting order.

Indeed, this is why many great Jazz musicians learn to improvise by transcribing solos of Charlie Parker, John Coltrane, etc, and learning their riffs in every possible key. Transposing is one of the best possible ways to learn to improvise, because it teaches you to listen and hear notes, as well as the patterns/riffs that everyone copies from each other.

There is even a great book called "Patterns for Jazz" that captures many of the most powerful riffs used by these musicians.

The really interesting thing about this is while most of the listening public assumes jazz is pure improvisation, much of it is copied riffs just rearranged in unique and interesting ways. I don't mean to detract from it; jazz is still a great musical style, but like all styles results from derivatives of previous works.

edit: grammar

Jun 24, 2015 · 67 points, 31 comments · submitted by rndn
mpdehaan2
I'm not 100% up to speed on my AI, but this sounds about like what you'd get with random variations on a signal, where the neural net is the "which sounds like X" filter, and picks one of the two to survive. But that would be using both some form of a genetic algorithm (details TBD?) and the neural net as the checker. But is it?

If they aren't doing it that way, I'd be interested in hearing how it's evolving the signal in that given direction - and also how that filter works (what libraries does it use?).

Sounds like it hit some sort of local maxima, so this system won't ever produce the original song, but something a percentage of the way toward it.

I'm a bit more interested in algorithmic composition, but this could be interesting if trying to blend genres. For a long time I've wanted to build a program that could produce essentially an infinite song morphing between genres with lots of tunable parameters.

msamwald
It would be interesting to know how novel those sequences are (obviously, the outcome would be far less impressive if what we hear is basically a looped, noisy sample of a song that already exists).
m-i-l
Not much information in the video on how this was achieved, but a quick search for "gruv algorithmic music generation" returns the following: https://sites.google.com/site/anayebihomepage/cs224dfinalpro... . Extract:

We compare the performance of two different types of recurrent neural networks (RNNs) for the task of algorithmic music generation, with audio waveforms as input (as opposed to the standard MIDI). In particular, we focus on RNNs that have a sophisticated gating mechanism, namely, the Long Short-Term Memory (LSTM) network and the recently introduced Gated Recurrent Unit (GRU). Our results indicate that the generated outputs of the LSTM network were significantly more musically plausible than those of the GRU.

leaveyou
Another promising field is RNN applied to TED talks: youtube.com/watch?v=-OodHtJ1saY
acd
I think it would sound better if we thought the neural network to play notes and music theory.
anentropic
This is not music. Music is not simply organised sound. Music is a cultural practise.
yyyyes
This is not a cultural practice?
durbin
Source code link?
crucialfelix
The funny thing is that the only really good ones are the first few where they claim its just random noise. The later ones just sound like a crappy radio.

With images the technique works because we like looking at the dense artifacts. Millions of dog heads essentially copy and pasted onto any appendage that looks like it should be a head. It looks like drugs and overwhelms in the same way.

If you just take white noise and throw it through a tuned filter bank (which is in essence and in final effect what they are doing here) then you just get crappy audio.

The more standard and successful use of NN in composition is the use it on pitch series and compositional forms. Like feeding it all of Beethoven and then getting it to generate similar compositions. That's been going on for decades. You can do it with that kind of data.

But the thing about pop and electronic music is that the easily machine observable elements are not very interesting. Listen to the 4/4 kick and snare pattern in the video. Its boring as hell. (Other tracks can be just a kick and snare and they are amazing and we celebrate them as classics and play them for 20 years. Machines will never understand why)

What's great and essential are things like spatial relationships between elements in the mix: how does the surge of the compressed synth/guitar cause the beat to tumble outward and stir you up ? after a series of peaks in a synth melody then the next time it pulls back creating a space that pulls at your heart strings. you create a negative space that the listener goes into. playing with listeners expectations based on what songs, conventions and tropes they already know and respond to.

melloclello
> Machines will never understand why

humans don't really understand why either, if it didn't stop us, why should it stop a machine?

rndn
> Its boring as hell. Machines will never understand why

That statement seems to be overly strong. Perhaps we are far from having a machine that can figure out high level aspects of a song on its own, but I don't think it's implausible that with some more guidance (for example by learning also loop arrangements and filters instead of only the waveforms) these neural networks can potentially create quite interesting music today (especially in the EDM/IMD genre). This development might be scary because it possibly replaces human creativity to a large extent, but you can't stop it by claiming that it's impossible or that it will always be of poor quality. People have said the same when synthesized music arrived, that it lacks human aspects etc., and now it's has a high cultural significance, even though it uses things like auto-tune and consists of super clean loops.

romaniv
This development might be scary because it possibly replaces human creativity to a large extent, but you can't stop it by claiming that

What is scary is the willingness of some people to accept dubious randomly-generated artifacts as "art" and as evidence of consciousness and intelligence. You should not trivialize actual human intelligence and creativity simply because you want to believe in strong AI.

GP post makes a very good points and (like usual, it seems) they get completely ignored in replies.

Setting up some 4/4 beat is easy. And you don't need an AI to throw is a randomly arpeggiated chord in there. It will sound okay-ish. Maybe it will even sound better than the stuff in this video. But that doesn't mean it is equivalent to proper music composition or that anyone would actually listen to that stuff outside of a technical demo.

BTW, if you want to see real procedurally generate music that is recorded by musicians for real listeners (as opposed to by CS grads for other CS grads) you might want to take a look at Karma: http://www.karma-lab.com .

rndn
I was actually referring more to the general public and not to people with highly sophisticated music tastes. This is a first cultural world problem, so to speak. The vast majority will probably be completely happy with an app that composes them their infinite personal soundtracks based on their Facebook profiles (similar to how Ray Kurzweil suggested that we may have chat agents that fool most poeple very soon). That might be a harsh reality to people in the first cultural world, but it wouldn't be the result of the general public wanting to believe in strong AI, but the music would actually give them the same satisfaction they receive from manmade music (or even more due to the personalization). I believe this is a somewhat plausible prediction and claims like "computers are incapable of human emotion and ingenuity!" will not change anything (except for your mood).
romaniv
Are we discussing AI or the supposed lack of music taste in general public? Again, saying "people are stupid" does not demonstrate a particular AI is smart of interesting. And anyway, people are smarter than they often get credit for.

> the music would actually give them the same satisfaction they receive from manmade music

You believe this because...? None of the generated tracks I've heard so far is anywhere near the kind of music people would listen to for the sake of the music itself.

Some people do listen to music to drawn out distracting conversations around them. Perfectly sensible thing to do. However, in those cases music can be replaced with white noise of nature's sounds. This does not demonstrate that music is equivalent to noise.

...

The problem with such AI demos is that they lack a particular purpose (i.e. success criteria), use existing material for training and do not go a step beyond that material. You can take a track and slow it down by a random fraction to get a different pitch and tempo. Viola. "New music." Procedurally generated. But that does not mean you've composed something. You've used a randomized algorithm hand-crated to produce a particular effect. It has no generative power.

rndn
I've already given the example of auto tune, loops and synthetic sounds. People care surprisingly little about authenticity and the top charts are telling about the intellectual preferences of the masses. Perhaps an RNN could even be trained with recordings of human emotional responses like heart rate and goosebumps. It's not just about changing the pitch here and there, but learning deep features about what makes music interesting to listen to.

Music doesn't need to be as coherent as a text (in fact some space for interpretation is often preferred), which also increases my confidence about this prediction.

crucialfelix
> for example by learning also loop and sample arrangements instead of only the waveforms

This is what I mean - its not about the loop and sample arrangements.

I've been doing exactly that since the 90s and I've made some great twisted computer generated stuff. Logical or predictable methods will always result in logical and predictable music. Its all about setting up code environments so that you can generate accidents and mutations and capture them. Its all about reacting and capturing it and putting it on wax (historically speaking).

But that doesn't mean that a computer can understand why or even recommend what is amazing. We humans don't even know why some things are so great. A major thrill is finding some sound that is so twisted (ill, stoopid, sick) and totally bypasses the rational mind, shuts your thinking down and you get a big smile and start jumping up and down acting like an idiot.

Then somebody else copies your track, then it becomes a style, then it becomes a cliche, then beatport is filled up with boring copie, then it becomes a sample set that people can buy, then somebody makes an app that can auto-generate that style and then they claim that computers are making music.

But they are just playing it back, just like tall the human copycats further upstream.

And the entire network of software, creators, audience and cultural is what we call music.

> This development might be scary because it possibly replaces human creativity to a large extent, but you can't stop it by claiming that it's impossible or that it will always be poor quality.

That's what I've spent a lot of my life engaged in. It often makes great music, but the machine cannot understand why. You'll need strong AI for that and it will need to have tensions, depressions, a body, chemical feelings, sex drive, longing and a strong psychological need to be lost in song. Then it could say "David, I think I've found a song you might like."

aflinik
I think I like your musical taste.
sweezyjeezy
The image stuff works because we have found a way to model a good prior for it : convolution layers are basically enforcing some positional invariance and locality constraints on what our model believes the world looks like. Without this very strict prior, image recognition with neural networks just wouldn't really work.

We haven't found a way to enforce a good prior for temporal data like sound yet.

murbard2
It helps but it's not necessary. Schmidhuber obtained state of the art performance on MNIST with a very vanilla deep net (no convolutional layer, no pooling, nothing fancy, just fully connected sigmoid units)
sweezyjeezy
MNIST is really easy, it's not a useful benchmark any more.
murbard2
Sure, but the fact that you get state of the art result without a convolutional prior ought to at least support the argument that the prior is not necessary
sweezyjeezy
You can get something like 5% error rate on MNIST with a well tuned linear classifier, it's not comparable with ImageNet. Note that the computer vision techniques used before convnets used things like SIFT features, which are another way of imposing a (sort-of) prior. I do believe that some sort of strong prior is necessary for the problem.
crucialfelix
I think that music is not just the sound and not just the observable elements. It literally moves your emotional system around. We get shivers, we get horny, our mind shuts down and we get swept away in memories. That's not because of any specific arrangement of machine-observeable sound objects.

> on what our model believes the world looks like

Its not even doing that. Those images are just simple 2D fields of pixels with color data.

It has no idea about any worlds. The spatial cues are way off, but we understand art (impressionism through messy expressionism, glitch) so we forgive it. And because they chose images with puppy eyes and we like puppies. Take away the animal elements and everybody would say the images were boring crap.

TheOtherHobbes
I realised a while back a lot of computer music is really computer science music - it's people who know something about computers but not much about the art of music, playing with relatively trivial algorithms to create music-like results.

There's also the academic musical equivalent - music professors using stock faddy techniques like serialism or (currently) number and group theory.

It's not that this is an impossible problem. It's more that the set of people who can code machine learning algorithms and understand music theory and are creative enough to invent new algorithmic techniques and to create more-than-listenable music is incredibly small - double figures, if that.

So progress in non-trivial computer music has been incredibly slow. The DSP side has been far more successful, because DSP is - in most ways - a much simpler problem.

skwosh
I somewhat agree about computer music, though it appears to be an extension of the process driven composition that has been part of western (art) music for a while now (e.g. modulation).

Sometimes I wonder if linguistics has more to offer composition than algebra (speculation, as I know next to nothing about (non-CS) linguistics).

> It's not that this is an impossible problem. It's more that the set of people who can code machine learning algorithms and understand music theory and are creative enough to invent new algorithmic techniques and to create more-than-listenable music is incredibly small - double figures, if that.

Are you pursuing something like this? Or know anyone who is? This is one of my main interests (alongside better interfaces for composition and well, making music). I'm actually back at University for my second degree to study this sort of thing.

If you have a blog or anything I'd be interested to find out...

crucialfelix
The reason for it has a lot to do with how Academia works. You need to write papers and produce innovative works. If you don't then you won't get funding or advance your career.

> or (currently) number and group theory.

That was Xenakis back in the 1950s !

Xenakis is the exception. He's really the foundation of academic computer music and his music is amazing and moving and his compositional concepts are still being hacked out by music programmers today.

tudorw
http://www.cristianvogel.com/neverenginelabs/about
ThomPete
Music is (dis)harmonies over rythmic patterns. There isn't anything inherently artistic about humans that computers can't replicate with time even the ability to compose an original song. That is besides the lives of humans and their appearance and history which is important but not the only factor.

The irony is that musicians are actually striving for, but failing at, reaching the perfection level that computers have.

And so for computers to sound more human like they have algorithms that make them more "sloppy"

What composition algorithms lack is not the ability to compose like humans but a life that will give them angels and a story.

Then again a lot of music is really formulaic anyway and computers are used for most of it. There is nothing in a few years that will hinder some sort of computer star to be born. But it's probably never going to connect with us the same way another human can. Not for now at least.

BasDirks
"What composition algorithms lack is not the ability to compose like humans but a life that will give them angels and a story."

Quote of the month for me. Bravo =]

TheOtherHobbes
I think that's a good example of what I'm saying - just because you don't understand the details doesn't mean professional musicians and composers don't have much deeper insight into music than you do.

If you think music is [list of numbers] that can be made more "human" with a bit of timing randomisation, then of course it's all perfectly straightforward.

In reality there's rather more happening.

>What composition algorithms lack is not the ability to compose like humans but a life that will give them angels and a story.

No, the music basically sucks as music. The number of people willing to listen to it voluntarily without being paid to - usually as students or academics - is vanishingly small.

The story part only becomes relevant after that problem is solved.

And while it's true that music is formulaic, it's also true that computer music hasn't yet worked out how to copy all the details of the formulas - never mind produce original and memorable new formulas from scratch.

The best formula copier is probably Cope's EMI, and that sounds exactly like what it is - a slightly confused cut-and-paste cliche machine, not a human composer with a point to make.

ThomPete
I think you are having it the wrong way around.

Music becomes meaningful in the listeners mind, and the things that make it meaningful is both that it's formulaic (structure) and whatever the performer instills in the listenter.

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.