Can software render music as well as a skilled human with a physical instrument and a microphone?

Question

[...] physical playing has been replaced by software quite a while ago. [...] almost nobody can distinguish a computer rendering from the real thing on a recording.

Can software really convert sheet music into an audio recording as well as a skilled human with a physical instrument and a microphone (as judged by the audience)?

Comments are not for extended discussion; this conversation has been moved to chat. — Richard, Aug 31 '21 at 20:22

user1079505 · Answer 1 · 2021-08-31T13:08:32.497

Can software really convert sheet music into an audio recording as well as a skilled human with a physical instrument and a microphone (as judged by the audience)?

"Converting sheet music into an audio recording" consists of two steps: recreating instrument sound and recreating artist expression.

Recreating instrument sound

Virtual instruments do exist and are widely in use. You tagged your question as synthesizer but while modeling sounds of real instruments was explored in the past, presently most instruments emulating sounds of real acoustic instruments base on sampling. Recording technique is developed since over 100 years, and today it works just perfect – you record a note played on an instrument, and then you play it back. It can't go wrong.

Typically an instrument consists of several or even several tens of GB of audio samples. The challenge, besides the obvious one of getting a consistent and high quality recording of all notes of the instrument, is to collect samples representing various dynamic levels and articulations. Another difficulty is to represent various nuances, like moving from one note to another, repeating the same note, ending a note, additional mechanical sounds made by the instrument...

Imitation is the most successful in music styles that don't rely too heavily on subtle instrument articulation changes, though the instrument producers keep pushing these boundaries farther.

Recreating artist expression

This is more difficult, as it's not well defined in technical terms. Sheet music represents certain idea, but the composer must rely on the musician's ability to interpret it. A musician will play each note differently. They will emphasize metric accents and phrases using capabilities of their instrument. A crescendo mark represents idea of increasing the volume, but in practice a simple gradual change may not create appropriate artistic effect. And "appropriate artistic effect" is a highly subjective term.

Various instrument offer various expressive capabilities. The most universal ones are timing and changes of tempo. Most instruments also can be played with various dynamics and may also provide various articulations. These nuances typically are not, or even cannot be precisely notated in a sheet music meant for humans. A music score fed directly into a virtual instrument, and thus devoid of these nuances will sound rather mechanical and not interesting.

Much better results can be obtained by converting sheet music to a MIDI sequence with manual adjustment of dynamics, timing and articulation of each note. It is a tedious process, and most importantly: manual. While some software tools can aid in it, human involvement is necessary. And recreating the expression of a skilled musician this way is still very challenging.

In practice MIDI sequences which are played by virtual instruments are often recorded by human musicians on MIDI controllers that capture their expression. This can yield very good results, but as a musician is involved, this doesn't fulfill the requirement of "converting sheet music into an audio recording by software".

Use of programmed music sequences emulating the real instruments is therefore the most successful in styles and situations that don't rely heavily on artist expression and interpretation, that have fixed tempo, or where said instrument is in the background. But as soon as we focus on artistic interpretation, a machine cannot replace a skilled musician. At least not yet.

The "fun" part: MIDIs recorded by human musicians on MIDI controllers often come with horrific tuplets, off-beat 64th notes, and readability issues. (At least that's my experience seeing others' MIDIs on Musescore and Finale Notepad.) — Dekkadeci, Aug 31 '21 at 10:34
MIDI file is not a notation format, but sequence of events (note on, note off, control change). If the musician doesn't play precisely on the beat, naively converting event timestamps to notation will produce horrible results. — ojs, Aug 31 '21 at 11:09
@ojs that's exactly the point of my answer. A good MIDI sequence doesn't have a simple correspondence to sheet music notation, and conversion between the two is not trivial. I made an edit to make it more clear. — user1079505, Aug 31 '21 at 13:12

Нет войне · Answer 2 · 2021-08-31T09:59:09.550

No, software that could take any given piece of sheet music and render it with the inflections and expressive devices that would be expected of a human performer (or group) is not yet generally available. A musical score isn't generally intended to be rendered precisely, but in a stylistically-appropriate way (see What does it mean to "play what is not written"? ; What does play with feeling mean?) and you need your performer to add that stylistic 'flavour'.

There's no reason that it would be impossible to create such software, and a google search for "render sheet music expressively" brings up links to a number of research projects - but we're certainly not at the point where "physical playing has been replaced by software quite a while ago" is a sensible statement.

Even if you involve a human in playing or programming in a digital performance for a synthesizer to render, it's arguable that synthesizer technology itself isn't quite at the point where it can reproduce all physical instruments in a way that would satisfy critical ears - so the statement "physical playing has been replaced by software quite a while ago" seems to me to fail on that count as well.

It seems rather similar to making a statement like "physical acting has been replaced by software quite a while ago". Clearly computers are able to help humans generate some very realistic artificial imagery, but you can't yet feed a script into a computer and render a photo-realistic movie.

score 3 · Accepted Answer · answered Aug 31 '21 at 12:11

3

The question was "Can software render music as well as a skilled human with a physical instrument and a microphone?" and before you answer no, (which seems to be most people's answers here) you should check out NotePerformer which performs music scores (Sibelius, Dorico, Finale) using AI. I'm only mentioning it because no-one else has.

Sure you can tell it's not live musicians, but it's far better than the sample or GM playback that music typesetting programs generally provide out of the box.

It sounds like an actual performance because of the way it renders legato and dynamics changes using AI. Check out some of the sample audio snippets on the site.

answered Aug 31 '21 at 12:11

Brian THOMAS

11,543
1
36
79

2

Being aware of NotePerformer, I only didn’t mention it because it still doesn’t replace humans – Todd Wilcox Aug 31 '21 at 13:45
Wow! although samples can be misleading, if the same pieces were part of their training dataset (then the trained neural network would essentially copy the training data) – MWB Aug 31 '21 at 17:11

score -1 · Answer 4 · answered Aug 31 '21 at 03:07

-1

The word "well" is subjective. If by "well," you mean "precisely," then 100%, computers can do infinitely better than any person can. Scales at 1000 BPM, all notes played at exactly the same volume? Check!

If by "well," you mean "played with dynamics that give human beings deep and complex emotional reactions," then YES. . . but not yet.

As long as you can hook up an AI system to something real (like a brain), and you can correlate that to something abstract (like subjective reports of feelings), computers can absolutely learn to do that, infinitely better than even the best people.

Give it about 10-20 years before computers start playing "feedback music" that is dynamically interact to YOUR brain at a given MOMENT, to maximize mood.

answered Aug 31 '21 at 03:07

Bennyboy1973

3,943
1
7
18

2

Any sources to back up the claim would be nice. – ojs Aug 31 '21 at 03:26
Which one? I made several claims in that answer. – Bennyboy1973 Aug 31 '21 at 09:06
1

All of them, of course. But let's start with the 10-20 years. It sounds suspiciously close to the time when fusion power will be available. – ojs Aug 31 '21 at 09:55
1

I'm also interested what "hooking up an AI system to a brain" means – ojs Aug 31 '21 at 09:56
Well, we can't measure experiences objectively, so we need to establish some physical correlate that CAN be-- brain waves, blood flow, and so on, often taken in conjunction with verbal reports: "What happens if I do THIS?" (pokes brain with a pin). "I smell smoke!" – Bennyboy1973 Aug 31 '21 at 10:27
10-20 years is a guesstimate. However, we already have most of the tools right now-- the ability to measure brain function, and the ability to process massive amounts of data. The problem is that we can't get 6 billion people to sit down while we collect information directly from their brains-- so we probably have to use things like subjective ratings on websites-- at least until I'm president of the world. – Bennyboy1973 Aug 31 '21 at 10:29
2

Sorry if this sounds arrogant, but do you have any actual neuroscience background? – ojs Aug 31 '21 at 10:32
Yes, I have a degree in psychology, with an emphasis in special state psychology ("the zone", OBE, etc.). I'm also a professional programmer (I came here from Stack Overflow). I've even modeled basic ANNs on my own (though mine were very poor compared to modern models). If you want to say "prove it," then I'm out. If you'd like to learn something about the fields of the study of neurology and how it might be modeled in AI systems, then I'm happy to give some pointers on where to look. Be warned-- the material gets dense pretty fast. – Bennyboy1973 Aug 31 '21 at 20:44
Also. . . asking questions for credentials isn't arrogant. However, it's a poor man's response to an assertion-- the equivalent to a kid arguing, "Well, MY Dad works for NASA, and HE says. . . " A better approach would either be a philosophical position of your own, or a request for sources-- credible information that might help you understand my position. – Bennyboy1973 Aug 31 '21 at 20:46
1

The people featured on thedailywtf are professional programmers too. Anyway, do you have any sources? – ojs Sep 01 '21 at 05:15
I no longer have access to academic periodicals and so on, but let me see what I can find. Why don't you narrow down your point of skepticism? Is it with the ability to measure aspects of mind (either philosophical or practical), the ability to collect enough data for AI, or a general skepticism about AI's ability to ever "get" the subtleties of performance as they relate to feeling? – Bennyboy1973 Sep 01 '21 at 07:22
(Wait, did you just delete a post, or am I hallucinating?) You don't map "subjective reports of feelings to music," at least not very effectively-- that's basically taking a survey, which is FAR too imprecise (and slow) to train an AI system well. You map subjective reports of feelings to brain function (waves, blood flow, etc.), and then treat those neural correlates as targets for an AI system. Until we all have the Musk-o-matic Brain Sensing Headset at home, it's going to be hard to do this well. – Bennyboy1973 Sep 01 '21 at 09:29
I think it would be enough information to generate maximally irritating noise, but playing generated music to test subjects at machine learning scale and controlling for other variables would take impractically long time. The magic of GANs is eliminating the test subject, and I don't really see how it would happen here. – ojs Sep 01 '21 at 09:34
Sorry about deleting the comment. I was just rewriting it to more polite form when your answer appeared. – ojs Sep 01 '21 at 09:35
Anyway, reacting to listener's mood is just one track here. The actual question was about computer-generated sound that is indistinguishable from good human musician. We already have lots of studio-made music that clearly has only processed samples of human-played music or not even that, and various top ten lists suggest listeners like it :) – ojs Sep 01 '21 at 09:45
In my opinion, we're actually closer to that than to completely computer-composed and -performed music. Modeled sounds are pretty impressive already, and I suspect we'll get something akin to orchestral and vocal "deep fakes" soon enough. "Make me sound like Pavarotti" will be a thing sooner rather than later, I guess. (only guess, though) – Bennyboy1973 Sep 01 '21 at 10:24
1

Yup, the question is in present sense though. In my opinion we're not so close though, keyboard instrument models are pretty good but a human player is needed and synthesizer orchestras are passable if you don't listen too closely. Of course it's subjective so the more deaf your subject is, the better the computer becomes. – ojs Sep 01 '21 at 12:56

Can software render music as well as a skilled human with a physical instrument and a microphone?

4 Answers4

Recreating instrument sound

Recreating artist expression

Linked