I don't blog enough; also sound

Posted by JoshDreamland on Sept. 8, 2013, 11:43 a.m.

Hell, I don't community enough. I hang out on the IRC, but there's just something inconvenient about going from blog to blog, which has, without my noticing, been made much less inconvenient lately.

Anyway, hi.

I was delving back through the forgotten pages of my late childhood, thanks in huge part to the Harmony of a Hunter album I had playing while I went about my usual coding routine of poking at the piece of code that doesn't work, recompiling it occasionally to see if it wants to just start working randomly. It reminded me of the explosive excitement erupting through me as I waited for the new Metroid installment, Metroid Prime 3: Corruption, for which they had bothered to compose a really awesome new intro tune which did just as good a job of capturing the Metroid feel as the original. I remember watching the spiderball boost jumping intro, and thinking of how I'd do that in Game Maker, because at the time, GM was the shit. I must have had some incredible imagination back then, because now, when I try to think of ways to do that in Game Maker, I no longer possess a neuron which does not immediately return the string, "YOU CAN'T LOL; GIVE UP."

It's amazing how many memories a little tune can have attached. Music is actually a pretty integral piece of my life, despite my not having any musical talent and almost no musical appreciation. That brings me to my next point: we need a new sequencing format. I've been meaning to mention this for a while.

I'm sure a lot of people around here (or who used to be around here) have played or even created MOD/XM/S3M/IT sequences; they're the (generally) chiptooney arrangements that store all their samples and such in the audio file, then sequence them in real time as they play. Back in the day, all video game music worked that way, whether the sequencing program shipped with the console or as part of the game (or indeed, as part of the sound file).

If you know a thing or two about sequenced audio, you'd know that XM, S3M, MOD, IT… all these sequence formats consist of <INSTRUMENTS/SAMPLES><PATTERNS><TRACKS>, where samples are notes of an instrument stored in one way or another, which are then arranged into patterns, or small melodies, which are then played in tracks. If you know three or four things, you might also know that this isn't how console music works. NSF, GSF, PSF, VGM… all these sound formats instead are laid out as follows: <PLAYER PROGRAM>.

So yeah, there isn't much to video game music formats; the entirety of the format is a specification for a little ARM program that generates the song as it is run. Good luck converting that to XM. An AI could do it; a patient human could do it. In some cases, a program can do it, as, historically speaking, there is a disjunct between artists and programmers which has led to the need for artists to have a simple, understandable way of sequencing notes that doesn't involve ARM assembly. What I'm saying is that many of these sound files use the same player format as a base, creating a sort of sub-format which is theoretically simple to convert to a regular sequence file. However, this isn't something I'd recommend doing; someone informed me that it has been attempted, but he couldn't name a program that did it (or he named one that I was unable to find and have since forgotten about).

The point is, these formats are not trivial to convert, and impossible to convert in the general case. There are a bazillion different sequence formats, and the players that exist for them aren't very portable and have all sorts of nasty licensing problems. Wouldn't it be great if we had a sound format that was backward-compatible with all of them?

It really isn't that hard to accomplish, when you consider the implications of supporting conversion from a sound format that is a damn ARM program. The solution? Support basic scripting, and compile it to byte code. Like Java byte code, only hopefully not anything like Java byte code, because it gives programmers the heebies.

Maybe you're not convinced. Consider the original Super Mario Bros music. The samples were a square wave multiplied by a sine wave. With time, samples became more complicated, eventually becoming small wave files, but the concept was the same, and the point stands: You can make extremely memorable music using a wave function that fits in a kilobyte and uses nothing but sin() and some loops.

Now, a player program alone is not sufficient. There are obvious problems with it, most of which I've already mentioned. Even with a good GUI to generate sample programs for you, a program for each wav file seems like overkill. And it is. The bytecode would need to be emulated, or at best, JIT-compiled, but either way, you have overhead from safety bounds checking. So we'd want to support the basic framework of samples, patterns, and tracks, too.

So here's the idea: Two sample types: file, and function. Sample functions can be played for any length, and are probed as any stream. Sample files are just included and transformed by note pitch + effect, where effect can be a built-in preset, as are available in XM/S3M/IT, or—you guessed it—another program! So you can have a program whose only purpose is to transform input given source and destination pitch(es), for the purpose of creating more complicated, ear-pleasing audio effects.

Now, sprinkle on an optional compositing program and, look at that: you have an entire audio pipeline. So you have samples which can be files or programs, which can be modulated to the correct pitch(es) by the default sequencing program or your own program, and then composited into your song by the default compositing program (multiply by volume, sum as floats, cast to bytes), or your own program. And of course, all pattern generation can be run in parallel.

The actual programming language would be imperative and support such functions as sin(), tan(), oggdecode(), flacdecode(), and random()/random_start()/random_next().

Sounds cool, but what have we accomplished?

You can convert to this (Turing complete!) audio format from *ANY* audio format.

I.e.: You can convert to this (Turing complete!) audio format from *ANY* audio format.

To clarify: You can convert to this (Turing complete!) audio format from *ANY* audio format.

To convert from MIDI:

1. Grab all relevant samples from the sound font, adding them as file-based samples to the new file.

2. Convert each instrument track to a pattern + track

3. ???????

4. MICROSOFT LAWSUIT

To convert from XM:

1. Figure out which way to hold the XM specification

2. Attempt to read and understand the XM specification

3. Fail; repeat from (1).

4. ???????

5. Copy over samples as file samples.

6. Copy over patterns.

7. Translate the effect indices (see step 2)

8. PROFIT

To convert from NSF:

1. Binary translation. Yes, you.

2. Place translated binary in a new function sample.

3. Create single pattern containing single note of infinite length

4. Create track containing that pattern

To convert from any other existing format:

1. Decode to WAV

2. Encode to OGG

3. oggdecode() in function sample using original sample as raw data

4. Single sample playing decoded ogg

5. Single track with single pattern playing that sample

As an interesting side-effect, you could make this continue forever: http://www.wimp.com/pisounds/

The more you know ♪

TL;DR: Hi.

Comments

JuurianChi 11 years, 3 months ago

Interesting.

I smell a Kickstarter Campaign.

Jabberwock 11 years, 3 months ago

I think I can already do all this with the .xm format unless I am missing something? There's a program that can convert MIDI to .xm, and everything else you suggest can, I think, be done by converting the file to a .wav, loading that .wav as a sample, then creating a pattern that plays that sample back. Which isn't really very useful anyway. In fact, there's already a program that lets you convert .nsf files to FamiTracker modules, which can then be converted to MIDI:

http://famitracker.com/wiki/index.php?title=NSF_Importer

Perhaps I'm misunderstanding your point. I mean, it would be nice to have all this functionality in one place, and it'd be nice to be able to take modules from other consoles and turn them into editable MIDI/.xm files (such converters might already exist, I dunno). Also, I don't think a direct .ftm to .xm converter exists, and that'd be pretty useful. But I am not sure if there is a need for a new file format like you describe.

sirxemic 11 years, 3 months ago
JoshDreamland 11 years, 3 months ago

@Jabberwock: The point was that such a conversion isn't possible; an NSF file is nothing but a program that generates music. There are no notes, no samples, no nothing in NSF. The program you found is brilliant; it works by emulating the NSF to create a wav file, then analyzing the wave sound to reconstruct a sequence. It can be tricked, and won't work right on all files.

Again, the issue is that there are no notes to read. You have to reconstruct them. So an NSF→XM converter is AI complete.

My format instead does a binary translation; you get a program in, and a program out. Thus, my audio files are Turing complete. The converter is a regular program. [:P]

@sirxemic: See, there really aren't any competing standards. There are thirty-five different tracker formats, and none of them offer anything similar. The single greatest tracker library is libDUMB, which supports MOD, XM, S3M, and IT files. The single greatest video game music library is libGME, which supports NSF(E), VGM/VGZ, GBS, SPC, SAP, GYM, AY, HES, and KSS. A binary format would be the end-all, be-all.

Jabberwock 11 years, 3 months ago

Okay, I guess I needed to read some stuff more carefully. Are you suggesting that your program would read the assembly code and convert it to some more high-level code? That would be cool, and I think your idea of being able to create "samples" that are functions could be really useful. I'm just skeptical of your claim that it would solve the problem of multiple formats once-and-for-all.

I dunno, perhaps I am looking at this too much from my own peculiar perspective. To put it simply, as someone who actually wants to make music that can be played back on an NES, I don't want some master format, because an NES won't be able to understand it. (Unless you wanted to create a virtual machine on the NES. If you can do that I'll never question you again. ;) )

JoshDreamland 11 years, 3 months ago

I'm suggesting the opposite; the low-level code gets converted to a similar flavor of low-level code. It might be slightly higher level in an effort to reduce the number of instructions to JIT, but it isn't a scripting language. The scripting language would be for designing the sounds, and it would be compiled into the bytecode to minimize player size.

Now, as far as playing this on the NES… That's a really cool idea. Backward compatibility wasn't a consideration. Technically, it is possible, but I'd have to keep the goal in mind while designing the byte code. I think it would be entirely possible to compile these sounds back to NSF, provided only that your sound program isn't too big to run on it.

I'm not sure of the NES's hardware limitations; it might be that even converting NSF→my format→NSF would make it too big to be played on the console anymore. Or it might be that anything you can fit in memory is fair game, meaning you could do anything but play a converted OGG on it. In an emulator, you might be able to play a complete OGG file as an NSF by converting and compiling it.

aeron 11 years, 3 months ago

I really like the idea behind this, especially the idea of programmable music in general. The possibilities beyond the realm of simple playback and compatibility with other formats are what fascinate me most. Namely, procedural music and the ability to write synthesizers and effects from the ground up.

As an aside: I once considered how reasonable it would be to have a game, for example, that instead of including pre-rendered music, it came with a project from a DAW with all the necessary VST plugins used. The music would be played back and mixed in realtime, and could even be altered and automated based on what was happening in game. Pitches could be changed or swapped out, channels could be muted, effects could be added. Lots of possibilities within that space, but it would take HEAPS of memory and processing power that most rigs couldn't handle alongside a full game (since you're essentially running the engine of the DAW in the background).

Your proposed format could have many of the same benefits, but wouldn't have the overhead of so many VSTs in favor of integrated effects and synths. Though to be honest some aspects confuse me. For one, it seems unnecessary to have the compatibility with formats like NSF or OGG the way you describe them, since you are essentially re-implementing them behind the JIT the same way they would be implemented outside the JIT. You are essentially describing another container format. When you consider that a particular product/game using the library would likely only use one type of music for consistency, why should someone use this container when they could just include libvorbis or libGME to the same effect?

In the case of NSF and other bytecode formats, I can only really see the benefit if it acted as more of a translation than a reimplemented player. I.E, the five channels of the NSF are reverse engineered into instruments, tracks, and patterns that can be manipulated further by scripts. This, as you've stated, is not a trivial task, but without it you're left with a single note that says nothing more than "play the whole song like the NES would", analogous to the elusive "make_game()" function that never made it's way into game maker. Perhaps to get around this type of problem, the compatibility could be with FamiTracker's native project format as opposed to raw NSF files. FTM maintains instruments, patterns, and song and would thus be compatible with the generalized tracker format.

I feel like I had something else to say, but it's escaped me at the moment so I'll wrap this up. To me, the benefit in the format you describe is the ability to write music that mimics chiptunes, tracker modules, or anything else (including a fully synthesized track), all from one unified editor. The editing experience will be consistent across schemes, you would be able to go into any converted or from scratch project and rewrite any given note, pattern, or even rewrite the entire piece. Finally, the icing on the cake is that the playback is consistent through a unified library.

firestormx 11 years, 3 months ago

Quote:
but it would take HEAPS of memory and processing power that most rigs couldn't handle
Why does Stevenup need 64GBs of RAM?!

Also, hi Josh.

JoshDreamland 11 years, 3 months ago

@aeron

Quote:
Though to be honest some aspects confuse me. For one, it seems unnecessary to have the compatibility with formats like NSF or OGG the way you describe them, since you are essentially re-implementing them behind the JIT the same way they would be implemented outside the JIT.

Let me clarify: OGG and FLAC are for reducing file-based sample sizes. So if you do need a multiple-second sample, you are able to compress them. It's like PNG depending on zlib. Being able to play an ogg is a pleasant side-effect.

Also, you can't just include an arbitrary NSF file in the format and expect it to work, unless your sound program is in fact the emulator. I was suggesting an ahead-of-time binary translation. It's still my binary format which is JIT'd, not NSF. So there's no overhead, and no need to include any info about NSF. Users can only JIT my format, and only decode OGG/FLAC. But by doing a conversion from MP3 to OGG ahead of time, they can also play converted MP3s, and by doing a conversion from NSF's ARM format to my format, they can also play NSFs. If they wanted to play arbitrary NSFs, they'd have to include the converter program, or just use libGME.

The goal was only to create a format which could be backward-compatible with all those other formats, not a format that offered to actually play them. [:P]

Quote:
This, as you've stated, is not a trivial task, but without it you're left with a single note that says nothing more than "play the whole song like the NES would", analogous to the elusive "make_game()" function that never made it's way into game maker. Perhaps to get around this type of problem, the compatibility could be with FamiTracker's native project format as opposed to raw NSF files.
The ability to tell a note, "play the whole song like the NES would," is pretty novel. The idea wasn't to let you manipulate the notes as in a conventional tracker, but instead to say "my tracker format is so great, there's an NSF 2 MyTrackerFormat converter!" Quite a boast, but a much more accurate boast than FamiTracker's, which I'd bet my last dollar fails with considerable frequency. So yeah, the internal representation of NES Mario wouldn't be very pretty, and it wouldn't light up the face of the first kid that went to edit the song. But it'd play with 100% accuracy in a (I would hope) well-supported tracker format.

So yeah, it's not that we would offer compatibility with FamiTracker; we'd just offer a converter. NSF2MTF, Fami2MTF, GSF2MTF, MIDI2MTF…

Quote:
I can only really see the benefit if it acted as more of a translation than a reimplemented player.
Bingo.

What I was really going for isn't a unified experience for the composer, so much as a unified experience for the player. If you support this tracker format, a determined user can port all of his songs over such that you can play them. No matter if they're notes and samples or a huge, ugly, impossible-to-maintain binary blob.

Also, hi, fsx.