Trippr audio programming: Interpolated noise (part 4)

As a little addendum I’ll talk a bit about the secret noise I’ve added to the Trippr tone. It’s silently there in the background. It’s a version of Brownian noise by interpolating between random points in the signal.

We’ll divide our buffer in evenly spaced blocks. We’ll use a power of two so it’ll fit nicely. In this case 64 (2^6), which will fit sixteen times in our buffer of 1024 (2^10) samples. These blocks will smoothly step between random points. We’ll save all our samples in an array called Noise.

uint32_t Smooth = 64;
float Random[SamplesToWrite / Smooth];
int16_t Noise[SamplesToWrite];

Now we’ll get random values between -1 and 1 to interpolate between. We will have to set the first and last one to 0 so the ends will always meet nicely.

Random[0] = 0.0f;
Random[SamplesToWrite/Smooth] = 0.0f;
for (uint32_t i = 1; i < SamplesToWrite/Smooth; i++)
  Random[i] = (float)rand()/(float)RAND_MAX * 2.0f - 1.0f;

Now we’ll use the Smoothstep() function to interpolate between these points. Then multiply these by the 16-bit min/max 32,767.

for (uint32_t i = 0; i < SamplesToWrite/Smooth; i++)
  for (uint32_t j = 0; j < Smooth; j++)
    Noise[i*Smooth + j] = (32767 * Smoothstep(Random[i], Random[i+1], Smooth, j);

The Smoothstep() function I got from this great site. This is it:

float Smoothstep(float A, float B, uint32_t N, uint32_t i)
  float t = i/(float)N;
  t = t * t * (3 - 2*t);
  return (1.0f -t)*A + t*B;

Trippr audio programming: The catch (part 3)

In part 2 we created a buffer in memory for our audio data. Every time our audio card needs more sound to play, our audio callback SDL_AudioCallback() function gets automatically called. A writtenToDevice flag gets set and we will have to fill our buffer with new data. Right now, there’s not much to worry about. All we have to know is:

void *Buffer;
uint32_t SamplesToWrite;

This buffer will be filled with our samples. Each sample holds two channels of two bytes (16-bits).

At the start of the program we allocate memory for the buffer with malloc().

#define BUFFER_SIZE 1024
Buffer = malloc(BUFFER_SIZE * sizeof(int16_t) * 2);

For the binaural beating effect we need two tones, left and right. Both frequencies differ by amount. So the left channel subtracts amount/2 from the base frequency, and the right adds the same amount.

When we have the right frequency for a channel we can calculate the correct value for a sample with the sin() function. This results in a value between -1 and 1. We will have to scale this to the signed 16-bit value of  -32,768 to 32,767. So we multiply the sine-function with the Amplitude variable;

#define SAMPLING_RATE 44100
uint32_t Amplitude = 32767;
int16_t *Output = Buffer;
// For every SampleIndex:
float Sine = sin(TWO_PI * Frequency * SampleIndex / SAMPLING_RATE);
int16_t Sample = Amplitude * Sine;
*Output++ = Sample;

If we use a for-loop over the whole buffer we get a tone at the given frequency.

But there is one catch.

Every time we start calculating the buffer, the value of the very first sample will be 0. The wave will start at the same point at the start of every buffer. This will result in a loud click every 1024 samples. That is, 43 times every second. This results is a very bad noise!

Also, if the frequency was changed last frame by the user. How do we make sure the new wave form cleanly glues to the previous one?

We need to know what the value is at the end of the buffer. But also if the signal was going up or down. A way to do this is to check what the phase of the signal was. The period of the sine wave is the amount of samples per phase. So if the amount of samples per second is 44,100. Then for example, the samples per period of a 100 Hz tone is 441, or SAMPLING_RATE / Frequency. Then if we take the modulo (or the rest after dividing) of the complete buffer by this period, we can figure out what our position, between 0 and 1, is in the phase.

float Period = SAMPLING_RATE / Frequency;
float Phase = (SamplesToWrite - Period * (int32_t)(SamplesToWrite/Period1)) / Period;
// ANSI C doesn't have a modulo operator.
// You can also do Phase = (SamplesToWrite % Period) / Period; in C++

If we multiply this phase by TWO_PI and add that to the sin() function when we calculate our samples, the new buffer has a phase offset that makes it stick perfectly to the last buffer.

This is maybe hard to visualise but you can find the final sine wave algorithm here:

int16_t *Output = Buffer;
uint32_t SamplesToWrite = BUFFER_SIZE;
for (uint32_t SampleIndex = 0; SampleIndex < SamplesToWrite; SampleIndex++)
  Sample = 32767 * sin(TWO_PI * Frequency * SampleIndex / SAMPLING_RATE + Phase * TWO_PI);
  *Output++ = Sample;
float Period = SAMPLING_RATE / Frequency;
Phase = (SamplesToWrite - Period * (uint32_t)(SamplesToWrite/Period)) / Period + Phase;

In the next, last part we will visit a neat trick to create a soft noise as a layer underneath the tones. But for the technical review of how the audio programming was done in Trippr this is about everything I wanted to talk about. Thanks for reading!

Trippr audio programming: The callback (part 2)

So last time we discussed the basics of digital audio. And concluded with a simple command line tool that exported WAV files with audio. But what if we wanted to listen to audio at real-time. Plus, what is we wanted to change the parameters on the fly?

In that case we will have to talk more directly with our sound card. It depends a whole lot on your hardware how this works. Thankfully we have our bloated and rusty operating system that can help us with that. We have API’s to help us. These application programming interfaces help us talk to the hardware. On Windows that is XAudio, macOS has Core Audio, and Linux has problems.

In this particular case I used a cross-platform interface on top of these, called the Simple Directmedia Layer. SDL is fairly low-level, small, and super portable. So you’re welcome to follow along on whatever system if you’d like to.

We need an audio buffer.

I’ve talked about sample rate and bit depth. Now we’ll add buffers. In fact, an audio buffer is a very small audio snippet in memory. But not too small. And not too large… You’ll see.

The engine behind Trippr runs at 60 frames per second. There are a few reasons for that. The refresh rates on most modern computer screens are at that speed. Most of the time the computer’s calculations for the next frame will be done quicker than the 16.6 milliseconds (1/60th second) needed per frame. But it’s still clever to wait for the rest of the time. To stay in sync with the refresh rate mentioned before. But also to safe battery life on a mobile device. The CPU will be sleeping for most of the time! Remember that even one millisecond is a long time for a computer.

One extra reason is that we have some time to play our sound samples. We can calculate new samples for the buffer every frame. Every 16 milliseconds. We know that our audio holds 44,100 samples per second. So we need 44100 / 60 = 735 samples per frame. So the audio buffer has to hold at least that amount (per channel) to have enough data to send to the audio device before it time again to calculate new samples.

Setting up the buffer with SDL.

We set up this system by opening a so called audio device with SDL. We call SDL_OpenAudioDevice() for this at the initialisation of our program. This takes the info we talked about: the sampling rate (44100), the bit depth (16), the amount of channels (2) and the buffer size. The buffer likes to be a power of two. So this has to be 1024. It’s the smallest power of two that’s larger than the 735 we decided to needed. Also this gives us a bit of room when for some reason we don’t make it to the next frame within the 16.6 ms. (For example when the CPU is very busy for a moment)

AudioSpec.freq = 44100;
AudioSpec.format = AUDIO_S16;
AudioSpec.channels = 2;
AudioSpec.samples = 1024;
AudioSpec.callback = SDL_AudioCallback;
AudioSpec.userdata = (void *)Buffer;

In this case the amount of samples in the buffer are actually “sample frames”. A sample frame is channel inclusive. So let’s say there are 1024 stereo samples in the buffer. (Or double that in interlaced samples) The buffer size is 1024 x 2 bytes x 2 channels = 4096 bytes.

Why not make the buffer huge?! We will never be out of data right? Well, when the user changes the parameters in the user interface. It will take the time of the buffer before new sound can be calculated. So the user experience a lag before the changes he/she made are heard back.

Why not make the buffer super tiny?! We will always have a super quick and responsive user experience! Well, anytime this tiny buffer runs out of data before new data is written to it the audio device doesn’t know what to do, and I can promise you it will sound horrible.

The callback function.

So how do we get those samples we carefully calculated to the device? A way to do this is with the SDL_AudioCallback() function. This is a function from SDL that we have to write ourselves. It gets called every time the audio stream to the device needs new data. So we need to implement this function by getting our sample buffer to the audio stream as quickly as possible. We’ll use a memcpy() for this.

void SDL_AudioCallback(void *Buffer,
                       uint8_t *Stream,
                       int32_t SizeInBytes)
  // Straight up copy buffer to audio device.
  memcpy(Stream, Buffer, SizeInBytes);
  writtenToDev = 1;

The pointer “Buffer” is our audio data. We copy that to the “Stream” pointer, where SDL expects the new data to be. When this happened we set a boolean flag writtenToDev so the next time a new frame is set up I know it’s time the calculate a new buffer. And this loops on and on:

  1. Calculate 1024 samples into our local audio buffer.
  2. Wait until SDL_AudioCallback gets called.
  3. Send the buffer to the audio device.
  4. Set up our writtenToDev flag.
  5. New frame: Calculate 1024 samples into our local buffer.
  6. Wait until SDL_AudioCallback gets called.

But how do we fill our local buffer? What are these (16-bit) values we will send to the audio card? What makes the sound? Well, that’s the topic of part 3

Trippr audio programming: Binaural beats (part 1)

The initial idea to create an interactive binaural beats synthesizer dates back to about a year ago when I started talking to artist Seán Hannan. He asked me to write a hip hop inspired soundtrack for his upcoming artwork in New York city. It would be conspiracy themed and I remembered there were some conspiracies floating around about the US government influencing it’s citizens brainwaves through their TV sets using binaural beats.

Actually I already used this effect back in 2013 when I was writing/programming the album Triumph Without Euphoria. It is the basis for every track on there.

If you would like to know more about this audio phenomenon and it’s sound, you can check out this seminal article by Gerald Oster; Auditory Beats in the Brain (Scientific American, 1973)

What are these binaural beats?

I wrote my first version of the algorithm in Max. It basically generates two sine waves, one per channel, with a slight difference in frequency. There are two parameters, the frequency of the average tone and the difference between the left and right channel. Max being a real-time environment developed for live performers makes it pretty easy to implement this as a live synthesis. But it was far from portable. Max is a heavy and proprietary environment and you have to bring your expensive MacBook to let anyone hear your music. Not really a good fit for creating, let’s say, a mobile app.

Digital audio basics.

Digital audio stands on three legs. Sampling rate, bit depth and channels. It uses these to translate a analog signal into a stream of values. (Or the other way around) The amount of values is the sampling rate.  The type of values is the bit depth. There is a channel per stream. In this case two, left and right, stereo.

The sampling rate is since the introduction of the CD standardised at 44,100 samples per second. Or 44.1 kHz. If this value doesn’t tell how fast computers have become than this will: every sample can have a value of 16-bits. That means from -32,768 to 32,767. A crazy amount of resolution. This gives a quick look at the data we will have to deal with. The reason we need this fidelity is because our ears are very sensitive to any glitch or bump in the audio signal. We can hear every digital pin drop.

Since audio chips and files need their data in a one-dimensional array both channels are interlaced in the stream. L-R-L-R-L-… and so on. So in conclusion, our digital synthesizer will have to deal with 44,100 x 16 x 2 = 1,411,200 bits = 176,400 bytes per second (Nothing compared to 475 MB per second for 1080p video streams, but that has a lot more complications involved)

Push power button to start computer.

I started with a small offline command-line tool that exported a sample with binaural beats. I had lots of fun implementing the medieval WAV-file header. The command accepted the arguments on the command line and spitted out a WAV file. I used these on the album for Seán Hannan; When The Iron Bird Flies And Horses Run On Wheels.

In part 2 I will go in further and explain the code behind the synth. And how I implemented the algorithms that create the audio streams in real-time… We’ll talk buffers!

Using a shovel as a hammer

“Everything is a view” says the Apple Developer Documentation. We’re talking about the Interface Builder in Xcode. Their WYSIWYG (do people still use that term?) tool for developing graphical user interface (GUI) files for macOS and iOS apps. The XIB files it creates are actually XML files that hold the information for laying out the user interface. At least that is what I heard. I never really went into it, because, yeah, modern OS’s lack a lot of good easy-to-use documentation.

So everything is a view. I believe that every view has it’s own render clock. So all NSView or UIView objects handle their own calculations on their own clocks and graphics buffers. That could be smart, I see different windows stacked on top of each other on iPads. It might be faster to handle each window as a different texture to send to the GPU. But what if everything is a view? Is every button, image and even every text label a view? I know they are subclasses of NS/UIViews. So, do all those UI elements have their own rendering context running? That can’t be smart!

Maybe it is. Maybe it isn’t. I just don’t know!

It’s the trouble that came with operating systems implementing object-oriented programming in the 90s. We have to work with these huge objects that inherit class after class after class. And most of the time we can’t even figure out what’s going on because we should privatise or “protect” most or all the data these classes contain. I don’t know what my lovely little baby button inherits from it’s parent class. Or how much members it has, or how many messages can be sent to it’s methods. (How about terminology eh?)

We just don’t know how many lines of code the AppKit UI objects each contain… When you carelessly swipe your iPhone screen te the left to get to the next tab, how far in every UI object’s class hierarchy do we have to trace back for code to get executed? And I don’t even want to know how many lines of code Apple’s Foundation type classes like NSNumber, NSString and NSArray contain. I guess that’s also just a bit more than just a typedef:

@interface NSArray (NSExtendedArray)
- (NSArray *)arrayByAddingObject:(ObjectType)anObject;
- (NSArray *)arrayByAddingObjectsFromArray:(NSArray *)otherArray;
- (NSString *)componentsJoinedByString:(NSString *)separator;
- (BOOL)containsObject:(ObjectType)anObject;
@property (readonly, copy) NSString *description;
- (NSString *)descriptionWithLocale:(nullable id)locale;
- (NSString *)descriptionWithLocale:(nullable id)locale indent:(NSUInteger)level;
- (nullable ObjectType)firstObjectCommonWithArray:(NSArray *)otherArray;
- (void)getObjects:(ObjectType _Nonnull __unsafe_unretained [_Nonnull])objects range:(NSRange)range NS_SWIFT_UNAVAILABLE("Use 'subarrayWithRange()' instead");
- (NSUInteger)indexOfObject:(ObjectType)anObject;
- (NSUInteger)indexOfObject:(ObjectType)anObject inRange:(NSRange)range;
- (NSUInteger)indexOfObjectIdenticalTo:(ObjectType)anObject;
- (NSUInteger)indexOfObjectIdenticalTo:(ObjectType)anObject inRange:(NSRange)range;
- (BOOL)isEqualToArray:(NSArray *)otherArray;
@property (nullable, nonatomic, readonly) ObjectType firstObject API_AVAILABLE(macos(10.6), ios(4.0), watchos(2.0), tvos(9.0));
@property (nullable, nonatomic, readonly) ObjectType lastObject;
- (NSEnumerator *)objectEnumerator;
- (NSEnumerator *)reverseObjectEnumerator;
@property (readonly, copy) NSData *sortedArrayHint;
- (NSArray *)sortedArrayUsingFunction:(NSInteger (NS_NOESCAPE *)(ObjectType, ObjectType, void * _Nullable))comparator context:(nullable void *)context;
- (NSArray *)sortedArrayUsingFunction:(NSInteger (NS_NOESCAPE *)(ObjectType, ObjectType, void * _Nullable))comparator context:(nullable void *)context hint:(nullable NSData *)hint;
- (NSArray *)sortedArrayUsingSelector:(SEL)comparator;
- (NSArray *)subarrayWithRange:(NSRange)range;

This is the interface/header file for the NSArray class. And it goes on for a while. 90+ methods and a whole lot of extra unnecessary data we can’t see because it’s encapsulated, hidden. Pretty insane for a data structure that is, in fact, simply a linearly arranged set of data with the same size at a location in memory. I guess this is part of the reason we need crazy fast CPUs for just some simple PDA functionality we already did on our PalmPilots which ran a 50 MHz Motorola processor.

Behold! Our array.

But back to inheritance. Our button inherits NSButton, NSControl, NSView, NSResponder and NSObject. And all these have their own connections and properties, with a lot of functionality built in. In my opinion a basic button is just a struct holding a picture and values for x, y, width and height and maybe a state value. Nothing more. Your input function runs a simple if-statement to decide if the button was pressed.

typedef struct button_type // 28 bytes total
  uint32_t *bitmap;
  uint32_t x, y, w, h;
  } state;
} button;

It is a fundamental problem to try and translate the computer’s data to real-life objects. A computer doesn’t know what a chair is. It knows what a switch is, nothing else. Trying to implement OOP is to use a shovel as a hammer. It works, but it is overkill and possible not very smart to do. I know, it would be crazy to all go back and write our new version of MS Word in x86 assembly. But use whatever language you use in a sane data-oriented way. We all get better from it, and we won’t have to wait around for minutes before Photoshop finally started up.

Watch this great talk by Casey Muratori: “The thirty-million-line problem” for more about this problem from someone that knows what he’s talking about.

A programmer’s story

I never was the gamer. The new edition of Modern Warfare couldn’t ever excite me. Even World of Warcraft bored me pretty quickly. The world was amazing, I had never experienced anything like that before. (I earned all the exploration achievements.) But waiting 6 hours for a dungeon to open up to get that one epic magic sword never really got me. I wasn’t interested in mechanics, I was in worlds.

I never was the sporty one. Playing outside was mostly a drag and competition never interested me. You know, that kinda kid. Instead I programmed Logo. Those were my C64 or ZX Spectrum days; typing over programs from books. Not really caring about what the code did. But still, it stuck with me. In the end the only thing you could really do with it was create screensavers with pretty pictures for your Windows machine. Logo (or I) wasn’t really capable of creating a simulation of any sort. Later I would create the little “guess the number” game or two in Visual Basic. But I couldn’t be very much bothered.

What is a world? A carefully constructed balance of autonomous objects. Like a flock of birds, but on a larger scale, with many more parameters, internal details and external influences. It’s like playing God, man! These objects go over a multitude of scales. From large cities and continents to the smallest molecules and electrons that make our very existence.

The magic of code never really left me. In 1996 I would create my own World Wide Web by creating BMP bitmaps that I would load in a browser. I didn’t even understand what the Internet was. A few years later, when we finally got a connection at home I started creating a whole lot of frames and table-based websites in HTML. Also for other people. Now I can’t even imagine how crazy it would be to ask a 16-year old kid (with no real skills whatsoever) to design the website for your business.

Sometimes I would find one of those magical worlds through my computer. Via the internet or one of the dozen CD-ROMs I bought at the toy store. I remember vividly the window of Norns I carefully bred. Or the ant-like citizens of SimCity that I tried to give anything they wanted for the lowest taxes possible. (No disaster-menu for me!) It was way more exciting than those stick-bugs in our terrarium in the living room or some fish tank at the Chinese restaurant!

Later when I studied at the art school I got back in touch with real programming. I learned Max, Csound, ChucK and Processing. At first I was really into Max, using it for almost everything. Composing music, running a webcam, to making coffee. But soon I would try to create my own objects and I met with the limits of such a purpose-built language. I switched to Processing and discovered how GPUs and shaders worked. Also Dan Shiffman’s amazing book The Nature Of Code showed me what would be possible if you wanted to simulate natural behaviour.

It can very well be that I’m the only person that is still a bit excited about No Man’s Sky and Spore. Since I’m not a game designer at all (or gamer for that matter) those worlds scratch my itch. I couldn’t care less that there is no interesting gameplay. They do both lack natural evolution but have interesting procedural content generation (PCG) for creature and environment design.

After my study I travelled a whole lot for a year or two. I started to really learn programming. Different programming paradigms (which I would soon start to hate). Really understand memory management. Understanding the hardware. Understanding what programming is and isn’t. I slowly got rid on of any overhead and libraries. I went from Processing (Java) to openFrameworks (C++) to SDL (C) to writing my own custom platform code.

Programming computers is about arranging values (data) in memory with the right algorithms. Not about using a big complicated language that encapsulates and abstracts everything to “real-life” objects. It’s not about any language at all. It’s about what you do with it. (Also don’t stick your code into a huge game engine. That’s the reason your game runs slow.)

And this gives you the clear canvas to really create your own worlds: The plan is to create a playground engine where I can test simulation algorithms. As an experiment I will try to build this natively on macOS. If this gets too messy I will go back to SDL as a platform layer.