Isen Kasa

Building myGTuner: A Real-Time Guitar Tuner with C++, TypeScript, and the Browser

A detailed walkthrough of building myGTuner, a real-time guitar tuner that explains the audio, music, and software concepts from microphone input to browser UI.

cpptypescriptaudioweb

Building myGTuner: A Real-Time Guitar Tuner with C++, TypeScript, and the Browser

I needed a guitar tuner, and I did not really want to depend on another app for it.

Most tuner apps are fine, but this felt like the kind of tool I should be able to build myself: listen through the microphone, detect the note, and tell me if the string is flat, sharp, or in tune.

So I built myGTuner.

The code is here: github.com/isenkasa/guitar-tuner.

The final app has three pieces:

guitar / microphone
  -> C++ audio agent
  -> TypeScript server
  -> browser UI

The C++ process captures and analyzes audio. The TypeScript server starts that C++ process and exposes the latest reading over HTTP. The browser shows the tuner.

This article walks through the build from the basic idea to the working app. I also explain the music and audio terms along the way, because words like frequency, MIDI note, and cents are easy to use without really explaining what they mean.


The Problem

A guitar tuner answers a simple question:

What note am I playing, and am I above or below the note I meant to play?

For standard guitar tuning, the open strings are:

6th string: E2
5th string: A2
4th string: D3
3rd string: G3
2nd string: B3
1st string: E4

If I play the A string, the tuner should ideally say something like:

A2
in tune
110.00 Hz

If the string is a little low, it should say flat. If the string is a little high, it should say sharp.

That means the app needs to solve a few smaller problems:

  1. Capture audio from the microphone.
  2. Measure whether the input signal is usable.
  3. Detect the main pitch in the sound.
  4. Convert that pitch into a musical note.
  5. Decide if that note is flat, sharp, or in tune.
  6. Display the result quickly enough to feel live.

Those became the project plan.


The Architecture

The app uses a native audio process plus a web UI:

microphone samples
  -> C++ agent
  -> JSON lines on stdout
  -> TypeScript server
  -> GET /api/current
  -> browser

I used this split because audio capture and signal processing are a good fit for C++, while the browser is a good fit for the display.

The C++ agent is responsible for:

  • opening the default microphone
  • reading audio samples
  • calculating RMS and peak
  • detecting clipping
  • estimating frequency
  • mapping frequency to note name
  • printing JSON readings

The TypeScript server is responsible for:

  • starting the C++ agent
  • reading JSON lines from stdout
  • keeping the latest valid reading in memory
  • serving GET /api/current
  • serving the static HTML UI

The browser is responsible for:

  • polling GET /api/current
  • rendering note, cents, frequency, input level, peak, and clipping
  • highlighting standard guitar strings

This kept the app simple. I did not need a database because the only state that matters is the latest tuner reading. I also did not need a separate frontend project because the server can serve one static HTML page directly.


What Is Sound, In Code?

Before talking about pitch detection, it helps to know what the microphone gives us.

A microphone records air pressure changes. In code, that becomes a stream of numbers called samples.

This app captures audio as mono floating-point samples at 48 kHz:

config.capture.format = ma_format_f32;
config.capture.channels = 1;
config.sampleRate = 48000;

That means:

  • ma_format_f32: each sample is a float
  • channels = 1: mono input
  • sampleRate = 48000: 48,000 samples per second

A sample value near 0.0 means quiet or no movement. Positive and negative values represent the waveform moving above and below its center line. In floating-point audio, sample values usually live between -1.0 and 1.0.

So if we record one second of audio at 48 kHz, we get about 48,000 numbers.

The tuner's job is to look at those numbers and answer:

What repeating pattern is inside this waveform?

That repeating pattern is what we hear as pitch.


RMS, Peak, and Clipping

Before detecting notes, the app measures the input signal.

This matters because pitch detection on silence or room noise can produce nonsense. If the input is too loud, it can clip and distort. Both cases make the tuner less trustworthy.

RMS

RMS stands for root mean square.

In normal language, RMS is a way to estimate the average loudness or energy of the signal.

The formula is:

rms = sqrt(sum(sample * sample) / sampleCount)

Why square the samples?

Because audio samples swing positive and negative. If we averaged the raw samples directly, positive and negative values would cancel each other out. Squaring makes every value positive before averaging.

Then we take the square root to bring the result back into the same general scale as the original samples.

In the UI, RMS becomes the input level.

Peak

Peak is simpler. It is the loudest absolute sample in the window:

peak = max(abs(sample))

If RMS is average loudness, peak is the loudest instant.

Both are useful. A signal can have a low average level but still have sharp peaks. The tuner displays both.

Clipping

Clipping happens when the input is too loud and the waveform hits the maximum range the system can represent.

For floating-point audio, values near 1.0 or -1.0 are suspicious. The app reports clipping when:

peak >= 0.98

That gives the UI enough information to show:

Input is clipping. Lower your gain.

Why the App Prints Every 100ms

The audio callback runs frequently. Printing JSON on every callback would be noisy and wasteful.

Instead, the agent accumulates stats and prints one reading every 100ms:

constexpr ma_uint32 kOutputIntervalFrames = kSampleRate / 10;

At 48 kHz:

48000 / 10 = 4800 frames

So every 4800 frames is about 100ms.

That is fast enough to feel live, but not so fast that the server and browser are flooded.

One small but important fix was disabling stdout buffering:

std::setvbuf(stdout, nullptr, _IONBF, 0);

When a program prints to a terminal, output often appears immediately. But when a program prints into a pipe, like a C++ child process talking to Node, the C runtime may buffer output in chunks.

That made the UI feel delayed. It could take a few strums before Node saw the latest lines.

Disabling buffering made every JSON line available as soon as it was printed.


Pitch, Frequency, and Hertz

Pitch is how high or low a sound feels.

In physics terms, pitch is related to frequency. Frequency means how many times a waveform repeats per second. The unit is Hertz, written as Hz.

Examples:

82.41 Hz  -> low E string, E2
110.00 Hz -> A string, A2
329.63 Hz -> high E string, E4
440.00 Hz -> A4, common tuning reference

If something vibrates 110 times per second, its frequency is 110 Hz.

So the tuner has to find the strongest repeating period in the waveform, then convert that period into frequency.


The Rolling Pitch Buffer

A single tiny audio callback may not contain enough sound to detect a note reliably.

The agent keeps a rolling buffer:

constexpr ma_uint32 kPitchWindowFrames = 4096;

At 48 kHz:

4096 / 48000 = 0.085 seconds

So the pitch detector looks at about 85ms of recent audio.

The buffer is circular. New samples overwrite the oldest samples. Before the pitch algorithm runs, the circular buffer is copied into normal oldest-to-newest order.

That gives the detector a clean window of recent sound.


Detecting Pitch with Autocorrelation

The pitch detector uses autocorrelation.

The idea sounds more intimidating than it is.

Imagine you have a waveform. If you slide a copy of that waveform over itself, there will be certain positions where the peaks and valleys line up again. When they line up well, that delay is probably one period of the sound.

That delay is called a lag.

If the waveform repeats every 436 samples, and we are recording 48,000 samples per second:

frequency = sampleRate / lag
frequency = 48000 / 436
frequency = about 110 Hz

That is close to A2.

The code searches a range of possible lags. For the MVP, the detector focuses on open-string guitar tuning with a little room on both sides:

70 Hz to 400 Hz

This covers:

E2 = 82.41 Hz
A2 = 110.00 Hz
D3 = 146.83 Hz
G3 = 196.00 Hz
B3 = 246.94 Hz
E4 = 329.63 Hz

Why not detect the full guitar fretboard immediately?

Because wider pitch detection ranges can make a simple algorithm jumpier. It has more possible answers, including harmonics and noise. Since the first goal is open-string tuning, a narrower search range makes the MVP more stable.

The conversion from frequency range to lag range looks reversed at first:

minLag = sampleRate / maxFrequency
maxLag = sampleRate / minFrequency

That is because high notes repeat faster, so they have shorter lags. Low notes repeat slower, so they have longer lags.

At 48 kHz:

minLag = 48000 / 400 = 120
maxLag = 48000 / 70  = 685

For each candidate lag, the app compares the signal with a delayed version of itself:

correlation += currentSample * delayedSample

It also calculates the energy of both signals and normalizes the score:

score = correlation / sqrt(currentEnergy * delayedEnergy)

Normalization matters because volume alone should not decide the winner. A louder signal should not automatically score better just because it is louder.

The detector keeps the lag with the best score. If the best score is weak, it returns no pitch:

minimum confidence = 0.55

The agent also skips pitch detection when the input is too quiet:

minimum pitch RMS = 0.003

That prevents quiet room noise from becoming fake notes.


What Is a Musical Note?

Now we have frequency. But musicians do not usually say:

I am playing 110 Hz.

They say:

I am playing A2.

A musical note name has two parts:

A2

The A is the note name. The 2 is the octave.

Western music divides an octave into 12 steps:

C, C#, D, D#, E, F, F#, G, G#, A, A#, B

After B, the pattern repeats at the next octave.

The octave number tells us which version of the note we mean. A2 is lower than A3, and A3 is lower than A4.


What Is MIDI, and Why Use It?

MIDI is a music technology standard. For this app, the useful part is that MIDI gives every piano-style note a number.

The important reference is:

A4 = 440 Hz = MIDI note 69

That gives us a convenient math bridge between frequency and note names.

To convert frequency to the nearest MIDI note:

midiNote = round(69 + 12 * log2(frequency / 440))

The 12 is there because there are 12 notes in an octave.

The log2 is there because pitch doubles every octave. For example:

A3 = 220 Hz
A4 = 440 Hz
A5 = 880 Hz

Each octave up doubles the frequency. Each octave down halves it.

Once we have the MIDI note, we get the note name with:

noteIndex = midiNote % 12

That index points into:

C, C#, D, D#, E, F, F#, G, G#, A, A#, B

The octave is:

octave = midiNote / 12 - 1

Example:

110 Hz -> MIDI 45 -> A2

So we do not need a giant table of all possible note frequencies. The formula handles it.


What Are Cents?

Finding the nearest note is not enough.

If the app says A2, we also need to know whether the string is exactly A2, a little below A2, or a little above A2.

That difference is measured in cents.

One semitone is divided into 100 cents.

So:

0 cents    -> exactly on the target note
+10 cents  -> a little sharp
-10 cents  -> a little flat
+50 cents  -> halfway to the next note
-50 cents  -> halfway to the previous note

To calculate cents, the app first converts the nearest MIDI note back into its ideal frequency:

nearestFrequency = 440 * 2^((midiNote - 69) / 12)

Then it compares the detected frequency to that ideal frequency:

cents = 1200 * log2(frequency / nearestFrequency)

Why 1200?

There are 12 semitones in an octave and 100 cents in each semitone:

12 * 100 = 1200 cents per octave

The app uses cents to decide tuner status:

cents < -5 -> flat
cents > 5  -> sharp
otherwise  -> in_tune

That means the tuner gives a small tolerance window around the exact note.


The JSON Contract

The C++ agent prints JSON lines. When pitch is detected, a reading looks like this:

{
  "frequency": 110.2,
  "note": "A2",
  "cents": 3.1,
  "status": "in_tune",
  "rms": 0.08,
  "peak": 0.41,
  "clipping": false
}

Each field has a job:

  • frequency: detected pitch in Hz
  • note: nearest musical note
  • cents: distance from the exact note
  • status: flat, sharp, in_tune, or unknown
  • rms: average input level
  • peak: loudest input level
  • clipping: whether the signal is too hot

If no pitch is detected, the agent sends:

{
  "frequency": null,
  "note": null,
  "cents": null,
  "status": "unknown",
  "rms": 0.01,
  "peak": 0.04,
  "clipping": false
}

That shape is useful because the browser can always expect the same keys. Some values are just null when there is no note.


The TypeScript Server

The TypeScript server is intentionally small.

It starts the C++ agent:

child_process.spawn("../agent/tuner-agent")

It reads stdout line by line. Every line should be a JSON object from the agent. If the line matches the expected tuner shape, the server stores it as the latest reading.

Then it exposes:

GET /api/current

The response is just the latest reading in memory.

There is no database. There is no queue. There is no auth. For a local tuner, those would be distractions.

The server also serves the static browser UI from server/public.

I used Node's built-in HTTP modules instead of Express because the server only needs two things:

  • one API endpoint
  • static file serving

For this MVP, built-in Node is enough.


The Browser UI

The browser polls:

GET /api/current

every 100ms.

It displays:

  • the current note
  • the tuning status
  • the cents offset
  • the detected frequency
  • input RMS level
  • peak level
  • clipping warning
  • standard string highlighting

The UI is static HTML, CSS, and JavaScript. That was deliberate.

I could have created a React or Vite app, but that would have added a second build system before the tuner needed one. For now, the browser UI is simple enough to live directly in server/public/index.html.

That choice makes the app easier to run:

cd server
npm install
npm run build
npm start

Open:

http://localhost:3000

How It Came Together

The part I liked most about this project is that it completes a real loop:

physical instrument
  -> microphone input
  -> native signal processing
  -> process communication
  -> TypeScript API
  -> browser interface

The C++ agent listens, measures, and reports. The TypeScript server starts the agent and exposes the latest reading. The browser does not need to know anything about audio processing; it just renders the current state.


What Comes Next

The MVP works, but there are clear improvements:

  • smoother note changes
  • better rejection of noisy input
  • a more polished cents meter
  • selectable input devices
  • alternate tunings
  • broader chromatic detection for fretted notes
  • packaging so the app is easier to launch

The first version focuses on standard open-string guitar tuning because that is what I needed first.

It is not trying to replace every tuner app. It is just a local tool that does the job, and because I built it, I can change it when I want to.

If you want to check it out, the repo is here: github.com/isenkasa/guitar-tuner.