Write-Ahead Logging (WAL)
Modern databases need to guarantee that your data is not lost, even when things go wrong: crashes, power failures, or abrupt shutdowns. At the heart of this guarantee lies a simple yet powerful mechanism : Write-Ahead Logging (WAL).
This blog is the first in a series on how databases work under the hood, and we begin by addressing a fundamental challenge: how can a database ensure durability and consistency in the face of crashes?
The Problem: Data Loss on Crashes
Let’s say you’re building a simple key-value store that writes to disk like this:
void put(const char* key, const char* value) {
FILE* f = fopen("store.txt", "w");
fprintf(f, "%s=%s\n", key, value);
fclose(f);
}
Every time you put, it overwrites the store file.
Seems simple, but what happens if your application crashes in the middle of writing? You might end up with:
- A partially written file
- A corrupted store
- Loss of previously stored keys
Even with fsync() or fflush(), things can go wrong. For instance:
- What if power is lost before the file is fully written?
- What if a write partially succeeds, corrupting the file?
Clearly, we need a safer method.
The Need for a Safer Write Path
Overwriting the entire store every time is wasteful and risky. Instead, we want a method that:
- Does not destroy the current state on each update
- Allows us to recover from crashes
- Can be flushed to disk reliably
This leads us to a design that’s central to database internals: the append-only log.
Enter Write-Ahead Logging (WAL)
Write-Ahead Logging turns the problem on its head:
Instead of writing directly to the main store, we first write the update to a separate log file and only then apply it to the in-memory or persistent state.
In other words, before mutating any data structure, you record the intention to mutate it in a log.
Example Flow
- You want to
put("name", "alice") - You first append
{"put": {"key": "name", "value": "alice"}}to a file calledwal.log - You flush this file to disk
- Then you apply the operation to the main key-value store
If your program crashes at any point after step 2, you can simply replay the log on restart and reconstruct the state.
Why WAL Works
The strength of WAL lies in two properties:
1. Append-Only Writes
Log entries are always written at the end of the file. This avoids partial overwrites and allows for atomic disk operations.
2. Replay Capability
Since the log contains the full history of changes, the current state can be rebuilt from scratch at any time by replaying the log.
3. Separation of Concerns
The WAL serves as a durable history of changes. The main store (e.g., a file or in-memory structure) can be optimized for performance and updated asynchronously.
WAL isn’t just for databases. Modern filesystems like XFS and Btrfs use a similar journaling mechanism for reliability, and even Microsoft Word uses a form of WAL to recover your unsaved documents.
Where WAL Is Used
Write-Ahead Logging is used in almost every major database:
- PostgreSQL: Uses WAL to support crash recovery and replication
- LevelDB / RocksDB: Use WAL before flushing to SSTables
- SQLite: Switches to WAL mode for performance and durability
- Kafka: Uses an append-only log as its core design
It’s not just a low-level hack, it’s a deliberate design for correctness.
DIY Time - Build Your Own Basic WAL (in C)
Let’s roll up our sleeves and build a minimal Write-Ahead Logging (WAL) system from scratch in C.
We’ll create a simple key-value database that:
- Logs every
putoperation to an append-only file - Replays the log to recover the latest state
- Can survive crashes using the WAL
You can replicate this in Python, Rust, or Go, the principles remain the same.
System Components
Following are the three core parts of our WAL system:
1. WAL File – wal.log
This file captures every write (aka put) operation, line by line:
name=Alice
age=30
- Append-only: Never delete or modify old entries.
- Durable: Persisted to disk immediately.
This is our reliable source of truth, used to replay state on recovery.
2. State File – state.txt
This represents the current snapshot of the database.
It is updated every time we call replay_log(), which processes all entries in the WAL and writes the latest value of each key.
- Easy to parse
- Rewritten fully on each update (for simplicity)
3. Crash Recovery Logic
Simulates recovery after a crash by reconstructing the full state from just the WAL file.
This is the essence of WAL:
If the app crashes before saving state, the WAL can replay it exactly.
Full C Implementation
Here's the code. We'll explain it block by block below:
// ------------------------------
// Write-Ahead Log (WAL) in C
// ------------------------------
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#define LOG_FILE "wal.log"
#define STATE_FILE "state.txt"
#define MAX_LINE 256
These includes and macros set up the environment:
wal.log: Our log filestate.txt: Our latest state file
append_log(): Appending to WAL
void append_log(const char *key, const char *value) {
FILE *log = fopen(LOG_FILE, "a");
if (!log) {
perror("Failed to open WAL");
exit(1);
}
fprintf(log, "%s=%s\n", key, value);
fclose(log);
}
What it does:
- Appends a single
key=valueline to the WAL. - Simple and durable. If you
fsync()afterfclose(), it becomes crash-resilient.
Fun Fact: Almost every database, from SQLite to Postgres, has an equivalent log append function under the hood!
replay_log(): Rebuilding the State
void replay_log() {
FILE *log = fopen(LOG_FILE, "r");
if (!log) {
perror("No WAL found. Starting fresh.");
return;
}
FILE *state = fopen(STATE_FILE, "w");
if (!state) {
perror("Failed to open state file");
exit(1);
}
char line[MAX_LINE];
while (fgets(line, sizeof(line), log)) {
fprintf(state, "%s", line); // naive overwrite
}
fclose(log);
fclose(state);
}
Purpose:
- Reads each log line.
- Writes it to the
state.txtfile, not very smart yet, since duplicates aren’t removed.
Later, you could replace this with a hashmap to keep only the latest value per key!
get_value(): Querying the State File
void get_value(const char *key) {
FILE *state = fopen(STATE_FILE, "r");
if (!state) {
printf("No state available.\n");
return;
}
char line[MAX_LINE];
while (fgets(line, sizeof(line), state)) {
char fkey[64], fval[128];
sscanf(line, "%[^=]=%s", fkey, fval);
if (strcmp(fkey, key) == 0) {
printf("%s => %s\n", fkey, fval);
fclose(state);
return;
}
}
printf("Key '%s' not found.\n", key);
fclose(state);
}
Why it matters:
- Allows querying values from the rebuilt
state.txt. - Uses
sscanf()to parsekey=valuestrings. - Currently shows the first match, which works because we overwrite the whole state each time.
main(): Tying It All Together
int main() {
printf("Replaying WAL to reconstruct state...\n");
replay_log(); // crash recovery
char command[10];
char key[64], value[128];
while (1) {
printf("\nEnter command (put/get/exit): ");
scanf("%s", command);
if (strcmp(command, "put") == 0) {
printf("Key: ");
scanf("%s", key);
printf("Value: ");
scanf("%s", value);
append_log(key, value);
replay_log(); // naive checkpointing
} else if (strcmp(command, "get") == 0) {
printf("Key: ");
scanf("%s", key);
get_value(key);
} else if (strcmp(command, "exit") == 0) {
break;
} else {
printf("Unknown command\n");
}
}
return 0;
}
What's happening:
- On start, we simulate a crash recovery by calling
replay_log(). - Then we enter a REPL (read-eval-print loop) for
put,get, andexit.
What’s Next?
Here are some ways to evolve this system:
- Use an in-memory map while replaying so only the latest key-values are kept
- Add a
deletecommand with tombstones (likekey=__deleted__) - Add a
fsync()for real durability - Implement checkpointing after N writes to flush WAL
Thanks for reading! I hope this hands-on introduction to Write-Ahead Logging gave you a solid understanding of how databases handle durability and recovery behind the scenes. Stay tuned and happy coding :)