Writing a Redis Clone in Rust - Learning to vibe code with constraints
Over the past couple of weeks, I continued my journey to learn Rust by build a Redis clone from scratch. I signed up for CodeCrafters, who were offering their Build your own Redis for free during the month of October. The goal of the course wasn't to create a production-ready alternative to Redis, but rather to work on a challenging problem to build a deeper understanding of your favourite programming language and deal with challenges with larger projects and codebases. Given the time constraints, I used this as an opportunity to build my own intiution and workflow with Claude Code and AI-assisted coding.
My Redis clone was build with a TCP server that handled multiple concurrent connections via a thread pool. The core components include:
- A RESP protocol parser
- Shared in-memory store with
Arc<Mutex<T>>for thread-safe access - Transaction support with MULTI/EXEC/DISCARD
- List & Stream Operations
- Replication support with leader-follower configuration
- Pub/Sub support for message broadcasting
Concurrency and Thread Safety
Managing concurrent access to shared state is a difficult challenge in most programming languages. Rust's philosophy is to leverage ownership and type checking, which helps boil down concurrency errors into compile-time errors. This made it particularly easier to implement via agentic coding since it shortens the validation loop for your agent. Two common patterns for handling concurrency are:
- Shared-state concurrency, where multiple threads have access to some piece of data
- Message-passing concurrency, where channels send messages between threads
I ended up using both patterns in my implementation to handle shared memory states for building replication support and implementing Pub/Sub.
Shared-State Concurrency
The first way of handling concurrent access most folks learn is via mutual exclusion. The core idea is to only allow only one thread access to a piece of data at any given time. This is usually implemented via a lock or mutex, which is a data structure that keeps track of who currently has exclusive access to the data. Mutexes have a reputation for being difficult to use because you have to remember to attempt to acquire the lock before using the data and remember to unlock the data when you're done.
let store = Arc::new(Mutex::new(Store::new()));
// Clone for each connection handler
let store_clone = Arc::clone(&store);
pool.execute(move || {
handle_connection(stream, store_clone, ...);
});
...
// SET command acquires lock first before saving in store
Command::Set(key, value, expiry_ms) => {
let mut store = store.lock().unwrap();
let expiry = expiry_ms.map(|ms| Instant::now() + Duration::from_millis(ms));
store.set(key, value, expiry);
"+OK\r\n".to_string()
}
Rust's ownership system makes mutexes much safer to use. The mutex owns the data it protects, and you can only access the data by calling lock(), which returns a MutexGuard. This guard implements Drop, so when it goes out of scope, the lock is automatically released! The compiler prevents you from accessing the data without holding the lock, eliminating entire classes of bugs.
Message Passing & Pub/Sub
Another popular approach to implemnting safe concurrency is through message passing, where threads communicate by sending data to each other. Rust's standard library provides an implementation of channels for message passing concurrency through std::sync::mpsc (multi-producer, single-consumer). A channel has two halves: a transmitter and a receiver. One part of your code calls methods on the transmitter with the data you want to send, and another part checks the receiving end for arriving messages. A channel is said to be closed if either the transmitter or receiver half is dropped. For our Redis Pub/Sub implementation, each subscriber connection gets its own receiver end of a channel, while the publisher holds sender ends that can be cloned and distributed:
struct PubSub {
// Map of channel name to list of (connection_id, sender) pairs
channels: HashMap<String, Vec<(usize, mpsc::Sender<(String, String)>)>>,
}When a message is published, we send it through all the senders for that channel. Each connection in subscribe mode receives a dedicated channel receiver for push-based message delivery:
// Create a channel for receiving published messages if not already created
if subscription_tx.is_none() {
let (tx, rx) = mpsc::channel::<(String, String)>();
subscription_tx = Some(tx);
subscription_rx = Some(rx);
}
let tx = subscription_tx.as_ref().unwrap();
// Subscribe to each channel
let mut pubsub_lock = pubsub.lock().unwrap();
for channel in &channels {
if !subscribed_channels.contains(channel) {
pubsub_lock.subscribe(channel.clone(), connection_id, tx.clone());
subscribed_channels.insert(channel.clone());
}
// Send subscription confirmation for each channel
// Format: *3\r\n$9\r\nsubscribe\r\n$<channel_len>\r\n<channel>\r\n:<count>\r\n
let response = format!(
"*3\r\n$9\r\nsubscribe\r\n${}\r\n{}\r\n:{}\r\n",
channel.len(),
channel,
subscribed_channels.len()
);
stream.write_all(response.as_bytes()).ok();
}
drop(pubsub_lock);
Ok(())
The receiver on each connection polls for messages without blocking other operations:
// Check for published messages if subscribed
if let Some(ref rx) = subscription_rx {
match rx.try_recv() {
Ok((channel, message)) => {
// Only forward messages for channels we're still subscribed to
if subscribed_channels.contains(&channel) {
// Send published message to subscriber
// Format: *3\r\n$7\r\nmessage\r\n$<channel_len>\r\n<channel>\r\n$<msg_len>\r\n<message>\r\n
let response = format!(
"*3\r\n$7\r\nmessage\r\n${}\r\n{}\r\n${}\r\n{}\r\n",
channel.len(),
channel,
message.len(),
message
);
if stream.write_all(response.as_bytes()).is_err() {
break;
}
}
continue; // Check for more messages
}
Err(mpsc::TryRecvError::Empty) => {
// No messages, continue to read commands
}
Err(mpsc::TryRecvError::Disconnected) => {
// Channel closed
break;
}
}
}Task-Driven Coding with Agents
While working with Claude Code, I found certain coding practices translated to better performance and accuracy:
- Task-Driven Coding - Try to scope tasks and prompts to the smallest unit of work you can. Larger plans add more complexity and have a higher risk of not producing what you'd expect
- Write Lots of Tests - Adding unit and integration tests lets your agents fully validate their changes and tightens the feedback loop. Rust's compiler errors helped this even further with most issues surfaced immediately as compile-time errors with actionable fixes
- Explicit is better than clever - Agents seem to prefer simplicity and readibility over more complicated abstractions. It's more valuable to have more verbose implementations and documentation. Smaller single responsibility functions
My workflow became: describe what you want → AI writes code → compiler validates → iterate only on logic errors. This tight feedback loop meant I could move quickly through implementation details while maintaining correctness.
Struggling with Leader-Follower Replication
One of the more interesting features CodeCrafters asks you to implement is Leader-Follower Replication. This feature allows replica Redis instances to be exact copies of a leader instance. At a high level each instance does the following:
Leader Instance:
- Accepts
PSYNCcommand from replicas - Sends a
FULLRESYNCresponse with an empty RDB file - Propagates write commands to all connected replicas
- Implements
WAITcommand to ensure replicas acknowledge writes
Replica Instance:
- Perform a three-way handshake (PING, REPLCONF, PSYNC)
- Receives and applies write commands from master
- Tracks replication offset for acknowledgments
- Responds to
REPLCONF GETACKqueries
The replication offset tracking was particularly nuanced. Replicas need to track how many bytes they've processed and respond with their offset when the leader instance requests acknowledgment.
When trying to implement this feature, I found that Claude struggled compared to some of the simpiler aspects of the Redis specification. In particular, I kept hitting snags when handling:
- Replication Offset Tracking - Replicas must count RESP protocol bytes for each command. The AI incorrectly assumed you count bytes for each batch
- Duplicate Stream Reading - After PSYNC, the replica connection serves dual purposes: it's both a command receiver and a replication stream. The AI struggled to reason about when to read from the connection versus when it should be passive.
- WAIT Command Stream Handling - Implementing WAIT and write acknowledgement was non-trivial. You had to Clone stream once per WAIT operation instead of multiple times, force blocking mode before reading, and handle retries on WouldBlock errors.
What I found usually broke myself and Claude out of the loop was similar to working with any other team member on a complicated programming task or bug:
- Extensive debug output - Inserting print statements for every read/write with byte counts made concurrency issues triagable
- More tests - Each failing test case became a specific, narrow problem that your AI could validate against
- Lead your AI to a potential solution- Claude seemed to be struggling with offset tracking, when I realized it was just implementing the specification inccorectly. Adding a
parse_commands_with_sizes()function helped it break out of that loop immediately
Once this foundation existed, the rest fell into place because the AI could reason about offset tracking locally within each function rather than across the whole system.
Conclusion
Building this Redis clone was an incredible learning experience. It deepened my understanding of concurrent programming in Rust, protocol design and trade-offs in distributed systems. It also helped me develop an intuition for agentic coding. For a large coding project to be succesful, it's more important to think deeply about your interfaces and abstractions, your codebase's testability and feedback loop and how explicit and readable is your code and documentation? These are all good software engineering practices regardless of whether you're working with AI, but with AI-assisted development, these best practices are immediately valuable since your feedback loop is now so much tighter. Bad structure produces confused AI output quickly; good structure lets you move fast while maintaining quality.
My project's full source code is available on GitHub. If you're interested in database internals or Rust systems programming, I highly recommend building something similar!
Resources
Have questions or suggestions? Feel free to open an issue on the GitHub repo or reach out!
Member discussion