Most transactional email platforms are built on Node.js, Python, or Go. We built euromail on Rust. Not because it's trendy, but because the specific problems in email infrastructure map almost perfectly to Rust's strengths: long-lived concurrent connections, cryptographic signing on every message, and zero tolerance for resource leaks under sustained load.
This post walks through the real engineering reasons behind the choice, with concrete examples from our codebase.
The Problem with Email at Scale
Transactional email sounds simple: accept an API call, build a MIME message, connect to an MX server, deliver it. But at scale, the system is doing dozens of things simultaneously:
- Maintaining SMTP connections to hundreds of recipient domains
- DKIM-signing every message with per-customer RSA keys
- Enforcing per-domain rate limits and IP warmup schedules
- Processing bounces, complaints, and DMARC reports in real time
- Running circuit breakers for domains that are temporarily rejecting mail
Each of these operations involves shared mutable state, concurrent access, and resources that must be cleaned up correctly. A leaked connection or a missed semaphore release under load doesn't just slow things down. It cascades.
Ownership Makes Resource Leaks Impossible
The single biggest win from Rust is deterministic resource cleanup. In our SMTP connection pool, each domain gets a semaphore controlling how many concurrent connections we open:
pub struct ConnectionGuard {
domain: String,
semaphores: Arc<DashMap<String, Arc<Semaphore>>>,
}
impl Drop for ConnectionGuard {
fn drop(&mut self) {
if let Some(sem) = self.semaphores.get(&self.domain) {
sem.add_permits(1);
}
}
}
When a ConnectionGuard goes out of scope, whether the delivery succeeds, fails, panics, or the task gets cancelled, the semaphore permit is released. There is no try-finally to forget. No defer to put in the wrong place. The compiler enforces it.
In Node.js, a forgotten finally block under a rare error path means a domain's connection slot leaks permanently. We've seen this pattern cause cascading delivery failures in other systems. In Rust, it cannot happen.
Async Without the Garbage Collector Tax
Our worker process runs 18+ concurrent tasks on the Tokio async runtime: email consumers, bounce processors, webhook dispatchers, retry schedulers, reputation analyzers, and more. Each task is a lightweight future, not a thread.
let email_consumer = {
let w = Arc::clone(&worker);
tokio::spawn(async move { w.run_email_consumer().await })
};
The critical difference from Go or Node.js: there is no garbage collector. Rust allocates and frees memory deterministically. Under sustained load of thousands of emails per second, our memory usage stays flat. There are no GC pauses causing delivery latency spikes.
Benchmarks back this up. In comparable server workloads, Rust uses roughly 15-16 MB of memory where Node.js uses 64 MB. More importantly, Rust's latency profile has no tail spikes from garbage collection, a real problem when you're holding open SMTP connections that have timeout windows.
Type-Safe State Machines Prevent Logic Bugs
Email delivery has complex state: queued, processing, sent, delivered, bounced, failed. SMTP domains cycle through circuit breaker states: closed, open, half-open. Bounce types are hard, soft, or undetermined. Getting any of these wrong means lost mail or duplicate deliveries.
Rust's enum system makes invalid states unrepresentable:
pub enum CircuitState {
Closed,
Open,
HalfOpen,
}
Pattern matching on these enums is exhaustive. If we add a new state, the compiler flags every function that doesn't handle it. In Python or JavaScript, a new status string silently falls through to a default case, and you find out in production.
Our email status, bounce type, plan tier, and operation status types are all enums. The compiler catches category errors that unit tests would miss.
Cryptographic Safety by Default
Every outgoing email gets DKIM-signed with the customer's RSA key, and optionally ARC-sealed. That's RSA-2048 cryptographic operations on every single message. The signing pipeline:
pub fn sign_message(
message: &[u8],
domain: &str,
selector: &str,
private_key_pem: &str,
) -> Result<Vec<u8>, SmtpError>
The key never leaves its scope. There's no way to accidentally log it, serialize it, or pass it to the wrong function, because the borrow checker prevents moving the reference outside its intended lifetime.
We use rustls for TLS instead of OpenSSL. That means our TLS implementation is 100% Rust with no C FFI boundary. No buffer overflows. No memory corruption from a mismanaged SSL context. The entire TLS handshake for SMTP STARTTLS is memory-safe by construction.
Lock-Free Concurrency for Rate Limiting
Email providers enforce per-domain rate limits, per-IP warmup schedules, and per-account quotas. These all require concurrent counters that thousands of tasks read and write simultaneously.
We use DashMap (a lock-free concurrent hashmap) and atomic operations instead of traditional mutexes:
pub struct DomainThrottle {
limiters: Arc<DashMap<String, Arc<Limiter>>>,
rate_per_second: NonZeroU32,
}
For IP warmup, hourly send counters use AtomicU64 with compare-and-swap:
match counter.value().compare_exchange(
current,
current + 1,
Ordering::AcqRel,
Ordering::Relaxed,
) { ... }
No locks. No contention. No poisoned mutex if a thread panics. These patterns are common in Rust's ecosystem and almost impossible to get right in languages without an ownership model.
SSRF Protection at the DNS Layer
Webhook delivery is a common feature in email platforms: notify the customer's server when an email bounces or gets complained about. But webhooks are also a vector for SSRF attacks, where an attacker configures a webhook URL that resolves to an internal IP.
Most implementations validate the IP, then connect. That leaves a TOCTOU (time-of-check to time-of-use) gap where DNS can rebind between validation and connection.
We solve this by implementing Rust's Resolve trait for reqwest, making validation and resolution atomic:
impl Resolve for SsrfSafeResolver {
fn resolve(&self, name: Name) -> Resolving {
Box::pin(async move {
let addrs = tokio::net::lookup_host(&addr_str).await?;
// Block private/reserved IPs before returning
let blocked: Vec<_> = addrs
.iter()
.filter(|a| is_private_or_reserved(a.ip()))
.collect();
if !blocked.is_empty() {
return Err(/* SSRF blocked */);
}
Ok(addrs)
})
}
}
DNS rebinding attacks cannot work because resolution and validation happen in the same step. The HTTP client never sees the private IP.
Structured Error Handling Across the Stack
Our error types form a layered hierarchy using thiserror:
#[derive(Debug, thiserror::Error)]
pub enum SmtpError {
#[error("DNS resolution failed for {domain}: {source}")]
DnsError { domain: String, source: Box<dyn Error + Send + Sync> },
#[error("Circuit open for domain {domain}")]
CircuitOpen { domain: String },
#[error("Warm-up limit reached for IP {ip} (limit: {limit})")]
WarmupLimitReached { ip: String, limit: u64 },
}
Each error variant carries structured context. When a delivery fails, we know exactly why: DNS failure, circuit breaker tripped, warmup limit hit, permanent SMTP rejection. Metrics can tag by error type. Retry logic can branch on the variant. Alert thresholds differ per category.
In Node.js or Python, this is typically a string message or a generic error code. In Rust, the type system guarantees you handle every case.
The Workspace: Six Crates, One Build
Our codebase is a Cargo workspace with six crates:
| Crate | Responsibility |
|---|---|
euromail-api | HTTP API server (Axum) |
euromail-worker | Background job consumer, retry logic |
euromail-smtp | SMTP delivery, DKIM, ARC, circuit breakers |
euromail-common | Shared types, auth, database, queue |
euromail-dashboard | Admin dashboard (server-rendered HTML) |
euromail-tests | Integration test suite |
All crates share workspace-level dependency versions, preventing version drift. A single cargo build --release compiles all three production binaries (API, worker, dashboard) with shared dependency compilation. Changes to the common crate propagate type-checking across the entire system instantly.
The release profile uses thin LTO and symbol stripping:
[profile.release]
lto = "thin"
strip = true
The API binary compiles to roughly 30 MB stripped. The worker with the full SMTP engine is about 40 MB. These run on debian:bookworm-slim Docker images, keeping the total image size small enough for fast container scheduling on Kubernetes.
Graceful Shutdown Without Lost Messages
Email systems cannot afford to lose messages during deploys. Our shutdown sequence uses tokio_util::CancellationToken to coordinate all 18+ worker tasks:
let messages = tokio::select! {
result = queue::read_messages(&self.redis, stream, group, &self.consumer_name, 10)
=> result,
_ = self.shutdown.cancelled() => break,
};
When SIGTERM arrives, the token cancels. Every consumer loop breaks at its next select! point. In-flight message processing runs to completion. Then telemetry flushes and the process exits cleanly.
Compare this to Node.js, where process.on('SIGTERM') gives you a callback but no structured way to drain async work. Or Python, where signal handling in async contexts is notoriously fragile. Rust's select! macro makes the shutdown path as explicit and testable as the happy path.
What We'd Choose Differently
Rust isn't without trade-offs.
Compile times are real. A clean release build takes about 5 minutes with sccache. Incremental builds are fast (under 30 seconds), but CI cold builds require caching strategies to stay reasonable.
The learning curve is steep. Lifetimes, trait bounds, and async pinning take time to internalize. We wouldn't recommend Rust for a prototype or a CRUD API. We'd reach for it again for anything that holds open thousands of concurrent connections and signs cryptographic material on every request.
The ecosystem is younger. Some email-specific crates are less mature than their Go or Python equivalents. We've contributed patches upstream and occasionally vendor-forked.
The Bottom Line
We chose Rust because email infrastructure is fundamentally about managing concurrent I/O, shared state, and resource lifetimes correctly under sustained load. These are exactly the problems Rust's ownership model, async runtime, and type system were designed to solve.
After two years of development, the choice has paid off in ways we expected (no memory leaks, no GC pauses) and ways we didn't (the type system catches logic bugs that would be integration test failures in other languages). Our worker process handles thousands of concurrent SMTP connections with flat memory usage and predictable latency, deployed as a 40 MB container image.
If you're building infrastructure that's long-lived, concurrent, and security-critical, Rust is worth the investment.