experimental · rust · edge ai

poolboy

Ship model updates to 10,000 edge hosts without melting anything.

~/gw-lab/experimental/poolboy
$
encoded 20.4 GB → 312 MB patch (1.5%)
peak RAM: 98 MB BLAKE3: 7f3e…a9c1
~100 MBmemory ceiling
20 GBtarget file size
~150LoC of net code
0external crypto deps
01 · the problem

The memory wall.

Classical binary diff tools build a suffix array over the entire base file. Memory use scales at roughly 5× filesize.

A 20 GB fine-tuned transformer. An 8 GB Jetson Orin Nano. Do the math: 100 GB of RAM to apply a patch that moves a few million weights.

The field solution today is ship the whole model, every time. That's gigabytes of cellular uplink per robot, per update. We can do better.

file size peak RAM 0 20 GB 0 100 GB 16 GB · Jetson RAM bsdiff · crashes here poolboy · flat @ 100 MB 🗲 OOM
02 · how it works

A streaming pipeline, capped at 100 MB.

Data flows through bounded buffers. Nothing in the hot path allocates unbounded structures.

base.bin 20 GB on disk FastCDC chunker content-defined · 1 MiB avg BLAKE3 identity dedupe via digest COPY / LITERAL instruction stream update.pool 312 MB · streaming-applyable 32 MB BufReader ▸ ▸ ▸ 64 KiB scratch ▸ ▸ ▸ 32 MB BufWriter ↑ peak resident memory never exceeds ~100 MB regardless of file size
03 · content-defined chunking

One inserted byte shouldn't invalidate the whole file.

FastCDC picks chunk boundaries based on rolling-hash fingerprints, not fixed offsets. A perturbation near the front of the file only disturbs nearby chunks — downstream chunks keep the same digests and are emitted as zero-cost COPY instructions.

fixed-block (bsdiff-style)
c₁
c₂
c₃
c₄
c₅
c₆
c₇
c₈
+1 byte ↓
c₁′
c₂′
c₃′
c₄′
c₅′
c₆′
c₇′
c₈′
all chunks drift · 100% re-upload
content-defined (poolboy)
c₁
c₂
c₃
c₄
c₅
c₆
c₇
c₈
+1 byte ↓
c₁′
c₂
c₃
c₄
c₅
c₆
c₇
c₈
1 chunk changed · 7× COPY · 99% reused
04 · wire format

54-byte header. Tiny state machine.

The applier never comprehends the file as a whole — it just loops over COPY and LITERAL ops, streaming the reconstructed bytes through a BLAKE3 hasher. At EOF, if the digest doesn't match the header, the applier refuses the output.

"POOL"
magic · 4 B
0x0001
version · u16
21_474_836_480
base_size · u64
21_481_127_168
target_size · u64
7f3e…a9c1
target_blake3 · 32 B
header · 54 bytes total
0x01COPYoffset:u64 len:u64reuse bytes from base
0x02LITERALlen:u64 bytes[len]embed new bytes
instruction body · repeat until target_size reached
05 · distribution

Leaderless election. P2P cascade. Tailscale-native.

The control plane unicasts an Announce. Agents race to Bid. First bid wins via atomic CAS. The winner downloads once, then re-seeds every other agent. The control plane uplink stays cool.

control plane poolboy-controlplane leader agent-02 agent-01follower agent-03follower agent-04follower agent-05follower tailscale overlay · encrypted p2p · no NAT traversal needed
06 · election timeline

Microsecond-grain. Stateless recovery.

Each message fits in a single 1500-byte UDP datagram. Crash an agent and it rejoins the next round — the ManifestId tags every message to the election it belongs to.

control plane agent-01 agent-02 (winner) agent-03 agent-04 Announce {manifest_id, patch_blake3, cp_addr} t=0µs Bid — first arrival wins (atomic CAS) t=~900µs LeaderAck → agent-02 t=~1.2ms LeaderReady {leader_p2p_port=42042} t=~3.1s (after download) ElectionClosed {leader_tailscale_ip, port} t=~3.1s
07 · why it's different

Built for the edge, not the cloud.

01

~100 MB ceiling

Flat resident memory on any file size. Your edge system doesn't care if the model is 2 GB or 200.

02

FastCDC chunking

Edits don't cascade. One-byte perturbation ≠ whole-file re-upload.

03

BLAKE3 verified

Streaming hash over reconstructed bytes. Mismatch → refuse. No corrupt weights.

04

Leaderless election

First bid wins. Atomic CAS on the control plane. Stateless agent recovery.

05

Tailscale-native

No custom crypto. No NAT traversal code. The overlay already solved both.

06

~150 LoC net code

No Quinn. No rustls. No iroh-blobs. Boring TCP + hash verification.

08 · use cases

Where poolboy shines.

→ 01

Fleet-wide model rollout

Ship a fine-tuned YOLO update to 500 live systems in the field. Leader pulls once at 4G uplink speed; followers pull from the leader over local Tailscale. Control plane uplink stays cool.

~50× less egress from the control plane
→ 02

Emergency rollback

A deploy turns sour. Revert is just another patch — from v2 weights back to v1 weights, most chunks identical, delta under a megabyte. Deployed fleet-wide in under five minutes.

< 5 min fleet rollback
→ 03

Multi-region cascade

One leader per regional Tailscale subnet. Control plane sends N announcements, each region elects locally, each region cascades internally. No cross-region traffic duplication.

O(regions) uplink, not O(agents)
09 · under the hood

The real code. No framework noise.

/// Download a blob from `server`, verifying BLAKE3 streaming.
/// Memory cost: 1 MiB BufWriter + 64 KiB scratch + hasher state.
pub async fn download(
    server: SocketAddr,
    expected_hash: [u8; 32],
    out: &Path,
) -> Result<u64> {
    let mut stream = TcpStream::connect(server).await?;
    stream.write_all(&expected_hash).await?;

    // status + length header
    let mut status = [0u8; 1];
    stream.read_exact(&mut status).await?;
    if status[0] != STATUS_OK { bail!("peer lacks blob"); }

    let mut len_buf = [0u8; 8];
    stream.read_exact(&mut len_buf).await?;
    let total = u64::from_le_bytes(len_buf);

    // stream socket → disk, hashing as we go
    let file = File::create(out).await?;
    let mut writer = BufWriter::with_capacity(1<<20, file);
    let mut hasher = blake3::Hasher::new();
    let mut buf = vec![0u8; 64 * 1024];
    let mut remaining = total;
    while remaining > 0 {
        let take = remaining.min(buf.len() as u64) as usize;
        stream.read_exact(&mut buf[..take]).await?;
        writer.write_all(&buf[..take]).await?;
        hasher.update(&buf[..take]);
        remaining -= take as u64;
    }
    writer.flush().await?;

    // verify or refuse
    let digest = *hasher.finalize().as_bytes();
    if digest != expected_hash {
        bail!("BLAKE3 mismatch after download");
    }
    Ok(total)
}
/// Every UDP datagram on the control socket carries one of these.
/// All fit in a single 1500-byte datagram via bincode.
pub enum Msg {
    /// control plane → every agent
    Announce {
        manifest_id: ManifestId,
        version: String,
        patch_blake3: [u8; 32],
        controlplane_p2p_addr: Option<SocketAddr>,
    },
    /// agent → control plane (race!)
    Bid { manifest_id: ManifestId, node_id: String },
    /// control plane → winning agent
    LeaderAck { manifest_id: ManifestId },
    /// leader → control plane (ready to serve followers)
    LeaderReady {
        manifest_id: ManifestId,
        leader_p2p_port: u16,
    },
    /// control plane → every agent (the pivot signal)
    ElectionClosed {
        manifest_id: ManifestId,
        leader_node_id: String,
        leader_tailscale_ip: IpAddr,
        leader_p2p_port: u16,
    },
}
//! Wire format
//!
//! ┌──────────── header (54 bytes) ────────────┐
//! │ magic "POOL"   (4)                        │
//! │ version u16    (2)                        │
//! │ base_size u64  (8)                        │
//! │ target_size u64(8)                        │
//! │ target_blake3 [u8; 32]   (32)             │
//! ├──────────── instruction body ─────────────┤
//! │ 0x01 COPY    : op(1) offset(8) len(8)     │
//! │ 0x02 LITERAL : op(1) len(8) bytes[len]    │
//! └───────────────────────────────────────────┘

pub const MAGIC: &[u8; 4]  = b"POOL";
pub const VERSION: u16     = 1;
pub const MIN_CHUNK: u32   = 256 * 1024;       // 256 KiB
pub const AVG_CHUNK: u32   = 1024 * 1024;      // 1 MiB
pub const MAX_CHUNK: u32   = 4 * 1024 * 1024;  // 4 MiB
10 · stack
Rust 2021· Tokio· FastCDC· BLAKE3· Bincode· Tailscale· Apache-2.0