26 Unix tools. One binary. Zero dependencies. · the missing coreutils for AI pipelines · vrk mcp - expose all 26 tools to any AI agent · brew install vrk - ready in 5 seconds · 26 Unix tools. One binary. Zero dependencies. · the missing coreutils for AI pipelines · vrk mcp - expose all 26 tools to any AI agent · brew install vrk - ready in 5 seconds

vrk sip

About

Samples lines from a stream without loading the whole thing into memory. You can take a random sample of N lines, every Nth line, or the first N lines. Uses reservoir sampling for uniform random selection, so every line has an equal chance of being picked.

The problem

You have a 10GB JSONL file and need a random sample of 100 records. head gives you the first 100, not a random sample. shuf loads the entire file into memory. You write a Python script with reservoir sampling and it takes 20 lines.

Before and after

Before

shuf -n 100 huge.jsonl
# loads entire file into memory
# not available on macOS without coreutils

After

cat huge.jsonl | vrk sip --count 100 --seed 42

Example

cat huge.jsonl | vrk sip --count 100 --seed 42

Exit codes

CodeMeaning
0Success
1I/O failure reading stdin
2No strategy specified, multiple strategies, –sample outside 1-100, interactive TTY

Flags

FlagShortTypeDescription
--firstintTake first N lines
--count-nintReservoir sample of exactly N lines
--everyintEmit every Nth line
--sampleintInclude each line with N% probability (1-100)
--seedint64Random seed for reproducibility
--json-jboolAppend metadata record after all output
--quiet-qboolSuppress stderr output