26 Unix tools. One binary. Zero dependencies. · the missing coreutils for AI pipelines · vrk mcp - expose all 26 tools to any AI agent · brew install vrk - ready in 5 seconds · 26 Unix tools. One binary. Zero dependencies. · the missing coreutils for AI pipelines · vrk mcp - expose all 26 tools to any AI agent · brew install vrk - ready in 5 seconds

vrk chunk

About

Splits long documents into chunks that fit within an LLM’s context window. Each chunk comes out as a JSONL record with its index, text, and exact token count. Splits at paragraph boundaries when possible and supports overlap so you don’t lose context at the edges.

The problem

You need to send a long document to an LLM but it exceeds the context window. You split on line count or character count, but neither maps to tokens. Chunks end up too large or too small, and you lose context at split boundaries.

Before and after

Before

split -l 100 document.txt chunk_
# no idea how many tokens each chunk is
# no overlap for context continuity

After

cat document.txt | vrk chunk --size 4000 --overlap 200

Example

cat long-doc.md | vrk chunk --size 4000

Exit codes

CodeMeaning
0Success, including empty input
1I/O error
2No input, –size missing or < 1, –overlap >= –size, unknown flag

Flags

FlagShortTypeDescription
--sizeintMax tokens per chunk (required)
--overlapintToken overlap between adjacent chunks
--bystringChunking strategy: paragraph
--quiet-qboolSuppress stderr output