vrk chunk
About
Splits long documents into chunks that fit within an LLM’s context window. Each chunk comes out as a JSONL record with its index, text, and exact token count. Splits at paragraph boundaries when possible and supports overlap so you don’t lose context at the edges.
The problem
You need to send a long document to an LLM but it exceeds the context window. You split on line count or character count, but neither maps to tokens. Chunks end up too large or too small, and you lose context at split boundaries.
Before and after
Before
split -l 100 document.txt chunk_
# no idea how many tokens each chunk is
# no overlap for context continuity
After
cat document.txt | vrk chunk --size 4000 --overlap 200
Example
cat long-doc.md | vrk chunk --size 4000
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success, including empty input |
| 1 | I/O error |
| 2 | No input, –size missing or < 1, –overlap >= –size, unknown flag |
Flags
| Flag | Short | Type | Description |
|---|---|---|---|
--size | int | Max tokens per chunk (required) | |
--overlap | int | Token overlap between adjacent chunks | |
--by | string | Chunking strategy: paragraph | |
--quiet | -q | bool | Suppress stderr output |