vrk grab
About
Fetches a web page and extracts the readable content as clean markdown. Strips navigation, ads, scripts, and boilerplate - you get the article text, not the page chrome. Also supports plain text and raw HTML output modes.
The problem
You need the content of a web page for an LLM pipeline. curl gives you raw HTML full of nav bars, ads, and scripts. You write a Python script with BeautifulSoup to extract the article text, and it breaks on the next site.
Before and after
Before
curl -s https://example.com/article | \
python3 -c "
from bs4 import BeautifulSoup
import sys
soup = BeautifulSoup(sys.stdin.read(), 'html.parser')
print(soup.get_text())"
After
vrk grab https://example.com/article
Example
vrk grab --text https://example.com/article
Exit codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | HTTP error, fetch timeout, or I/O error |
| 2 | Usage error - invalid URL, no input, mutually exclusive flags |
Flags
| Flag | Short | Type | Description |
|---|---|---|---|
--text | -t | bool | Plain prose output, no markdown syntax |
--raw | bool | Raw HTML, no processing | |
--json | -j | bool | Emit JSON envelope with metadata |
--quiet | -q | bool | Suppress stderr output |