Web Scraper¶
The web scraper library extracts clean text content from web pages, optimized for LLM consumption. It was extracted from the recipes server's html_fetcher.py into a standalone reusable library.
Quick Reference¶
| Package | jarvis-web-scraper |
| Source | jarvis-web-scraper/ |
| Tests | 27 tests |
Usage¶
from jarvis_web_scraper import WebScraper
scraper = WebScraper()
# Extract clean text from a URL
result = scraper.scrape(url="https://example.com/article")
print(result.text) # Clean extracted text
print(result.title) # Page title
print(result.metadata) # Extracted metadata
Features¶
- Extracts main content, stripping navigation, ads, and boilerplate
- Returns clean text suitable for LLM context windows
- Handles common web page structures and formats
- Configurable extraction strategies
Consumers¶
- jarvis-command-center -- deep research tool (web search, scrape, summarize)
- jarvis-recipes-server -- URL recipe import (HTML parsing)