Web scraping articles

1/26/2026 • EN

I'm swearing off APIs entirely

A developer explains why they are giving up on building apps that rely on external APIs due to access issues, ethical concerns, and platform risks.

API Access Data Dependency oauth side projects web scraping

Dave Rupert

12/29/2025 • EN

shot-scraper 1.9

shot-scraper 1.9 CLI tool released, featuring a new -x option to extract page resources and accessibility command fixes.

cli Digital Forensics Playwright Shot Scraper web scraping

Simon Willison

12/28/2025 • EN

TIL wget can download full sites

A technical guide explaining how to use wget with recursive options to download entire websites for offline viewing, including a breakdown of key command-line flags.

CLI Tools command line Offline Browsing web scraping Wget

Vincent Warmerdam

12/8/2025 • EN

Millions of Locations for Thousands of Brands

Analyzing All The Places' open-source location data project, detailing the technical setup and process for downloading and examining millions of brand locations.

data analysis Duckdb github Python web scraping

Mark Litwintschik

11/24/2025 • EN

Progressive Web Scraping with a Four-Tier Fallback System

A technical tutorial on building a smart web scraping system that automatically escalates through four tiers of complexity until it succeeds.

Bright Data curl Fallback System Playwright web scraping

Daniel Miessler

10/6/2025 • EN

Inside Claude Code's Web Tools: WebFetch vs WebSearch

A technical analysis of Claude Code's WebFetch and WebSearch tools, detailing their internal architecture and processing pipelines.

api design Claude Code LLM Agents web scraping Web Tools

Mikhail Shilkov

9/27/2025 • EN

Get xkcd Cartoons at 2x Resolution

Discover an undocumented trick to get xkcd comics at double resolution using a simple URL modification and a Python script to check availability.

Image Resolution Python srcset URL Manipulation web scraping

Michael Lynch

9/13/2025 • EN

“We’re Walling Off The Open Internet To Stop AI”

Discusses the trend of websites walling off content from AI bots, arguing it undermines open internet principles and may concentrate power.

ai ethics Content Protection Open Internet Tech Policy web scraping

Alex Seifert

8/29/2025 • EN

Cat and Mouse: Challenges in Adversarial Web Scraping

A talk exploring adversarial web scraping, covering bot detection techniques and ethical methods to bypass them from both scraper and site operator perspectives.

Adversarial Techniques Bot Detection Elixir http web scraping

Tyler A. Young

7/2/2025 • EN

goHardDrive Leaked Personal Data for Thousands of Customers

A security researcher discovers goHardDrive exposed thousands of customer records via an insecure RMA status check form with no authentication.

API Security Data Breach Information Disclosure privacy web scraping

Michael Lynch

3/31/2025 • EN

Poisoning Well

Explores the ethics of LLM training data and proposes a technical method to poison AI crawlers using nofollow links.

ai ethics Data Poisoning llm robots.txt web scraping

Heydon Pickering

3/26/2025 • EN

Building a Personal Content Recommendation System, Part Two: Data Processing and Cleaning

Part two of building a personal recommendation system, covering data collection from Pocket and content extraction using the Jina Reader API.

Data Cleaning data processing github recommendation systems web scraping

Saeed Esmaili

3/17/2025 • EN

Please stop externalizing your costs directly into my face

A developer's frustration with aggressive LLM crawlers causing outages and consuming resources, detailing past abuse like crypto mining and Go module mirror issues.

Abuse Mitigation GIT Hosting LLM Crawlers robots.txt web scraping

Drew DeVault