I'm swearing off APIs entirely
A developer explains why they are giving up on building apps that rely on external APIs due to access issues, ethical concerns, and platform risks.
A developer explains why they are giving up on building apps that rely on external APIs due to access issues, ethical concerns, and platform risks.
shot-scraper 1.9 CLI tool released, featuring a new -x option to extract page resources and accessibility command fixes.
A technical guide explaining how to use wget with recursive options to download entire websites for offline viewing, including a breakdown of key command-line flags.
Analyzing All The Places' open-source location data project, detailing the technical setup and process for downloading and examining millions of brand locations.
A technical tutorial on building a smart web scraping system that automatically escalates through four tiers of complexity until it succeeds.
A technical analysis of Claude Code's WebFetch and WebSearch tools, detailing their internal architecture and processing pipelines.
Discover an undocumented trick to get xkcd comics at double resolution using a simple URL modification and a Python script to check availability.
Discusses the trend of websites walling off content from AI bots, arguing it undermines open internet principles and may concentrate power.
A talk exploring adversarial web scraping, covering bot detection techniques and ethical methods to bypass them from both scraper and site operator perspectives.
A security researcher discovers goHardDrive exposed thousands of customer records via an insecure RMA status check form with no authentication.
Explores the ethics of LLM training data and proposes a technical method to poison AI crawlers using nofollow links.
Part two of building a personal recommendation system, covering data collection from Pocket and content extraction using the Jina Reader API.
A developer's frustration with aggressive LLM crawlers causing outages and consuming resources, detailing past abuse like crypto mining and Go module mirror issues.
Explains the LLMs.txt file, a new standard for providing context and metadata to Large Language Models to improve accuracy and reduce hallucinations.
A guide to using browser-use, a scriptable AI agent built with Playwright and LLMs to automate repetitive browser tasks.
Explores using Bing Search API to ground LLM responses for website assistants, comparing custom implementation with Azure AI Agent Service.
A technical tutorial demonstrating mouse right-click operations and stream()/has-text() methods using Playwright Java for test automation.
A technical tutorial on creating interactive data tables by web-scraping with R's rvest package and styling with reactable.
Analysis of 5 years of Hacker News 'Who's Hiring' thread data using Deno and the HN API to visualize tech hiring trends.
Cloudflare now offers a simple setting to block AI bots from scraping your website, available even on free plans.