Web scraping articles

6/20/2018 • EN

An Intro to Web Scraping With lxml and Python

A tutorial on web scraping basics using Python's lxml library and XPaths, demonstrated by extracting data from Steam.

Lxml Python web scraping Xpath

Yasoob Khalid

4/27/2018 • EN

Quicker Storify export

A programmer shares a script to automate exporting Storify content before it shuts down, saving time over the manual process.

automation Console Script Data Export web scraping

Lea Verou

4/24/2018 • EN

Reverse Engineering Facebook API: Private Video Downloader

A technical guide on reverse engineering the Facebook API to download private videos using Python and browser analysis.

Facebook API Python reverse engineering Video Downloader web scraping

Yasoob Khalid

4/23/2018 • EN

Reverse Engineering Facebook: Public Video Downloader

A technical guide on reverse engineering Facebook to create a tool for downloading public videos by analyzing network requests.

Facebook API reverse engineering Video Downloader web scraping

Yasoob Khalid

4/15/2018 • EN

Reverse Engineering Soundcloud API

A guide on reverse engineering the Soundcloud API to bypass download restrictions using Python.

api Python reverse engineering Soundcloud web scraping

Yasoob Khalid

2/20/2018 • EN

Saving and scraping a website with Puppeteer

A technical guide on using Puppeteer to scrape and save a complete copy of a website, including all assets, for performance audits.

headless chrome JavaScript Node.js puppeteer web scraping

Stefan Baumgartner

12/6/2017 • EN

The Perils of Outsourcing Your MVP

A developer shares a cautionary tale about the pitfalls of outsourcing an MVP, using a personal project as an example of what can go wrong.

Freelancing mvp Outsourcing Web Development web scraping

Michael Lynch

11/9/2017 • EN

Scraping the web with Scrapy

A summary of a Python Frederick talk on using Scrapy, a Python framework for web scraping, including a link to the presentation.

Python Scrapy web scraping

Matt Layman

10/17/2017 • EN

Goodreads 👍📚 Part 2: rvesting descriptions

A technical guide on using R's rvest package to scrape book descriptions and genres from Goodreads, adapting code from an existing project.

data analysis Goodreads R Programming Rvest web scraping

Mara Averick

9/13/2017 • EN

Analyzing HN moderation & censorship

An analysis of Hacker News moderation tools and practices, based on data scraped from the site's API.

API Scraping data analysis Hacker News Moderation web scraping

Drew DeVault

2/17/2016 • EN

Did your Fedora live cd build fail?

A Fedora maintainer shares a Python script to scrape and email daily reports of failed live CD builds from Koji.

automation Cron linux Python web scraping

Amit Saha

1/22/2016 • EN

Scraping Euro Parliament Website Using Python

A technical guide on using Python to scrape public data, including answers to questions, from the European Parliament website.

data extraction Lxml Python Requests web scraping

Yasoob Khalid

12/29/2015 • EN

Polyglot Twitter Bot, Part 1: Node.js

First part of a series on building a Twitter bot using Node.js, covering setup, authentication, and basic search functionality.

AWS Lambda Bot Development Node.js twitter api web scraping

Joel Grus

8/13/2015 • EN

Reading LWN.net with Pocket

A developer shares a Python script to save subscriber-only LWN.net articles to Pocket for offline reading.

automation github programming Python web scraping

Julien Danjou

7/26/2015 • EN

Making the Whitney Houston API

A developer documents their journey creating a Whitney Houston song API from scratch using Python, web scraping, and JSON.

API Development json Python web scraping Wikipedia

Cassidy Williams

3/24/2015 • EN

Inject JavaScript with PhantomJS to inspect websites

Learn how to use PhantomJS, a headless browser, to inject and execute JavaScript for inspecting dynamic websites.

dom manipulation Headless Browser Javascript Injection Phantomjs web scraping

Matt Layman

1/15/2015 • EN

Want suggestions for next post

The author asks readers to choose which of his two Python web scraping projects he should write a tutorial about next.

Beautifulsoup Facebook Bot Htmlparser Requests web scraping

Yasoob Khalid

1/15/2015 • EN

Simple way to scrape web with Ruby

A tutorial on using Ruby and the Mechanize gem to scrape personal fitness data from MyFitnessPal when API access is unavailable.

api data extraction Mechanize ruby web scraping

Krzysztof Zabłocki

11/27/2014 • EN

Extracting schedule information from timeedit

A technical analysis of reverse-engineering TimeEdit's web interface to extract and parse schedule data via its JSON and HTML endpoints.

API Integration Data Parsing json Timeedit web scraping

Jonas Hietala

6/29/2014 • EN

EuroPython 2014 and me

The author announces they will be giving their first conference talk on Webscraping in Python at EuroPython 2014.

conference Python web scraping

Yasoob Khalid

Web scraping Articles

An Intro to Web Scraping With lxml and Python

Quicker Storify export

Reverse Engineering Facebook API: Private Video Downloader

Reverse Engineering Facebook: Public Video Downloader

Reverse Engineering Soundcloud API

Saving and scraping a website with Puppeteer

The Perils of Outsourcing Your MVP

Scraping the web with Scrapy

Goodreads 👍📚 Part 2: rvesting descriptions

Analyzing HN moderation & censorship

Did your Fedora live cd build fail?

Scraping Euro Parliament Website Using Python

Polyglot Twitter Bot, Part 1: Node.js

Reading LWN.net with Pocket

Making the Whitney Houston API

Inject JavaScript with PhantomJS to inspect websites

Want suggestions for next post

Simple way to scrape web with Ruby

Extracting schedule information from timeedit

EuroPython 2014 and me

Select Language

We use cookies