academic-web-scraping

Installation
SKILL.md

Academic Web Scraping Guide

Overview

Research often requires collecting data from the web -- whether it is bibliographic metadata from academic databases, experimental datasets from public repositories, social media posts for computational social science, or economic indicators from government portals. Web scraping and API-based data collection are essential skills for modern researchers across disciplines.

This guide covers both approaches: structured API access for platforms that provide one, and web scraping for when no API exists. It emphasizes ethical data collection practices, including respecting robots.txt, rate limiting, terms of service compliance, and IRB considerations for human-subject data. The goal is to collect research data reliably and responsibly.

Whether you are building a dataset for a machine learning paper, collecting metadata for a systematic review, or gathering public data for policy research, these patterns help you do it correctly and efficiently.

API-Based Data Collection

APIs are always preferable to scraping when available. They provide structured data, are officially supported, and have clear usage terms.

Academic APIs

Installs
20
GitHub Stars
227
First Seen
Mar 10, 2026
academic-web-scraping — wentorai/research-plugins