edX Institution Course Scraper

A comprehensive Python web scraping tool designed to extract course information from edX institutions using advanced Selenium WebDriver automation.

🚀 Project Overview

This sophisticated scraper navigates through edX’s “Schools & Partners” page, automatically discovers all listed educational institutions, and intelligently extracts the title of the first course offered by each organization. Built with robust error handling and dynamic content support.

✨ Key Features

🔄 Dynamic Content Scraping: Uses Selenium WebDriver to handle JavaScript-loaded content
🏫 Institution Discovery: Automatically extracts profile URLs for all edX schools and partners
📚 Course Identification: Navigates to each institution’s page and finds their first course
💾 Incremental CSV Output: Saves data progressively to prevent loss during long scraping sessions
🛡️ Robust Element Selection: Multiple CSS selectors and fallback mechanisms for reliability
🚀 Headless Operation: Runs efficiently in background without browser UI

🛠️ Technology Stack

Python 3.x - Core programming language
Selenium WebDriver - Browser automation and dynamic content handling
BeautifulSoup4 - HTML parsing and data extraction
Pandas - CSV data manipulation and export
ChromeDriver - Automated browser control

📁 Project Files

edx_course_scrapper.py - Main scraper implementation with comprehensive error handling
README.md - Complete setup guide and usage documentation
code-viewer.html - Interactive source code viewer with syntax highlighting
Documentation - Detailed technical specifications and examples

� Quick Start

Prerequisites

class="highlight">

1
pip install selenium pandas requests beautifulsoup4
Download ChromeDriver
Visit ChromeDriver Downloads and install the version matching your Chrome browser.
Run the Scraper
class="highlight">1
python edx_course_scrapper.py
� Sample Output
The script generates edx_institution_courses.csv:
Institution First Course Offered
Harvard University CS50’s Introduction to Computer Science
MIT Introduction to Computer Science and Programming in Python
Stanford University Machine Learning
💻 Interactive Code Viewer
Explore the complete source code with syntax highlighting and easy copying:
📄 edx_course_scrapper.py
259 lines • Complete Python implementation with documentation
 
⬇️ Download Options
� Download Python Script - Direct file download
📋 Copy from Viewer - Use the copy button in the code viewer above
📁 View on GitHub - Browse project repository
🎯 Use Cases
� Educational Research: Analyze course offerings across institutions
🔍 Market Analysis: Track trends in online education
🏫 Institutional Comparison: Compare course portfolios between universities
📊 Data Science Projects: Build educational datasets for analysis
🚀 Advanced Features
Error Recovery: Continues scraping even if individual pages fail
Rate Limiting: Respectful delays between requests
Multiple Selectors: Handles different page layouts automatically
Headless Mode: Efficient background operation
Progress Tracking: Real-time status updates during scraping
📖 Need Help?
Check the detailed README for complete setup instructions, troubleshooting tips, and advanced configuration options.
Recently Updated
 edX Institution Course Scraper - Web Scraping with Selenium
 Flashmaster - Interactive Learning Application
Trending Tags
 automation beautifulsoup data-extraction flashcards javascript learning python react selenium typescript
© 2025 Usama Sadiq. Some rights reserved.
Using the Chirpy theme for Jekyll.
Trending Tags
 automation beautifulsoup data-extraction flashcards javascript learning python react selenium typescript
 
 
A new version of content is available.