edX Institution Course Scraper

This Python script is designed to scrape information from the edX website, specifically focusing on the institutions listed on their "Schools & Partners" page and extracting the title of the first course offered by each. It leverages Selenium to handle dynamically loaded content, ensuring that data populated by JavaScript is correctly captured.

✨ Features

⚙️ Prerequisites

Before running this script, ensure you have the following installed:

ChromeDriver Setup

  1. Download ChromeDriver: Visit the ChromeDriver Downloads page.
  2. Match Chrome Version: Download the ChromeDriver version that matches your installed Google Chrome browser version. You can check your Chrome version by going to chrome://version in your browser.
  3. Place ChromeDriver:
    • Recommended: Place the chromedriver executable in a directory that is included in your system's PATH environment variable (e.g., /usr/local/bin on macOS/Linux, or a directory added to PATH on Windows).
    • Alternative: If you don't want to modify your PATH, you can specify the full path to the chromedriver executable in the initialize_driver function within the script (uncomment and modify the Service(executable_path='...') line).

🚀 Installation

  1. Clone this repository (or copy the script content) to your local machine.
  2. Navigate to the project directory in your terminal.
  3. Install the required Python libraries:
    pip install selenium pandas requests beautifulsoup4

🏃‍♀️ Usage

To run the scraper, simply execute the Python script from your terminal:

python your_script_name.py

(Replace your_script_name.py with the actual name you save the script as, e.g., edx_scraper.py)

The script will:

📊 Output

The edx_institution_courses.csv file will contain two columns:

Example edx_institution_courses.csv content:

Institution,First Course Offered
ACCA,Financial Accounting
Harvard University,CS50's Introduction to Computer Science
MIT,Introduction to Computer Science and Programming in Python
...

⚠️ Important Notes

💡 Future Enhancements

⬇️ Download the Code

You can download the Python scraper script from my GitHub repository:

Download edx_scraper.py