Udemy Instructor & Course Scraper
Large-scale extraction of instructor profiles and course metadata from Udemy. Bypassed bot detection using undetected-chromedriver. Multi-threaded architecture with resumable execution, proxy rotation, and automatic URL discovery via sitemaps.
Problem
Client needed to extract comprehensive instructor profiles and course data from Udemy at scale. Udemy employs aggressive bot detection that blocks standard Selenium-based approaches.
Technical Challenges
- Bypassed bot detection using undetected-chromedriver
- Built multi-threaded architecture for parallel processing
- Implemented resumable execution with URL caching
- Automatic instructor URL discovery via Udemy sitemaps
Data Extracted
Instructor name, bio, photo, total students, reviews, social links, course titles, lecture counts, ratings, pricing, and content duration.