Services Work Process FAQ Get in Touch
Available for new projects

Get reliable data
from any website.

I build scrapers that work on sites others can't crack. Anti-bot bypass, large-scale extraction, automated pipelines that run for years.

10+ Years experience
50+ Clients served
100% Job success (Upwork)
Anti-bot bypass specialist
Production systems running 2+ years
Millions of records extracted
30-day bug fix guarantee
Michal, freelance data engineer

Hi, I'm Michal Búci.

Freelance data engineer with over 10 years of experience building web scrapers and data pipelines. I specialize in the projects other developers can't finish: sites with aggressive anti-bot protection, complex multi-source data aggregation, and systems that need to run reliably for years without babysitting.

I've helped clients whose existing scraping solution broke when sites added new protection. If you're stuck, I can probably help.

What you get when you work with me

Every project is different, but these are the problems clients come to me with most often.

Large-Scale Extraction

You get clean, structured data from thousands of pages across dozens of sites, all in one consistent format. No gaps, no duplicates, ready for your team to use.

Anti-Bot Bypass

You get the data even when Cloudflare, Akamai, Datadome, or CAPTCHAs stand in the way. If your previous scraping solution got blocked, this is what I do.

Legal & Government Data

You get consistent, structured data from court records, regulatory filings, and tax codes across all 50 states, even though every state has a completely different website.

Automated Data Pipelines

You get fresh data on your schedule without lifting a finger. Daily runs, monitoring, error alerts, and automatic recovery. Set it up once, rely on it for years.

★★★★★

"Had a great experience. Timely and cost-effective delivery of a web-scraper. Kept it running without glitches for the time-period we needed. Handled feedback and changes. Would definitely work with Michael again."

Social Media & News Scraper · Upwork Client

Have a data extraction challenge? Tell me about it and I'll give you an honest assessment.

Discuss Your Project

Real problems I've solved

Each project had unique challenges. Here's how I approached them and what I delivered.

Vacation Rentals Anti-Bot Bypass

Vacation Rental Price Intelligence System

A property management company needed real-time pricing and availability data from a major vacation rental platform to optimize their revenue. The platform uses enterprise-level anti-bot protection that blocked every scraping tool they tried.

Problem

The client was manually checking competitor pricing across dozens of properties. They needed automated daily data to feed their revenue optimization strategy, but the target site's Akamai and Datadome protection blocked all standard approaches.

Approach

  • Mimicked real browser fingerprints at the network level so anti-bot systems couldn't tell the difference
  • Built self-healing cookie management that automatically refreshes sessions
  • Rotated through 20+ proxies with sticky sessions for consistency
  • Created a web dashboard for managing properties and exporting data

Result

Fully automated pipeline running 3x daily, pulling 365 days of availability and pricing per property. Data syncs directly to the client's Guesty platform. Zero manual intervention since deployment.

365 days of pricing data per property
3x daily fully automated runs
Zero manual intervention needed
★★★★★

"Michal is easy to work with and delivers perfect code. Pleasant communication, too. No complaints. Thank you Michal!"

WordPress Product Feeds · 3-year ongoing engagement (Nov 2022 - Sep 2025)

E-Commerce Distributed Systems

Shopify Tax Compliance Checker

A client needed to verify sales tax collection across thousands of Shopify stores for all 50 US states. Manually checking each store in each state was impossible at scale.

Problem

The client needed a structured matrix showing which Shopify stores charge sales tax in which states. With thousands of stores and 51 jurisdictions, that's hundreds of thousands of checkout simulations needed.

Approach

  • Built a distributed system running across up to 10 cloud servers simultaneously
  • Simulated full checkout flows with realistic browsing behavior
  • Job queue system coordinating tasks across all instances
  • Real-time results syncing to Google Drive every 15 minutes
  • Fault-tolerant design: resumes automatically after any interruption

Result

Delivered a clean CSV matrix: one row per store, one column per state, each cell showing tax status. The system ran unattended for weeks, resuming automatically after any interruption.

50 states covered per store
10 servers running in parallel
Zero data lost on interruption
Government Anti-Bot Bypass

Government Procurement Data Scrapers

A B2B SaaS company aggregating government procurement data had their existing scrapers blocked by state portal anti-bot protections. I rewrote and hardened 5 scrapers from scratch.

Problem

Five U.S. state procurement portals were blocking the client's existing scrapers. Each portal had different protection: reCAPTCHA, hCaptcha, fingerprinting, and authenticated sessions. Their data pipeline was completely stalled.

Approach

  • Switched to undetectable browser automation that controls a real Chrome instance
  • Integrated automated CAPTCHA solving for reCAPTCHA v2/v3 and hCaptcha
  • Built a shared base framework so future scrapers take half the time to build
  • Residential proxy rotation with per-session binding

Result

All 5 portals successfully scraped after the previous developer's solution was blocked. The shared framework cut development time for future scrapers in half. Client's data pipeline was back online within weeks.

5 of 5 blocked portals now working
50% faster to add new scrapers
Weeks from stalled to production
★★★★★

"Michal worked very diligently and was able to complete the task precisely as asked. He even had the finished products submitted a day before the deadline. I look forward to working with him again!"

Reddit Scraper · Upwork Client

Finance Regulatory Crypto

Regulatory & Crypto Data Platform

A fintech company needed daily automated collection of regulatory news, cryptocurrency market data, and SEC filings across dozens of international sources. The system has been running in production for over 2 years.

Problem

The client tracked news from 9 international financial regulators, pricing from 47+ crypto exchanges, and SEC EDGAR filings. Doing this manually consumed their entire data team. They needed a zero-intervention automated system.

Approach

  • Built a modular scraping platform covering 56+ data sources
  • Deduplication engine preventing duplicate articles across years of data
  • Auto-publish to WordPress, cloud storage uploads to AWS S3 and GCS
  • SEC Form ADV parsing: private funds, CRD numbers, PDF brochure downloads
  • Automated monitoring with email alerts for success and errors

Result

The platform has run daily for over 2 years with zero downtime. It freed the client's data team to focus on analysis instead of manual data collection.

2+ years zero-downtime operation
56+ sources covered daily
Zero manual intervention needed
Finance AI / LLM Open Source

Stock News Scraper for AI Portfolio Analysis

I built a scraper that feeds my daily AI portfolio advisor with fresh stock news. It extracts headlines and full article text from Finviz by ticker, outputs clean Markdown for LLM consumption, and runs as a scheduled cloud service.

Problem

Generic search tools return inconsistent, stale, or irrelevant financial news when used as LLM tools. Needed a reliable daily source of recent articles per ticker with full text extraction, formatted for AI consumption.

Approach

  • Mimics a real browser so anti-bot systems can't tell the difference
  • Bypasses Yahoo Finance's consent wall programmatically
  • Detects and skips paywalled sources automatically
  • Converts articles to clean Markdown for LLM context

Result

Runs daily on Apify with zero server management. My AI advisor queries it every morning for fresh portfolio news. Consistently reliable data quality compared to generic search tools.

Daily automated runs
30 days of news coverage

Need something similar built for your business? I'd be happy to discuss your requirements.

Let's Talk

How it works

Three steps from your first message to reliable data flowing into your systems.

1

Tell me what you need

Share the target website, the data you're after, and how you plan to use it. I'll ask smart follow-up questions, not obvious ones.

2

I assess and quote

Within 24 hours, you get an honest assessment of feasibility, a clear timeline, and a fixed price or hourly estimate. No surprises.

3

I build and deliver

You get clean, structured data in your preferred format. Plus 30 days of bug fixes included, or ongoing maintenance if you need it.

Languages

  • Python
  • JavaScript / Node.js
  • SQL

Scraping

  • Playwright
  • Selenium
  • nodriver
  • curl_cffi
  • BeautifulSoup

Anti-Detection

  • Residential Proxies
  • TLS Impersonation
  • CAPTCHA Solving
  • Stealth Plugins

Databases

  • PostgreSQL
  • MySQL
  • Redis
  • MongoDB

Infrastructure

  • DigitalOcean / AWS
  • Docker
  • Linux
  • GitHub Actions

Data & Parsing

  • pandas
  • pdfplumber
  • JSON / CSV / Excel
  • REST APIs

Common questions

It depends on complexity, scale, and anti-bot protection. Here are typical ranges:

  • Single-site scraper (e-commerce, 10k products): $400-800
  • Social media / news scraper (multiple sources): $800-1,500
  • Anti-bot bypass project (Cloudflare/Akamai protected): $1,500-4,000
  • Multi-state government data (50 states, varying structures): $3,000-6,000
  • Ongoing data pipeline (daily runs, monitoring): $500-1,500/month

Most projects fall in the $1,000-2,000 range.

Probably. I specialize in sites that block standard scraping tools, including sites protected by Cloudflare, Akamai, Datadome, PerimeterX, and custom anti-bot systems. If you've tried other scrapers and they got blocked, that's exactly the kind of project I take on.

Send me the URL and I'll give you an honest assessment.

Generally yes, for publicly available data. The 2022 hiQ vs. LinkedIn ruling established that scraping public data does not violate the Computer Fraud and Abuse Act. I don't scrape data behind logins without authorization or personal data for prohibited purposes.

Simple scrapers: 2-5 days. Complex projects with anti-bot bypass: 1-3 weeks. Multi-state or large-scale systems: 3-6 weeks. I'll give you a realistic timeline after understanding your requirements.

Websites change. It happens. For one-time projects, I include a 30-day bug fix period after delivery. For ongoing data needs, I offer maintenance retainers that include monitoring, updates when sites change, and priority support.

Whatever you need: JSON, CSV, Excel, direct database insertion (PostgreSQL, MySQL, MongoDB), API endpoints, or webhook delivery. I'll match your existing systems.

Let's work together

Tell me about your project and I'll get back to you within 24 hours with an honest assessment and timeline.

Based in Europe (CET), with flexible hours and strong US timezone overlap. Typically responds within 24 hours.