2026-05-06

I Built a Personal Data Removal Service. Then I Quit.

There's a moment, early in any scraping project, where everything feels possible. You write a function, point it at a website, and out comes structured data. Addresses, phone numbers, relatives' names. It's almost magic. For a few days, building Traceless felt like that.

Traceless was a personal data removal service. The idea was simple: your personal information is scattered across hundreds of data broker websites -- Spokeo, Whitepages, BeenVerified, and dozens of sites you've never heard of -- and for most people there's no good way to get it taken down. You could pay someone like DeleteMe or Optery to do it, or you could spend weeks manually submitting opt-out forms to each site. I wanted to build the third option: an AI-powered system that found your data, scored each match for confidence, and submitted opt-out requests on your behalf.

The first ten scrapers were fun. ZabaSearch, FastPeopleSearch, TruePeopleSearch -- these sites have basic bot checks. Rotate the user agent, add a delay, parse the DOM. Done. The feedback loop is addictive: you write 50 lines of TypeScript, run the scanner, and watch your name and home address come back from a website that sells it to anyone.

Then I hit Spokeo.

Spokeo uses PerimeterX, a behavioral analysis system that watches how you move through a page. Not just what requests you make, but how fast, in what order, with what timing. It can tell a real user from a script within seconds. To beat it, you have to simulate human behavior at the browser level: random scroll increments with variable pauses, per-keystroke typing delays, randomized viewports, WebGL vendor spoofing, timezone rotation. And you have to route all of it through residential proxies, because datacenter IPs are flagged immediately.

This is where the project stopped being fun and started being a maintenance contract.

The bot difficulty scale I ended up building has five levels. Level 1-2 sites work fine without proxies, around 95% success. Level 3 sites need residential proxies and fingerprint patching, but you can reach 80%. Level 4 sites like Spokeo and Whitepages need all of that plus slow, human-like interactions, and you still only hit 60-70%. Level 5 is basically invite-only; you flag those as manual and move on.

When I ran the full stack in production, I scanned 49 brokers and 39 came back successful. Ten were blocked. That's a 20% block rate -- right at the upper threshold I'd set as acceptable, with no degradation headroom, on day one.

The other number that stuck with me: 841. That's how many unique data broker sites I catalogued while building a prototype scanner. Of those, 657 are accessible via plain HTTP. The rest are dead, redirected, or blocked. To build a genuinely comprehensive removal service, you'd need scrapers for all of them. I built 51. Optery covers 65+. DeleteMe covers 750+. The gap isn't a product gap -- it's an operational one. Someone has been maintaining scrapers for those 700 additional sites for years.

I signed up for Optery while building Traceless, and I learned more from their marketing emails than from any competitive analysis I could have done. Their first email showed screenshots of my actual data on real sites: InmatesSearcher, PropertyRecs, TexasCourtRecords.us. Not a list of site names -- screenshots. Seeing your address displayed on a site called InmatesSearcher is visceral in a way that "your data is on 68 sites" is not. That's the insight that drives their conversion. I built the same thing for Traceless -- screenshot capture during the scan, PDF exposure report, a three-email drip sequence escalating from informational to fear-based. The code works. The insight was right.

But here's what I kept coming back to: the moat in this business isn't the product insight. It's the scraper maintenance.

Optery and DeleteMe aren't winning because they have a smarter matching algorithm or a more beautiful dashboard. They're winning because they've been running the anti-bot arms race longer. PerimeterX updates its behavioral models. Cloudflare ships new challenge pages. The techniques that work today might fail after the next model update. Staying current isn't a launch problem -- it's an ongoing operational cost that compounds over time.

A useful way to think about it: the first 50 scrapers cost about as much to build as the next 50. But the incumbents have years of accumulated scraper history. That's not an advantage you can buy at launch; you accumulate it by doing the work.

I killed the project in late March. Ran a proper strategic analysis -- the kind where you lay out eight different business paths and score them against your actual constraints. Traceless came out seventh of eight. The core problem: B2C pricing, crowded market, no existing audience, long build cycle. At $99 per year, I'd need over 3,600 paying customers for $30,000 per month. With no distribution and a product that takes months to prove its value, that's a very slow ramp.

Two days later, I revived it. That's the part I'm least proud of, in hindsight. The code was working. There was a running deployment. Walking away from something that works requires a different kind of discipline than abandoning something that failed. I went back and shipped the conversion engine in a single session, deployed it, and left it live.

Then I moved on to other things.

The other things turned out to be more interesting. Projects where I can go from idea to deployed in a day, get feedback in a week, and decide whether to continue based on something real. Data removal is the opposite of that. It's slow validation, slow growth, slow everything.

I'm shutting the site down. The full stack -- Vercel frontend, Railway backend, 49 broker scrapers -- is going offline. I'm not the right person to turn this into a business, and keeping it running for the sake of keeping it running doesn't make sense. The people who win in this space are the ones who treat scraper maintenance as a core competency, not a side effect of the real product.

The lesson, I think, is this: when you hit walls early in a project, it's worth asking whether they're accidental complexity or essential complexity. Accidental complexity is problems you made for yourself -- bad architecture, wrong abstraction. Essential complexity is inherent to the problem. The anti-bot arms race isn't an engineering problem you can solve once. It's a permanent feature of the landscape. If you build in this space, you sign up for it indefinitely.

I decided I'd rather build in spaces where most of the walls are accidental.