GitHubMAY 28, 2026 · OPEN SOURCE · AI AGENTS

Browser-Use Hits 96K Stars. The Agent Layer Is the New War.

browser-use at 96K stars in 14 months. Stagehand at 22.8K. Browserbase at $67.5M raised. The browser-agent infrastructure layer is the new fight — and most of it still breaks on a Cloudflare check.

By Kadin Nestler · May 28, 2026 · 12 min read

Share X LinkedIn Email

Browser-agent infrastructure: who's shipping what

1
Anthropic Computer Use
Pixel-driven · Claude 4.7 hits 78.0% on OSWorld-Verified
Released Oct 2024
2
browser-use (Müller / Zunic)
Open source · Python · MIT · YC W25 · 89.1% WebVoyager
96K stars · $17M seed
3
Stagehand (Browserbase)
TypeScript SDK · MIT · DOM-aware · v3 shipped Q1 2026
22.8K stars · $67.5M total
4
OpenAI ChatGPT Agent
Folded into ChatGPT Agent · 68.9% on BrowseComp
Operator killed Aug 31, 2025
5
WebVoyager benchmark
Magnitude SOTA at 93.9% · browser-use at 89.1%
586 live web tasks

The 14-month sprint nobody saw coming

When Anthropic shipped Computer Use in October 2024, the framing was modest — a beta capability that let Claude move a mouse and read a screen. Demos were rough. Latency was bad. The model would mis-click buttons, hallucinate dropdowns, and occasionally try to fill a search box with the contents of a different tab. The reaction in the AI infra crowd was split. Half called it a research artifact. The other half saw the same thing the Anthropic team did: this was the door opening on a category that had been waiting a decade.

Fourteen months later, the category has a name — browser-agent infrastructure — and a leaderboard. The open-source library browser-use from a two-person Zurich team is sitting at 96,000 stars on GitHub at the time of writing, with 10,800 forks and an MIT license. Browserbase has raised $67.5M across seed, Series A, and Series B, with the most recent round in mid-2025 valuing the company at $300M post-money. Their open-source SDK, Stagehand, is at 22.8K stars and shipped v3 earlier this year with a redesigned agent primitive.

Meanwhile OpenAI launched Operator in January 2025, folded it into ChatGPT Agent in July 2025, and deprecated the standalone product on August 31, 2025. The category moved that fast. The post-mortem on Operator is instructive — it shipped at 38.1% on OSWorld and 58.1% on WebArena, and OpenAI's own messaging admitted users were routing the wrong queries to it. ChatGPT Agent now hits 68.9% on BrowseComp, and the Operator brand is gone.

That is the chronology. The strategic read is more interesting: every major AI lab and a half-dozen well-funded startups have decided the browser is the surface where agentic AI either becomes useful or stays a research toy. The fight to own the layer between an LLM and a Chromium instance is the new infrastructure war. Most of the players will lose. The two open-source contenders worth watching are browser-use and Stagehand, and the architectural choice they made about how an agent sees a webpage is what separates the survivors from the casualties.

Pixel-driven vs DOM-aware: the architectural fork

Every browser agent has to answer one question before it does anything else. When the agent looks at a page, what does it look at?

There are two answers. The first is pixel-driven — the agent takes a screenshot, the multimodal LLM reads the image, and the agent outputs coordinate-based actions. Click at (847, 312). Type into the field at (445, 580). This is the Anthropic Computer Use approach, and it is also what OpenAI Operator used. The advantage is universality: the agent can interact with literally anything visible on screen, including non-web apps, Flash plugins, embedded canvas elements, screen-share windows. The downside is brutal — coordinate-based clicks fail constantly on responsive layouts, the screenshot-to-action loop is slow (Anthropic's own demos run at one action every 5-10 seconds), and the agent has no notion of semantic structure. It is looking at pixels and pattern-matching against its training corpus of web UIs.

The second answer is DOM-aware — the agent reads the actual rendered DOM tree, identifies interactable elements by tag and attribute, and outputs structured actions like "click the button labeled 'Submit Order'." This is what browser-use does, and it is what Stagehand does. The advantage is precision: clicking by selector is deterministic, latency drops to 1-3 seconds per action, and the agent has a real model of what is on the page. The cost is that the agent only works inside a browser context — no native apps, no PDFs that the browser hasn't rendered, no screen-share. The agent is a webpage agent, not a computer agent.

Both libraries hedge slightly. browser-use accepts an optional screenshot alongside the DOM snapshot for cases where layout matters more than structure. Stagehand processes the DOM but uses chunking and ranking to reduce token spend on large pages. Neither is purely one or the other. But the architectural commitment is real — both teams bet that for the 90% of valuable agent work that happens inside a browser, DOM beats pixels on speed, cost, and reliability.

The benchmark data backs the bet. On the WebVoyager benchmark — 586 live web tasks across 15 sites, the de facto standard for browser-agent reliability — browser-use hits 89.1% task completion. Stagehand's agent primitive lands around 75% in third-party comparisons. The pixel-driven incumbents are nowhere close on equivalent web tasks. Anthropic Computer Use's recent gains have come from improvements to OSWorld-Verified — Claude Opus 4.7 hit 78.0% there in early 2026 — but OSWorld is a different test, measuring desktop computer tasks on Ubuntu rather than focused browser interaction.

What the funding rounds actually bought

Browserbase raised a $40M Series B in June 2025 at a $300M post-money valuation, led by Notable Capital with CRV and Kleiner Perkins doubling down. The round brought total funding to $67.5M in fifteen months. Patrick Collison, Jeff Lawson, and Guillermo Rauch joined as angels. The pitch was not Stagehand directly. The pitch was the underlying infrastructure — headless Chromium-as-a-service with stealth fingerprinting, residential proxy rotation, persistent session storage, and a control plane built to run thousands of parallel browser instances on demand. Stagehand is the SDK that makes the infra usable; the infra is the moat.

browser-use raised a $17M seed round in March 2025, led by Felicis with A Capital, Nexus Ventures, Y Combinator (the company was YC W25), Paul Graham, Liquid2, SV Angel, and Pioneer Fund. The seed-stage thesis was different — browser-use is positioned as the open-source standard, with a commercial cloud product layered on top for teams that want stealth, parallel execution, and managed sessions without running the infra themselves. The founders, Magnus Müller and Gregor Zunic, met at ETH Zurich's Student Project House while working on web-scraping tools during their data science masters. The library shipped in late 2024 and crossed 79K stars by the time of the seed announcement; it is now at 96K, putting it in the top-10 fastest-growing AI infra repos of 2025.

The customer lists overlap and tell the same story. browser-use lists Airbnb, Amazon, and Anthropic among its production users. Browserbase customers include Perplexity, Cognition, and a long tail of vibe-coding startups that need a browser their agents can drive. Both companies are selling the same picks-and-shovels economics — when every AI agent needs a browser, the team that owns the browser layer collects rent on the whole sector.

Where the agent layer still trips

The honest read, after running both libraries against real production sites for the better part of 2025, is that the demos are doing a lot of work the production reality is not.

WHAT EVERY BROWSER AGENT STILL TRIPS ON

The gap between the WebVoyager benchmark score and the production reliability number is real, and predictable. Browser agents reliably fail on Cloudflare bot challenges (which now block ~20% of the public web by default on first contact from a headless browser), unexpected modal dialogs that hijack focus, infinite-scroll containers that load state lazily, OAuth handoffs that open in popup windows the agent loses track of, and any flow where session state persists across multiple tabs. The benchmark sites are picked partly because they don't do these things. Your production targets will. Budget for failure rates in the 30-50% range on uninstrumented sites, even with the best libraries.

Cloudflare alone is the single largest reliability cliff. The company has deployed AI Labyrinth, pay-per-crawl walls, and tuned bot-detection models specifically targeting Chromium DevTools Protocol fingerprints in 2025. A naked browser-use agent walking onto a Cloudflare-protected target gets a CAPTCHA on first contact and stalls. The workarounds are infrastructure-level, not library-level — residential proxy rotation, browser fingerprint spoofing, session warming on real human-like traffic patterns. Browserbase sells exactly this stack as a premium tier. browser-use cloud offers a similar wrapper. Running either library against the open web without one of these stacks is asking for a 50%+ failure rate.

The second cliff is modal dialogs and dynamic state. Both libraries handle the happy path well — page loads, the agent reads the DOM, the agent clicks. The unhappy path — a cookie banner that wasn't there a second ago, a session-timeout modal, a region-blocked redirect, a captcha intermezzo, an A/B-tested layout variant that the LLM has never seen — produces stuck agents that either retry forever or quit with vague errors. The GitHub issue queues on both projects are full of these. The libraries are improving, but the fundamental problem is that an LLM-driven loop has no equivalent of Playwright's waitForLoadState('networkidle') that always works. The agent's notion of "the page is ready" is a guess based on what the DOM looked like a second ago.

The third cliff is multi-tab and popup OAuth flows. Most useful agent work involves logging into something. Most login flows involve a popup, an OAuth redirect, or a token-bearing iframe. Both browser-use and Stagehand have improved support for multi-tab orchestration in their 2026 releases, but anyone who has run a serious agent through a real OAuth flow knows the failure rate is higher than the docs admit. The agent loses track of which tab has focus, the parent context goes stale, and the OAuth callback either lands in the wrong session or doesn't fire at all.

The fourth, and possibly most important: agents still cannot reliably tell you when they have failed. A browser-use run that clicks the wrong button on a checkout flow will happily complete with status "success." Verifying agent output requires a separate verification layer — a deterministic script that confirms the order actually placed, the email actually sent, the form actually submitted. This is the part of agent-driven automation that nobody has solved at the library level, because it is necessarily domain-specific. Every production deployment of a browser agent in 2026 includes a custom verification harness wrapped around it. The libraries got the easy 80%. The verification problem is the remaining 20% and it is not getting solved soon.

Why the category still wins anyway

The structural argument for browser-agent infrastructure is not that the agents are reliable today. They are not. The argument is that they are 5-10x more reliable than they were when Computer Use shipped 14 months ago, the gap between benchmark and production is narrowing, and the alternative — hand-written Playwright scripts that break every time a target site ships a CSS change — has costs that compound the other direction.

The production calculus most teams are running looks like this. A Playwright script written by a senior engineer in 2026 takes about 4 hours to ship and breaks roughly every 30 days, requiring another hour to fix. Annualized maintenance: 12 hours per script. A browser-use or Stagehand agent for the same task takes about 30 minutes to ship, completes the task 75-89% of the time on the first try, and survives target-site UI changes that would have killed the Playwright version. Annualized maintenance: closer to 2 hours, plus the LLM inference cost per run. For tasks that need to run once a week against a UI you don't control, the math swings hard toward the agent. For tasks that run once a second against a UI you do control, Playwright still wins.

This is why the category gets to keep its valuations even with the reliability gap. The structural shift is real. The teams shipping browser agents in production are not betting on perfect reliability today. They are betting that the LLM-driven layer plus the verification harness compounds faster than the deterministic-script ecosystem. So far the data supports the bet.

What is genuinely uncertain is which library wins the open-source category and which infrastructure provider wins the cloud layer underneath it. browser-use has the star count and the Python audience, which matters because Python is the language most AI engineers ship in. Stagehand has the better TypeScript story and the Browserbase infra integration, which matters because frontend-aware teams build with Node and want one vendor for both pieces. Both are open-source MIT. Both ship every two weeks. The winner is going to be decided by who gets the verification problem right first, because that is the next ceiling on production adoption.

What to do this week

If you are evaluating browser-agent infrastructure for a production deployment, the practical sequence is simple. Pick one library — browser-use if your stack is Python, Stagehand if it is TypeScript. Build a single end-to-end agent against a non-Cloudflare target as a baseline. Measure the gap between your benchmark success rate and your production success rate against your real target site. That gap is the budget you owe to the verification layer, the proxy infrastructure, and the retry logic. Most teams under-budget it by 5x.

The category is real. The reliability is improving. The fight to own the layer is going to define what agentic AI looks like in production for the next three years. Browser-use and Stagehand are the two open-source bets that matter. Bet accordingly.

"The browser is the surface. The agent is the actor. The verification layer is the part nobody talks about and the part that decides whether any of this ships."

Sources

Cite this article

Ascero AI. “Browser-Use Hits 96K Stars. The Agent Layer Is the New War..” May 28, 2026. https://asceroai.com/news/browser-use-stagehand-agent-frontier-2026

Free to reference with attribution and a link back to this page.

Did this land? Pass it on.