the differentiator

The stack.

Most AI agencies resell Vapi or Bland and pay a per-minute platform tax on every client call. Ascero AI owns the stack from telephony to memory — open-source, multi-model, multi-cloud. The result: flat-rate client pricing, no single-vendor lock-in, and a margin curve that improves with scale instead of getting worse.

Below is the actual architecture, with every component named and linked. No black boxes.

Voice layer

Owned voice receptionist — Pipecat + LiveKit on Twilio

Most AI agencies resell Vapi or Bland and pay a per-minute platform tax on every call. Ascero AI runs Pipecat (open-source real-time voice orchestration) on LiveKit (WebRTC transport) with Twilio handling the PSTN interconnect. Your receptionist runs on our infra, not someone else's billing meter — so margins compound as you grow.

Pipecat

Real-time voice agent orchestration (open source)

LiveKit Agents

WebRTC transport for sub-300ms latency voice

Twilio Programmable Voice

Inbound + outbound PSTN telephony

Why this matters: Per-minute platform fees compound to the wrong side of the trade as call volume grows. Owning the voice stack lets Ascero AI offer flat-rate pricing instead of usage-meter pricing, which the research shows is the #1 SMB conversion lift in the AI receptionist category.

Reasoning layer

Multi-model routing across Anthropic, OpenAI, Google

Every model has a different shape. Claude is the default for nuanced reasoning, multi-step planning, and the Ascero chat agent. GPT-realtime carries the voice receptionist's conversational layer. Gemini handles long-context document reasoning and Google Workspace integration. Switching the model is a config change, not a rewrite.

Anthropic Claude

Default reasoning + agent workflows

OpenAI GPT + Realtime + Whisper

Voice, vision, transcription, embeddings

Google Vertex AI / Gemini

Long-context + Workspace integrations

NVIDIA NIM

Self-hosted inference for hybrid + on-prem clients

Why this matters: Single-model agencies are downstream of one vendor pricing, latency, and policy decision. Multi-model routing keeps Ascero AI insulated from any single provider — and lets each task land on the model that does it best.

Browser + research layer

Browser-native agents — Stagehand + browser-use

Half of small-business work happens inside web portals: insurance carriers, MLS listings, scheduling tools, e-commerce admins. Stagehand v3 (Browserbase) handles deterministic, reliability-critical flows like booking and checkout. browser-use handles exploratory research and data-entry tasks via accessibility-tree navigation — no XPath, no brittle selectors. Together they replace anywhere a human currently logs into a portal.

Browserbase Stagehand v3

Hybrid AI + Playwright for reliability-critical flows

browser-use

LLM-native browser via accessibility trees

Why this matters: Outsourced VAs cost $1,200-$3,000/mo per worker and break the moment a portal redesigns. Browser-native agents are deterministic where it matters (checkout, booking) and resilient where it doesn't (research). Ascero AI productizes this as a deployable tier.

Memory + ops layer

Persistent agency memory — claude-mem + multica

Every Ascero AI client engagement gets its own persistent memory store. The chat agents, the receptionist, the workflows all share context across sessions instead of cold-starting every conversation. multica runs the internal agency ops — content drafts, outbound campaigns, build tasks — assigned to agents like a Linear board assigns to humans.

claude-mem

Persistent context across Claude Code / Codex sessions

multica

Agent task assignment + tracking (Linear-for-agents)

Why this matters: Most AI agencies lose context between sessions and re-explain every project from scratch. Persistent memory is the difference between an agency that gets better over time and one that re-introduces itself every Monday.

Tool + skill layer

Skill marketplaces — anthropics/skills + curated agency stack

Ascero AI ships every client engagement with a curated skill set: official Anthropic skills (document generators, code skills) plus an Ascero-specific layer (vertical-tuned receptionist scripts, lead-qualification logic, ROI math templates). Skills compose like Lego bricks — every project starts further down the cost curve than the last.

anthropics/skills

Official Skills repo (doc generation, code skills)

addyosmani/agent-skills

Production-grade engineering skills baseline

Composio

250+ SaaS auth layer for outbound + CRM-sync agents

Why this matters: Building from scratch every engagement is the agency anti-pattern. A curated skill marketplace + persistent memory means month-12 engagements cost 30% of the labor that month-1 engagements did.

the thesis

Every other AI agency is downstream of someone else's billing meter.

Ascero AI owns the voice stack, the browser stack, and the memory layer. Open source where it counts. Multi-model where it matters. Self-hostable when a client demands it.

That's the moat — and it's why Ascero can sell flat-rate while the rest of the field sells per-minute.

Book a stack walkthrough →

last reviewed · 2026-05-11

Written and maintained by Kadin Nestler + Jaiden Lawlor. The two co-founders of Ascero AI.

Find a mistake? Email us — we update inside 48 hours.