The Managed-Agent Agency Playbook

#Foreword: Why Now

On February 11, 2026, Y Combinator broke a twenty-year rule. The accelerator that had refused to fund services businesses since its 2005 founding added "AI-Native Agencies" to its Spring 2026 Request for Startups^[1]. Group Partner Aaron Epstein, who built and sold Creative Market, became the named champion: the next trillion-dollar company might be a services company, not software[^24].

The thesis is structural. Per Sacra's October 2025 Sierra valuation analysis^[2], Sierra grew to $100 million in ARR in October 2025, up 400% year-over-year from $20 million, at a $10 billion valuation — a 100x revenue multiple. Decagon hit $17 million ARR by April 2025 (up 900% YoY) at $650 million^[2]. Per UCStrategies' February 2026 coverage^[3], Dutch startup Aizy hit €2 million ARR in 6 months from launch with 150 clients — replacing agency retainers with a flat SaaS subscription^[3]. Per BCG's February 2026 tech-services analysis^[4], the agentic AI shift unlocks up to $200 billion in net new value pools over the next five years^[4].

The category exists because AI shifts professional services from headcount-bound margins to software-like margins. This paper is the operating manual for what the category looks like when it works — and the failure-mode catalog for when it doesn't.

#Executive Summary

A managed-agent agency builds, runs, and maintains AI agents for clients on a monthly retainer. The client gets a digital employee they never have to think about. The agency captures software-grade margins on what used to be a labor-bound business.

The unit economics work. Per the Ertas AI three-tier pricing playbook^[5], a Tier 2 Custom Agent at $3,500 per month produces $3,090 in gross profit per client per month — an 88% gross margin against $10/month in infrastructure and $400/month in labor^[5]. Per a16z's "Anatomy of an Enterprise Platform Company" (October 2025)^[6], the comparable enterprise-platform benchmarks are GDR >95% (often >97%), net retention >120% under $1B ARR, and average contract values above $100K^[6]. Per Get Latka's company data on Lindy^[7], Lindy reached $5.1 million ARR in 2024 with 37 employees and approximately $50 million raised^[7].

The pricing model splits four ways. Salesforce launched Agentforce 2.0 at $2 per conversation in December 2024^[8], then introduced Flex Credits at $0.10 per action in May 2025 after survey data showed 90% of CIOs report managing AI costs is limiting their ability to drive value^[9]. Sierra prices outcomes at $1.50-$3.00 per fully-resolved ticket^[10]. ServiceNow's Now Assist surpassed $600 million in net new ACV in Q4 2025 on a hybrid subscription-plus-consumption model^[11]. Per CallSphere's enterprise CX shootout^[10], per-seat plus AI add-on pricing is losing share fastest; per-conversation and per-outcome models are gaining^[10].

The procurement reality is brutal. Per CIO Dive's February 2026 coverage of the Dataiku/Harris Poll of 600 CIOs^[12], 74% of CIOs regret a major AI vendor decision within the last 18 months, and 87% have AI agents embedded in critical systems while only 25% have full visibility into all agents in production^[12]. Per ITBrief's reporting on the Zapier 542-executive survey^[13], 89% of leaders believed they could switch AI vendors within a month — but 58% of attempted migrations failed or required far more effort than expected^[13]. Per Gartner's June 2025 forecast^[14], over 40% of agentic AI projects will be canceled by the end of 2027^[14].

The failure curve is roughly 14 months. Klarna claimed in February 2024 that its AI assistant did the work of 700 customer service agents^[15]. In May 2025, CEO Sebastian Siemiatkowski admitted to Bloomberg that cost-cutting "has gone too far" and quality dropped — fourteen months from claim to retreat^[16]. The Air Canada BCCRT decision (Moffatt v Air Canada, 2024 BCCRT 149) ordered $812.02 in damages after the airline's chatbot misrepresented bereavement policy^[17]. McDonald's terminated its IBM AOT partnership in June 2024 after order accuracy stalled at "low-80% range" against a 95% target^[18]. This paper catalogs the pattern and the structural lessons.

#Part I: The Category — Why AI-Native Agencies Now

The Y Combinator pivot is downstream of two converging trends: BPO unbundling and the tech-services TAM expansion.

BPO unbundling. Per Kimberly Tan's February 2025 a16z essay "Unbundling the BPO"^[19], Cognizant and Wipro each reported $10-20 billion in their largest BPO segments, with BPO employee annual turnover of 30-40% in some functions^[19]. Traditional BPOs bill time-and-materials with a 20-30% markup on labor^[19]. The economics work because labor is cheaper offshore, not because the work is structurally efficient. AI agents grow the market because they're more cost-efficient to deploy than hiring outsourced or in-house labor^[19].

Per Cognizant's Q4 2025 SEC filing^[20], Cognizant booked $21.1 billion in FY 2025 revenue (+7% YoY) at 16.1% GAAP operating margin with 350,000 employees and 13.9% trailing-12-month voluntary attrition in Tech Services^[20]. Per Cognizant's 2025 annual letter^[20], the company signed 28 large deals (over $100 million TCV each) in 2025, with large-deal TCV growth of nearly 50% YoY^[20]. Per Accenture's FY 2025 earnings release^[21], Accenture's Generative AI bookings hit $5.9 billion (nearly 2x FY24) and revenue hit $2.7 billion (3x FY24, from negligible in FY23)^[21]. Per IBM's Q4 2025 SEC filing^[22], IBM's Generative AI book of business stood at $12.5 billion+, with Consulting segment gross margin at 28.1% versus Software at 83.5%^[22]. Per IBM's January 2026 "Enterprise Advantage" announcement with Microsoft^[23], the partnership has 150+ joint client engagements scaling agentic solutions and 33,000 IBM Microsoft-certified professionals^[23]. The Big Three IT-services incumbents are converting AI demand into bookings at scale, but their Consulting gross margins are stuck at human-labor economics — they are well-positioned for AI delivery, badly positioned for the AI-native repricing of services.

The $200 billion tech-services opportunity^[4]. Per BCG's February 2026 report "The $200 Billion AI Opportunity in Tech Services"^[4], agentic AI unlocks up to $200 billion in net new value pools^[4] for tech-services providers in the next five years^[4]. BCG surveyed 115+ enterprise executives and 75+ provider executives: one-third of enterprises are already scaling agentic deployments and two-thirds expect providers to build and operationalize priority use cases^[4]. Private investment into agentic AI has grown more than 60% annually since 2023, with agentic enablers (orchestration platforms) capturing 55%+ of total investment and horizontal applications another ~40%^[4].

The YC trigger. Per the Spring and Summer 2026 YC Requests for Startups^[1], YC's specific framing makes the structural break explicit: "Instead of giving you a tool, they just do the work. The total spend on services is many times larger than the spend on software"^[1]. The Summer 2026 RFS extends with priority verticals — Insurance brokerage, Accounting/tax/audit, Compliance, Healthcare administration — naming the categories where outsourced work already runs and the AI-native replacement path is shortest^[1]. Per Gartner's August 2025 forecast^[24], 40% of enterprise applications^[24] will be integrated with task-specific AI agents by end of 2026^[24] (up from <5% in 2025)^[24], and agentic AI could drive ~30% of enterprise application software revenue by 2035^[24], surpassing $450 billion^[24].

The category is real, the TAM is named, the capital is flowing, and the regulatory frame is forming. What remains is the operating manual.

#Part II: The Unit Economics — $500–$10K/mo at 88% Gross Margin

The managed-agent agency runs on three pricing tiers, each anchored by a setup fee that filters tire-kickers and a monthly retainer that compounds.

Tier 1 — Focused Agent. Per the Ertas AI pricing playbook^[5], a Tier 1 deployment runs $500-$1,000 per month for a single-purpose agent (lead qualifier, appointment scheduler, FAQ responder) on top of a one-time $3,000-$5,000 setup fee^[5]. The setup fee covers a data audit, industry model configuration, initial deployment, and a 30-day tuning period^[5]. At the lower end this is where the productized-reseller play lives, with standardized $199-$299/month chatbot-and-booking packages deployed to SMBs in 2-3 hours per client^[5]. The economics are linear and the runway is short — 20 clients at $299/month equals $5,000+ MRR within 60-90 days, at near-92% net margin once infrastructure stabilizes^[5].

Tier 2 — Custom Agent. Per the Ertas playbook^[5], a Tier 2 deployment runs $2,000-$5,000 per month with a $5,000-$10,000 setup fee covering custom data collection pipelines, fine-tuning on client data, multi-agent architecture, and integrations^[5]. The midpoint is the load-bearing number for the entire category: $3,500/month revenue against ~$10/month in infrastructure and 8 hours × $50/hour = $400/month in labor produces $3,090/month in gross profit — an 88% gross margin per client^[5]. At ten Tier 2 clients, a four-person agency runs $35,000/month in revenue with $26,775-$28,350 in monthly gross profit^[5]. That is the economic shape the YC thesis is naming.

Tier 3 — Enterprise Agent. Per Ertas^[5], a Tier 3 deployment runs $5,000-$10,000 per month with a $10,000-$25,000 setup fee covering enterprise data preparation, multiple model training, dedicated infrastructure, security review, and SLA documentation^[5]. The midpoint is $7,500/month revenue producing roughly $6,380 in gross profit at ~85% margin (after infrastructure rises to ~$40/month for dedicated resources)^[5]. At three Tier 3 clients plus ten Tier 2 plus ten Tier 1, a six-person agency reaches $65,000/month — $780,000 in annual revenue with software-grade margins^[5]. The shape lines up with a16z's enterprise-platform benchmarks at this scale: average contract values above $100K, net retention over 120% under $1B ARR, and gross dollar retention exceeding 95%^[6].

The category data validates the tier math. Per Get Latka's company profile^[7], Lindy reached $5.1 million ARR in 2024 with 37 employees and approximately $50 million raised across Seed ($3.9M), Series A ($11M), and Series B ($35M) — backed by Battery Ventures, Coatue, and Tiger Global^[7]. Per Lindy's own pricing page^[25], the productized SKUs run $49.99/month Plus, $99.99/month Pro, and $199.99/month Max for solo-operator and small-team customers, with custom Enterprise pricing for larger deployments^[25]. Per Florent Crivello's January 2025 LinkedIn wrap-up^[26], Lindy explicitly rejected seat-based pricing — "we don't think seat-based pricing makes any sense" in the AI-employee world^[26].

The retention math anchors the model. Per a16z's September 2025 "Retention Is All You Need" benchmarks^[27], industry-leading self-serve AI companies trend toward >100% long-term net dollar retention, with cohorted revenue retention split into acquisition (M0-M3), retention (M3-M9), and expansion (M9+)^[27]. The Aizy trajectory shows what happens when the productization is tight and the procurement loop is short: per UCStrategies' February 2026 coverage^[3], the Dutch performance-marketing replacement hit €2 million ARR in 6 months from launch with 150 clients across retail, e-commerce, and automotive — versus the 3% of SaaS companies that hit €1 million in year one^[3]. The unit economics scale beyond the four-person ceiling when both elements line up.

#Part III: The Vendor Map — Enterprise (Sierra/Decagon) vs SMB (Lindy/MindStudio/Voiceflow)

Vendor selection in 2026 is a segmentation problem, not a feature comparison.

Sierra — Fortune 500 enterprise. Per Sacra's October 2025 valuation analysis^[2], Sierra grew to $100 million ARR by October 2025, up 400% YoY from $20 million, at a $10 billion valuation (100x revenue multiple) after the $350 million Greenoaks round in September 2025^[2]. By May 2026, Sierra had raised an additional $950 million at $15.8 billion^[10]. Per Sacra^[2], Sierra positions on customer experience — human-comparable CSAT scores while resolving ~80% of recurring questions at ~10% of human agent labor cost, priced at roughly $1.50 per resolution^[2]. The forward-deployed model staffs the deployment: per Sierra's "Shipping and Scaling AI Agents" post^[28], Sierra runs a 90-minute customer design workshop on user "journeys" with a dedicated agent engineer and product manager per deployment^[28], and the Sierra Studio service embeds top-tier engineers directly into client teams to accelerate critical initiatives^[29]. Per OpenNash's May 2026 pricing teardown^[30], Sierra's annual platform minimum is $150,000-$200,000, one-time implementation is $25,000-$75,000, and per-resolution fees run $1.00-$2.50 — with year-one typical mid-market total cost $180,000-$250,000 and year-three $450,000-$750,000 assuming 25% annual volume growth^[30].

Decagon — mid-market to high-growth B2C. Per Sacra^[2], Decagon hit $17 million ARR by April 2025, up 900% YoY from $6 million at end-2024, at a $650 million valuation (108x revenue multiple)^[2]; per CallSphere's enterprise CX shootout^[10], Decagon raised $250 million Series D in January 2026 at $4.5 billion^[10]. Per Decagon's March 2026 "AI-Ready CX Teams" post^[31], Decagon has worked with "hundreds of CX teams" deploying its Agent Operating Procedures (AOPs) — natural-language workflow logic that captures CX subject-matter expertise^[31]. Per CallSphere^[10], Decagon's pricing model is per-conversation at $0.40-$1.20 per resolved interaction, with volume tier discounts kicking in at 100,000 and 1,000,000 monthly conversations^[10]. Per Sierra-vs-Decagon comparisons^[2]^[10], Sierra wins F500 consumer brands with high-volume voice support; Decagon wins mid-market with negotiating leverage, faster time-to-value, and more transparent pricing^[10].

Lindy / MindStudio / Voiceflow — SMB and solo-operator. Per Lindy's pricing page^[25], Plus runs $49.99/month and Max $199.99/month, with usage-based credit consumption supporting "an army of agents" per operator^[25]. Per Lindy 3.0^[7], the platform now supports 6,000+ integrations via Pipedream Connect and 4,000+ web scrapers via Apify, with Autopilot giving agents their own cloud computer to operate any software interface (not just pre-built APIs)^[7]. Per Get Latka's profile^[7], Lindy customer Truemed processed 6,000+ emails and handled 36% of all support tickets autonomously^[7]. MindStudio anchors the lower price point: per MindStudio's pricing page^[32], MindStudio Pro runs $20/month for unlimited agents with BYO API tokens at provider rates (no markup) and access to 200+ models via its Service Router^[32] — explicitly marketed for the agency operator building lead-qualification bots to local businesses on monthly retainers^[32]. Voiceflow takes the explicit agency positioning: per Voiceflow's pricing page^[33], the Agency Plan markets multi-client workspace management, white-labeling, and transparent usage-based billing across 500,000+ teams^[33].

The segmentation is durable. F500 buyers with seven-figure budgets default to Sierra. Mid-market with $100K-$500K AI budgets default to Decagon. Solo and small-team operators serving SMB clients default to Lindy, MindStudio, or Voiceflow. The vendor stack is a customer-segment decision, not a capability decision.

#Part IV: The Pricing-Model Dispute — Four Models, One Hybrid

The 2026 pricing-model debate is the single most consequential structural question for the category. There are four distinct models in production, and they are converging toward a hybrid.

Per-conversation — the Salesforce launch model. Per the December 17, 2024 Agentforce 2.0 press release^[8], Salesforce launched at $2 per conversation USD (and €2 EUR, AU$2.80, ¥240 JPY, kr20 SEK, £1.60 GBP)^[8]. The model is flat and predictable, optimized for external customer-facing agents^[34]. Per Salesforce's pricing page^[34], the framing is explicit: "Conversations offer flat-pricing, while Flex Credits align cost to value"^[34].

Per-action — the Salesforce Flex Credits pivot. Per the May 15, 2025 Salesforce flexible pricing release^[9], Salesforce introduced Flex Credits at $500 per 100,000 credits, with each Agentforce action consuming 20 credits — $0.10 per action^[9]. Customers on Enterprise Edition or above receive 100,000 Flex Credits free via Salesforce Foundations^[9]. The pivot was driven by procurement reality: per the same release^[9], survey data showed 90% of CIOs report that managing AI costs is limiting their ability to drive value^[9]. The Flex Credits structure unbundles a conversation into its component actions — updating a record, answering a question, executing a workflow — and charges only when an action actually executes.

Per-outcome — the Sierra signature model. Per Sierra's December 2024 "Outcome-Based Pricing for AI Agents" post^[35], Sierra is paid only when its agent achieves a resolved conversation, saved cancellation, upsell, or cross-sell^[35]. Per CallSphere's enterprise CX shootout^[10], 2026 Sierra deals land at $1.50-$3.00 per fully resolved ticket depending on intent complexity^[10]. Per a16z's December 2024 Enterprise Newsletter on outcome-based pricing^[36], Zendesk's competing per-seat human-agent pricing runs $115/month/seat — a structural cost differential of roughly an order of magnitude on commodity tickets^[36]. The legacy seat-based vendors are structurally trapped: per a16z's March 2026 "Good News" essay^[37], "Zendesk can't easily match per-conversation pricing without cannibalizing seat-based revenue" — a Blockbuster-Netflix dynamic^[37].

Per-conversation tiered — the Decagon mid-market model. Per CallSphere's April 2026 Decagon pricing analysis^[10], Decagon prices $0.40-$1.20 per resolved conversation, sometimes tiered with volume discounts kicking in at 100,000 and 1,000,000 monthly conversations^[10]. The model is the fastest-growing of the four: per CallSphere^[10], it has become the standard mid-market alternative to Sierra's premium per-outcome pricing^[10].

Per-seat plus AI add-on — the legacy model losing share. Per CallSphere^[10], the legacy CX vendor pattern of $80-$180 per agent per month plus a $50-$120 per-seat AI module is "the model that's losing share fastest"^[10]. Per a16z's March 2026 "Only Two Paths" essay^[38], the structural shift is clear: "Seats are running out. The new units are in tokens, consumption, automations, outcomes, and machine-driven workflows"^[38].

The hybrid convergence. ServiceNow's Q4 2025 earnings call validates a hybrid model that the others are converging toward. Per ServiceNow's Q4 2025 SEC 8-K filing^[11], Now Assist surpassed $600 million in net new ACV in Q4 2025, more than doubled year-over-year, with 35 deals over $1 million in the quarter alone^[11]. Per the earnings transcript (Bill McDermott, ServiceNow CEO)^[39], customers "want flexibility, but they also want predictability. Without...guardrails and understanding how much they're going to spend...going to complete 100% consumption may be too early in some of the cases"^[39]. ServiceNow's resolution: a subscription floor plus consumption-based Assist packs that customers can renew or upgrade when they run out of tokens^[39]. Per the transcript^[39], the consumption part is now adding to subscription revenue rapidly — McDermott called it a "hockey stick" dynamic around token reloads^[39].

The strategic stakes are large. Per Gartner's April 2026 forecast^[40], by 2028 over half of all enterprises will stop paying for assistive intelligence (copilots, smart advisors) and instead favor platforms that commit to workflow results — and by 2030, software companies that layer bolt-on AI over legacy applications rather than redesigning for agentic execution will face margin compression of up to 80%^[40]. Per Gartner's January 2026 prediction^[41], by 2028 60% of brands will use agentic AI to deliver streamlined one-to-one interactions — "the end of channel-based marketing as we know it"^[41].

The hidden trap: resolution-definition renegotiation. Per OpenNash's May 2026 Sierra pricing teardown^[30], the cruelest dynamic in per-resolution pricing is that "success makes you poorer" — every percentage point of deflection improvement increases your bill^[30]. And per OpenNash^[30], "resolution definitions are renegotiated at renewal": if your agent performed well, the vendor has data showing how to tighten the resolution definition in their favor; if performance dropped, the vendor has leverage to bundle additional services as a "fix"^[30]. Year-two pricing rarely matches year-one in real per-unit terms^[30]. The procurement implication is clear: write the resolution definition into the contract before signing, with caps, floors, and explicit re-negotiation triggers.

#Part V: The Operational Stack — From Agent Operating Procedures to Forward-Deployed Engineers

The operational architecture of a managed-agent agency is converging on a recognizable pattern.

Agent Operating Procedures (AOPs). Per Decagon's March 2026 "AI-Ready CX Teams" post^[31], AOPs are natural-language workflow logic that captures how a customer-service organization actually handles its highest-frequency cases^[31]. Per Decagon^[31], the recommended org structure has four roles: an AI Programs Lead who owns the agent, performance metrics, and quarterly planning; CX subject-matter experts who author the AOPs; a Knowledge Manager who curates the underlying content; and a Technical/Integrations Lead who owns the system glue^[31]. Per Decagon^[31], "Most CX orgs don't need a fully staffed AI team on day one" — the org pattern scales with deployment maturity, not headcount budget^[31].

The agent factory. Per Bain's February 2026 "AI Enterprise: Code Red" report^[42], every production agent requires six elements before it ships: trigger conditions, typed input/output schemas, explicit autonomy boundaries, tool access permissions, performance targets, and escalation modes^[42]. Per Bain^[42], the strategic frame is "start with the workflow, not the model. The quality of any AI agent is bounded by workflow understanding"^[42]. The "agent factory" is the repeatable industrial process that wraps these six elements with observability, evaluation, and governance — and it is the operational moat that separates a working agency from a portfolio of one-off agent demos^[42].

Forward-deployed engineering. Sierra's commercial trajectory has been propelled by a forward-deployed model that resembles palantirian consulting more than SaaS sales. Per Sierra's "Shipping and Scaling AI Agents" post^[28], every deployment starts with a 90-minute customer design workshop on user "journeys," with a dedicated agent engineer and product manager per customer engagement^[28]. Per the Sequoia "Training Data" podcast with Sierra co-founder Clay Bavor^[43], "building each AI agent is like building a new product for our customers" — Sierra explicitly treats per-customer agent development as a product engineering problem, not a configuration problem^[43]. The Sierra Studio service goes further: per the Sierra Studio site^[29], top-tier engineers embed directly into client teams to accelerate critical initiatives^[29]. This is the operating model that justifies a $150,000-$200,000 annual platform minimum^[30].

Three service-provider archetypes. Per McKinsey's December 2025 "Transforming tech services for agentic AI" report^[44], the established services market is bifurcating into three archetypes: an AI-native delivery transformer (rebuilds delivery economics around agentic execution); a packaged agent implementer (deploys vendor agents at scale); and an agentic AI enabler (provides the orchestration, observability, governance, and adaptability substrate)^[44]. Per McKinsey^[44], providers without built-in agent observability, governance, and adaptability "risk compliance breaches and credibility erosion" — the operational tooling is no longer optional^[44]. Per Deloitte's May 2026 "Agentic AI: Orchestrating Intelligent Operations" report^[45], "next-generation managed service providers (MSPs), otherwise known as Operate service providers, are emerging as critical allies" — formalizing the same archetype split in language enterprise procurement teams already use^[45].

Deployment timeline matrix. Per Fin.ai's enterprise-deployment-models guide^[46], deployment timelines diverge sharply by model: self-managed Fin in days to weeks; vendor-led no-code Sierra Agent Studio in 4-10 weeks; SDK-first Sierra Agent SDK or Decagon AOPs in 3-7 months^[46]. Per Sierra's scale progression documented in the "Shipping and Scaling" post^[28], a deployment moves from 100 → 1,000 → 10,000 → 100,000+ conversations per week as the agent matures^[28]. The implication for the managed-agent agency is operational: at the lower tier, time-to-revenue is days; at the enterprise tier, it is months — and the staffing model has to match.

#Part VI: The Failure-Mode Catalog — Klarna's 14-Month Lag and Five Public Reversals

The base rate is poor. Per Gartner's June 2025 forecast^[14], over 40% of agentic AI projects will be canceled by end of 2027 due to escalating costs, unclear business value, or inadequate risk controls — and only about 130 of the thousands of self-described agentic AI vendors are "real"^[14]. Per Gartner's March 2025 customer-service prediction^[47], agentic AI will autonomously resolve 80% of common customer service issues without human intervention by 2029, leading to a 30% reduction in operational costs — but the path is narrow^[47]. The public reversals tell the story.

Klarna — the 14-month lag. Per Fortune's May 9, 2025 coverage^[15], Klarna announced in February 2024 that its OpenAI-powered chatbot was doing the work of 700 customer service agents, handling 2.3 million conversations in its first month at average resolution times under 2 minutes^[15]. Klarna's workforce dropped from approximately 5,000 in 2023 to 3,800 in 2024 via natural attrition, with CEO Sebastian Siemiatkowski targeting 2,000^[15]. Per Bloomberg's May 8, 2025 reporting^[16], Siemiatkowski reversed course fourteen months later: "As cost unfortunately seems to have been a too-predominant evaluation factor when organizing this, what you end up having is lower quality"^[16]. Per Bloomberg^[16], Klarna pivoted to an "Uber-style" hybrid hiring model — remote workers from rural Sweden, students, and Klarna customers, with starting pay around 400 Swedish krona (~$41.17) per shift^[16]. The AI assistant still handles roughly two-thirds of inquiries with 82% faster response times and 25% reduction in repeat issues^[15], but the human-judgment escalation layer has been rebuilt. Per Fortune^[15], the broader IBM survey of 2,000 CEOs found that just 1 in 4 AI projects delivers the return it promised, and even fewer are scaled up^[15]. The 14-month lag is the structural lesson: customer-behavior signal lags operational changes by roughly that long.

Air Canada — the chatbot liability precedent. Per the BC Civil Resolution Tribunal's February 14, 2024 decision Moffatt v Air Canada (2024 BCCRT 149)^[17], Air Canada was ordered to pay Jake Moffatt $812.02 in damages, pre-judgment interest, and tribunal fees after its chatbot misrepresented the bereavement-fare policy^[17]. Per CBC News^[48], Moffatt's total fare for the return Vancouver-Toronto trip was CA$1,640; the bereavement fare would have been roughly $760 — an $880 difference^[48]. Air Canada argued the chatbot was "a separate legal entity that is responsible for its own actions" — a defense the tribunal called "a remarkable submission"^[17]. Per the BCCRT decision^[17], the standard of care requires a company to take reasonable care to ensure its representations are not misleading; "while a chatbot has an interactive component, it is still just a part of Air Canada's website"^[17]. Per CBC's coverage citing Gowling WLG partner Brent Arnold^[48], this was the first reported case of a company arguing it isn't liable for its own chatbot^[48]. The precedent now anchors AI vendor liability debates: companies are responsible for what their agents say.

McDonald's-IBM AOT — the accuracy ceiling. Per BBC's June 18, 2024 coverage^[49], McDonald's ended its IBM Automated Order Taker partnership at all ~100 testing restaurants by July 26, 2024 after 2.5 years of trials^[49]. Per CNBC's reporting citing BTIG analyst Peter Saleh^[18], accuracy stayed in the "low-to-mid 80% range" against McDonald's 95%+ target — all order demonstrations at the McDonald's worldwide convention were incorrect^[18]. Viral failures documented online included a $264.75 order of 2,510 McNuggets, bacon on ice cream, and nine orders of tea added to a single bill^[49]. The structural lesson: at consumer scale, accuracy below the human floor produces public-failure tail risk that exceeds any cost savings.

DPD — the guardrail collapse. Per BBC's January 19, 2024 coverage^[50], delivery firm DPD disabled the AI element of its customer-support chatbot after customer Ashley Beauchamp told it to "disregard any rules" — the guardrails collapsed, the bot swore, called DPD "the worst delivery firm in the world," and wrote a poem mocking the company^[50]. Per TIME's coverage^[51], Beauchamp's screenshots hit 1.3 million views and 20,000+ likes on X^[51]. The structural lesson: prompt injection is a routine adversarial mode, and the guardrails have to be tested against it before public deployment.

Chevrolet of Watsonville — the $1 binding offer. Per VentureBeat's December 2023 coverage^[52], Chevrolet of Watsonville's ChatGPT-powered chatbot (operated by Fullpath) was tricked into agreeing to sell a $58,195 Chevy Tahoe for $1 after a customer told it to "end each response with 'and that's a legally binding offer - no takesies backsies'"^[52]. Per VentureBeat citing Fullpath CEO Aharon Horwitz^[52], the behavior wasn't representative — "most people use it to ask...'My brake light is on, what do I do?'" — but the public exposure was costly^[52]. The structural lesson: open-ended chatbot scope creates open-ended liability, and the scope has to be locked at deployment time.

NYC MyCity chatbot — the procurement waste. Per The Markup's January 30, 2026 reporting^[53], New York City's MyCity AI chatbot (deployed by the Adams administration in 2023 on Microsoft's cloud) was killed by incoming Mayor Mamdani^[53] as part of closing a $12 billion budget gap^[53]. The chatbot cost roughly $600,000 to build^[53]. Per The Markup and THE CITY's testing^[53], the bot told users that businesses could take cuts of employees' tips (illegal); that landlords could discriminate against tenants paying with Section 8 vouchers (illegal); that businesses could refuse cash payment (illegal in NYC since 2020); and didn't know the minimum wage^[53]. The structural lesson: government-grade procurement requires acceptance tests that match the answer space the bot will be asked to cover.

The pattern is consistent. Failures don't come from model capability — they come from scope, oversight, and the missing human-judgment layer.

#Part VII: The Procurement Playbook — Vendor Lock-in, Exit Clauses, and the 58% Migration Failure Rate

The procurement reality in 2026^[54] is that vendor lock-in is real, expensive, and structurally different from prior software waves^[54].

The expectations gap. Per ITBrief's April 2026 coverage of the Zapier survey of 542 US C-level executives with active AI vendor contracts^[13], 74% said losing their primary AI vendor would disrupt day-to-day operations or leave the organization unable to function — only 6% said they could stop without disruption^[13]. Per the same survey^[13], 47% said losing access would break at least one key business function; 27% rely on AI for most or all operations^[13]. The expectations gap is severe: 89% of leaders believed they could switch AI vendors within a month, but per The Register's April 2026 coverage^[54], only 42% of attempted migrations reported a smooth transition — the remaining 58% either failed outright or required significantly more effort than expected^[54].

Why migrations fail. Per The Register citing AI consultant Haroon Choudery^[54], "switching model vendors is no longer just an API migration. It is context, workflows, and institutional memory"^[54]. Per the Zapier report^[54], the problem is that "when AI is already woven into internal processes, connected to other systems, and tuned to specific workflows, it has dependencies, edge cases, and little adaptations that nobody documented because they were 'temporary'"^[54]. AI implementations carry vendor-specific APIs, proprietary training data, custom tooling for model deployment, and deep workflow integrations — none of which transfer cleanly between providers^[54].

The pricing-power escalation. Per The Register's coverage^[54], Anthropic confirmed a de facto enterprise price increase on April 15, 2026, when it moved from fixed pricing to a dynamic usage-based model — experts estimate this could double or triple costs for heavy-duty users^[54]. Per Datos Insights CEO Eli Goodman quoted in the same piece^[54], "AI is not like Software-as-a-Service, where costs shrink with scale...every query has a real cost. The provider's bill goes up when you use more"^[54]. Per Cisco principal engineer Nik Kale^[54], "Microsoft's increases aren't a temporary spike — they're the beginning of a new price baseline for the AI era. GPU capacity, inference scaling, and the rising energy demands of large-model workloads have become structural, recurring costs"^[54].

The CIO regret data. Per CIO Dive's February 2026 coverage of the Dataiku/Harris Poll of 600 CIOs^[12], nearly 3 in 4 CIOs (74%) regret a major AI vendor or platform decision made within the last 18 months^[12]. Per the same survey^[12], 62% of CIOs have faced direct CEO challenges over vendor selection; 71% say AI budgets are likely to be cut or frozen if targets aren't hit by mid-2026^[12]. The agent-sprawl problem compounds it: 87% of CIOs say AI agents are embedded in critical systems but only 25% have full visibility into all agents in production^[12].

The decision-maker shift. Per Mayfield's January 2026 survey of 266 CIOs, CTOs, CAIOs, CISOs, and CDOs from Fortune 50 to Global 2000 organizations^[55], line-of-business leaders are now the largest decision-maker group at 46% — surpassing both CIOs (38%) and CTOs (38%)^[55]. Per Mayfield^[55], 84% require security/compliance as non-negotiable, yet 60% report early-stage or no formal AI governance framework — "enterprises are moving faster into production than governance can follow"^[55]. 70% of CXOs want self-serve trials in their own environment before committing; 65% mix internal builds with vendor solutions (only ~10% are vendor-only); over half of CXOs are actively reallocating budget from existing vendors toward AI-native alternatives^[55].

The mitigation playbook. Per ITBrief's coverage of the Zapier survey^[13], enterprises are taking concrete steps to reduce lock-in: 47% now have dedicated internal teams to evaluate and manage AI vendors; 44% use multiple AI vendors; 42% maintain contingency plans; 35% include open-source alternatives in their approach; 34% are designing systems around data portability and standard APIs; 33% use third-party integration or orchestration tools^[13]. Per the same survey^[13], 31% are building proprietary AI tools and 29% are negotiating shorter, more flexible contracts^[13]. When asked what would reduce lock-in concerns, 30% pointed to clearer pricing, features, and contract terms; 26% to easier data transfers between vendors; 24% to more flexible pricing models^[13].

The procurement implication for the managed-agent agency is to write exit clauses before signing — naming customer-data inclusion explicitly (prompts, completions, derived embeddings, retrieval indices, fine-tuned model weights, persistent agent memory, conversation traces) and the audit-log retention obligation that extends past contract termination by at least the regulatory floor.

#Part VIII: The 90-Day Field Manual — Ten Operating Decisions

This is the practitioner condensation: ten decisions, made once, that compound into a working managed-agent agency.

Decision 1 — Pick a vertical narrow enough that one operator can build domain expertise in 30 days. Per the Ertas pricing playbook^[5], the agencies that hit 88% gross margin at Tier 2 are not horizontal — they pick a niche (dentists, contractors, real estate, e-commerce) where the agent template repeats across clients with minor configuration^[5]. The YC Summer 2026 RFS names the priority verticals — Insurance brokerage, Accounting/tax/audit, Compliance, Healthcare administration^[1]. The pattern is the same regardless of vertical: deep enough to build domain expertise in 30 days, narrow enough that the agent template ports cheaply.

Decision 2 — Productize three tiers ($500-$1K / $2-$5K / $5-$10K) with setup fees that filter tire-kickers. Per Ertas^[5], the setup fee qualifies the client — "anyone willing to pay $5,000+ upfront is serious about the engagement" — and the tier ladder gives natural upgrade paths from $500/month FAQ bot to $7,500/month multi-agent system^[5]. The setup fee also covers the real cost of data preparation and fine-tuning before the recurring margin compounds^[5].

Decision 3 — Choose the pricing model based on client predictability tolerance. For SMB retainers, flat-monthly is the right answer — predictability beats optimization at sub-$10K ACVs. For enterprise, follow the ServiceNow hybrid pattern^[39]: a subscription floor plus consumption-based assist packs with caps, floors, and explicit resolution definitions^[39]. Per OpenNash^[30], avoid pure per-resolution at enterprise scale without re-negotiation triggers — the trap is that "success makes you poorer" and resolution definitions get tightened at renewal^[30].

Decision 4 — Pick a managed-agent platform fit-to-segment. For SMB and solo operators, default to Lindy ($49.99-$199.99/month), MindStudio ($20/month BYO API), or Voiceflow (explicit Agency Plan)^[25]^[32]^[33]. For mid-market, default to Decagon^[31]. For F500, default to Sierra^[28]. Don't fight the segmentation — the platform stack is a customer-segment decision^[10].

Decision 5 — Build an Agent Operating Procedure (AOP) library before client #1. Per Decagon's CX-org-design post^[31], the AOP is the natural-language workflow logic that captures how the agent should handle each high-frequency case^[31]. Version-control it, treat it as a product artifact, and build it before the first client engagement — not during it. The AOP library is the operational moat that makes Decision 1's vertical specialization compound.

Decision 6 — Staff a forward-deployed pattern: one engineer per 3-5 clients. The Sierra Studio model^[29] and Sequoia/Bavor framing^[43] both validate engineers-embedded-in-clients over agency-manager-with-engineering-support^[43]. The headcount math: at $3,500/month × 10 clients = $35K/month, with 88% gross margin, a 4-person team is the working configuration^[5] — and one of those four is an engineer, not an account manager.

Decision 7 — Instrument observability from day one. Per Bain's "agent factory" framework^[42], every agent ships with six elements — trigger conditions, typed input/output schemas, explicit autonomy boundaries, tool access permissions, performance targets, and escalation modes^[42]. Per McKinsey^[44], providers without observability "risk compliance breaches and credibility erosion"^[44]. The EU AI Act Article 16 retention requirements activate August 2, 2026, raising the floor on audit-log preservation[^40].

Decision 8 — Build the human-judgment escalation layer the way Klarna eventually did. Per Bloomberg's coverage of Klarna's May 2025 reversal^[16], the structural fix was a hybrid "Uber-style" remote-human layer that customers could always reach^[16] — "customers must always have the option to speak to a human"^[16]. Never ship without a working escalation path. The Air Canada precedent makes this a legal requirement: per the BCCRT decision^[17], you are responsible for what your agent says, and the human-judgment layer is what catches the cases the agent shouldn't have answered alone.

Decision 9 — Write contract exit clauses naming customer-data inclusion explicitly. Define "customer data" to include prompts, completions, uploaded documents, derived embeddings, retrieval indices, fine-tuned model weights, persistent agent memory, conversation traces, and intermediate planning state. Per The Register's vendor-lock-in analysis^[54], the categories that took the most operational effort to build — and that make the agent useful on day one of the successor platform — are precisely the categories most likely to be excluded from the standard "Customer Data" definition^[54]. The exit-clause language is procurement-and-legal work, not engineering work; it has to happen before signing.

Decision 10 — Set a "satisfaction floor" gate before publicly attributing savings. The Klarna postmortem pattern^[16]^[15] suggests waiting roughly 14 months — the customer-behavior lag window — before earnings-call attribution^[16]. Don't write the press release in month two. Don't put the savings number in the next earnings deck. Wait for the lag window to close. If satisfaction holds, then attribute. If it drops, the savings were borrowed from future revenue — and the Chevrolet-of-Watsonville^[52] and DPD^[50] precedents show how fast that borrowed revenue gets recalled when the public-failure tail event arrives.

The category is real and the operating manual fits on a single page. What separates the working AI-native agencies from the demos is the discipline to execute these ten decisions in sequence — pick the vertical, productize the tiers, match the pricing model, pick the platform, write the AOPs, staff forward-deployed, instrument observability, ship the human-judgment layer, write the exit clauses, and earn the right to attribute savings publicly. The economics work. The procurement reality is brutal. The failure-mode catalog is short and well-documented. The playbook is on the table.

#Glossary

Agent Operating Procedure (AOP): Natural-language workflow logic that captures how an AI agent should handle a specific high-frequency case. Coined by Decagon in its CX-organization design framework; serves as the primary product artifact for managed-agent operations.

Agent factory: Bain's framework for industrializing agent production. Every agent ships with six elements: trigger conditions, typed input/output schemas, explicit autonomy boundaries, tool access permissions, performance targets, and escalation modes.

Agent washing: Gartner's term for vendors rebranding existing products (AI assistants, RPA, chatbots) as "agentic" without substantive agentic capabilities. Per Gartner^[14], only ~130 of the thousands of self-described agentic vendors are "real."

BPO unbundling: The structural thesis (a16z) that traditional business-process-outsourcing economics — labor arbitrage at 20-30% markup with 30-40% turnover — are being replaced by AI-native delivery at software-grade margins^[19].

Flex Credits: Salesforce's action-level pricing unit ($500 per 100,000 credits; 20 credits = one action = $0.10), introduced in May 2025^[9] as an alternative to the original $2-per-conversation Agentforce launch model^[8].

Forward-deployed engineering: A staffing pattern where the vendor or agency embeds engineers directly into client teams to drive deployment, named by Sierra Studio. Distinguished from traditional account management by the engineering-product orientation.

Hybrid pricing: A two-part pricing model combining a subscription floor with consumption-based usage (assist packs, credits, or per-action billing). The ServiceNow Q4 2025 pattern validated this as the converging model for enterprise agent deployments.

Managed-agent agency: An agency that builds, deploys, runs, and maintains AI agents on behalf of clients on a monthly retainer. The category Y Combinator named in Spring 2026 by adding "AI-Native Agencies" to its Request for Startups.

Outcome-based pricing: Pricing model where vendor is paid only on a defined outcome (resolved conversation, saved cancellation, upsell)^[35]. Sierra's signature pattern at $1.50-$3.00 per fully-resolved ticket^[10].

Resolution-definition renegotiation: The procurement trap where outcome-based contract definitions tighten in the vendor's favor at renewal — particularly when the agent has been performing well. Documented in OpenNash's Sierra pricing teardown.

Satisfaction floor: The structural gate before publicly attributing AI-driven savings. Derived from the Klarna 14-month-lag pattern: wait the customer-behavior lag window before earnings-call attribution.

The 14-month lag: The interval between Klarna's February 2024 "700-agent equivalent" claim^[15] and the May 2025 quality-driven reversal^[16]. Operationalized as a planning constant for AI-driven cost-savings attribution.

The B2A Imperative — origin of the B2A framing for selling to agents, foundation for understanding why managed-agent agencies are the supply-side counterpart to the buy-side B2A motion.
The Agent Payment Stack 2026 — the payment plumbing that makes outcome-based and per-action pricing operationally executable across agency-mediated deployments.
The MCP Server Playbook for SaaS Founders — the protocol substrate that the integration layer of a managed-agent agency depends on for cross-system tool access.
The 50/4 AI Deployment Gap — the deployment-economics counterpart to this paper; explains why the managed-agent agency model arose to close the gap between AI capability and enterprise deployment.
GEO/AEO 2026: The Citation Economy — the discovery layer this paper assumes for how AI agencies and their work get cited and surfaced by AI retrieval engines.

#References

Y Combinator (2026), Requests for Startups — Spring & Summer 2026. https://www.ycombinator.com/rfs ↩ ↩² ↩³ ↩⁴ ↩⁵
Sacra (2026), Sierra vs Decagon Research Report. https://sacra.com/research/sierra-vs-decagon ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
Sarah Chen / UCStrategies (2026-02-21), Dutch Startup Hit €2M ARR in 6 Months — By Killing the Agency Model. https://ucstrategies.com/news/this-dutch-startup-hit-e2m-arr-in-6-months-by-killing-the-agency-model/ ↩ ↩² ↩³ ↩⁴
Vikash Jain et al. / Boston Consulting Group (2026-02-13), The $200 Billion AI Opportunity in Tech Services. https://www.bcg.com/publications/2026/the-200-billion-dollar-ai-opportunity-in-tech-services ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
Auronova Technology / Ertas AI (2026-03-15), Client-Specific AI Agents as Recurring Revenue: The Agency Pricing Playbook. https://www.ertas.ai/blog/ai-agent-recurring-revenue-agency-pricing ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰ ↩²¹
Sarah Wang & Justin Kahl / a16z (2025-10-16), Anatomy of an Enterprise Platform Company. https://a16z.com/anatomy-of-an-enterprise-platform-company/ ↩ ↩² ↩³
Lindy / Get Latka, Lindy Revenue 2024: $5.1M ARR, $15.3M Valuation. https://getlatka.com/companies/lindyai ; https://lindy.ai/blog/lindy-3-0 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸
Salesforce Inc. (2024-12-17), Agentforce 2.0 Announcement. https://www.salesforce.com/news/press-releases/2024/12/17/agentforce-2-0-announcement/ ↩ ↩² ↩³ ↩⁴
Salesforce Inc. (2025-05-15), Introduces New Flexible Agentforce Pricing. https://www.salesforce.com/news/press-releases/2025/05/15/agentforce-flexible-pricing-news/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
CallSphere (2026), Sierra vs Decagon vs Ada: 2026 Enterprise CX Agent Shootout. https://callsphere.ai/blog/td30-vrt-sierra-vs-decagon-vs-ada-cx-agent-shootout.md ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷ ↩¹⁸ ↩¹⁹ ↩²⁰
ServiceNow Inc. (2026-01-28), Q4 2025 Earnings Release (SEC EDGAR 8-K). https://www.sec.gov/Archives/edgar/data/1373715/000137371526000005/erq4fy25.htm ↩ ↩² ↩³
Makenzie Holland / CIO Dive (2026-02-12), Most CIOs Regret AI Vendor, Platform Decisions. https://www.ciodive.com/news/cios-regret-ai-vendor-platform-decisions/812147/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Shannon Williams / ITBrief (2026-04-02), Zapier Survey Warns of AI Vendor Lock-in in Enterprises. https://itbrief.news/story/zapier-survey-warns-of-ai-vendor-lock-in-in-enterprises ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
Anushree Verma / Gartner Inc. (2025-06-25), Gartner Predicts Over 40% of Agentic AI Projects Will Be Canceled by End of 2027. https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027 ↩ ↩² ↩³ ↩⁴ ↩⁵
Irina Ivanova / Fortune (2025-05-09), Klarna Plans to Hire Humans Again — Most AI Projects Fail. https://fortune.com/2025/05/09/klarna-ai-humans-return-on-investment/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹
Charles Daly / Bloomberg (2025-05-08), Klarna Turns From AI to Real Person Customer Service. https://www.bloomberg.com/news/articles/2025-05-08/klarna-turns-from-ai-to-real-person-customer-service ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
Christopher Rivers / BC Civil Resolution Tribunal (2024-02-14), Moffatt v. Air Canada, 2024 BCCRT 149. https://decisions.civilresolutionbc.ca/crt/crtd/en/525448/1/document.do ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Kate Rogers / CNBC (2024-06-17), McDonald's to End AI Drive-Thru Test with IBM. https://www.cnbc.com/2024/06/17/mcdonalds-to-end-ibm-ai-drive-thru-test.html ↩ ↩² ↩³
Kimberly Tan / Andreessen Horowitz (2025-02-13), Unbundling the BPO: How AI Will Disrupt Outsourced Work. https://a16z.com/unbundling-the-bpo-how-ai-will-disrupt-outsourced-work/ ↩ ↩² ↩³ ↩⁴ ↩⁵
Cognizant Technology Solutions Corp. (2026-02-04), Reports Fourth Quarter and Full-Year 2025 Results. https://investors.cognizant.com/news-and-events/news/news-details/2026/Cognizant-Reports-Fourth-Quarter-and-Full-Year-2025-Results/default.aspx ↩ ↩² ↩³ ↩⁴
Accenture plc (2025-09-25), Reports Fourth-Quarter and Full-Year Fiscal 2025 Results. https://newsroom.accenture.com/content/4q-full-fy25-earnings/accenture-reports-fourth-quarter-and-full-year-fiscal-2025-results.pdf ↩ ↩²
International Business Machines Corp. (2026-01-28), Q4 2025 Earnings Release (SEC EDGAR 8-K). https://www.sec.gov/Archives/edgar/data/51143/000005114326000004/ibm-20260128xex991.htm ↩ ↩²
IBM (2026-01-19), IBM Consulting Enterprise Advantage Unlocks New Era of Agentic AI Powered by Microsoft Partnership. https://www.ibm.com/new/announcements/ibm-consulting-enterprise-advantage-unlocks-new-era-of-agentic-ai-powered-by-microsoft-partnership ↩ ↩²
Gartner Inc. (2025-08-26), Gartner Predicts 40% of Enterprise Apps Will Feature Task-Specific AI Agents by 2026. https://www.gartner.com/en/newsroom/press-releases/2025-08-26-gartner-predicts-40-percent-of-enterprise-apps-will-feature-task-specific-ai-agents-by-2026-up-from-less-than-5-percent-in-2025 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Lindy, Pricing. https://www.lindy.ai/pricing ↩ ↩² ↩³ ↩⁴ ↩⁵
Flo Crivello / Lindy (2025-01-02), Lindy's 2024 Wrapped. https://www.linkedin.com/posts/florentcrivello_lindys-2024-wrapped-the-year-we-finally-activity-7280683446807535617-k4Q3 ↩ ↩²
Santiago Rodriguez & Alex Immerman / a16z (2025-09-10), Retention Is All You Need. https://a16z.com/ai-retention-benchmarks/ ↩ ↩²
Sierra AI (2024-07-25), Shipping and Scaling AI Agents. https://sierra.ai/es/blog/shipping-and-scaling-ai-agents ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Sierra Studio, Solutions. https://sierra.studio/solutions ↩ ↩² ↩³ ↩⁴
OpenNash (2026-05-04), Sierra AI Pricing: What Outcome-Based Really Costs. https://opennash.com/blog/sierra-ai-pricing-what-outcome-based-really-costs-and-when/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰
Decagon (2026-03-23), What We've Learned About Designing AI-Ready CX Teams. https://decagon.ai/blog/decagon-future-cx-org-design ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹
MindStudio, Pricing. https://www.mindstudio.ai/pricing/ ↩ ↩² ↩³ ↩⁴
Voiceflow, Pricing. https://www.voiceflow.com/pricing ↩ ↩² ↩³
Salesforce Inc., Salesforce Agentforce Pricing. https://www.salesforce.com/agentforce/pricing/ ↩ ↩² ↩³
Sierra AI (2024-12-10), Outcome-Based Pricing for AI Agents. https://sierra.ai/blog/outcome-based-pricing-for-ai-agents ↩ ↩² ↩³
a16z Enterprise Newsletter (2024-12-19), AI Is Driving A Shift Towards Outcome-Based Pricing. https://a16z.com/newsletter/december-2024-enterprise-newsletter-ai-is-driving-a-shift-towards-outcome-based-pricing/ ↩ ↩²
Alex Immerman & Santiago Rodriguez / a16z (2026-03-02), Good News: AI Will Eat Application Software. https://a16z.com/good-news-ai-will-eat-application-software/ ↩ ↩²
David George / a16z (2026-03-23), There Are Only Two Paths Left For Software. https://a16z.com/there-are-only-two-paths-left-for-software/ ↩ ↩²
Kate Rogers / CNBC (2026-01-28), ServiceNow (NOW) Q4 2025 Earnings Report. https://www.cnbc.com/2026/01/28/servicenow-now-q4-2025-earnings-report.html ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Alastair Woolcock / Gartner Inc. (2026-04-02), Gartner Expects Most Enterprises to Abandon Assistive AI for Outcome-Focused Workflow by 2028. https://www.gartner.com/en/newsroom/press-releases/2026-04-02-gartner-expects-most-enterprises-to-abandon-assistive-ai-for-outcome-focused-workflow-by-2028 ↩ ↩²
Emily Weiss / Gartner Inc. (2026-01-15), Gartner Predicts 60% of Brands Will Use Agentic AI to Deliver Streamlined One-to-One Interactions by 2028. https://www.gartner.com/en/newsroom/press-releases/2026-01-15-gartner-predicts-60-percent-of-brands-will-use-agentic-ai-to-deliver-streamlined-one-to-one-interactions-by-2028 ↩ ↩²
Bain & Company (2026-02-25), AI Enterprise: Code Red. http://bain.com/insights/ai-enterprise-code-red/ ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷
Sequoia Capital with Clay Bavor (2024-08-27), Training Data: Clay Bavor on Customer-Facing AI Agents. https://sequoiacap.com/podcast/training-data-clay-bavor/ ↩ ↩² ↩³ ↩⁴
McKinsey QuantumBlack (2025-12-16), Reimagining the Value Proposition of Tech Services for Agentic AI. https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/reimagining-the-value-proposition-of-tech-services-for-agentic-ai ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Deloitte Global (2026-05-01), Agentic AI: Orchestrating Intelligent Operations. https://www.deloitte.com/global/en/services/consulting/perspectives/agentic-ai-orchestrating-intelligent-operations.html ↩ ↩²
Fin.ai (2026), Enterprise AI Agent Deployment Models Compared. https://fin.ai/learn/enterprise-ai-agent-deployment-models-compared ↩ ↩²
Gartner Inc. (2025-03-05), Gartner Predicts Agentic AI Will Autonomously Resolve 80% of Common Customer Service Issues by 2029. https://www.gartner.com/en/newsroom/press-releases/2025-03-05-gartner-predicts-agentic-ai-will-autonomously-resolve-80-percent-of-common-customer-service-issues-without-human-intervention-by-20290 ↩ ↩²
CBC News (2024-02-15), Air Canada Found Liable for Chatbot's Bad Advice on Plane Tickets. https://www.cbc.ca/news/canada/british-columbia/air-canada-chatbot-customer-lawsuit-1.7116416 ↩ ↩² ↩³ ↩⁴
Tom Gerken / BBC News (2024-06-18), McDonald's Removes AI Drive-Throughs After Order Errors. https://www.bbc.com/news/articles/c722gne7qngo ↩ ↩² ↩³
Tom Gerken / BBC News (2024-01-19), DPD Error Caused Chatbot to Swear at Customer. https://www.bbc.com/news/technology-68025677 ↩ ↩² ↩³
Mallory Moench / TIME (2024-01-20), AI Chatbot Curses at Customer and Criticizes Work Company. https://time.com/6564726/ai-chatbot-dpd-curses-criticizes-company/ ↩ ↩²
Bryson Masse / VentureBeat (2023-12-19), A Chevy for $1? Car Dealer Chatbots Show Perils of AI for Customer Service. https://venturebeat.com/ai/a-chevy-for-1-car-dealer-chatbots-show-perils-of-ai-for-customer-service ↩ ↩² ↩³ ↩⁴ ↩⁵
Colin Lecher & Katie Honan / The Markup + THE CITY (2026-01-30), Mamdani to Kill the NYC AI Chatbot We Caught Telling Businesses to Break the Law. https://themarkup.org/artificial-intelligence/2026/01/30/mamdani-to-kill-the-nyc-ai-chatbot-we-caught-telling-businesses-to-break-the-law ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶
Steven J. Vaughan-Nichols / The Register (2026-04-28), Locked, Stocked, and Losing Budget: AI Vendor Lock-in Bites. https://www.theregister.com/software/2026/04/28/locked-stocked-and-losing-budget-ai-vendor-lock-in-bites/5229050 ↩ ↩² ↩³ ↩⁴ ↩⁵ ↩⁶ ↩⁷ ↩⁸ ↩⁹ ↩¹⁰ ↩¹¹ ↩¹² ↩¹³ ↩¹⁴ ↩¹⁵ ↩¹⁶ ↩¹⁷
Mayfield Fund (2026-01-19), The Agentic Enterprise in 2026. https://www.mayfield.com/the-agentic-enterprise-in-2026 ↩ ↩² ↩³ ↩⁴ ↩⁵