Autonomous AI Agent Market: Mid-2026 Vendor Assessment

An ace8 scorecard across 13 vendors | May 2026

May 12, 2026

It’s hard to believe that by May 2026, we’re already ‘used to’ OpenClaw in enterprise-level deployment conversations, but that’s the pace of this market. However, in a world with increasing security incidents, it is worth paying closed attention to this space.

In the previous report, we established the Holon Model as an ideal conceptual model to evaluate autonomous agentic behavior. The following report is a state of the market survey on tools including Hermes, Claude Code, Codex, etc. The good news is that baseline experiences are appreciated by their customers, but in general, security and governance are no where near ready.

Today it is a billable line item on enterprise budgets, a regulated risk class at frontier labs, and the organizing problem of every model release. Six trends define the mid-2026 landscape:

1. The category bifurcated into three structurally distinct buyer segments. Open self-hosted runtimes (OpenClaw, Hermes Agent, AutoGPT, CrewAI) serve technical operators who want to own their stack. Managed enterprise infrastructure (Anthropic, OpenAI, Microsoft, Google, AWS, Salesforce, LangChain) serves organizations that need governance and SLAs. Domain-specific autonomous workers (Devin, Manus, Claude Cowork) serve specific jobs-to-be-done. These are not substitutes. Treating them as one market — as press coverage still does — produces the buyer confusion that drives Gartner’s projection that 40%+ of agentic AI projects will be cancelled by end of 2027 (Gartner, June 25, 2025).

2. Revenue has arrived; reliability has not caught up. Salesforce Agentforce hit $800M ARR (+169% YoY) in Q4 FY26 (Salesforce earnings, Feb 25, 2026). Microsoft 365 Copilot reached 20M paid seats by April 2026 (Microsoft FY26 Q3 earnings). Anthropic API revenue grew 17–70x YoY (Shashi.co, May 2026). Cognition’s Devin is deployed across ~52,000 developers between Goldman Sachs and Citi (American Banker). Yet METR’s task-length researchshows 50%-reliable autonomous horizons are doubling every four months from a low base, MIT NANDA found 95% of enterprise GenAI pilots fail to deliver measurable P&L, and Princeton’s HAL benchmark pivoted to a “Reliability Dashboard” after concluding headline benchmark scores were systematically inflated.

3. Open standards won the protocol layer. Anthropic donated Model Context Protocol (MCP) to the Linux Foundation’s Agentic AI Foundation in December 2025 (Anthropic announcement); Google donated Agent2Agent (A2A)to the Linux Foundation in June 2025 (Linux Foundation press release). OpenAI, Microsoft, AWS, Salesforce, and Google have all adopted MCP (TechCrunch on Google MCP adoption). Pento estimates 97M monthly MCP SDK downloads (Pento.ai year-in-review). The fight has moved up the stack — to harnesses, governance, and customer success.

4. The frontier is durable H3, not H4. Despite marketing claims, the documented production frontier is reliable supervisor-to-task delegation with persistent memory and human approval gates on high-stakes actions. Claude Agent Teams, Devin’s “Managed Devins,” and AgentCore Runtime are the closest things to real H4. True peer-to-peer federated agent networks at scale remain a 2027 story.

5. The reference architecture is now legible. OpenClaw’s SOUL.md / HEARTBEAT.md / AGENTS.md / TOOLS.md / MEMORY.mddecomposition (OpenClaw docs) and Hermes Agent’s five-pillar pattern (Memory, Skills, Soul, Crons, Experience Loop — Hermes Agent docs) have given the industry a shared vocabulary for what an agent actually needs to be: identity, scheduled behavior, delegation rules, tools, and bounded memory. Every serious product built on top of LLM runtimes in the next 24 months will inherit some version of this decomposition.

6. Customer sentiment is the single best leading indicator. Vendors with the largest gap between marketing and verified user experience — Manus most starkly (Trustpilot manus.im), OpenClaw in second place, Salesforce in third (The Information; TheStreet) — are the vendors with the most volatile commercial outcomes. Vendors whose sentiment compounds quietly (Hermes Agent, Claude Code, Devin in narrow domains) are the ones whose ARR is most likely to be real in 12 months.

This report grades 13 vendors across five dimensions: Holon Depth (HD), Production Readiness (PR), Governance & Trust (GT), Developer Leverage (DL), and Customer Sentiment (CS) — using the H0–H5 holonic framework defined below.

Holon levels (H0–H5) Framework

See Holon Levels of Agentic LLM and Orchestration: A Vendor Taxonomy, by ace8.

Grading dimensions and scale

Holon Depth (HD) — deepest level reliably demonstrated in production
Production Readiness (PR) — GA status, disclosed ARR/customers, uptime, governance maturity
Governance & Trust (GT) — identity, observability, certifications, incident history
Developer Leverage (DL) — SDK quality, MCP/A2A interop, extensibility, ecosystem
Customer Sentiment (CS) — aggregated from G2, Trustpilot, Reddit, Hacker News, independent reviews

Grades: A = market-leading at scale | B = strong with documented gaps | C = uneven, material public failures | D = significantly behind peers

Section 1: Open self-hosted runtimes

1.1 OpenClaw (Peter Steinberger / OpenClaw Foundation)

Category: Open self-hosted autonomous agent harness Max Holon Level: H3 (architecturally present; H2 in reliable production) Repo: github.com/openclaw/openclaw | License: MIT | Pricing: free software, BYO LLM API (~$30–150/mo typical)

Origin and trajectory. Originally shipped as Clawdbot in November 2025 by Peter Steinberger (github.com/steipete), rebranded to Moltbot in late January 2026 after Anthropic raised trademark concerns, and settled on OpenClaw on January 30, 2026 (Wikipedia: OpenClaw). The project hit ~295,000 GitHub stars by April 2026 (WebSearchAPI: Inside OpenClaw), overtaking React among runnable-software projects (OpenClaw blog: 250K Stars). Steinberger was hired by OpenAI in February 2026 while the project was spun out to a foundation. Lex Fridman dedicated podcast episode #491 to him. The Pragmatic Engineer ran a long-form feature titled “The creator of Clawd: ‘I ship code I don’t read.’”

Architecture. The workspace-kernel pattern wraps an LLM with a gateway, scheduler, and file-centric configuration. The instruction stack uses SOUL.md (identity), HEARTBEAT.md (scheduled behavior), AGENTS.md (delegation rules), TOOLS.md(capabilities), and MEMORY.md (state), plus a skill system using SKILL.md files at three precedence levels: bundled, globally installed, and workspace-scoped (Ken Huang: OpenClaw Design Patterns). ClawHub is the public skill registry. The gateway routes across WhatsApp, Telegram, Slack, email, Discord, and custom webhooks. The mascot is a lobster named Molty.

The Anthropic dispute. On April 4, 2026, Anthropic announced that Claude Code subscriptions would no longer cover third-party harnesses including OpenClaw — users had to switch to pay-as-you-go API access (TechCrunch, April 4, 2026). TechCrunch followed up on April 10, 2026 reporting that Steinberger’s own Claude account was briefly suspended before being reinstated (TechCrunch, April 10, 2026). VentureBeat called Anthropic’s competing Channels feature “an OpenClaw killer” (VentureBeat).

Documented security posture. This is the report’s largest single concern. At Steinberger’s April 2026 “State of the Claw” talk: 1,142 security advisories filed in the project’s first five months — roughly 16.6/day, more than double the Linux kernel’s rate (WebSearchAPI). An audit found ~12% of submitted ClawHub skills contained malicious code. 155,000+ publicly exposed OpenClaw instances were identified. 36% of ClawHub marketplace skills were found to contain prompt injections. Cisco’s AI security team tested a third-party ClawHub skill and documented prompt injection plus data exfiltration without user awareness (Medium: Your OpenClaw Agent Just Leaked Its Secrets). In March 2026 the Chinese government restricted state agencies and SOEs from using OpenClaw, citing security and energy concerns (Wikipedia: OpenClaw).

1.2 Hermes Agent (Nous Research)

Category: Open self-hosted self-improving autonomous runtime Max Holon Level: H3 Repo:github.com/NousResearch/hermes-agent | License: MIT | Pricing: free; works against any LLM API

Origin and trajectory. Released by Nous Research on approximately February 25, 2026 with the tagline “The agent that grows with you” (Nous Research on X). Nous Research (github.com/nousresearch) is the lab behind the Hermes, Nomos, and Psyche model families; Hermes Agent is explicitly a distinct product. Latest release as of this writing is v0.13.0 (”Tenacity Release”), shipped May 7, 2026 (releases page), with multi-agent Kanban, a /goal Ralph-loop primitive, Checkpoints v2, eight P0 security fixes, Google Chat as the 20th supported messaging platform, and seven i18n locales.

Architecture — five pillars. Memory (FTS5 full-text search across past sessions plus Honcho dialectic user modeling; bounded and consolidated rather than unbounded context accumulation), Skills (auto-created after 5+ tool-call tasks, agentskills.io-compatible, with an Autonomous Curator process for periodic consolidation), Soul (SOUL.md persona files), Crons (natural-language scheduled behavior via the gateway), and Experience Loop (skills patched during use, plus the separate hermes-agent-self-evolution repo using DSPy + GEPA) — documented at hermes-agent.nousresearch.com/docs and the hermes-agent README. The critical architectural difference from OpenClaw is the memory model: Hermes compresses and consolidates rather than accumulating, and turns repeated behavior into reusable skills.

Capabilities. CLI plus an Ink/React-based TUI (hermes --tui) (quickstart docs). Messaging gateway across 20+ platforms including Telegram, Discord, Slack, WhatsApp, Signal, DingTalk, SMS via Twilio, Mattermost, Matrix, Webhook, Email IMAP/SMTP, Home Assistant, Feishu/Lark, WeCom, BlueBubbles iMessage, Microsoft Teams, Google Chat, and LINE. Seven terminal sandboxing backends: local, Docker, SSH, Singularity, Modal, Daytona, Vercel Sandbox. An OpenAI-compatible /v1/chat/completions server, REST cron job management, MCP support, and ACP-based IDE integration for VS Code, Zed, and JetBrains. Community resources catalogued at awesome-hermes-agent; independent documentation mirror at mudrii/hermes-agent-docs.

Direct competitive positioning vs OpenClaw. The README includes a built-in hermes claw migrate command that detects ~/.openclaw and imports OpenClaw settings, memories, skills, and API keys. A community bridge called “HermesClaw” runs both agents on the same WeChat account. Turing Post and MindStudio both published comparison pieces.

Scorecard

1.3 AutoGPT (Significant Gravitas)

Category: Open-source autonomous agent framework Max Holon Level: H2–H3 Repo: github.com/Significant-Gravitas/AutoGPT | License: mixed (MIT + Polyform Shield for autogpt_platform/) | Pricing: free tier; paid from ~$25/mo

Current state. AutoGPT has matured from its viral 2023 prototype into AutoGPT Platform — a visual workflow builder plus modular block backend with MCP support, credit-billed execution, distributed Redis locking, durable execution, and a marketplace (PyShine technical overview). 184,055 stars and 46,246 forks as of May 7, 2026 (releases page). Hosted at platform.agpt.co.

Sentiment. Independent reviewers describe AutoGPT as “pioneering but aging. Other frameworks have surpassed it for production use cases” (Tencent Cloud techpedia). Pasquale Pillitteri’s 2026 OSS review: “Today it is no longer the ‘first cult autonomous agent’ that burned through GPT-4 tokens uncontrollably, but a mature platform for building reusable agents… it suffers compared to LangGraph in production scenarios.” Vibeagentmaking retrospective on the 100K-stars milestone.

Scorecard

1.4 CrewAI

Category: Multi-agent OSS framework Max Holon Level: H3 Repo: github.com/crewAIInc/crewAI | License: MIT | Pricing: free; Professional from $25/mo; Enterprise custom

Current state. 50.8k GitHub stars, 7k forks, release 1.14.5a3 on May 6, 2026 (Panto AI statistics). ~27M total PyPI downloads with roughly 5M monthly. Raised $18M total in October 2024 (Pulse 2.0); PitchBook indicates total funding to $44.5M across 14 investors.

Architecture pitch. “Agents as team members with roles.” A 2026 cross-framework review: “CrewAI thinks in teams… if your problem maps to a team analogy, CrewAI will feel natural and productive.” Reported drawbacks in Vibecoding review: high token consumption, limited OSS observability.

Adoption. CrewAI claims “used by nearly half of the Fortune 500” (Insight Partners launch announcement) — self-reported. 2 billion agentic executions in the prior 12 months as of Jan 2026. 150 beta enterprise customers in <6 months following enterprise launch.

Scorecard

Section 2: Managed enterprise agent infrastructure

2.1 Anthropic — Claude Code, Cowork, Computer Use, Agent SDK, MCP

Category: Managed frontier-model agent infrastructure Max Holon Level: H3–H4 (H4 via Agent Teams; H3 for managed workflows) Pricing: Free; Pro $20/mo; Max 5x $100/mo; Max 20x $200/mo; Team Premium $125/mo (5-seat min); Enterprise custom (Claude pricing)

Position. Anthropic is the de facto coding-agent leader by benchmark and developer revealed preference. Claude Opus 4.5 (Nov 24, 2025) scored 80.9% on SWE-bench Verified, 37.6% on ARC-AGI-2, and 59.3% on Terminal-Bench(Vellum benchmarks) while cutting Opus pricing 67% from $15/$75 to $5/$25 per MTok (Claude Fast). Sonnet 4.6 (Feb 17, 2026) and Opus 4.7 (April 16, 2026) extended the run (Vellum Opus 4.7 explained). At the May 6, 2026 Code w/ Claude event, Anthropic disclosed API volume up 17–70x YoY (Shashi.co).

Claude Code. Subagents (markdown-defined, isolated context windows), Skills (Oct 15, 2025), Plugins (marketplace launched April 2026 — Build with Claude), Hooks, and a Channels Telegram/Discord interface via MCP (VentureBeat). The April 22, 2026 misstep — briefly removing Claude Code from the Pro plan as “a small test on ~2% of new prosumer signups” — was reverted within hours (The Register). Reddit sentiment analysis at AI Tool Discovery.

Claude Cowork. Announced as a research preview in January 2026, GA on macOS and Windows in April 2026 as “Claude Code for the rest of your work” (Claude Cowork product page; DataCamp tutorial). Architecture uses Apple’s VZVirtualMachine framework with a custom Linux root filesystem sandbox per Simon Willison’s first impressions. Tygart Media on April/May 2026 Claude updates.

Computer Use. OSWorld trajectory: 14.9% (Oct 2024) → 61.4% (Sonnet 4.5, Sep 2025) → 72.5% (Sonnet 4.6) → 72.7% (Opus 4.6) → ~78% (Opus 4.7) (Anthropic 3.5 Models; Coasty OSWorld 2026 results; TokenMix Computer Use API 2026). Suprmind on Claude features 2026.

Claude for Microsoft 365. GA across Excel, PowerPoint, and Word with Outlook in public beta around March 11, 2026 (Intellectia). Claude for Chrome covered by DataCamp.

Agent SDK. Renamed from Claude Code SDK (migration guide). @anthropic-ai/claude-agent-sdk with ~1,106 dependent npm packages. DataCamp SDK tutorial.

MCP — the open standard play. Launched November 2024. Adopted by OpenAI in March 2025 (Zuplo year of MCP), Google DeepMind in April 2025 (TechCrunch), Microsoft in Windows 11 and Microsoft 365. In December 2025 Anthropic donated MCP to the Agentic AI Foundation under the Linux Foundation (Anthropic announcement). 10,000+ active public MCP servers; Pento estimates 97M monthly SDK downloads (Pento year of MCP).

Enterprise customers. ServiceNow (Jan 28, 2026) — Claude is default for ServiceNow Build Agent, rolled to 29,000+ employees (Anthropic ServiceNow announcement). InfoQ on Sonnet 4.5 SWE-bench Verified leadership. Release notes catalogued at Claude Help Center and Releasebot.

Scorecard

2.2 OpenAI — ChatGPT agent, Codex, AgentKit

Category: Managed frontier-model agent infrastructure Max Holon Level: H3 Pricing: ChatGPT Plus $20/mo, Pro $200/mo; Codex bundled across tiers; AgentKit at standard API pricing

Position. Operator (Jan 23, 2025 — Wikipedia) and Deep Research (OpenAI announcement) merged into ChatGPT agent on July 17, 2025 (TechCrunch; TechRadar live coverage). Codex now reports 3–4M weekly users and 9M+ paying business users (OpenAI Codex flexible pricing). ChatGPT agent release notes.

ChatGPT agent benchmarks (OpenAI claims). Humanity’s Last Exam 41.6% pass@1, BrowseComp 68.9% SOTA, SpreadsheetBench 45.5% vs Copilot’s 20.0%, DSBench modeling 85.5% (Belitsoft on ChatGPT agent). First product designated “high capability” in biological/chemical risk under OpenAI’s Preparedness Framework.

Codex evolution. codex-1 (May 2025) → GPT-5-Codex (Sept 2025, 74.5% SWE-bench Verified by OpenAI vs 69.4% by Vals.ai) → GPT-5.1-Codex-Max → GPT-5.3-Codex (SWE-bench Pro leaderboard). Quantumrun Codex Explained. 9 Must-Have Skills for Codex in 2026. Plugin governance feature: MLQ.

AgentKit (DevDay Oct 6, 2025). OpenAI announcement; Digital Applied step-by-step guide. Codex product page.

APIs and SDK. Responses API replaced Assistants API (deprecation page; migration guide). Predecessor framework: OpenAI Swarm.

Sentiment. Early Operator coverage was harsh — TweakTown aggregating Reddit: “Operator is quite simply too slow, expensive, and error-prone.” This triggered absorption into ChatGPT agent.

Scorecard

2.3 Microsoft — Copilot Studio, M365 Copilot, GitHub Copilot, Foundry

Category: Managed enterprise agent infrastructure (volume leader) Max Holon Level: H3–H4 Pricing: Copilot Studio $200/mo for 25,000 credits; M365 Copilot $30/user/mo

Position. 20M paid M365 Copilot seats by April 2026 (CNBC on FY26 Q3; Microsoft FY26 Q2 earnings); 4.7M paid GitHub Copilot subscribers; 230,000+ Copilot Studio organizations and 400,000+ custom agents built in one quarter (Microsoft 2025 Annual Report; M365 sales agents announcement). But share-of-preference is contracting per Perspectives.plus citing Recon Analytics.

The “Frontier Firm” stack. Ignite 2025 Book of News; Ignite 2025 M365 blog post; Hong Kong recap. Microsoft Agent Framework 1.0 consolidates AutoGen (now in maintenance mode — Discussion #7066) and Semantic Kernel.

M365 Copilot agents. Researcher and Analyst reached GA May 30, 2025, capped at 25 combined queries per user per month (M365 Admin). Plus Facilitator, Interpreter, and more (Redmondmag agent enhancements; November 2025 Copilot Studio updates). Copilot Pre-Purchase Plan; Copilot Studio licensing; Microsoft Learn licensing; HSO Copilot vs Copilot Studio.

GitHub Copilot agent mode. GA April 2025; SWE-bench Verified 56.0% with Claude 3.7 Sonnet per GitHub’s official disclosure (GitHub Blog; Developer Tech; coding agent press release). Tech Insider GitHub Copilot vs Cursor 2026. GitHub Spark: Cryptopolitan; GitHub Spark feature page; Medium overview; Hackster.io; Microsoft Community Hub.

Foundry. Azure AI Foundry Agent Service pricing; Foundry Models pricing; Microsoft Foundry pricing. Microsoft Copilot actions and agents IT post.

Critique. Ragnar Heil’s “Brutally Honest Review” (late 2025): “When placing a Copilot Studio agent in a managed solution, you suddenly get vague SQL errors.” Per Xenoss summarizing Gartner field work: “Only 5% of organizations moved from a pilot to larger-scale deployments.” Practitioner thread on outages: Microsoft Q&A. Microsoft FY25 Q4 earnings call.

Scorecard

2.4 Google — Gemini Enterprise, ADK, A2A, Antigravity, Jules

Category: Managed enterprise agent infrastructure (openness leader) Max Holon Level: H3–H4 Pricing: Gemini Business $21/user/mo; Gemini Enterprise $30/user/mo (Revolgy guide)

Position. A2A Protocol donated to Linux Foundation June 23, 2025. ADK docs repo; adk-python (17.6K stars, 3.2K dependents). Agent Development Kit documentation.

Project Mariner discontinued. Shut down May 4, 2026 (Digital Trends; Android Authority; Android Headlines). Tech folded into Gemini Agent (9to5Google; Android Central).

Gemini Enterprise. Launched October 9, 2025 (Google Cloud blog; CNBC; TechRepublic; Business Chief; Technology Magazine; AI Magazine; Max Productive AI). 48 languages, IL5 and FedRAMP High authorizations. UI Bakery 2026 Vertex Agent Builder guide.

Cloud Next ‘26. Vertex AI rebrand to “Gemini Enterprise Agent Platform” (Promevo recap; Google Cloud welcome post; Cloud Next wrap-up; Google blog Cloud Next ‘26 highlights).

Jules. Out of beta August 6, 2025 (SiliconANGLE; TechCrunch). Gemini 3.1 Pro Jules score on SWE-bench Verified: 80.6% (Morph LLM 14 Best AI Coding Agents 2026).

Antigravity. Launched November 18, 2025 alongside Gemini 3 (The New Stack; Wikipedia; Google Developers blog).

Gemini model lineup. Gemini 3 blog post; ALM Corp Gemini 3.1 Pro complete guide; Vellum Gemini 3 benchmarks. NotebookLM agentic evolution: Medium 2023–2026 analysis; Jorgep on Gemini Notebooks vs NotebookLM; DigitalOcean.

Scorecard

2.5 AWS — Bedrock AgentCore

Category: Managed enterprise agent infrastructure (consumption-priced) Max Holon Level: H3–H4 Pricing: Runtime $0.0895/vCPU-hour + $0.00945/GB-hour (AWS pricing)

Position. AgentCore launched July 16, 2025 (AWS launch blog) and reached GA October 13, 2025. SDK was downloaded 1M+ times by GA day. AWS production-ready agents at scale; VentureBeat coverage.

Architecture — seven services. Runtime (8-hour async session, Firecracker microVM isolation), Memory, Identity (OAuth + Cognito/Entra ID/Okta), Gateway (turns Lambda/APIs into MCP tools), Browser (sandboxed with Live View + Session Replay), Code Interpreter, Observability (81 metrics via CloudWatch, OpenTelemetry).

Public incident. BeyondTrust/Phantom Labs disclosed a DNS-based data leakage flaw in AgentCore Code Interpreter sandbox mode (Hackread). Community sentiment on Gateway: Hacker News.

Scorecard

2.6 Salesforce — Agentforce

Category: Managed application-layer agent infrastructure (revenue leader) Max Holon Level: H3–H4 Pricing: Flex Credits at $500 per 100,000 credits (Salesforce Flexible Pricing; Magicfuse 2026 cost guide; Oliv AI breakdown)

Position. Agentforce ARR reached $800M, up 169% YoY in Q4 FY26 (Salesforce earnings; SEC 8-K).

Version history. Agentforce 1.0 (Oct 29, 2024) → 2.0 (Dec 2024) → 2dx (March 2025) → 3 (Salesforce June 23, 2025) → Agentforce 360 (Oct 13, 2025) → Agentforce Operations (April 29, 2026) (Pulse 2.0 coverage).

Customer evidence. Wiley case studies on CX Today; Agentforce in Action: customer success stories. Salesforce on combating hallucinations. Benioff layoffs framing: CNBC; CIO Dive on IT automation pivot; Salesforce HR extension into Workday partnership.

The hype-vs-reality discourse. Salesforce Ben on Agentforce hallucinations; TheStreet on customer backlash; CIO on Salesforce layoffs and exec churn.

G2 reviews. Agentforce ranked #1 Agentic AI Product 2026 at Salesforce G2 Awards page. Agentforce Reviews G2; Agentforce Sales reviews; Agentforce Service reviews.

Analyst recognition. Salesforce named leader in 2025 IDC MarketScape.

Scorecard

2.7 LangChain / LangGraph

Category: Open-source agent framework with managed observability layer Max Holon Level: H3–H4 Pricing:LangSmith Developer free; Plus $39/seat/mo; Enterprise custom (LangChain pricing; LangSmith pricing FAQ; MetaCTO breakdown; Coverge 2026 tier analysis; ZenML LangGraph pricing)

Position. LangChain $125M Series B at $1.25B valuation Oct 20, 2025 (Sacra LangChain profile). Revenue trajectory: $8.5M (June 2024) → $16M ARR (Oct 2025) (Latka). Sequoia: From Agent 0-to-1 to Agentic Engineering.

LangGraph 1.0. Shipped October 2025. LangChain GitHub organization. The legacy critique receded but is documented at Medium: Challenges and Criticisms of LangChain.

Enterprise customers. Klarna case study: 80% faster resolution. LangGraph used in production; Built with LangGraph. Best open-source frameworks overview: Firecrawl 2026.

Scorecard

Section 3: Domain-specific autonomous workers

3.1 Cognition AI — Devin

Category: Domain-specific autonomous software engineer Max Holon Level: H4 (Managed Devins) | H3 (typical use)Pricing: Free; Pro $20/mo; Max $200/mo; Teams $80/mo; Enterprise custom; AI Agent Square Devin review

Position. Cognition funding and growth blog; Sacra company profile. Digitalapplied complete guide. Awesome AI agents catalog: GitHub.

Architecture. Devin 2.2 released March 1, 2026 (Cognition blog). Managed Devins demonstrates real H4 delegation in isolated VMs. Long-running agent technical analysis: Zylos Research.

Benchmarks. Devin’s original 2024 number — 13.86% end-to-end on SWE-bench — has never been formally updated for Devin 2.x on Verified (AI Code Review SWE-bench leaderboard). Rapid Claw AI Agent Framework Scorecard 2026.

Enterprise deployments. Citi: American Banker. Goldman Sachs, Nubank, Ramp, Dell, Cisco, Palantir, Mercado Libre, Santander, NASA, Microsoft, OpenSea, Gumroad — named across Cognition’s funding blog.

Customer sentiment. The defining critique is Answer.AI’s January 8, 2025 evaluation by Hamel Husain, Isaac Flath, and Johno Whitaker: “Out of 20 tasks we attempted, we saw 14 failures, 3 inconclusive results, and just 3 successes.”Jeremy Howard tweeted (X post): “We tried really really hard to make Devin (the coding agent) work for us. But it didn’t… We remain less than bullish on agents.” The Register summary: “’First AI software engineer’ is bad at its job: Nailed just 15% of assigned tasks.” Hacker News thread on Answer.AI evaluation (item 42734681).

Scorecard

3.2 Manus AI (Butterfly Effect)

Category: Open-ended autonomous task agent Max Holon Level: H2–H3 Pricing: Manus Plans & Pricing; Spectrum AI Labs 2026 cost guide; Lindy 2026 plan breakdown; Fello AI plans explained; Get AI Perks credits explained; Electro IQ statistics; Manus AI team page

Position — the most extreme arc of the past year. Manus context: Substack: Is Manus the ‘DeepSeek Moment’; Substack: Meta’s Manus Acquisition Playbook; Wikipedia: Manus (AI agent). NDRC unwound the Meta deal on April 27, 2026 (UPI; CNBC).

Customer sentiment — the defining problem. Trustpilot fragmented across three profiles: main manus.im profile (121 reviews); page 4 of reviews; manus-ai.sbs profile. Independent test: Rio Times: 14 Failures in Two Weeks of Testing.

Scorecard

Section 4: Consolidated grading matrix

Section 5: Customer sentiment summary

Section 6: Structural observations

6.1 The hype-reality gap is largest in open self-hosted runtimes. OpenClaw achieved the fastest GitHub star accumulation of any runnable-software project in history, yet community threads consistently conclude it has no reliable production use cases for general autonomy. This isn’t a failure of the underlying technology — it’s a structural property of the category. The 30% of OpenClaw users who migrated to Hermes Agent cite better memory defaults and “actually getting stuff done instead of debugging.”

6.2 Governance is the real H5 gap across every vendor. No vendor in this study offers a credible H5 governance membrane. Microsoft’s Entra Agent ID and Defender for Agents come closest. Anthropic’s MCP-plus-prompt-injection-mitigations posture is the model-layer equivalent. Devin’s planning + PR gates are the most explicitly governance-structured at the domain level. But none provides the fleet-level RBAC, audit trails, entitlement scoping, and policy enforcement that enterprise buyers in regulated industries actually need.

6.3 The real production frontier is reliable H3, not H4. Despite H4 marketing claims, the documented production frontier across all vendors is durable H3. Customers expecting H4 autonomy and getting H2 results are the primary driver of negative sentiment patterns in OpenClaw, Manus, and AutoGPT. METR’s underlying time-horizon framing: EA Forum summary; AI Digest Moore’s Law for agents.

6.4 Self-hosted vs managed is the primary buyer segmentation. Every vendor with strong positive customer sentiment is either fully managed (Anthropic, Claude Cowork, Devin in narrow domains) or architecturally disciplined enough to be stable on $5 VPS infrastructure (Hermes Agent).

6.5 OpenClaw is still the most important reference architecture in the category. Despite security posture and production limitations, OpenClaw matters more than any other open project for understanding what the autonomous agent category is. The SOUL.md / HEARTBEAT.md / AGENTS.md / TOOLS.md / MEMORY.md decomposition is the most legible model of what an agent needs to be. Hermes Agent’s five-pillar pattern is a direct architectural answer.

6.6 The category is splitting decisively. Counter-narratives have hardened into thought leadership: Dwarkesh interview with Andrej Karpathy: “AGI is still a decade away”; Utkarsh Kanwat: Why I’m Betting Against AI Agents in 2025; Gary Marcus comments thread. Anthropic’s own Project Vend (TechCrunch on terrible business owner experiment; Maxpool on Vending-Bench) and the Klarna AI reversal are the canonical cautionary case studies. Air Canada chatbot ruling: McCarthy Tétrault.

Full source list

Vendor product pages, pricing, and official docs

Anthropic / Claude:

Claude pricing: https://claude.com/pricing
Claude Cowork product page: https://claude.com/product/cowork
Claude Code subagents: https://code.claude.com/docs/en/sub-agents
Claude API release notes: https://platform.claude.com/docs/en/release-notes/overview
Claude Help Center release notes: https://support.claude.com/en/articles/12138966-release-notes
Build with Claude marketplace:

https://buildwithclaude.com/

Claude Agent SDK migration: https://platform.claude.com/docs/en/agent-sdk/migration-guide
npm: @anthropic-ai/claude-agent-sdk: https://www.npmjs.com/package/@anthropic-ai/claude-agent-sdk
DataCamp Claude Agent SDK tutorial: https://www.datacamp.com/tutorial/how-to-use-claude-agent-sdk
DataCamp Claude Cowork tutorial: https://www.datacamp.com/tutorial/claude-cowork-tutorial
DataCamp Claude for Chrome: https://www.datacamp.com/tutorial/claude-for-chrome-ai-powered-browser-assistance-automation
Claude sign-in: https://claude.ai/login
Anthropic MCP donation: https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation
Anthropic 3.5 models + Computer Use: https://www.anthropic.com/news/3-5-models-and-computer-use
Anthropic ServiceNow partnership: https://www.anthropic.com/news/servicenow-anthropic-claude
Tygart Media April–May 2026 Claude updates: https://tygartmedia.com/claude-updates-april-2026/
Simon Willison Cowork first impressions:

Simon Willison’s Newsletter

First impressions of Claude Cowork, Anthropic’s general agent

In this newsletter…

5 months ago · 126 likes · 8 comments · Simon Willison

Suprmind Claude features 2026: https://suprmind.ai/hub/claude/features/
TokenMix Computer Use API 2026: https://tokenmix.ai/blog/claude-computer-use-api-2026
Coasty OSWorld 2026 results: https://coasty.ai/blog/osworld-benchmark-results-2026-who-actually-wins
Coasty OSWorld ranked: https://coasty.ai/blog/osworld-benchmark-results-2026-computer-use-ranked
Vellum Claude Opus 4.5 benchmarks: https://www.vellum.ai/blog/claude-opus-4-5-benchmarks
Vellum Claude Opus 4.7 explained: https://www.vellum.ai/blog/claude-opus-4-7-benchmarks-explained
Claude Fast Opus 4.5: https://claudefa.st/blog/models/claude-opus-4-5
Max Productive AI Claude review 2026: https://max-productive.ai/ai-tools/claude/
Intellectia: Claude for Office apps: https://intellectia.ai/news/stock/anthropic-launches-claude-for-office-apps
Shashi.co Anthropic Platform Bet: https://www.shashi.co/2026/05/anthropics-platform-bet-code-with.html
Releasebot Anthropic updates: https://releasebot.io/updates/anthropic
InfoQ Sonnet 4.5 SWE-bench: https://www.infoq.com/news/2025/10/claude-sonnet-4-5/
VentureBeat “OpenClaw killer” Channels: https://venturebeat.com/orchestration/anthropic-just-shipped-an-openclaw-killer-called-claude-code-channels
TechCrunch April 4: subscription block: https://techcrunch.com/2026/04/04/anthropic-says-claude-code-subscribers-will-need-to-pay-extra-for-openclaw-support/
TechCrunch April 10: Steinberger ban: https://techcrunch.com/2026/04/10/anthropic-temporarily-banned-openclaws-creator-from-accessing-claude/
The Register: Pro tier removal: https://www.theregister.com/2026/04/22/anthropic_removes_claude_code_pro/
AI Tool Discovery: Claude Code Reddit: https://www.aitooldiscovery.com/guides/claude-code-reddit

OpenAI:

ChatGPT agent release notes: https://help.openai.com/en/articles/11794368-chatgpt-agent-release-notes
Codex flexible pricing: https://openai.com/index/codex-flexible-pricing-for-teams/
Codex product page: https://openai.com/codex/
AgentKit announcement: https://openai.com/index/introducing-agentkit/
Deep Research announcement: https://openai.com/index/introducing-deep-research/
GPT-5.1-Codex-Max: https://openai.com/index/gpt-5-1-codex-max/
OpenAI deprecations: https://developers.openai.com/api/docs/deprecations
Responses API migration: https://platform.openai.com/docs/guides/migrate-to-responses
OpenAI Operator (Wikipedia): https://en.wikipedia.org/wiki/OpenAI_Operator
TechCrunch July 17 ChatGPT agent: https://techcrunch.com/2025/07/17/openai-launches-a-general-purpose-agent-in-chatgpt/
TechRadar July 17 live coverage: https://www.techradar.com/news/live/openai-july-17-announcement-live-event
Belitsoft on ChatGPT agent: https://belitsoft.com/news/chatgpt-agent-openai-20250717
TweakTown on Operator: https://www.tweaktown.com/news/102985/ai-agents-like-openais-operator-have-long-way-to-go-before-replacing-humans/index.html
Quantumrun Codex Explained: https://www.quantumrun.com/consulting/openai-codex/
Medium GPT-5-Codex vs Sonnet 4.5: https://medium.com/@leucopsis/how-gpt-5-codex-compares-to-claude-sonnet-4-5-1c1c0c2120b0
Medium 9 must-have Codex skills 2026: https://medium.com/@unicodeveloper/9-must-have-skills-for-codex-in-2026-b5124b375eec
MLQ Codex plugin governance: https://mlq.ai/news/openai-introduces-plugin-feature-for-codex-for-enterprise-ai-coding-governance/
Digital Applied AgentKit guide: https://www.digitalapplied.com/blog/openai-agentkit-complete-guide
MorphLLM OpenAI Swarm guide: https://www.morphllm.com/openai-swarm

Microsoft:

Copilot Studio pricing: https://www.microsoft.com/en-us/microsoft-365-copilot/pricing/copilot-studio
Microsoft Foundry pricing: https://azure.microsoft.com/en-us/pricing/details/microsoft-foundry/
Azure AI Foundry Agent Service pricing: https://azure.microsoft.com/en-us/pricing/details/azure-ai-agent-service/
Foundry Models pricing: https://azure.microsoft.com/en-us/pricing/details/ai-foundry-models/microsoft/
Microsoft FY26 Q2 earnings: https://www.microsoft.com/en-us/investor/events/fy-2026/earnings-fy-2026-q2
Microsoft FY25 Q4 earnings: https://www.microsoft.com/en-us/investor/events/fy-2025/earnings-fy-2025-q4
Microsoft 2025 Annual Report: https://www.microsoft.com/investor/reports/ar25/index.html
CNBC FY26 Q3 earnings: https://www.cnbc.com/2026/04/29/microsoft-msft-q3-earnings-report-2026.html
Microsoft 365 sales agents announcement: https://www.microsoft.com/en-us/microsoft-365/blog/2025/03/05/new-sales-agents-accessible-in-microsoft-365-copilot-help-teams-close-more-deals-faster/
Microsoft Copilot actions and agents IT post: https://www.microsoft.com/en-us/microsoft-365/blog/2024/11/19/introducing-copilot-actions-new-agents-and-tools-to-empower-it-teams/
Microsoft Ignite 2025 Hong Kong: https://news.microsoft.com/en-hk/2025/11/19/microsoft-ignite-2025-empowering-the-frontier-firm-with-ai-agents-and-copilot/
Microsoft Ignite 2025 Frontier Firm M365 post: https://www.microsoft.com/en-us/microsoft-365/blog/2025/11/18/microsoft-ignite-2025-copilot-and-agents-built-to-power-the-frontier-firm/
Microsoft Ignite 2025 Book of News: https://news.microsoft.com/ignite-2025-book-of-news/
Microsoft Copilot Studio November 2025 updates: https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/whats-new-in-microsoft-copilot-studio-november-2025/
Microsoft Copilot Credit Pre-Purchase Plan: https://www.microsoft.com/en-us/microsoft-copilot/blog/copilot-studio/scale-your-agent-rollout-with-confidence-introducing-copilot-credit-pre-purchase-plan/
Microsoft Learn Copilot Studio licensing: https://learn.microsoft.com/en-us/microsoft-copilot-studio/billing-licensing
LicenseQ Copilot Studio licensing: https://licenseq.com/copilot-studio-licensing/
HSO Copilot vs Copilot Studio: https://www.hso.com/blog/microsoft-copilot-vs-studio
Microsoft Agent Framework 1.0: https://devblogs.microsoft.com/agent-framework/microsoft-agent-framework-version-1-0/
AutoGen repo: https://github.com/microsoft/autogen
AutoGen maintenance discussion #7066: https://github.com/microsoft/autogen/discussions/7066
GitHub Copilot agent mode launch (GitHub Blog): https://github.blog/news-insights/product-news/github-copilot-agent-mode-activated/
GitHub Copilot agent press release: https://github.com/newsroom/press-releases/coding-agent-for-github-copilot
Developer Tech GitHub Copilot agents: https://www.developer-tech.com/news/github-boosts-copilot-agents-new-models-and-mcp-support/
Tech Insider GitHub Copilot vs Cursor 2026: https://tech-insider.org/github-copilot-vs-cursor-2026-2/
GitHub Spark feature page: https://github.com/features/spark
Cryptopolitan: GitHub Spark launch: https://www.cryptopolitan.com/microsofts-github-spark-launches/
Medium GitHub Spark overview: https://medium.com/@servifyspheresolutions/github-spark-microsofts-ai-powered-app-development-platform-c17bd174a74b
Hackster.io GitHub Spark: https://www.hackster.io/news/github-goes-after-vibe-coding-fans-with-the-public-preview-of-github-spark-d54d80e3a6f4
Microsoft Community Hub partner news: https://techcommunity.microsoft.com/blog/partnernews/dream-it-see-it-ship-it-/4435962
M365 Admin Researcher/Analyst GA: https://m365admin.handsontek.net/researcher-analyst-moving-general-availability/
Redmondmag new Copilot agents: https://redmondmag.com/articles/2025/04/23/microsoft-announces-new-copilot-agents-and-enhancements.aspx
Perspectives Microsoft AI numbers:

https://www.perspectives.plus/p/microsoft-ai-numbers-good-bad-ugly

Xenoss Copilot enterprise limitations: https://xenoss.io/blog/microsoft-copilot-enterprise-limitations
Ragnarheil Copilot Studio review: https://ragnarheil.de/the-good-the-bad-and-the-ugly-of-copilot-studio-a-brutally-honest-review-going-into-late-2025/
Microsoft Learn copilotstudio not loading thread: https://learn.microsoft.com/en-us/answers/questions/5836354/copilotstudio-microsoft-com-is-not-loading

Google:

Linux Foundation A2A announcement: https://www.linuxfoundation.org/press/linux-foundation-launches-the-agent2agent-protocol-project-to-enable-secure-intelligent-communication-between-ai-agents
ADK docs repo: https://github.com/google/adk-docs
ADK Python repo: https://github.com/google/adk-python
ADK documentation site: https://google.github.io/adk-docs/
Gemini Agent overview: https://gemini.google/overview/agent/
9to5Google: Gemini Agent planner upgrade: https://9to5google.com/2026/05/06/gemini-agent-planner-upgrade/
Android Central Gemini Agent: https://www.androidcentral.com/apps-software/google-gemini-is-finally-becoming-the-personal-assistant-we-were-promised
Digital Trends Mariner shutdown: https://www.digitaltrends.com/computing/google-pulls-the-plug-on-project-mariner-the-ai-agent-that-browsed-the-web-like-a-human/
Android Authority Mariner shutdown: https://www.androidauthority.com/google-project-mariner-shutdown-3664323/
Android Headlines Mariner shutdown: https://www.androidheadlines.com/2026/05/google-shuts-down-project-mariner-ai-agent.html
Gemini Enterprise launch (Google Cloud): https://cloud.google.com/blog/products/ai-machine-learning/introducing-gemini-enterprise
TechRepublic Gemini Enterprise: https://www.techrepublic.com/article/news-google-releases-gemini-enterprise/
CNBC Gemini Enterprise: https://www.cnbc.com/2025/10/09/google-launches-gemini-enterprise-to-boost-ai-agent-use-at-work.html
Business Chief Gemini Enterprise: https://businesschief.com/news/what-can-googles-gemini-enterprise-suite-offer-businesses
AI Magazine Gemini Enterprise: https://aimagazine.com/news/what-can-googles-gemini-enterprise-suite-offer-businesses
Technology Magazine Gemini Enterprise: https://technologymagazine.com/news/inside-googles-comprehensive-new-gemini-enterprise-offering
Max Productive AI Gemini Enterprise launch: https://max-productive.ai/blog/google-gemini-enterprise-platform-launch/
Revolgy Gemini Enterprise guide: https://www.revolgy.com/insights/blog/guide-to-gemini-enterprise-features-pricing-and-implementation
UI Bakery 2026 Vertex AI Agent Builder: https://uibakery.io/blog/vertex-ai-agent-builder
Promevo Cloud Next 2026 recap: https://promevo.com/blog/google-cloud-next-2026-recap
Google Cloud Next ‘26 welcome: https://cloud.google.com/blog/topics/google-cloud-next/welcome-to-google-cloud-next26
Google Cloud Next 2026 wrap-up: https://cloud.google.com/blog/topics/google-cloud-next/google-cloud-next-2026-wrap-up
Google Cloud Next ‘26 highlights (Google blog): https://blog.google/innovation-and-ai/infrastructure-and-cloud/google-cloud/google-cloud-next-26-recap/
Jules announcement (Google blog): https://blog.google/innovation-and-ai/models-and-research/google-labs/jules/
SiliconANGLE Jules launch: https://siliconangle.com/2025/08/06/google-makes-jules-ai-coding-agent-available-everyone-free-paid-plans/
TechCrunch Jules beta exit: https://techcrunch.com/2025/08/06/googles-ai-coding-agent-jules-is-now-out-of-beta/
Antigravity (New Stack): https://thenewstack.io/antigravity-is-googles-new-agentic-development-platform/
Antigravity (Wikipedia): https://en.wikipedia.org/wiki/Google_Antigravity
Google Developers blog: Antigravity: https://developers.googleblog.com/build-with-google-antigravity-our-new-agentic-development-platform/
Morph LLM 14 Best AI Coding Agents 2026: https://www.morphllm.com/best-ai-coding-agents-2026
Gemini 3 blog post: https://blog.google/products-and-platforms/products/gemini/gemini-3/
Vellum Gemini 3 benchmarks: https://www.vellum.ai/blog/google-gemini-3-benchmarks
ALM Corp Gemini 3.1 Pro guide: https://almcorp.com/blog/gemini-3-1-pro-complete-guide/
NotebookLM evolution (Medium): https://medium.com/@jimmisound/the-cognitive-engine-a-comprehensive-analysis-of-notebooklms-evolution-2023-2026-90b7a7c2df36
Jorgep Gemini Notebooks vs NotebookLM: https://jorgep.com/blog/google-gemini-notebooks-vs-notebooklm/
DigitalOcean NotebookLM 2026 guide: https://www.digitalocean.com/resources/articles/what-is-notebooklm
TechCrunch Google MCP adoption: https://techcrunch.com/2025/04/09/google-says-itll-embrace-anthropics-standard-for-connecting-ai-models-to-data/

AWS:

AgentCore launch blog: https://aws.amazon.com/blogs/aws/introducing-amazon-bedrock-agentcore-securely-deploy-and-operate-ai-agents-at-any-scale/
AWS production-ready agents at scale: https://aws.amazon.com/blogs/machine-learning/enabling-customers-to-deliver-production-ready-ai-agents-at-scale/
VentureBeat AgentCore: https://venturebeat.com/ai/aws-unveils-bedrock-agentcore-a-new-platform-for-building-enterprise-ai-agents-with-open-source-frameworks-and-tools
Hackread AgentCore Code Interpreter leak: https://hackread.com/data-leak-risk-in-aws-bedrock-ai-code-interpreter/
HN AgentCore Gateway thread: https://news.ycombinator.com/item?id=44928468

Salesforce / Agentforce:

Salesforce FY26 Q4 earnings: https://www.salesforce.com/news/press-releases/2026/02/25/fy26-q4-earnings/
Salesforce 8-K Q3 FY26: https://www.sec.gov/Archives/edgar/data/0001108524/000110852425000234/crm-q3fy26xexhibit991.htm
Agentforce 3 announcement: https://www.salesforce.com/news/press-releases/2025/06/23/agentforce-3-announcement/
Agentforce Operations announcement: https://www.salesforce.com/news/stories/agentforce-operations-announcement/
Agentforce Operations (Pulse 2.0): https://pulse2.com/salesforce-launches-agentforce-operations-to-end-back-office-bottlenecks/
Agentforce flexible pricing news: https://www.salesforce.com/news/press-releases/2025/05/15/agentforce-flexible-pricing-news/
Magicfuse Agentforce cost guide: https://magicfuse.co/blog/agentforce-cost
Oliv AI Agentforce pricing breakdown: https://www.oliv.ai/blog/salesforce-agentforce-pricing-breakdown
CX Today Agentforce case studies: https://www.cxtoday.com/crm/agentforce-case-studies/
Salesforce Agentforce customer success stories: https://www.salesforce.com/news/stories/agentforce-customer-success-stories/?bc=OTH
Salesforce combating AI hallucinations: https://www.salesforce.com/news/stories/combating-ai-hallucinations/
CNBC Benioff 4,000 layoffs: https://www.cnbc.com/2025/09/02/salesforce-ceo-confirms-4000-layoffs-because-i-need-less-heads-with-ai.html
CIO Dive Salesforce IT automation: https://www.ciodive.com/news/salesforce-agentforce-IT-services-marc-benioff/807103/
Salesforce Devops extends AI into HR: https://salesforcedevops.net/index.php/2025/05/06/salesforce-extends-ai-strategy-into-hr/
Salesforce Ben Agentforce hallucinations: https://www.salesforceben.com/are-agentforce-hallucinations-a-problem-or-is-it-just-your-bad-data/
TheStreet Salesforce AI backlash: https://www.thestreet.com/technology/salesforce-ai-faces-backlash-from-customers
CIO Salesforce layoffs and exec churn: https://www.cio.com/article/4130028/salesforce-lays-off-staffers-as-executive-leadership-churn-continues.html
Agentforce G2 Awards page: https://www.salesforce.com/agentforce/g2-awards/
Agentforce G2 reviews: https://www.g2.com/products/salesforce-agentforce/reviews
Agentforce Sales G2 reviews: https://www.g2.com/products/agentforce-sales-formerly-salesforce-sales-cloud/reviews
Agentforce Service G2 reviews: https://www.g2.com/products/agentforce-service-formerly-salesforce-service-cloud/reviews
IDC MarketScape Application Platform Marketplaces 2025–2026: https://www.salesforce.com/news/stories/idc-marketscape-application-platform-marketplaces-2025-2026/?bc=OTH

LangChain:

LangChain Series B at $1.25B: https://blog.langchain.com/series-b/
Klarna case study: https://blog.langchain.com/customers-klarna/
Is LangGraph used in production: https://blog.langchain.com/is-langgraph-used-in-production/
Built with LangGraph: https://www.langchain.com/built-with-langgraph
LangChain pricing: https://www.langchain.com/pricing
LangSmith pricing FAQ: https://docs.langchain.com/langsmith/pricing-faq
LangChain GitHub organization: https://github.com/langchain-ai
Sacra LangChain profile: https://sacra.com/c/langchain/
Latka LangChain revenue: https://getlatka.com/companies/langchain
Sequoia LangChain feature: https://sequoiacap.com/article/langchain-from-agent-0-to-1-to-agentic-engineering/
MetaCTO LangSmith pricing breakdown: https://www.metacto.com/blogs/the-true-cost-of-langsmith-a-comprehensive-pricing-integration-guide
Coverge LangSmith 2026 pricing analysis: https://coverge.ai/blog/langsmith-pricing
ZenML LangGraph pricing: https://www.zenml.io/blog/langgraph-pricing
Firecrawl best OSS frameworks 2026: https://www.firecrawl.dev/blog/best-open-source-agent-frameworks
Medium criticisms of LangChain: https://shashankguda.medium.com/challenges-criticisms-of-langchain-b26afcef94e7

Cognition / Devin:

Cognition funding and growth blog: https://cognition.ai/blog/funding-growth-and-the-next-frontier-of-ai-coding-agents
Cognition Devin 2.2 announcement: https://cognition.ai/blog/introducing-devin-2-2
Devin pricing: https://devin.ai/pricing
AI Agent Square Devin review 2026: https://aiagentsquare.com/agents/devin.html
Sacra Cognition profile: https://sacra.com/c/cognition/
Digitalapplied Devin complete guide: https://www.digitalapplied.com/blog/devin-ai-autonomous-coding-complete-guide
Awesome AI agents 2026 (GitHub): https://github.com/caramaschiHG/awesome-ai-agents-2026
Zylos long-running agents research: https://zylos.ai/research/2026-01-16-long-running-ai-agents
Rapid Claw AI Agent Framework Scorecard 2026: https://rapidclaw.dev/blog/ai-agent-benchmarks-2026
AI Code Review SWE-bench leaderboard: https://aicodereview.cc/blog/swe-bench-scores-leaderboard/
American Banker Citi rollout: https://www.americanbanker.com/news/citi-is-rolling-out-agentic-ai-to-its-40-000-developers
Answer.AI Thoughts on a Month with Devin: https://www.answer.ai/posts/2025-01-08-devin.html
Jeremy Howard X post on Devin:

Jeremy Howard@jeremyphoward

We tried really really hard to make Devin (the coding agent) work for us. But it didn't. Check out Hamel's detailed writeup blog linked below, describing the many tasks of many types we explored, nearly all of which failed. We remain less than bullish on agents...

Hamel Husain @HamelHusain

New post re: Devin (the AI SWE). We couldn't find many reviews of people using it for real tasks, so we went MKBHD mode and put Devin through its paces. We documented our findings here. Would love to know if others have had a different experience. https://t.co/DDqzoAXKkl

7:37 AM · Jan 17, 2025 · 417K Views

74 Replies · 206 Reposts · 2.29K Likes

The Register: ‘First AI software engineer’ is bad at its job: https://www.theregister.com/2025/01/23/ai_developer_devin_poor_reviews/
HN thread on Answer.AI evaluation: https://news.ycombinator.com/item?id=42734681

Manus AI:

Manus pricing: https://manus.im/pricing
Spectrum AI Labs 2026 cost guide: https://spectrumailab.com/blog/manus-ai-pricing-plans-cost-guide-2026
Lindy 2026 plan breakdown: https://www.lindy.ai/blog/manus-ai-pricing
Fello AI plans explained: https://felloai.com/manus-ai-pricing/
Get AI Perks credits explained: https://www.getaiperks.com/en/ai/manus-credits-explained
Electro IQ Manus statistics: https://electroiq.com/stats/manus-ai-statistics/
Manus AI team page: https://manusai.online/team
Substack: Is Manus the DeepSeek Moment:

Recode China AI

💻Is Manus the 'DeepSeek Moment' for AI Agents or Just a Claude Wrapper?

Hi, this is Tony! Welcome to this issue of Recode China AI (for the week of March 3, 2025), your go-to newsletter for the latest AI news and research in China…

a year ago · 24 likes · 1 comment · Tony Peng

Substack: Meta’s Manus Acquisition Playbook:

Recode China AI

💵Meta's Manus Acquisition: A New Playbook for Chinese-Founded AI Startups

Six months ago, Manus was being called a deserter (‘逃兵’). When the company relocated to Singapore in mid-2025, shuttered all its China-focused operations, laid off 80 mainland employees, and went dark on Chinese social media, the backlash was brutal. Chinese netizens and media tore into them for abandoning their home market…

6 months ago · 14 likes · 4 comments · Tony Peng

Wikipedia: Manus (AI agent): https://en.wikipedia.org/wiki/Manus_(AI_agent)
UPI: China blocks Meta acquisition: https://www.upi.com/Top_News/2026/04/27/blocks-meta-acquisition-manus/9181777298190/
CNBC: China blocks Meta Manus takeover: https://www.cnbc.com/2026/04/27/meta-manus-china-blocks-acquisition-ai-startup.html
Trustpilot manus.im: https://www.trustpilot.com/review/manus.im
Trustpilot manus.im page 4: https://www.trustpilot.com/review/manus.im?page=4
Trustpilot manus-ai.sbs: https://www.trustpilot.com/review/manus-ai.sbs
Rio Times: 14 Failures in Two Weeks: https://www.riotimesonline.com/manus-a-i-review-14-failures-in-two-weeks-of-testing/

OpenClaw:

OpenClaw GitHub org: https://github.com/openclaw/openclaw
Peter Steinberger GitHub profile: https://github.com/steipete
Wikipedia: OpenClaw: https://en.wikipedia.org/wiki/OpenClaw
OpenClaw blog 250K stars milestone: https://openclaws.io/blog/openclaw-250k-stars-milestone
WebSearchAPI Inside OpenClaw: https://websearchapi.ai/blog/openclaw-state-of-the-claw-peter-steinberger
Ken Huang OpenClaw Design Patterns Part 1:

Agentic AI

OpenClaw Design Patterns (Part 1 of 7)

Recently, our AI researchers at Distributedapps.ai got into the rabbit hole of OpenClaw architecture and uncovered some potentially reusable design patterns for building agentic systems. To share these insights, we are publishing our findings in a multi-part series, with each part includings chapters of agent design, configuration, and security. We are thrilled to share Part 1 with you today, which establishes the foundations. Please be sure to subscribe so you can get instant updates as we release the exciting new parts of this series…

4 months ago · 58 likes · Ken Huang

Medium: OpenClaw Agent Leaked Secrets: https://medium.com/@upadhyay.suraj09/your-openclaw-agent-just-leaked-its-secrets-to-github-heres-how-i-fixed-it-9bdcea7d27a7

Hermes Agent / Nous Research:

Nous Research GitHub org: https://github.com/nousresearch
Hermes Agent repo: https://github.com/nousresearch/hermes-agent
Hermes Agent README: https://github.com/NousResearch/hermes-agent/blob/main/README.md
Hermes Agent quickstart: https://github.com/NousResearch/hermes-agent/blob/main/website/docs/getting-started/quickstart.md
Hermes Agent releases: https://github.com/NousResearch/hermes-agent/releases
Nous Research X announcement:

Nous Research@NousResearch

Meet Hermes Agent, the open source agent that grows with you. Hermes Agent remembers what it learns and gets more capable over time, with a multi-level memory system and persistent dedicated machine access.

8:39 PM · Feb 25, 2026 · 3.15M Views

376 Replies · 714 Reposts · 6.99K Likes

Hermes Agent docs: https://hermes-agent.nousresearch.com/docs/
Hermes Agent landing:

https://hermes-agent.nousresearch.com/

Awesome Hermes Agent: https://github.com/0xNyk/awesome-hermes-agent
Mudrii Hermes Agent docs mirror: https://github.com/mudrii/hermes-agent-docs

AutoGPT:

AutoGPT GitHub: https://github.com/significant-gravitas
AutoGPT releases: https://github.com/Significant-Gravitas/AutoGPT/releases
AutoGPT blog: https://agpt.co/blog
PyShine AutoGPT Platform overview: https://pyshine.com/2026/04/20/autogpt-platform-continuous-ai-agents/
Progressive Robot What Is AutoGPT: https://www.progressiverobot.com/2026/04/14/what-is-autogpt/
Vibeagentmaking 100K stars retrospective: https://vibeagentmaking.com/blog/autogpt-got-100k-stars-and-then-what/
Tencent Cloud Best Open Source AI Agents 2026: https://www.tencentcloud.com/techpedia/144032
Pasquale Pillitteri 10 OSS Agent Frameworks 2026: https://pasqualepillitteri.it/en/news/1476/10-open-source-ai-agent-frameworks-2026

CrewAI:

CrewAI releases: https://github.com/crewAIInc/crewAI/releases
Panto AI CrewAI statistics 2026: https://www.getpanto.ai/blog/crewai-platform-statistics
Pulse 2.0 CrewAI Series A: https://pulse2.com/crewai-multi-agent-platform-raises-18-million-series-a/
PitchBook CrewAI 2026 profile: https://pitchbook.com/profiles/company/590845-78
Medium CrewAI vs ADK vs LangGraph: https://medium.com/@saniakawale/ai-agents-in-academia-crewai-google-adk-langgraph-compared-53efbc1d5727
NxCode CrewAI vs LangChain 2026: https://www.nxcode.io/resources/news/crewai-vs-langchain-ai-agent-framework-comparison-2026
Vibecoding CrewAI Review: https://vibecoding.app/blog/crewai-review
Insight Partners CrewAI launch: https://www.insightpartners.com/ideas/crewai-launches-multi-agentic-platform-to-deliver-on-the-promise-of-generative-ai-for-enterprise/

Benchmarks, market analysis, and counter-narratives

Gartner 40% cancellation projection: https://www.gartner.com/en/newsroom/press-releases/2025-06-25-gartner-predicts-over-40-percent-of-agentic-ai-projects-will-be-canceled-by-end-of-2027
Legal.io MIT NANDA 95% pilot failure: https://www.legal.io/articles/5719519/MIT-Report-Finds-95-of-AI-Pilots-Fail-to-Deliver-ROI-Exposing-GenAI-Divide
VICE: Company that replaced workers with AI hiring humans again: https://www.vice.com/en/article/this-company-replaced-workers-with-ai-now-theyre-looking-for-humans-again/
SlideFactory: Agentic AI for business 2026: https://www.theslidefactory.com/post/agentic-ai-for-business-2026
Utkarsh Kanwat: Why I’m Betting Against AI Agents 2025: https://utkarshkanwat.com/writing/betting-against-agents
Gary Marcus AI Agents dud comments:

https://garymarcus.substack.com/p/ai-agents-have-so-far-mostly-been/comments

METR task-length research (EA Forum): https://forum.effectivealtruism.org/posts/YJ7Pk2bwTd3ieimG8/metr-measuring-ai-ability-to-complete-long-tasks
AI Digest Moore’s Law for agents: https://theaidigest.org/time-horizons
Dwarkesh Karpathy interview:

Dwarkesh Podcast

Andrej Karpathy — AGI is still a decade away

Listen now

8 months ago · 363 likes · 17 comments · Dwarkesh Patel

arXiv SWE-ABS adversarial benchmark: https://arxiv.org/pdf/2603.00520
Codeant SWE-bench leaderboard 2026: https://www.codeant.ai/blogs/swe-bench-scores
Princeton HAL holistic agent leaderboard:

https://hal.cs.princeton.edu/

arXiv Measuring Agents in Production: https://arxiv.org/pdf/2512.04123
Maxpool Vending-Bench Project Vend: https://maxpool.dev/research-papers/vending_bench_report.html
TechCrunch Anthropic Claude vending experiment: https://techcrunch.com/2025/06/28/anthropics-claude-ai-became-a-terrible-business-owner-in-experiment-that-got-weird/
Fortune Klarna AI return on investment: https://fortune.com/2025/05/09/klarna-ai-humans-return-on-investment/
McCarthy: Moffatt v. Air Canada: https://www.mccarthy.ca/en/insights/blogs/techlex/moffatt-v-air-canada-misrepresentation-ai-chatbot

Protocol and platform infrastructure

Anthropic MCP donation announcement: https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation
Zuplo One Year of MCP: https://zuplo.com/blog/one-year-of-mcp
Pento A Year of MCP: https://www.pento.ai/blog/a-year-of-mcp-2025-review

ace8: AI and Society

Ready for more?