Per-Layer OSS Candidate Evaluations#

Status: Active — v1.0
Date: 2026-05-22
Author: Murali Raju / NotionAlpha OSS AI Lab
Rubric: docs/oss-ai-lab/methodology/qualification-rubric.md v1.0
Reference architecture: docs/oss-ai-lab/reference-architecture.md
Spec: docs/superpowers/specs/2026-05-22-notionalpha-oss-ai-lab-design.md §7

How to read this document#

Each of the seven capability layers in the reference architecture is evaluated here. For every layer:

At least two real OSS candidates are scored against the qualification rubric (0–3 per criterion, total /18).
Every factual claim carries a cited source URL.
A dated recommended pick (or an explicit "no qualifying candidate yet") closes each layer section.

These picks are dated recommendations, not permanent endorsements. The reference architecture is defined by capabilities and interfaces; implementations behind those interfaces are chosen by rubric score and can be swapped as the OSS landscape evolves. See spec §5 and the reference architecture's "Defining principle" section. Any implementation recommendation in this document should be re-evaluated when a candidate's license, governance, or health status changes materially.

Layer 1 — Runtime Isolation & Governance#

Architecture position: Control plane
Deep-contribution anchor: OpenShell (NVIDIA, Apache-2.0)
What it must do: Contain a compromised agent; enforce deny-by-default policy over tools, data, network, and output.

Candidate A — OpenShell (NVIDIA)#

Repository: https://github.com/NVIDIA/OpenShell
Evaluation date: 2026-05-22
Version: v0.0.46 (May 21, 2026; 40 total releases since March 2026)
License verified at: https://github.com/NVIDIA/OpenShell/blob/main/LICENSE

Qualification rubric — OpenShell v0.0.46 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	Apache-2.0 at repository root, confirmed at https://github.com/NVIDIA/OpenShell. Permissive, OSI-approved, no additional conditions.	—
2. Forkability / vendor independence	1	Single-vendor project (NVIDIA), no foundation governance, no public governance charter. Community ecosystem lives in a separate repo (https://github.com/NVIDIA/OpenShell-Community). CONTRIBUTING.md and AGENTS.md present; codebase is Rust (89.6%), comprehensible to outside contributors. No track record of successful community forks yet — project is two months old.	Single-vendor; practical community capacity to sustain a fork is unproven at this stage. Permissive license provides the legal forkability backstop. Honest-tension provision invoked — see below.
3. Health & bus factor	2	40 releases in roughly two months (March–May 2026) at https://github.com/NVIDIA/OpenShell/releases, demonstrating active cadence. 6.2k stars, 729 forks. CONTRIBUTING.md present. Contributor concentration is unclear — NVIDIA dominates; no confirmed outside maintainers with merge rights yet.	Alpha software; NVIDIA is the de facto single maintainer organization. Bus factor is unknown but likely low at this stage.
4. Production adoption & security posture	1	Alpha ("single-player mode: one developer, one environment, one gateway" per https://docs.nvidia.com/openshell/latest/about/overview). No documented enterprise production deployments. SECURITY.md present at repository root. OpenSSF Scorecard not publicly surfaced for this repository. Defense-in-depth design (Landlock LSM, Seccomp BPF, OPA/Rego policy engine) is architecturally sound.	No documented production users. Alpha status means unknown-failure risk is elevated. Requires documented rationale: this is the only OSS project with kernel-level agent sandbox isolation; no alternative exists at this capability level.
5. Composability	3	Declarative YAML policy schema; agent-agnostic (no SDK rewrite required); sandbox lifecycle API (create/run/terminate); Helm chart for Kubernetes deployment (https://developer.nvidia.com/blog/run-autonomous-self-evolving-agents-more-safely-with-nvidia-openshell/). Clean separation: policy engine, sandbox runtime, and community skill extensions are independently addressable.	—
6. Open-standards alignment	2	Policy engine uses OPA/Rego (open standard, https://www.openpolicyagent.org/). Network egress uses HTTP CONNECT proxy pattern (open protocol). No confirmed OTel trace emission in v0.0.46. No MCP tool-layer integration in core (MCP sits above the runtime layer, so partial overlap is expected). OpenTelemetry emission would strengthen this score.	OTel emission not confirmed in alpha release; structured policy-decision events emitted but format not yet publicly documented against OTel agent conventions.
Total	12/18

Interpretation: Viable with caveats

Any criterion scored 0: No

Honest-tension provision invoked: Yes — criterion 2 scores 1 (single-vendor, no governance charter, community capacity unproven). This is resolved by two mechanisms per the rubric's honest-tension provision: (1) Apache-2.0 license provides a real legal forkability backstop; (2) the reference architecture defines Runtime Isolation & Governance as a capability with stable interfaces — a different implementation can slot in behind those interfaces if NVIDIA disengages. No foundation-governed alternative at this capability level exists as of 2026-05-22. The criterion 4 score of 1 is accepted because no OSS alternative provides kernel-level agent isolation; the risk is documented and the per-layer recommendation is explicitly dated.

Recommendation: Recommended with documented caveats

Candidate B — Kata Containers (OpenInfra Foundation)#

Repository: https://github.com/kata-containers/kata-containers
Evaluation date: 2026-05-22
Version: v3.15.0 (April 2026)
License verified at: https://github.com/kata-containers/kata-containers/blob/main/LICENSE

Kata Containers provides hardware-virtualization-backed container isolation — each container runs in a lightweight VM with its own kernel. It is the closest OSS alternative to OpenShell's sandbox isolation model. It does not provide agent-specific policy enforcement, but it provides the isolation substrate that agent runtimes can build on.

Qualification rubric — Kata Containers v3.15.0 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	Apache-2.0 at repository root, confirmed at https://github.com/kata-containers/kata-containers/blob/main/LICENSE.	—
2. Forkability / vendor independence	3	Governed by the OpenInfra Foundation (https://openinfra.dev/projects/), a vendor-neutral foundation. Contributions from Intel, Red Hat, IBM, NVIDIA, Alibaba, and others documented in commit history. Foundation governance charter provides a credible path for outside parties to take over.	—
3. Health & bus factor	3	Active releases (v3.15.0, April 2026); contributors from 5+ organizations; issues and PRs active per public repository. Long track record since 2018.	—
4. Production adoption & security posture	3	Production use documented at major cloud providers (AWS Firecracker integration, Azure confidential containers, Alibaba Cloud). OpenSSF Scorecard badge present in repository. SECURITY.md present with responsible-disclosure channel (https://github.com/kata-containers/community/blob/main/VMT/VMT.md).	—
5. Composability	2	Pluggable hypervisor backends (QEMU, Cloud Hypervisor, Firecracker). However, it does not expose an agent-specific policy surface or tool-invocation interception API — it provides VM isolation, not declarative agent policy. Adopting it for the Runtime Isolation & Governance layer requires building the agent policy layer on top.	Does not satisfy the "deny-by-default tool invocation check" requirement of the layer without substantial additional policy-enforcement code. It is an isolation substrate, not a complete agent governance runtime.
6. Open-standards alignment	3	OCI-compatible (Open Container Initiative). Integrates with Kubernetes via CRI. OTel-compatible through standard container orchestration tooling.	—
Total	17/18

Interpretation: Strong candidate

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended as an isolation substrate; does not fully satisfy the Runtime Isolation & Governance layer's policy-enforcement requirement without an additional agent-policy layer. Kata Containers + a policy engine (e.g., OPA) is a viable alternative assembly, but requires integration work that OpenShell provides out of the box.

Layer 1 recommended pick — 2026-05#

Recommended pick: OpenShell (NVIDIA, Apache-2.0)
https://github.com/NVIDIA/OpenShell — v0.0.46, evaluated 2026-05-22

Rationale: OpenShell is the only OSS project that satisfies the full capability requirement — deny-by-default tool invocation policy, kernel-level sandbox isolation (Landlock LSM, Seccomp BPF), declarative YAML policy, and structured policy-decision events — as a single integrated runtime. Kata Containers scores higher overall but satisfies only the isolation substrate portion of the capability; meeting the full layer requirement would require composing it with a separate policy engine and agent-invocation interception layer, producing integration debt that negates its governance advantage for this specific use case.

OpenShell's criterion 2 weakness (single-vendor, no governance charter) and criterion 4 weakness (alpha, no production users) are explicitly acknowledged. Both are mitigated by the honest-tension provision: Apache-2.0 license provides a real forkability backstop, and the architecture is drawn so Kata Containers + OPA or another composition can replace OpenShell behind the same capability interface if NVIDIA disengages.

This pick is dated 2026-05 and should be re-evaluated when: (a) OpenShell reaches beta/GA status with documented production users; (b) a foundation-governed alternative emerges with equivalent agent-specific policy enforcement; or (c) OpenShell's governance status changes.

Layer 2 — Assurance, Evaluation & Forensics#

Architecture position: Control plane
Deep-contribution anchor: RAMPART (Microsoft, MIT)
What it must do: Measure how an agent fails before it ships; analyze behavior after.

Candidate A — RAMPART (Microsoft)#

Repository: https://github.com/microsoft/RAMPART
Evaluation date: 2026-05-22
Version: v0.1.0 (May 20, 2026)
License verified at: https://github.com/microsoft/RAMPART/blob/main/LICENSE

Qualification rubric — RAMPART v0.1.0 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	MIT at repository root, confirmed at https://github.com/microsoft/RAMPART. Permissive, OSI-approved, no additional conditions.	—
2. Forkability / vendor independence	1	Single-vendor project (Microsoft AI Red Team), no foundation governance, no public governance charter. Named contributors: Bashir Partovi (lead), Elliot H Omiya, Richard Lundeen, Nina Chikanov, Spencer Schoenberg, Toby Kohlenberg (https://www.microsoft.com/en-us/security/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/). Codebase is pytest-native Python, comprehensible to outside contributors. No outside maintainers confirmed.	Single-vendor; practical community capacity to sustain a fork is unproven at v0.1.0. MIT license provides the legal forkability backstop. Honest-tension provision invoked — see below.
3. Health & bus factor	2	Released May 20, 2026; 129 stars, 21 forks, 6 open PRs at https://github.com/microsoft/RAMPART. Named team of 6+ contributors indicates bus factor above 1, but all are Microsoft employees. Active internal use at Microsoft predates public release (red team operations across Phi-3, Copilot, and other products).	All maintainers are from one organization. External contributor base not yet established one day after launch.
4. Production adoption & security posture	2	Internal production use at Microsoft documented (100+ red team operations per https://www.microsoft.com/en-us/security/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/). Security policy referenced in GitHub sidebar; SECURITY.md present. OpenSSF Scorecard badge visible in repository. No public CVE history (too new). External production deployments not yet documented — the project launched publicly 2 days before this evaluation.	External production evidence is limited to Microsoft's own use. Score reflects the credible evidence of serious internal production use by the creating organization's security team.
5. Composability	3	pytest-native: integrates into any Python CI pipeline without bespoke runners. Built on PyRIT (https://github.com/microsoft/PyRIT), which provides the attack/converter/scorer/orchestrator primitives; RAMPART adds the engineer-facing test-writing layer on top. Evaluators are composable with boolean logic. Adapter pattern for different agent endpoints (HTTP, callable, WebSocket). Clean separation from the PyRIT attack layer.	—
6. Open-standards alignment	2	pytest integration aligns with standard Python test infrastructure. Structured evaluation reports suitable for CI gates. No confirmed OTel trace emission in v0.1.0. Roadmap includes Azure AI Content Safety integration (https://www.microsoft.com/en-us/security/blog/2026/05/20/introducing-rampart-and-clarity-open-source-tools-to-bring-safety-into-agent-development-workflow/). OTel agent-convention emission is not yet present.	OTel emission not confirmed; structured reports are machine-readable but format not yet confirmed against OTel gen_ai semantic conventions. Roadmap makes OTel adoption credible but not current.
Total	13/18

Interpretation: Viable with caveats

Any criterion scored 0: No

Honest-tension provision invoked: Yes — criterion 2 scores 1 (single-vendor, no governance charter). Resolved by: (1) MIT license provides a real legal forkability backstop; (2) the reference architecture defines Assurance, Evaluation & Forensics as a capability with stable interfaces; (3) no foundation-governed alternative exists at this level of specificity for agentic AI adversarial evaluation as of 2026-05-22.

Recommendation: Recommended with documented caveats

Candidate B — PyRIT (Microsoft AI Red Team)#

Repository: https://github.com/microsoft/PyRIT
Evaluation date: 2026-05-22
Version: v0.13.0 (April 17, 2026)
License verified at: https://github.com/microsoft/PyRIT/blob/main/LICENSE

PyRIT is the attack-automation layer on which RAMPART is built. It is evaluated here as an independent alternative for teams that prefer to operate directly at the attack-orchestration level rather than the engineer-facing test-writing level that RAMPART provides.

Qualification rubric — PyRIT v0.13.0 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	MIT at repository root, confirmed at https://github.com/microsoft/PyRIT.	—
2. Forkability / vendor independence	1	Single-vendor (Microsoft AI Red Team). 117 contributors per https://github.com/microsoft/PyRIT — a meaningfully larger base than RAMPART. Maintainer Roman Lutz publicly identified; team includes multiple Microsoft engineers. No foundation governance. CONTRIBUTING.md and Discord community present.	Contributor count (117) suggests broader engagement than RAMPART, but org concentration is still Microsoft. Outside maintainers with merge rights not confirmed.
3. Health & bus factor	3	17 releases since initial open-source release; latest v0.13.0 April 2026; active Discord; 3.9k stars, 762 forks per https://github.com/microsoft/PyRIT. 117 contributors across two years of development. Issues responsive per community reports.	—
4. Production adoption & security posture	3	Used in 100+ Microsoft red team operations including Phi-3, Copilot; tied to Microsoft Security Response Center AI bounty program. SECURITY.md present. Featured at Microsoft Build in May 2025 (the 2025 conference — PyRIT predates it, having been open-sourced in 2024). Download count credible via PyPI (pip install pyrit). Paper published (arXiv academic citation).	—
5. Composability	3	Modular primitives: targets, converters, scorers, orchestrators — each independently extensible. Supports text, image, audio, video modalities. Adapter pattern for OpenAI, Azure, Anthropic, Google, HuggingFace, custom HTTP. Plugin architecture explicit.	—
6. Open-standards alignment	2	No confirmed OTel emission. Structured memory system tracks attack/response pairs. pytest-compatible (RAMPART is built on it). No MCP integration (operates at the model-API level, not the tool-serving level).	OTel emission absent; integration-cost note: an OTel exporter would need to be built to connect PyRIT runs to the Durable Trajectories spine.
Total	15/18

Interpretation: Strong candidate

Any criterion scored 0: No

Honest-tension provision invoked: No. The honest-tension provision is a distinct mechanism from the disqualification rule: the disqualification rule applies when a criterion scores 0 (none did here), while the honest-tension provision is invoked only when a single-vendor project is the best available implementation for a capability AND is being recommended as that layer's pick (as for the RAMPART and OpenShell anchors). A criterion-2 score of 1 on a candidate that is not the recommended pick — as PyRIT is here, being the dependency beneath the recommended RAMPART rather than the pick itself — does not by itself invoke the provision.

Recommendation: Recommended — strong candidate, primarily as the foundational attack layer underneath RAMPART.

Layer 2 recommended pick — 2026-05#

Recommended pick: RAMPART (Microsoft, MIT)
https://github.com/microsoft/RAMPART — v0.1.0, evaluated 2026-05-22

Rationale: RAMPART is purpose-built for the Assurance, Evaluation & Forensics layer's exact requirements: engineer-facing adversarial test authorship, pytest-native CI integration, structured pass/fail reports, statistical trials for probabilistic agent behavior, and cross-plugin injection attack coverage. It is built on PyRIT — which scores higher overall — but RAMPART is the correct abstraction for this layer's interface (test-writing by engineers, not attack-orchestration by security researchers). PyRIT is the dependency; RAMPART is the recommended interface to it.

RAMPART's criterion 2 weakness (single-vendor, no governance charter) is explicitly acknowledged and resolved by the honest-tension provision. The deep-contribution anchor designation means NotionAlpha will work to earn recognized contributor status — which, over time, can provide a practical fork-sustainability backstop beyond the license backstop alone.

This pick is dated 2026-05 and should be re-evaluated when: (a) RAMPART v1.0 ships with an expanded contributor base; (b) a foundation-governed alternative with equivalent agentic adversarial evaluation coverage emerges; or (c) RAMPART's governance status changes.

Layer 3 — Durable Trajectories#

Architecture position: Spine
What it must do: Record every agent run as a durable, replayable, inspectable trajectory; support structured query and replay by the Assurance layer.

Candidate A — Temporal (Temporal Technologies)#

Repository: https://github.com/temporalio/temporal
Evaluation date: 2026-05-22
Version: v1.31.0 (April 29, 2026; 138+ releases)
License verified at: https://github.com/temporalio/temporal/blob/LICENSE

Qualification rubric — Temporal v1.31.0 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	MIT at repository root, confirmed at https://github.com/temporalio/temporal. Permissive, OSI-approved.	—
2. Forkability / vendor independence	2	Single-vendor (Temporal Technologies) with no foundation governance, but the project originated as a fork of Uber's Cadence — demonstrating that the codebase is fork-viable in practice. CONTRIBUTING.md and proposals process present. Community forum and Slack active. 20.4k stars, 1.6k forks suggests substantial outside engagement. No governance charter documented.	No governance charter; practical fork-ability is credibly demonstrated by the Cadence fork history. Single vendor controls release.
3. Health & bus factor	3	v1.31.0 (April 2026); active release cadence (138+ releases); Temporal Technologies is a well-funded startup with multiple engineers; contributor community spans cloud-native practitioners (Replit, DoorDash, Netflix documented users per https://temporal.io/). Replay conference hosted May 2026 with AI-agent focus.	—
4. Production adoption & security posture	2	Documented enterprise production deployments: Replit (agent control plane), DoorDash, Netflix, Stripe (https://temporal.io/case-studies) — production-adoption sub-dimension is strong. SECURITY.md was NOT confirmed in the repository content reviewed; the rubric requires `SECURITY.md` present for a criterion-4 score of 3, so the security-posture sub-dimension caps at 2. Per the rubric's combined-score rule, the criterion score is the lower of the two sub-dimensions = 2. Active CVE response not separately documented.	SECURITY.md presence not confirmed in repository content reviewed; this caps the score at 2 and should be verified directly before adoption.
5. Composability	2	Temporal provides durable execution semantics — workflows, activities, and event history — that satisfy the spine's requirement for ordered, immutable trajectory storage and replay. However, Temporal is a general durable execution engine, not purpose-built for agent trajectory recording. Mapping the reference architecture's TrajectoryEvent schema onto Temporal's activity history requires an adapter layer. The workflow replay mechanism directly supports the Assurance layer's replay requirement. Not a drop-in implementation without an integration layer.	Adapter layer needed to map agent trajectory events to Temporal workflow history. Not a purpose-built trajectory store; composability requires intentional mapping.
6. Open-standards alignment	2	OpenTelemetry integration available via Temporal's OTel SDK instrumentation (https://docs.temporal.io/production-deployment/otel). Temporal's wire protocol (Temporal gRPC) is not an open standard, though the event history format is documented. No MCP integration (not applicable to this layer).	Temporal's own wire protocol is proprietary; OTel integration is available but via SDK instrumentation, not native emission.
Total	14/18

Interpretation: Viable with caveats

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended with caveats (adapter layer required; SECURITY.md should be verified before production adoption)

Candidate B — OpenTelemetry Collector + Persistent Backend (CNCF)#

Repository: https://github.com/open-telemetry/opentelemetry-collector
Evaluation date: 2026-05-22
Version: v1.30.0 (May 2026)
License verified at: https://github.com/open-telemetry/opentelemetry-collector/blob/main/LICENSE

OpenTelemetry Collector paired with a persistent backend (Jaeger for trace storage, or a purpose-built OTel-compatible store) provides an alternative Durable Trajectories implementation. OTel graduated from CNCF on May 21, 2026 (https://www.cncf.io/announcements/2026/05/21/cloud-native-computing-foundation-announces-opentelemetrys-graduation-solidifying-status-as-the-de-facto-observability-standard/), making it the de facto open observability standard.

Qualification rubric — OTel Collector v1.30.0 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	Apache-2.0 at repository root, confirmed at https://github.com/open-telemetry/opentelemetry-collector.	—
2. Forkability / vendor independence	3	CNCF-graduated project (graduated May 11, 2026). 12,000+ contributors from 2,800+ companies per CNCF announcement. Foundation governance with documented charter.	—
3. Health & bus factor	3	Second-highest CNCF project velocity, second only to Kubernetes. Active releases. Contributor base spans every major cloud vendor.	—
4. Production adoption & security posture	3	Production adoption across every major cloud vendor (Google Cloud, AWS, Azure, Datadog, Honeycomb per https://opentelemetry.io/). SECURITY.md present. OpenSSF Scorecard present in repository. CVE responsiveness documented.	—
5. Composability	2	OTel Collector is highly composable (receivers, processors, exporters pipeline). However, OTel's unit is an event/trace span — not a bounded agent run (trajectory). Satisfying the spine's `get_trajectory(trajectory_id)` and `replay(trajectory_id)` interface requires building a trajectory-assembler layer on top of the raw OTel event store. The spine is more than an event bus; the trajectory as a bounded, replayable unit is not a native OTel abstraction.	Trajectory as a bounded, replayable unit requires additional tooling beyond the OTel Collector itself. Not a complete spine implementation without supplementary trajectory-assembly logic.
6. Open-standards alignment	3	OTel is the open standard. Agent semantic conventions (gen_ai.* group) entered experimental status (https://github.com/open-telemetry/semantic-conventions/issues/2664). This candidate is the standard.	Gen_ai agent conventions are experimental, not yet stable, but the trajectory is clear.
Total	17/18

Interpretation: Strong candidate

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended as the observability and event-emission standard; requires a trajectory-assembly layer to fully satisfy the spine's replayable-trajectory interface.

Layer 3 recommended pick — 2026-05#

Recommended pick: Temporal (MIT) as the durable execution substrate, with OpenTelemetry Collector as the event emission standard
https://github.com/temporalio/temporal — v1.31.0, evaluated 2026-05-22
https://github.com/open-telemetry/opentelemetry-collector — v1.30.0, evaluated 2026-05-22

Rationale: Neither candidate fully satisfies the spine's requirements alone. Temporal provides the durable, replayable execution record — its event history is the closest OSS primitive to the architecture's TrajectoryEvent record, and its replay mechanism directly enables the Assurance layer's forensic replay requirement. OpenTelemetry provides the standard event-emission format that every capability layer should emit to. The recommended implementation is: capability layers emit OTel-format trajectory events; the Durable Trajectories spine stores those events in a Temporal workflow history (or a Temporal-adjacent store). An integration adapter (part of the Gauntlet seam work) maps between the reference architecture's TrajectoryEvent schema and Temporal's activity record format.

This is the most complex per-layer integration. The architecture's defining principle applies directly: the capability (durable, replayable, queryable trajectory record) is permanent; the specific storage implementation (Temporal, or an OTel-native trajectory store as the gen_ai conventions mature) is swappable.

This pick is dated 2026-05 and should be re-evaluated when: (a) OTel gen_ai agent semantic conventions reach stable status; (b) a purpose-built OSS agent trajectory store (not a general durable execution engine) emerges; or (c) Temporal's SECURITY.md presence is confirmed and its CVE responsiveness is verified.

Layer 4 — Identity & Delegation#

Architecture position: Faculty
What it must do: Issue scoped, auditable agent identities; enforce delegation constraints; bind identity to runtime policy; record delegation events.

Candidate A — SPIRE (SPIFFE Runtime Environment, CNCF)#

Repository: https://github.com/spiffe/spire
Evaluation date: 2026-05-22
Version: v1.15.0 (May 19, 2026; 138 total releases)
License verified at: https://github.com/spiffe/spire/blob/main/LICENSE

Qualification rubric — SPIRE v1.15.0 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	Apache-2.0 at repository root, confirmed at https://github.com/spiffe/spire.	—
2. Forkability / vendor independence	3	CNCF-graduated project (https://www.cncf.io/projects/spiffe-and-spire/). Governance documented at https://github.com/spiffe/spiffe (SIG structure). Contributors from multiple organizations. Foundation governance survives any single vendor's departure.	—
3. Health & bus factor	3	v1.15.0 (May 2026), 138 releases, 2.4k stars, 613 forks. Third-party audit by Cure53 (February 2021). CNCF TAG-Security assessments (2018, 2020). MAINTAINERS.md, GOVERNANCE.md, CODEOWNERS present. Active release cadence.	—
4. Production adoption & security posture	3	Production use documented across cloud-native enterprises; CNCF graduation requires demonstrated production adoption. SECURITY.md present with [email protected] contact. Third-party Cure53 audit completed. CII Best Practices badge displayed.	—
5. Composability	2	SPIRE exposes the SPIFFE Workload API — a gRPC interface that workloads use to obtain X.509-SVIDs and JWT-SVIDs. Clean, versioned API. However, SPIRE does not natively model agent→sub-agent delegation chains with constrained scopes — it handles workload identity (who the agent is), not delegation authorization (what subset of authority the spawning agent passes to the sub-agent). Satisfying the full Identity & Delegation layer requires composing SPIRE with an authorization primitive (e.g., OAuth 2.0 Token Exchange, RFC 8693).	Does not natively support the constrained delegation chain semantics required by the architecture. The delegation grant interface (`delegate(from, to, scopes)`) requires composition with an authorization layer.
6. Open-standards alignment	3	SPIFFE is the open standard for workload identity. SVIDs are X.509 certificates — open standard. JWT-SVIDs use standard JWT format. OAuth 2.0 Token Exchange (RFC 8693) integration pattern is documented. OpenAPI for the SPIRE API.	—
Total	17/18

Interpretation: Strong candidate

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended as the identity issuance foundation; requires composition with an authorization layer (OAuth 2.0 Token Exchange or equivalent) to satisfy the delegation constraint requirement.

Candidate B — HashiCorp Vault (with SPIFFE integration)#

Repository: https://github.com/hashicorp/vault
Evaluation date: 2026-05-22
Version: v1.17.x (community edition)
License verified at: https://github.com/hashicorp/vault/blob/main/LICENSE

Vault provides secret management, PKI certificate issuance, and — as of recent releases — native SPIFFE authentication support (https://www.hashicorp.com/en/blog/spiffe-securing-the-identity-of-agentic-ai-and-non-human-actors). It can serve as an identity issuance and secret-injection layer for agents.

Qualification rubric — HashiCorp Vault (Community) v1.17 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	0	Vault CE was relicensed from MPL-2.0 to BUSL 1.1 in August 2023 (https://github.com/hashicorp/vault/blob/main/LICENSE). BUSL 1.1 is source-available, not OSI-approved. Each release converts to MPL-2.0 four years after publication, so the v1.17 conversion date is 2027 — it has NOT yet passed; the BUSL restrictions are currently active. The rubric's criterion 1 table maps "BUSL/BSL with active restrictions" to score 0 (score 1 applies only when the conversion date has already passed).	BUSL 1.1 with active restrictions = 0 per the rubric. This is a disqualifying score. The community fork OpenBao (https://github.com/openbao/openbao, MPL-2.0) is a licensed alternative, scored separately if needed.
2. Forkability / vendor independence	1	HashiCorp (now IBM subsidiary) controls main branch. BUSL license restricts forking for competitive use. OpenBao fork exists but is separately maintained.	BUSL license significantly weakens the forkability backstop.
3. Health & bus factor	3	Active releases; large contributor base; major enterprise deployments.	—
4. Production adoption & security posture	3	Extensive enterprise production deployments; mature CVE responsiveness; documented security policy.	—
5. Composability	3	Extensive plugin system; REST API; multiple auth backends; SPIFFE integration documented.	—
6. Open-standards alignment	3	SPIFFE, OAuth2, OIDC, PKI standards natively.	—
Total	13/18

Interpretation: Disqualified — criterion 1 scored 0. Per the rubric's disqualification rule, a 0 on criterion 1 (license) disqualifies the candidate regardless of total score.

Any criterion scored 0: Yes (criterion 1 — BUSL with active restrictions = 0 per rubric). See disqualification rule: a 0 on the license criterion disqualifies the project regardless of total.

Honest-tension provision invoked: No — this provision does not apply to license disqualification.

Recommendation: Not recommended (Vault CE). The OpenBao fork (https://github.com/openbao/openbao, MPL-2.0) is a viable alternative and should be evaluated separately if a Vault-compatible identity layer is required.

Layer 4 recommended pick — 2026-05#

Recommended pick: SPIRE (CNCF, Apache-2.0)
https://github.com/spiffe/spire — v1.15.0, evaluated 2026-05-22

Rationale: SPIRE is the CNCF-graduated, foundation-governed, production-proven implementation of the SPIFFE workload identity standard. It scores near-perfect on the rubric (17/18) and addresses the core identity-issuance requirement of the layer: each agent instance receives a cryptographically grounded, short-lived identity credential (SVID) tied to its attestation properties, with automatic rotation. The composability caveat (criterion 5, score 2) is acknowledged: SPIRE satisfies workload identity issuance but not delegation chain authorization. The delegation constraint requirement (delegate(from, to, scopes)) requires composition with OAuth 2.0 Token Exchange (RFC 8693) or an equivalent authorization primitive. This is a well-understood composition pattern documented by the SPIFFE community (https://riptides.io/blog-post/spiffe-meets-oauth2-current-landscape-for-secure-workload-identity-in-the-agentic-ai-era/).

Vault CE is disqualified by its BUSL 1.1 license. The OpenBao fork should be evaluated if a Vault-compatible secret management layer is needed alongside SPIRE.

This pick is dated 2026-05 and should be re-evaluated when: (a) a CNCF or equivalent project provides a delegation chain authorization primitive natively composable with SPIFFE; (b) OpenBao matures to CNCF graduation and becomes the preferred Vault-compatible option.

Layer 5 — Context & Memory#

Architecture position: Faculty
What it must do: Retrieve relevant, permissioned context; assemble working context; store and recall durable memory; scope all operations to agent identity.

Candidate A — Mem0 (mem0ai)#

Repository: https://github.com/mem0ai/mem0
Evaluation date: 2026-05-22
Version: mem0-cli v0.2.7 (May 20, 2026); 321 releases
License verified at: https://github.com/mem0ai/mem0

Qualification rubric — Mem0 v0.2.7 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	Apache-2.0 at repository root, confirmed at https://github.com/mem0ai/mem0.	—
2. Forkability / vendor independence	1	Single-vendor project (mem0ai, YC S24 startup). No foundation governance. Two identified founders (Taranjeet Singh, Deshraj Yadav). 56.4k stars, 6.4k forks, 2,198 commits suggests active community but concentrated maintainership. No governance charter.	Early-stage startup; if company fails, fork-capacity depends on community adoption. Apache-2.0 provides legal forkability backstop.
3. Health & bus factor	2	321 releases; active community (Discord, X, email). Multiple published research papers (ECAI 2025). YC backing provides near-term runway. Founder-led development. Contributor count across 321 releases suggests outside contributors but maintainer concentration is high.	Bus factor likely low (2 founders + small team). YC-stage startup burn risk.
4. Production adoption & security posture	2	56.4k stars and active commercial adopters (plugin architecture for enterprise Azure users documented at https://mem0.ai/). Research paper at ECAI 2025 (arXiv:2504.19413). SECURITY.md not confirmed in available content. OpenSSF Scorecard not surfaced.	SECURITY.md absence not confirmed — requires direct repository check. Production evidence is credible (stars, commercial plugin users) but enterprise case studies are limited.
5. Composability	3	Hybrid datastore (vector + graph + key-value). Multiple deployment modes (library, self-hosted server, cloud). Plugin architecture for enterprise infrastructure integration (Azure AI, existing vector DBs). Python and TypeScript SDKs (55.9% Python, 33.8% TypeScript in repo). Retrieval and memory operations are cleanly separated APIs.	—
6. Open-standards alignment	2	No confirmed OTel trace emission. REST API available. Multi-provider vector store backends. No MCP integration confirmed. Integration-cost note: an OTel exporter and a spine-emission adapter would need to be built to connect Mem0 retrieval/memory events to the Durable Trajectories spine.	OTel emission absent; adapter cost is medium (API calls are structured and hookable).
Total	13/18

Interpretation: Viable with caveats

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended with documented caveats

Candidate B — Graphiti (Zep / getzep)#

Repository: https://github.com/getzep/graphiti
Evaluation date: 2026-05-22
Version: v0.29.1 (May 21, 2026)
License verified at: https://github.com/getzep/graphiti/blob/main/LICENSE

Graphiti is the open-source temporal knowledge graph engine underlying the Zep context engineering platform. It provides bi-temporal fact storage (event time + ingestion time) and hybrid retrieval (semantic + BM25 + graph traversal).

Qualification rubric — Graphiti v0.29.1 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	Apache-2.0 at repository root, confirmed at https://github.com/getzep/graphiti/blob/main/LICENSE.	—
2. Forkability / vendor independence	1	Single-vendor project (Getzep / Zep Inc.). No foundation governance. Getzep discontinued Zep Community Edition (https://blog.getzep.com/announcing-a-new-direction-for-zeps-open-source-strategy/), concentrating OSS effort on Graphiti. 26.4k stars, 2.6k forks. CONTRIBUTING.md and CODE_OF_CONDUCT.md present.	Single-vendor commercial startup; open-source strategy narrowed to Graphiti as the commercial product's core. Apache-2.0 provides legal forkability backstop.
3. Health & bus factor	2	v0.29.1 (May 21, 2026) — 29+ releases showing active cadence. Stars and forks suggest a community following. Single startup team; bus factor likely low.	Bus factor unknown; startup concentration risk similar to Mem0.
4. Production adoption & security posture	2	Graphiti is the core of Zep's commercial product — implying real production use by Zep customers. 26.4k stars. SECURITY.md present at https://github.com/getzep/graphiti. OpenSSF Scorecard not surfaced. No documented external enterprise case studies for Graphiti standalone.	Production evidence is implied through Zep commercial use, not directly documented for open-source Graphiti.
5. Composability	3	Temporal knowledge graph engine: entities, relationships, validity windows. P95 retrieval latency ~300ms with no LLM calls at query time. Self-hostable. Hybrid retrieval (semantic + BM25 + graph traversal) via clean Python API. Bi-temporal data model provides structured fact storage with explicit time metadata.	—
6. Open-standards alignment	2	No confirmed OTel emission. Apache-2.0 and self-hostable. No MCP integration confirmed. Integration-cost note: context retrieval and memory write events would need an OTel adapter to emit to the Durable Trajectories spine. Graphiti's bi-temporal model maps naturally to a trajectory event's timestamp structure.	OTel emission absent; adapter cost is medium.
Total	13/18

Interpretation: Viable with caveats

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended with documented caveats

Layer 5 recommended pick — 2026-05#

Recommended pick: Mem0 (Apache-2.0) for personalization and durable factual memory; Graphiti (Apache-2.0) for temporal and relational reasoning workloads
https://github.com/mem0ai/mem0 — v0.2.7, evaluated 2026-05-22
https://github.com/getzep/graphiti — v0.29.1, evaluated 2026-05-22

Rationale: Both candidates score 13/18 and are viable with caveats. Neither is a clear winner across all workloads. Mem0 provides the cleaner multi-deployment architecture (library + server + cloud) and a broader multi-provider integration surface, making it the safer default for enterprises that need durable factual context and personalization. Graphiti's bi-temporal graph model is structurally superior for workloads that require reasoning about when facts changed and multi-hop entity relationships — it outperforms Mem0 on LongMemEval temporal benchmarks (63.8% vs. 49.0% per https://vectorize.io/articles/mem0-vs-zep). The two can be composed: Mem0 for semantic memory retrieval, Graphiti for temporal reasoning.

Both candidates share the same weaknesses: single-vendor governance (criterion 2, score 1 each) and absent OTel emission (criterion 6 caveats). Both are mitigated by Apache-2.0 licenses (legal forkability backstop) and the architecture's swappability principle.

On the honest-tension provision: It is NOT invoked for this layer, even though the recommended Mem0 and Graphiti both score criterion 2 = 1. The honest-tension provision applies narrowly — to single-vendor projects that are the deep-contribution anchors (RAMPART, OpenShell) where the per-layer evaluation must reconcile a vendor-controlled pick with the architecture's neutrality. Context & Memory is not an anchor layer; Mem0 and Graphiti are mapped, not contributed-to. Their criterion-2 weakness is handled by the ordinary rubric mechanics — a documented caveat plus the Apache-2.0 forkability backstop and the swappability principle — without needing the special honest-tension reconciliation. The provision is reserved for the cases the rubric's "Honest tension" section names explicitly.

This pick is dated 2026-05 and should be re-evaluated when: (a) a CNCF-graduated context/memory project emerges; (b) OTel gen_ai memory conventions reach stable status and either project adopts them; or (c) either project's commercial model shifts in a way that affects open-source access.

Layer 6 — Tools & Effectors#

Architecture position: Faculty
What it must do: Declare and discover tools; check policy before invocation; execute under isolation; record every invocation; handle failures safely.

Candidate A — MCP Python SDK (modelcontextprotocol / Linux Foundation)#

Repository: https://github.com/modelcontextprotocol/python-sdk
Evaluation date: 2026-05-22
Version: Active development on main (856 commits); latest PyPI release verified at https://pypi.org/project/mcp/
License verified at: https://github.com/modelcontextprotocol/python-sdk/blob/main/LICENSE

The MCP Python SDK is the reference implementation of the Model Context Protocol for tool-serving and tool-calling. On December 9, 2025, Anthropic donated MCP to the newly formed Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation — confirmed by Anthropic's official announcement (https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation) and the Linux Foundation press release (https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation).

Qualification rubric — MCP Python SDK (main, 2026-05-22) — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	MIT at repository root, confirmed at https://github.com/modelcontextprotocol/python-sdk.	—
2. Forkability / vendor independence	2	Governance transferred on December 9, 2025 to the Agentic AI Foundation, a directed fund under the Linux Foundation — confirmed by primary sources: Anthropic's announcement (https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation) and the Linux Foundation press release (https://www.linuxfoundation.org/press/linux-foundation-announces-the-formation-of-the-agentic-ai-foundation). AAIF co-founded by Anthropic, Block, and OpenAI with support from Google, AWS, Microsoft, Cloudflare, and Bloomberg. This is a material improvement over single-vendor control. 23.1k stars, 3.5k forks. SECURITY.md present. However, individual projects retain technical autonomy and MCP maintainers (led by lead core maintainer David Soria Parra) continue to set the roadmap — de facto Anthropic-originated influence on protocol direction remains, per the rubric's "foundation-hosted but with de facto single-vendor control" condition for a score of 2.	Foundation governance is genuine and recent (Dec 2025); de facto Anthropic-originated influence on protocol direction is a real but mitigated caveat — score of 2 reflects "foundation-hosted with de facto single-vendor control."
3. Health & bus factor	3	97 million monthly SDK downloads (https://www.anthropic.com/news/donating-the-model-context-protocol-and-establishing-of-the-agentic-ai-foundation). 856 commits. 246 open issues (active community). SECURITY.md present. Supported by every major AI vendor (Anthropic, OpenAI, Google, Microsoft, AWS).	—
4. Production adoption & security posture	3	97 million monthly downloads. Adopted by VS Code, JetBrains, and multiple commercial platforms. Cisco dedicated MCP security tooling at RSA Conference 2026 per https://www.getmaxim.ai/articles/best-open-source-mcp-gateways-in-2026/. SECURITY.md present. 23,968+ MCP server implementations in the Glama registry per https://glama.ai/mcp/servers.	—
5. Composability	3	MCP is the open protocol; the SDK exposes server and client primitives. Clean JSON-RPC interface. Pluggable transport (stdio, HTTP/SSE, Streamable HTTP). Tool, resource, and prompt abstractions are independently usable. MCP servers are composable — an MCP gateway can aggregate multiple servers.	—
6. Open-standards alignment	3	MCP is the open standard for tool-serving. Linux Foundation governance. JSON-RPC 2.0 wire protocol. OpenAPI integration for REST tools. A2A integration for cross-agent invocation. Anthropic, OpenAI, Google, Microsoft all support MCP natively.	—
Total	17/18

Interpretation: Strong candidate

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended

Candidate B — OpenAI Agents SDK (tool-calling layer)#

Repository: https://github.com/openai/openai-agents-python
Evaluation date: 2026-05-22
Version: Active development; Agents SDK released March 2025, overhauled April 2026
License verified at: https://github.com/openai/openai-agents-python/blob/main/LICENSE

The OpenAI Agents SDK includes a tool-calling layer with MCP support (https://openai.github.io/openai-agents-python/mcp/). It is evaluated here specifically for its Tools & Effectors layer coverage, not as a full orchestration framework.

Qualification rubric — OpenAI Agents SDK (tool layer) v1.x — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	MIT at repository root.	—
2. Forkability / vendor independence	1	Single-vendor (OpenAI). No foundation governance. OpenAI controls the SDK and can deprecate it (Swarm was deprecated when Agents SDK launched). Strong adoption but vendor-controlled roadmap.	Vendor dependency risk is material given OpenAI's history of deprecating prior tools. MIT license provides legal forkability backstop.
3. Health & bus factor	2	Active development (April 2026 overhaul). Large adoption. But OpenAI is the sole maintainer; contributor concentration is high.	Single-vendor; contributor distribution unknown.
4. Production adoption & security posture	3	Wide production adoption by enterprises using OpenAI APIs. Active security program at OpenAI.	—
5. Composability	2	MCP support is present. However, the tool-calling layer is tightly coupled to the Agents SDK's agent abstractions (Agents, Handoffs, Guardrails) — it is not straightforwardly extractable as a standalone tool-invocation layer without adopting OpenAI's full orchestration model.	Not cleanly separable from OpenAI's orchestration model for standalone Tools & Effectors use.
6. Open-standards alignment	2	MCP native support (https://openai.github.io/openai-agents-python/mcp/); A2A support. The SDK ships built-in tracing, but it exports to OpenAI's own backend by default, NOT OpenTelemetry — OTel emission requires third-party instrumentation (the official `opentelemetry-instrumentation-openai-agents-v2` contrib package, https://github.com/open-telemetry/opentelemetry-python-contrib/blob/main/instrumentation-genai/opentelemetry-instrumentation-openai-agents-v2/README.rst, or OpenInference instrumentors). Per the rubric, OTel via an add-on instrumentor rather than native emission maps to 2.	OTel emission is not native — the SDK's default trace export targets OpenAI's backend; OTel requires a contrib instrumentation package.
Total	13/18

Interpretation: Viable with caveats

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended with caveats — viable when the full OpenAI Agents SDK is already in the stack; not recommended as a standalone Tools & Effectors layer due to coupling.

Layer 6 recommended pick — 2026-05#

Recommended pick: MCP Python SDK (MIT, Linux Foundation governance)
https://github.com/modelcontextprotocol/python-sdk — evaluated 2026-05-22

Rationale: MCP is the de facto open standard for tool-serving and tool-calling, with Linux Foundation governance (donated to the Agentic AI Foundation on December 9, 2025), 97 million monthly downloads, and support from every major AI vendor. The MCP Python SDK is the reference implementation. It scores 17/18 on the rubric — the only gap is the de facto Anthropic-originated influence on protocol direction despite the governance transfer to the Linux Foundation.

The Tools & Effectors layer requires that every tool invocation be policy-checked (by the Runtime Isolation & Governance layer) before execution. MCP satisfies the tool declaration, discovery, and invocation interface. The policy-check integration is provided by composing MCP with OpenShell (the Runtime Isolation & Governance anchor): MCP tool calls are intercepted by the OpenShell policy engine before execution — this composition is the core of the Gauntlet seam artifact.

This pick is dated 2026-05 and should be re-evaluated when: (a) the MCP governance under the Linux Foundation's Agentic AI Foundation matures with a documented charter and multi-vendor steering committee; or (b) a competing open-standards tool protocol achieves comparable adoption.

Layer 7 — Orchestration#

Architecture position: Faculty
What it must do: Coordinate multi-step agent runs and multi-agent collaboration; support MCP and A2A interoperability protocols; handle failures; emit orchestration events to the spine.

Candidate A — LangGraph (LangChain Inc.)#

Repository: https://github.com/langchain-ai/langgraph
Evaluation date: 2026-05-22
Version: langgraph==1.2.1 (May 21, 2026)
License verified at: https://github.com/langchain-ai/langgraph/blob/main/LICENSE

Qualification rubric — LangGraph v1.2.1 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	MIT at repository root, confirmed at https://github.com/langchain-ai/langgraph.	—
2. Forkability / vendor independence	1	Single-vendor (LangChain Inc.). No foundation governance. 32.7k stars, 5.5k forks. CONTRIBUTING.md present. Production use by Klarna, Replit, Elastic (https://github.com/langchain-ai/langgraph). No governance charter beyond community code of conduct.	Single-vendor commercial startup. LangChain Platform is the commercial product; open-source LangGraph is the core. MIT license provides legal forkability backstop.
3. Health & bus factor	3	langgraph 1.2.1 (May 2026), 32.7k stars, 5.5k forks. Active release cadence (multiple releases per month). Used by Klarna (85 million users' customer support), Elastic, Replit. Large ecosystem.	—
4. Production adoption & security posture	2	Documented enterprise production deployments: Klarna (customer support at scale), Elastic, Replit (https://github.com/langchain-ai/langgraph) — production-adoption sub-dimension is strong. SECURITY.md was NOT confirmed in the repository content reviewed; the rubric requires `SECURITY.md` present for a criterion-4 score of 3, so the security-posture sub-dimension caps at 2. Per the rubric's combined-score rule, the criterion score is the lower of the two sub-dimensions = 2.	SECURITY.md presence not confirmed in available content reviewed; this caps the score at 2 and should be verified directly before adoption.
5. Composability	3	Graph-based orchestration model (directed cyclic graph with conditional branching). Persistent checkpoints for durable execution. Human-in-the-loop interrupt points. MCP support (https://zenithlaw.com/building-agentic-orchestration-mcp-a2a-langgraph-langchain-playbook). A2A integration documented (https://medium.com/@guangya.liu/building-an-intelligent-multi-agent-orchestration-system-with-langgraph-a2a-and-mcp-674efdf666f7). Sub-agent spawn patterns.	—
6. Open-standards alignment	2	MCP tool support for tool-calling. A2A agent-to-agent protocol support. LangGraph does NOT emit OpenTelemetry traces natively in isolation — OTel emission is provided via the LangSmith SDK's OTel integration (`langsmith[otel]`, https://docs.langchain.com/langsmith/trace-with-opentelemetry) or via community OpenInference instrumentors. OTel support is therefore via a first-party SDK add-on / community plugin rather than built-in core emission. Per the rubric, partial/plugin-based OTel support maps to 2.	OTel emission is not built into LangGraph core; it requires the LangSmith SDK OTel integration or a community instrumentor. This is an add-on, not native emission.
Total	14/18

Interpretation: Viable with caveats

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended with caveats (SECURITY.md should be verified; criterion 2 is 1 due to single-vendor governance)

Candidate B — CrewAI (CrewAI Inc.)#

Repository: https://github.com/crewAIInc/crewAI
Evaluation date: 2026-05-22
Version: v1.14.x (2026)
License verified at: https://github.com/crewAIInc/crewAI/blob/main/LICENSE

Qualification rubric — CrewAI v1.14 — 2026-05-22#

Rubric version: v1.0

Criterion	Score (0–3)	Evidence	Caveats
1. License	3	MIT at repository root, confirmed at https://github.com/crewAIInc/crewAI/blob/main/LICENSE.	—
2. Forkability / vendor independence	1	Single-vendor (CrewAI Inc.). No foundation governance. 50,000+ GitHub stars per https://automationatlas.io/tools/crewai/. Built entirely from scratch — independent of LangChain — which is a practical advantage for fork-ability (no hidden upstream dependency). $18M funding. No governance charter.	Single-vendor commercial startup; MIT license provides legal forkability backstop.
3. Health & bus factor	3	50,000+ stars, 100,000+ certified developers. v1.14 with A2A protocol support (https://softmaxdata.com/blog/definitive-guide-to-agentic-frameworks-in-2026-langgraph-crewai-ag2-openai-and-more/). Active enterprise platform (CrewAI AMP). Built-in memory, observability.	—
4. Production adoption & security posture	2	Wide production adoption across enterprise customers; CrewAI AMP for enterprise deployment; $18M funding indicates commercial traction — production-adoption sub-dimension is strong. SECURITY.md was NOT confirmed in the content reviewed; the rubric requires `SECURITY.md` present for a criterion-4 score of 3, so the security-posture sub-dimension caps at 2. Per the rubric's combined-score rule, the criterion score is the lower of the two sub-dimensions = 2. OpenSSF Scorecard not surfaced.	SECURITY.md presence not confirmed in content reviewed; this caps the score at 2 and should be verified directly before adoption.
5. Composability	2	Role-based orchestration (crews, agents, tasks). Native MCP and A2A protocol support (https://automationatlas.io/tools/crewai/). Built-in memory, RAG, and observability bundled into the framework. The bundling of memory and orchestration (vs. separate composable layers) partially conflicts with the architecture's separation-of-concerns principle — extracting just the orchestration layer from CrewAI while using a separate memory layer (Mem0 or Graphiti) requires careful configuration to avoid dual-write patterns.	Memory and orchestration are coupled in the default configuration; using a separate memory layer requires explicit configuration to override the built-in memory.
6. Open-standards alignment	3	Native MCP support; A2A v1.14; model-agnostic (OpenAI, Anthropic, Google, Ollama); OTel-compatible observability.	—
Total	14/18

Interpretation: Viable with caveats

Any criterion scored 0: No

Honest-tension provision invoked: No

Recommendation: Recommended with documented caveats

Layer 7 recommended pick — 2026-05#

Recommended pick: LangGraph (MIT)
https://github.com/langchain-ai/langgraph — v1.2.1, evaluated 2026-05-22

Rationale: LangGraph and CrewAI both score 14/18 — a tie on total. Per-criterion scores are the decision record, not the total, and the deciding criterion is composability (criterion 5): LangGraph scores 3, CrewAI scores 2. LangGraph is the preferred orchestration layer for this architecture for two reasons specific to the architecture's requirements:

First, LangGraph's graph-based, checkpoint-persistent execution model maps directly onto the reference architecture's Orchestration layer interface (start_run, submit_step, get_run_status, spawn). The checkpoint mechanism is structurally compatible with Temporal's durable execution substrate in the Durable Trajectories spine — these two can be composed to provide both orchestration-level state and durable trajectory storage.

Second, LangGraph's separation of orchestration (graph structure) from memory (retrieved via Context & Memory layer) and tool invocation (via Tools & Effectors layer / MCP) is cleaner than CrewAI's bundled model, which couples memory and RAG into the framework. This separation is important for the reference architecture's composability principle — and it is exactly the criterion-5 gap that breaks the tie.

CrewAI is a strong alternative — particularly for teams that prefer role-based agent decomposition over graph-based state machines, or that want a faster path to production with more batteries included. Its native MCP and A2A support is equally strong. The criterion 5 bundling caveat is the deciding difference.

Both candidates share the criterion 2 weakness (single-vendor, no foundation governance). Both are mitigated by their MIT licenses. Neither has a confirmed SECURITY.md in the content reviewed — this must be verified before production adoption.

This pick is dated 2026-05 and should be re-evaluated when: (a) a CNCF-graduated orchestration framework with equivalent MCP+A2A support emerges; (b) LangGraph's governance formalizes with a charter; or (c) the bundled CrewAI model becomes separable enough to cleanly satisfy the architecture's layered composability requirement.

Tracked candidates — not yet rubric-eligible#

AX (Google, Apache-2.0) — https://github.com/google/ax

Distributed agent runtime: single-writer controller, durable event log, resumable streams, isolated-actor model for skills/tools/agents, MCP for tools. Created 2026-03-30; the README warns of major breaking changes before stable release.

A candidate for the Orchestration layer and potentially the Durable Trajectories spine — though AX's design bundles the event log into the controller, which is in architectural tension with this reference's split between Orchestration (faculty) and Durable Trajectories (spine). Recommending AX as the Spine implementation would require the event log to be exposed as a general-purpose emit endpoint that every other capability layer can write to, not only the controller's internal recovery journal.

Re-evaluate when: first stable release ships, or v0.x reaches one documented external (non-Google) production deployment.

Summary table — all seven layers#

Layer	Recommended pick	License	Score	Honest-tension invoked	Dated
Runtime Isolation & Governance	OpenShell (NVIDIA)	Apache-2.0	12/18	Yes (criterion 2: single-vendor)	2026-05
Assurance, Evaluation & Forensics	RAMPART (Microsoft)	MIT	13/18	Yes (criterion 2: single-vendor)	2026-05
Durable Trajectories	Temporal + OTel Collector	MIT / Apache-2.0	14/18 + 17/18	No	2026-05
Identity & Delegation	SPIRE (CNCF)	Apache-2.0	17/18	No	2026-05
Context & Memory	Mem0 (+ Graphiti for temporal)	Apache-2.0	13/18	No	2026-05
Tools & Effectors	MCP Python SDK (Linux Foundation)	MIT	17/18	No	2026-05
Orchestration	LangGraph (LangChain Inc.)	MIT	14/18	No	2026-05

Control-plane picks (both anchor layers) invoke the honest-tension provision for criterion 2 because both RAMPART and OpenShell are single-vendor projects. This is resolved per the rubric's honest-tension section: permissive licenses provide the legal forkability backstop; the architecture's capability-first design means a different implementation can slot in behind the same interface; and no foundation-governed alternative exists at the required capability level for either layer as of 2026-05.

Implementations are recommended-but-swappable. This table records the best available OSS options at evaluation date 2026-05. The capability definitions and interfaces in the reference architecture are permanent; these implementation choices are not. Re-evaluate any pick when its license, governance, health, or competitive landscape changes materially.

NotionAlpha OSS AI Lab — notionalpha.com