On March 31, 2026, the carefully manicured veil of secrecy surrounding Anthropic’s most ambitious project was accidentally stripped away. In what will likely be remembered as the most significant “packaging fail” in the history of the artificial intelligence industry, Chaofan Shou, a security researcher and Solayer Labs intern, discovered that the full source code for Claude Code—Anthropic’s flagship agentic CLI—was sitting in plain sight on the public npm registry. The scale of the exposure was staggering: 512,000 lines of original TypeScript and a 59.8 MB source map file (cli.js.map) that functioned as a high-definition blueprint for the company’s “agentic harness.”
The irony was thick enough to choke on. Within the leaked files sat a module called undercover.ts, a sophisticated secrecy subsystem designed to prevent Claude from accidentally leaking internal codenames in public commits. Yet, while the AI was being trained to stay “undercover,” the humans behind the curtain had inadvertently published the entire project. This was not a sophisticated external breach or a state-sponsored hack; it was the digital equivalent of leaving the blueprints for a high-security vault on a park bench. For a company valued at $380 billion and obsessed with AI safety and alignment, the incident served as a humbling reminder that even the most advanced intelligence in the world is still at the mercy of a misconfigured configuration file.
The Anatomy of a Packaging Error
Peeling back the layers of the @anthropic-ai/claude-code package (version 2.1.88) reveals a failure sequence that was both mundane and entirely preventable. The exposure resulted from the intersection of a documented runtime bug, a failure in release hygiene, and a lack of automated safeguards at the artifact boundary.
The Bun Runtime Bug and Issue #28001
In late 2024, Anthropic migrated Claude Code to the Bun runtime and bundler to capitalize on its superior execution speed. However, this move introduced a hidden vulnerability. GitHub issue #28001 in the Bun repository describes a persistent bug where the bundler generates and serves source maps even when the development: false flag is explicitly set. Despite being a documented and reproducible issue for weeks, Anthropic’s build pipeline continued to churn out production artifacts with a sourceMappingURL comment appended to the output. This comment acted as a direct pointer to the cli.js.map file, which contained the unminified, unobfuscated TypeScript source code for nearly 2,000 files.
The .npmignore Oversight
While the Bun bug provided the match, the absence of a robust .npmignore file provided the fuel. In the npm ecosystem, the registry whitelists everything in a directory by default unless it is explicitly excluded. Anthropic failed to include *.map exclusion rules in their ignore configuration. This oversight meant that every time the team ran a publish command, the sensitive debugging maps were bundled into the public tarball. The community’s investigation highlighted that a simple ten-second manual audit of the package via npm pack –dry-run would have revealed the original source code sitting inside the distribution, yet the high-velocity “vibe coding” culture at Anthropic seemingly prioritized shipping over checking.
The Cloudflare R2 Connection
The exposure was deepened by the way Anthropic managed its build artifacts. The leaked source map didn’t just contain the code inlined; it pointed directly to a ZIP archive hosted on an Anthropic-owned Cloudflare R2 bucket. This created a secondary, high-bandwidth path for data exfiltration. Once the link was public, the archive was mirrored across GitHub over 41,500 times within hours. Even as Anthropic’s legal team scrambled to issue DMCA takedowns, the “blueprints” were already being parsed by every competitor in the AI space.
KAIROS and the Secret Feature Roadmap
The leak provided a rare look at Anthropic’s aggressive unreleased roadmap, hidden behind 44 active feature flags. These weren’t mere experiments; they were fully realized systems with detailed prompt engineering, analytics, and error handling.
KAIROS: The Proactive Autonomous Daemon
The most profound discovery was KAIROS (Ancient Greek for “the right time”). Mentioned more than 150 times in the source, KAIROS represents a shift from a reactive CLI to an “always-on” background assistant. Unlike the public version of Claude Code, which only acts when a user types a command, KAIROS is designed as a persistent daemon that monitors development activity. It utilizes exclusive tools like PushNotification and SubscribePR to intervene proactively. The system operates on a 15-second “blocking budget” per cycle, ensuring it remains fast enough to maintain an interactive loop while it “thinks” about the developer’s next move in the background.
ULTRAPLAN and Multi-Agent Orchestration
The ULTRAPLAN flag revealed a mode where complex engineering tasks are offloaded to a remote Opus 4.6 session. This isn’t just a longer prompt; it allows the model a 30-minute autonomous reasoning window to construct a high-level execution strategy before “teleporting” the results back to the local environment. Furthermore, the Multi-Agent Coordinator Mode showed Anthropic’s work on parallelizing engineering tasks. This system allows a master agent to spawn multiple workers, each with a dedicated “scratchpad,” to solve distinct parts of a problem simultaneously.
Buddy: The Terminal Tamagotchi
In a bizarre twist of developer whimsy, the leak confirmed a pet companion system called Buddy. Set for an April 1st release, Buddy allows users to hatch one of 18 unique ASCII species, including a dragon, ghost, axolotl, and “chonk,” based on a deterministic hash of their user ID. The system includes a gacha-style rarity mechanic with 1% shiny variants and collectible accessories like wizard hats and crowns. Pets possess RPG-like stats—SNARK, CHAOS, DEBUGGING, and PATIENCE—and react in real-time to the user’s code. The randomness is managed by a tiny seeded PRNG known as Mulberry32, which one dev comment described as “good enough for picking ducks.”
Peeking at the Internal Model Roadmap

The source code served as a Rosetta Stone for Anthropic’s internal codenames, mapping them to specific model capabilities and future releases.
Capybara, Fennec, and Mythos
Analysis confirmed that Capybara is the internal name for a high-context Claude 4.6 variant featuring a 1-million-token window. Fennec was mapped to Opus 4.6, and Numbat was identified as a model currently in internal testing. The broader project itself carries the internal codename Tengu, with telemetry events such as tengu_cobalt_frost (voice mode) and tengu_amber_quartz (the voice kill switch) appearing throughout the codebase.
The Future: Opus 4.7 and Sonnet 4.8
Perhaps most telling were the instructions found in the “Undercover Mode” prompts. The AI is explicitly forbidden from mentioning version strings for Opus 4.7 and Sonnet 4.8. This confirms that Anthropic is not just iterating on current models but is already deep into testing the next two generations of their model stack, which are likely 6 to 12 months away from any public announcement.
Architectural Secrets and “Paranoid” Engineering
The code revealed that Anthropic’s engineering culture is as brilliant as it is paranoid. The “agentic harness” is designed with heavy-handed safety and security measures to protect the model from the “wild” environment of a developer’s terminal.
The Dream Memory System
To prevent “context entropy,” Claude Code uses a three-layer memory architecture. A lightweight MEMORY.md file acts as a permanent index of pointers, while detailed knowledge is stored in topic-specific files. Most impressively, the system uses a background Dream subagent to consolidate memory during idle periods. This Dream system uses a three-gate trigger: 24 hours since the last dream, at least 5 sessions since the last dream, and a consolidation lock. The subagent follows a four-phase cycle: Orient, Gather, Consolidate, and Prune, ensuring that the agent’s memory is optimized for long-term relationships with a single user rather than just a single session.
Anti-Distillation and The YOLO Classifier
To stop competitors from training on their API traffic, Anthropic implemented anti-distillation measures (ANTI_DISTILLATION_CC). This involves injecting “fake tool” definitions into system prompts to poison any data recorded by eavesdropping models. They also utilize a YOLO Classifier gated by the TRANSCRIPT_CLASSIFIER flag. This fast, ML-based system makes autonomous permission decisions, deciding whether to auto-approve tool calls based on the conversation transcript without interrupting the user. Safety is governed by specific individuals; the code notes that the “Cyber Risk Instruction” module is owned by David Forsythe and Kyla Guru, and cannot be modified without their review.
Security via Attestation and Regex
Every request from Claude Code includes a native client attestation. The Bun runtime replaces a placeholder with a cryptographic hash, serving as a DRM-style check to verify that the traffic is coming from a legitimate, unmodified binary. Pragmatically, the team also uses regex-based sentiment analysis to track user frustration. Patterns like wtf|ffs|shit are tracked alongside triggers like “continue” to measure how often the model fails to deliver a complete or satisfactory response.
The “Unhinged” Codebase: Tech Debt in a $380B Company
The leak laid bare the reality of modern software development at a hyper-growth AI company. The codebase was a portrait of “vibe coding”—a term popularized by Andrej Karpathy to describe developers who let AI write the code and skip the line-by-line review. The lead engineer for Claude Code reportedly admitted that 100% of his contributions were written by Claude itself, leading to a circular development cycle that created as much chaos as it did utility.
The Hex-Encoded “Duck” and tech debt
One of the most telling examples of “unhinged” engineering was the hex-encoding of the word “duck.” Because the word “duck” collided with a sensitive internal model codename, it triggered Anthropic’s own build scanner. Rather than fixing the naming collision, the developers simply hex-encoded the string as 64 75 63 6b to bypass their own security gates. This “good enough” mentality extended to the file structure: main.tsx was a massive 800KB file, and the codebase contained over 460 eslint-disable comments and 50 active functions with _DEPRECATED in their names, such as writeFileSyncAndFlush_DEPRECATED().
The Immediate Aftermath: Malware and Weaponization
While the leak was a disaster for Anthropic’s IP, it was a goldmine for threat actors. Within 24 hours, the attention around the “leaked source” was weaponized to distribute infostealers to unsuspecting developers.
The TradeAI.exe Campaign
Threat actors created fake GitHub repositories that mirrored the branding of the leak, promising “cracked” or “enterprise” versions of the source. These repositories led users to download 100MB+ password-protected 7z archives designed to evade automated scans. Inside was TradeAI.exe, a Rust-compiled dropper that deployed Vidar stealer and GhostSocks proxy malware. GhostSocks is particularly dangerous as it turns the victim’s high-speed development machine into a residential proxy for the attackers’ network.
Advanced Sandbox Evasion
The malware showed high sophistication in its evasion tactics. It was programmed to terminate if it detected blacklisted usernames like Janet Van Dyne, John, Harry Johnson, tim, or wdagutilityaccount. It also checked for hostnames such as Wasp, MARS, or AMAZING-AVOCADO and rejected any environment with a BIOS serial matching ete9t8e8t3 (the Windows Defender Application Guard). Most notably, it used a GPU hardware scoring system to prioritize targets, rejecting virtualized GPUs and integrated graphics in favor of high-end NVIDIA RTX gaming cards, which are ideal for credential harvesting and proxy operations.
The Broader Supply Chain Context of 2026
The Claude Code incident was merely one ripple in a larger storm that hit the tech industry in early 2026. The same two-week window saw critical compromises of axios, Trivy, and LiteLLM. This wave of supply chain attacks targeted the “infrastructure layer”—the tools that bridge the gap between a developer’s code and the production environment. Attackers even began package squatting on Anthropic’s internal package names, hoping to catch developers who were trying to compile the leaked source and might inadvertently pull down a malicious dependency.
Best Practices and Defensive Frameworks
The Claude Code leak provides a blueprint for a new era of security—one that accounts for the speed of AI-assisted development. Engineering leaders must move from human-centric review to automated artifact governance.
Artifact Hygiene and VibeGuard
Researchers have since proposed the VibeGuard framework as a necessary pre-publish security gate. VibeGuard is designed to catch the “blind spots” inherent in vibe coding, specifically checking for artifact hygiene, source-map exposure, and packaging configuration drift. It ensures that a missing .npmignore or an accidental source map never makes it to a public registry.
Infrastructure as the Perimeter
Organizations must transition the security perimeter from the developer’s laptop to isolated cloud development environments. By running tools like Claude Code inside isolated micro-VMs or containers, the blast radius of a compromised package is contained. This approach ensures that even if a developer downloads a trojanized archive or pulls a squatted package, the malware cannot exfiltrate corporate credentials or pivot to the internal network.
Publish-Time Auditing
Finally, teams must adopt publish-time auditing. This includes mandated CI/CD checks that fail if .map files are detected in production artifacts and the habit of running npm publish –dry-run to manually inspect the final tarball. In an age where AI can write a project in minutes, these automated “absence checks” are the only way to maintain operational security.
Conclusion: Scaling AI Safely
The March 31, 2026 leak was a watershed moment for the AI industry. While Anthropic successfully avoided the loss of customer data, the exposure of its intellectual property was a massive blow that laid bare its internal roadmap and engineering shortcuts. The incident proved that even a company at the frontier of intelligence can be undone by the simplest of human errors. It highlighted the incredible promise of autonomous agents like KAIROS, while also exposing the messy, “unhinged” reality of development in the age of vibe coding.
The takeaway is clear: as AI agents take on more of the labor of building software, humans must take on more of the labor of Agentic Governance. Security can no longer be a checkbox at the end of a sprint; it must be the control plane that observes, detects, and enforces safety at the boundary where code becomes a published artifact. The future of AI development depends on our ability to govern the very tools we have built to replace us.
