A founder shipped a vibe-coded SaaS app in a weekend. It worked. Users signed up. Payments processed. Three weeks later, 1.5 million API keys leaked because nobody had done a security review — the AI had hardcoded every credential directly into the client-side JavaScript, and the app had been live the entire time.

This is not a hypothetical. It happened in early 2026, documented by the Cloud Security Alliance in their AI-Generated CVE Surge research note. The founder knew how to ship. They did not know what to look for.

The actual numbers on AI code security in 2026

The statistics are uncomfortable to sit with, especially if you have been shipping AI-generated code without a consistent review process:

  • 92% of audited AI-built applications had at least one critical flaw — Sherlock Forensics 2026 Security Report, 534 code samples across six major LLMs
  • 78% of AI-built applications stored secrets in plaintext — same report
  • 45% of AI-generated code fails OWASP Top-10 benchmarks — Veracode analysis of 4 million code scans
  • 2.74x higher vulnerability density in AI code compared to human-written code
  • 60% of developers do not adjust permission scopes in AI-generated code before deploying
  • 35 new CVEs in March 2026 traced directly to AI-generated code, up from six in January — Georgia Tech Vibe Security Radar

Attacks exploiting application vulnerabilities rose 44% in 2026, according to the Cloud Security Alliance. This is not coincidence. AI generates 46% of all new code on GitHub right now, and a significant portion of it is going to production without the review passes that slow down but genuinely protect human-written code.

The pattern I keep seeing: teams adopt AI tools, their output velocity jumps, code review gets shorter because “the AI wrote it cleanly,” and the security pass gets skipped because the code looks fine. It usually does look fine. The vulnerability is in the part that is not on the screen.

Why AI code is structurally insecure

This is the part nobody in the AI tooling marketing wants to talk about plainly: AI models optimise for plausible correctness, not adversarial robustness.

When you ask an LLM to write authentication middleware, it produces code that correctly handles the expected inputs — a valid token, a well-formed request, a user who exists. It does not produce code that instinctively hardens against the inputs that do not fit the happy path: the null session that gets evaluated to truthy, the token that passes expiry checks because the timezone offset was ignored, the admin role check that gets bypassed on endpoints the AI did not know were sensitive.

Veracode ran a direct test: they presented six major LLMs with a choice between a secure and an insecure implementation of the same function. The models chose the insecure option 45% of the time. Not because they were trying to be insecure — because the insecure option was often shorter, more conventional, and matched more of the training data. Insecure patterns are overrepresented in code that gets written about, published, and indexed.

A senior engineer reading AI-generated authentication code immediately looks at the edge cases because they have been burned by those edge cases before. A junior engineer — or a non-engineer founder — reads the same code, sees that it handles valid inputs correctly, and ships it. The bug that surfaces at 3am six months later was always there.

CVE-2025-53773: when the AI tool itself becomes the vector

The API key leak above is a developer mistake amplified by AI. CVE-2025-53773 is something more unsettling: the AI coding tool itself as the attack surface.

Disclosed in early 2026, CVE-2025-53773 is a CVSS 9.6 vulnerability in GitHub Copilot. Here is what happened: an attacker wrote a pull request with adversarial instructions embedded in the PR description — something innocuous-looking in the text, containing a hidden directive for Copilot. When a developer opened that PR in an editor with Copilot active, Copilot processed the PR description, read the embedded instructions, and executed them. The attack achieved remote code execution on the developer’s machine without the developer doing anything other than looking at a pull request.

This is indirect prompt injection, and it is the attack OWASP LLM Top 10 v2.0 lists as LLM01 — the most critical vulnerability class in production AI deployments. The attack does not come from your prompts. It comes from content the AI was asked to read: a PR description, a code comment, a database entry, a web page in a RAG pipeline.

73% of production AI deployments assessed in 2026 security audits showed exposure to prompt injection vulnerabilities. The CVE-2025-53773 example is not an edge case — it is the mechanism playing out in a real, widely-used tool.

OWASP LLM Top 10 v2.0: what changed for agentic systems

OWASP updated the LLM Top 10 to version 2.0 in 2025, and the change reflects exactly how production AI usage shifted. Version 1.0 was written for standalone chatbots. Version 2.0 is written for what most developers are actually building: agentic systems with tool access, RAG pipelines, and multi-model orchestration.

The updated top 10 covers prompt injection, sensitive information disclosure, supply chain vulnerabilities, data and model poisoning, improper output handling, excessive agency, system prompt leakage, vector and embedding weaknesses, misinformation, and unbounded consumption.

Three of these deserve specific attention for developers writing agentic code, as covered in the agentic coding guide:

Excessive agency (LLM06): This is the permission scope problem. If your AI agent can write to a database, it should only have write access to the specific tables it needs. Most agentic scaffolding gives agents broad permissions because it is easier to set up. 60% of developers ship without adjusting these scopes. The blast radius when something goes wrong is the entire scope of permissions granted — not just the feature you intended.

Improper output handling (LLM05): AI output goes somewhere. If it goes into a database query, you need parameterisation. If it goes into HTML, you need encoding. If it goes into a shell command, you need sandboxing. AI-generated code that handles its own output has a tendency to skip these steps because the training data skipped them too.

Supply chain vulnerabilities (LLM03): AI models occasionally fabricate package names that sound plausible but do not exist, then use them in generated code. The packages sometimes get registered by bad actors. This is not theoretical — it has been documented multiple times. Every dependency in AI-generated code needs verification.

The security checklist for AI-generated code

This is not a comprehensive security audit. It is a first pass — the things that catch the most common failures in the least time:

Before it merges:

Run the diff through Semgrep with the OWASP ruleset. Free, open source, takes 30 seconds on most PRs. It catches hardcoded secrets, injection vulnerabilities, insecure deserialization, and the most common OWASP Top-10 failures automatically. I have had it catch hardcoded credentials in AI-generated code three times in four months.

Check every external API call in the diff. AI models sometimes fabricate API endpoints that do not exist — using outdated training data to construct URLs that look right but point nowhere or, worse, to domains that get registered by third parties. Verify any API base URL you did not write yourself.

Review permission scopes explicitly. If the feature adds a database query, confirm the DB role used has minimum required permissions. If it adds a new API route, confirm authentication middleware is applied and not just assumed.

For agentic code specifically:

Apply the principle of least privilege to every tool the agent can call. An agent that needs to read a calendar does not need write access to the calendar. An agent that queries a database does not need schema modification rights. This is basic IAM applied to AI — it is not being applied by default.

Validate inputs before they reach the model and outputs before they leave. Indirect prompt injection (CVE-2025-53773 style) is easier to execute when the model has no input filtering. Output filtering catches cases where the model has been manipulated into producing something it should not.

Log what the agent does, not just what it was asked to do. Agentic AI introduces a new kind of opacity — the model takes intermediate steps that are not always visible. Without logging at the tool-call level, post-incident analysis becomes guesswork.

The correction rule from experience:

When I find a security issue in AI-generated code, I look for the same pattern in everything the same model wrote in the same session. AI models are consistent in their mistakes — if it hardcoded a credential once, it probably did it again somewhere nearby. Same goes for permission scopes: one over-permissioned role usually means every role in that session is over-permissioned.

The honest account of vibe coding’s security debt

The vibe coding in production post on this blog covers the productivity reality: AI-generated code produces 1.7x more issues than human-written code, and senior developers are the ones who benefit most because they have the judgment to review the output. That post covers the productivity side honestly.

The security dimension is a separate problem. Productivity issues surface in QA. Security issues surface in breach reports.

The developers I have spoken to who are doing this well share one practice: they treat AI-generated code the way a senior engineer treats code from a capable but inexperienced colleague. Fast, often good, needs a real review before it ships. The review is not optional and it is not shorter because the AI wrote it — if anything, it is longer, because the code is unfamiliar.

The ones who are not doing well have let the velocity of AI code generation compress their review process. The code ships fast, it looks clean, it passes basic tests. The vulnerabilities do not announce themselves.

78% of AI-built apps storing credentials in plaintext is a number that should stop people. It has not stopped everyone, because nobody finds out until something goes wrong. Set up the review pass before something goes wrong.


Statistics from: Sherlock Forensics 2026 AI Code Security Report, Cloud Security Alliance AI-Generated CVE Surge Research Note (March 2026), Veracode 4M code scan analysis, Georgia Tech Vibe Security Radar, OWASP LLM Top 10 v2.0. CVE-2025-53773 details from National Vulnerability Database.