AI-Generated Code Contains 2.74x More Vulnerabilities Than Human Code

Veracode tested over 100 AI models and found that AI-generated code is significantly less secure than human-written code. 48% of AI code flagged as vulnerable — and AI is now a causal factor in one out of every five security incidents. Here's what that means in practice, and why "we review it before merge" isn't enough of an answer.

Datum

March 24, 2026

What the numbers say

Veracode tested over 100 LLM models for the security of their generated code. The result: AI-generated code contains, on average, 2.74x more vulnerabilities than human-written code from the same repositories. 48% of the AI code analysed was classified as insecure.

These aren't lab numbers. This is code flowing into production systems every day.

A separate analysis by AppSec Santa confirms the scale: of 534 code samples, 25.1% contained confirmed vulnerabilities — ranging from 19% to nearly 30% depending on the model. The most common issues: Server-Side Request Forgery (SSRF), debug information leaks, insecure deserialisation, and injection vulnerabilities of all kinds. Nearly a third of all findings fell into the injection category — the same category that's been in the OWASP Top 10 for decades and should be well understood by now.

Aikido Security's 2026 report adds another number worth sitting with: AI-generated code is now a causal factor in one in five security incidents. 69% of developers surveyed have already discovered vulnerabilities introduced by AI tools. In one in five of those cases, there were measurable business consequences.

And all of this against the backdrop that 100% of organisations surveyed have AI-generated code in their codebase — but 81% have no visibility into where or to what extent AI is actually being used.

The problem isn't the model

The first instinct when you hear these numbers is usually: "We'll just use a better model." That doesn't hold up.

The safest model in the AppSec Santa study still had a vulnerability rate of 19%. The least safe had 29%. The difference is real — but even the best model produces insecure code in nearly one out of five cases. Upgrading to a premium model doesn't buy you security. It buys you slightly less risk.

The second problem is more subtle. Developers using AI tools demonstrably review generated code less carefully. Stanford documented this in 2023 — and found that those same developers reported greater confidence in the security of their code than before. More output, less scrutiny, higher trust. That's a dangerous combination.

What vulnerabilities actually show up

The patterns are recognisable. The most common issues in AI-generated code aren't exotic — they're well-known:

Hardcoded credentials and API keys. AI models optimise for code that works. Placing credentials directly in the code is the path of least resistance, and when a prompt is oriented toward speed, the key ends up in the file.

SQL injection via string concatenation. Instead of parameterised queries, AI builds queries through string concatenation — because that's how it appears in a large share of training data. Not malicious. Still dangerous.

Missing input validation. Especially at API endpoints. Generated code tends to do what the prompt describes — but external inputs aren't treated as potential attack vectors.

SSRF and path traversal. Both arise when user-controlled inputs flow into file paths or HTTP requests without validation. Two of the most common classes in the analysed samples.

Java was the language with the highest risk profile in the Veracode study — which matters particularly for enterprise environments.

Europe performs better — and there's a reason

One detail from the Aikido data stands out: in the US, 43% of organisations report serious security incidents related to AI-generated code. In Europe, it's 20%.

The most plausible explanation: GDPR, NIS2, and similar regulations have built a stronger baseline discipline around code review and security testing. Organisations that already have processes for data protection impact assessments and security-by-design find it easier to embed AI-generated code into those existing review frameworks.

That's an argument for taking compliance processes seriously rather than treating them as overhead.

What this means for the SDLC — and where Shipwright fits

"We review it" is necessary but not sufficient. Human review doesn't scale with the speed and volume at which AI produces code. And as noted, developer trust in AI output changes the quality of that review.

A few structural measures actually help:

SAST on every commit, not just pull requests. Static analysis needs to be a gate, not an occasional check. Running multiple scanners in parallel finds significantly more: 78% of vulnerabilities in the AppSec Santa study were caught by only one scanner.

Keep sensitive areas out of AI-assisted generation. Authentication, cryptography, access control — a single vulnerability in these modules can compromise an entire system. Higher scrutiny here is reasonable.

Create visibility. 81% with no overview of where AI is used in the SDLC is a structural problem. You can't manage risk you can't see.

Activate secrets detection explicitly. Hardcoded credentials are one of the most direct paths to an incident. Most SAST tools have the rules — they're just often not enabled.

This is where I think the design of the development process matters as much as the tooling. I built Shipwright precisely around this tension. It's an open-source AI framework for spec-driven development — a structured 7-phase SDLC where security requirements are defined before code is generated, not retrofitted after. Self-healing CI catches regressions automatically, and compliance checks are built into the process rather than bolted on at the end. The idea is to give AI the guardrails it needs to produce code that's actually production-ready.

→ Explore Shipwright

The actual question

What concerns me most about this conversation isn't the number 2.74. It's how many organisations are right now adopting AI tools without scaling their security processes to match.

The productivity gains from AI assistants are real — I see them every day. But they come with a companion risk that doesn't manage itself. That's not an argument against AI tools. It's an argument for adapting the SDLC accordingly.

Sources

Veracode: We Asked 100+ AI Models to Write Code. Here's How Many Failed Security Tests. (2025) — veracode.com
AppSec Santa: AI Code Security (2026) — appsecsanta.com
Aikido Security: AI-Generated Code Blamed for 1-in-5 Breaches (2026), via rg-cs.co.uk
Dark Reading: As Coders Adopt AI Agents, Security Pitfalls Lurk in 2026 — darkreading.com
Stanford University / GitHub Research: cited via AppSec Santa (2023/2024)