Security Article

Eyes Open, Vulnerabilities Shipping: The AI Code Security Paradox

A Checkmarx survey of 2,350 developers and security leaders finds 70% believe AI-generated code is more vulnerable — and nearly a third ship it to production anyway. The tools exist. The process doesn't.

Emeka Okafor

Security Editor · Jun 9, 2026 · 6 min read

The scenario is more uncomfortable than any CVE: developers who know the code has problems, have the tooling to find them, and push to production regardless. That's not a hypothetical failure mode. According to Checkmarx's latest application security survey, it's routine industry practice.

The report, based on responses from 2,350 global developers, CISOs, and AppSec managers — a 54 percent larger sample than the prior year — paints a picture of an industry that has quietly redefined acceptable risk upward, just as AI is accelerating the volume of code being written.

The Numbers Are Hard to Dismiss

Seventy percent of respondents reported seeing "significantly more vulnerabilities" in AI-generated code compared to human-written code. Thirty percent said they knowingly ship vulnerable code to production. Ninety-three percent reported experiencing at least one security breach resulting from a vulnerable application — a figure that, while slightly improved from 98 percent the year prior, is still a near-universal failure rate.

The reasons developers give for shipping anyway are telling: pressure to deploy quickly, vulnerabilities that are too difficult to fix on schedule, and a quiet bet that downstream controls — WAFs, runtime protection, monitoring — will catch what development didn't. Checkmarx's framing is blunt: "Risk is normalized."

On the composition side, AI-generated code now accounts for approximately 49 percent of what goes to production, down slightly from 54 percent in last year's survey. Open source components account for 59 percent of production applications. Those are self-reported estimates, and the true numbers may differ, but the direction is clear: a growing majority of production code was never written by the engineer responsible for shipping it — either it came from an LLM or it came from node_modules.

Why AI Code Skews Insecure

The vulnerability gap in AI-generated code isn't arbitrary. LLMs are trained predominantly on public code — code that reflects years of accumulated shortcuts, deprecated patterns, and pre-CVE assumptions baked into Stack Overflow answers and public repositories.

Research from the University of Central Florida and Birzeit University examined how AI-generated code security varied across Java, Python, C, and C++, finding significant variation by language: C code tended to carry the most security issues; Python the fewest. Crucially, the researchers found that LLMs "underutilize modern language and compiler features, often favoring outdated practices over more secure alternatives" — a direct artifact of training data that skews toward older idioms. The researchers themselves acknowledge the study is a "time-stamped view" given how rapidly LLMs are evolving, so the specific rankings may shift, but the structural problem — models encoding historical insecurity — isn't going away without deliberate intervention in training or fine-tuning.

The result is a code generator that confidently produces plausible-looking output in patterns that security engineers have spent years trying to retire.

The 3.4x Risk Multiplier

Checkmarx quantified the dose-response relationship between AI adoption level and breach rate. Organizations where 81–100 percent of code is AI-generated ship vulnerable code at 3.4 times the rate of those with 1–20 percent AI adoption. That correlation — "AI code volume correlates directly with vulnerable code deployment, which correlates directly with breach frequency" — is the clearest statement yet that AI-assisted development, as currently practiced, is not security-neutral.

The multiplication factor matters for threat modeling. It's not just that AI code contains more bugs; it's that the deployment cadence AI enables means those bugs reach production faster and at higher density. If you're moving from manual review of 100 PRs a week to AI-assisted throughput of 300, and each carries a higher baseline vulnerability probability, the exposure surface compounds quickly.

The Process Gap, Not the Tool Gap

Perhaps the most frustrating finding in the Checkmarx data is that the tooling problem is largely solved. Static analysis, SAST, AI-assisted remediation — the infrastructure to catch these issues exists and is widely deployed. Checkmarx's own conclusion: "The tools do the work, but organizations lack in translating this into process."

Veracode has independently reported the same dynamic: AI assistance accelerates development velocity to a point where security review cycles, which haven't scaled equivalently, simply can't keep up.

This is a governance problem dressed up as a tooling problem. Most engineering organizations have SAST in their CI/CD pipelines. What they often lack is a hard gate — a policy that treats a high-severity finding the same way a failing unit test is treated: as a blocker, not a post-ship ticket. When security tools emit warnings and engineers have the discretion to click through them under deadline pressure, the tools don't actually produce security. They produce audit trails.

What Defensible Practice Looks Like

Given the data, a few practical adjustments make the threat model materially better:

Treat AI code as untrusted input during review. The same skepticism applied to third-party libraries should apply to LLM output. Don't assume the model applied current security idioms.
Harden the pipeline gate, not just the scanner. Static analysis that can be overridden on deadline is decorative. Integrate severity thresholds as blocking conditions with a documented exception process — not a checkbox.
Audit open source transitive dependencies continuously. With 59 percent of production code coming from open source, the npm audit and pip-audit runs need to be scheduled events, not one-time checks at install time. Malicious packages and newly discovered CVEs don't respect your release schedule.
Track AI code share as a risk metric. If the 3.4x multiplier holds directionally, an org moving from 20 percent to 80 percent AI-generated code should expect a corresponding change in its vulnerable-code deployment rate — and plan security resourcing accordingly.

The developers who ship known-vulnerable code aren't acting irrationally given the incentives they face. Delivery pressure is real; vulnerability backlog is overwhelming; and the calculus usually lands on "ship and monitor." The problem is that this calculus is being made millions of times across the industry simultaneously, on a growing base of AI-generated code that starts from a worse security baseline. That's not a sustainable equilibrium — it's a breach pipeline.

Sources & further reading

Devs know AI code is riddled with holes, but ship it anyway — theregister.com

#Security #Llm #Devops #Vulnerabilities #Ai Generated Code #Appsec

Written by

Emeka Okafor · Security Editor

Emeka has spent over a decade tracking threat actors, vulnerability disclosures, and the evolving landscape of application security, bringing a sharp continent-spanning perspective to his reporting. He's known for translating dense CVE advisories into clear, actionable context that developers and security teams alike actually read.

Discussion 0

Join the discussion

No comments yet

Be the first to weigh in.

Eyes Open, Vulnerabilities Shipping: The AI Code Security Paradox

The Numbers Are Hard to Dismiss

Why AI Code Skews Insecure

The 3.4x Risk Multiplier

The Process Gap, Not the Tool Gap

What Defensible Practice Looks Like

Sources & further reading

Discussion 0

Related Reading

Arch's AUR Malware Sprawl Hits 1,579 Packages

AI Agents Uncover 21 Zero-Day Vulnerabilities in FFmpeg

Harden a Fresh Linux VPS in 30 Minutes: SSH Keys, UFW, and Fail2ban

AUR Supply Chain Attack Delivers eBPF Rootkit and Infostealer