Bleeding Llama: Responding to CVE-2026-7482 in Ollama Across Your Fleet

May 10, 2026 - Ollama , CVE , AI Security , Detection Engineering , Velociraptor , Vulnerability Management

There’s a good chance Ollama is running somewhere in your environment right now. You might not know about it. The developer who installed it probably didn’t file a ticket. It’s listening on port 11434, bound to all network interfaces, with no authentication — because that’s the default.

That was already a risk. Then Cyera’s research team found CVE-2026-7482, and the risk got a lot more concrete.

This post covers what Ollama is, what the vulnerability does, how to find every instance running across your fleet, and what to do about it. We’ll include Velociraptor hunt queries and Sigma detection rules you can put to work today.

What Is Ollama?

Ollama is the most popular platform for running large language models locally. It lets developers run open-source models — Llama 3, Mistral, Phi, Gemma, and dozens more — entirely on their own hardware. No cloud. No API fees. No data leaving the machine.

That privacy-first pitch is genuinely compelling. It’s also why Ollama has over 170,000 GitHub stars and more than 100 million Docker Hub pulls. It installs as a single binary, runs as a local HTTP server, and works out of the box in minutes.

The problem is “works out of the box.” Ollama’s default configuration:

Binds to all network interfaces (0.0.0.0), not just localhost
Requires no authentication on any API endpoint
Exposes a full REST API for model management, including uploading files and pushing models to external registries

For a developer running it on a laptop for personal experiments, that’s fine. For a developer who installed it on a shared server, or whose laptop is on a corporate network, it’s a different story. And for an organization with no inventory visibility into what AI tooling is running on its endpoints — which describes most organizations right now — it’s a risk you can’t manage because you can’t see it.

This is the shadow AI problem. Ollama is easy to install, powerful, and completely outside most security teams’ radar.

The Vulnerability: Bleeding Llama (CVE-2026-7482)

Full technical details are available in Cyera’s original research. This is a working summary.

CVE-2026-7482 | CVSS 9.1 — Critical | Unauthenticated, Remote

What It Is

The bug lives in Ollama’s model quantization pipeline — the code responsible for processing GGUF files. GGUF is the standard file format for packaging LLM weights. When you load a model into Ollama, it reads tensor data (structured chunks of model parameters) from a GGUF file into memory.

The vulnerability is an out-of-bounds heap read. An attacker crafts a GGUF file that declares a tensor far larger than the data actually present in the file. Ollama trusts that declaration and reads past the end of the legitimate buffer — into whatever happens to be sitting adjacent on the heap.

That adjacent memory can contain anything the process has touched: system prompts, user messages, conversation history, environment variables, API keys, tokens, proprietary code, customer data. Whatever Ollama has been handling ends up in the overread.

How the Attack Works

The entire exploit requires three unauthenticated API calls:

Step 1 — Upload the crafted GGUF file

POST /api/blobs/sha256:<hash>

The attacker uploads a malformed GGUF file with an inflated tensor shape declaration.

Step 2 — Trigger model creation

POST /api/create

Ollama processes the GGUF file. The out-of-bounds read executes. Heap memory fills the resulting model output file.

Step 3 — Exfiltrate via model push

POST /api/push

The attacker pushes the model — now containing embedded heap data — to an attacker-controlled registry. The exfiltration primitive is built into Ollama itself.

No credentials. No user interaction. Three HTTP requests.

What an Attacker Can Steal

From AI conversations running on the server:

User prompts and chat messages
System prompts from all running models
Full conversation history across all users of that instance

From the host environment:

Environment variables — including API keys, tokens, and database credentials
Proprietary code submitted to the model for review or generation
Customer data, contracts, or regulated content reviewed via the AI

Who Is Most at Risk

With approximately 300,000 Ollama servers currently exposed on the public internet, the blast radius is wide. Particularly at risk:

Shared internal Ollama instances used as team AI assistants — every employee conversation is potentially exposed
Development environments where Ollama is behind Claude Code, LangChain, AutoGen, or similar agentic tooling — tool outputs, file contents, and code summaries all pass through memory
Regulated industries — healthcare, finance, and legal teams where prompt content includes PHI, PII, or privileged information
Any deployment reachable from the network without an authentication layer

Finding Ollama Across Your Fleet

You cannot patch what you cannot find. Before anything else, you need inventory. Here’s how to hunt for Ollama across your endpoints using Velociraptor — and what to look for if you’re using other tools.

Velociraptor Hunt Queries

Velociraptor is an open-source DFIR and endpoint visibility platform. These VQL queries can be run as fleet-wide hunts from the Velociraptor console. Each one takes seconds to execute across thousands of endpoints.

Hunt 1: Find the Ollama process running on any endpoint

SELECT Pid, Name, Exe, CommandLine, Username, Fqdn
FROM pslist()
WHERE Name =~ "(?i)ollama"

This catches Ollama whether it was installed as a service, launched manually, or running under an unexpected username. The case-insensitive regex handles platform variations.

Hunt 2: Find port 11434 listening on any interface

SELECT Pid, FamilyString, TypeString, Status,
       LocalAddress, LocalPort, RemoteAddress, RemotePort, Fqdn
FROM netstat()
WHERE LocalPort = 11434

This is your misconfiguration detector. If LocalAddress comes back as 0.0.0.0 or :: (any interface) rather than 127.0.0.1 (localhost only), that instance is network-accessible. Flag it immediately.

Hunt 3: Find the Ollama binary on disk — including installs that aren’t currently running**

SELECT FullPath, Size, Mtime, Atime, Fqdn
FROM glob(globs=[
  "C:/Users/*/AppData/Local/Programs/Ollama/ollama.exe",
  "C:/Program Files/Ollama/ollama.exe",
  "/usr/local/bin/ollama",
  "/usr/bin/ollama",
  "/home/*/.ollama/ollama",
  "/opt/homebrew/bin/ollama"
])

A binary on disk that isn’t currently running is still a risk — it may start after a reboot, or a user may launch it manually. This hunt gives you full inventory, not just active processes.

Hunt 4: Check installed version against the patched release

SELECT Fqdn, FullPath,
  commandline_exec(argv=["ollama", "--version"]).Stdout AS RawVersion,
  parse_string_with_regex(
    string=commandline_exec(argv=["ollama", "--version"]).Stdout,
    regex="(?P<Version>[0-9]+\\.[0-9]+\\.[0-9]+)"
  ).Version AS Version
FROM glob(globs=["/usr/local/bin/ollama", "C:/Users/*/AppData/Local/Programs/Ollama/ollama.exe"])

Compare the Version field against the vendor-released patched version. Any endpoint returning an older version is unpatched and needs immediate attention.

Other Discovery Approaches

EDR process inventory. Most enterprise EDRs (CrowdStrike Falcon, Microsoft Defender for Endpoint, SentinelOne) support fleet-wide process queries. Search for ollama as a process name or binary path. This gets you coverage without requiring a Velociraptor deployment.

Network flow analysis. Look for internal hosts initiating or receiving connections on port 11434. This surfaces Ollama instances that aren’t in your process inventory — for example, a containerized deployment or a VM that your EDR doesn’t cover. If you see 11434 traffic between internal hosts, that’s worth investigating even before you know whether the endpoint is patched.

Shodan and Censys. Query for port:11434 and cross-reference results against your organization’s registered IP ranges. This tells you which of your internet-facing assets are directly exploitable right now. This should be one of your first moves.

Package managers. On macOS: brew list | grep ollama. On Linux: snap list, dpkg -l | grep ollama, rpm -qa | grep ollama. On Windows: check Add/Remove Programs or use winget list. These catch installs that aren’t currently running and won’t show up in process-based queries.

Sigma Detection Rules

Two rules. The first catches the active exploit. The second catches the misconfiguration before it becomes an exploit.

Rule 1: Ollama Model Push to External Registry

The highest-fidelity event in the attack chain is Step 3 — the POST /api/push that exfiltrates heap data to an attacker-controlled registry. This rule targets web proxy, WAF, or API gateway logs where Ollama API traffic is observable.

A legitimate model push goes to registry.ollama.ai or an internal registry you control. A push to any other destination is anomalous. That’s the signal.

title: Ollama Model Push to External or Unexpected Registry
id: a3f7c2d1-8b4e-4a9f-b1c5-6d8e0f2a4b7c
status: experimental
description: >
  Detects an Ollama API push request targeting a registry other than the
  official Ollama registry or known internal registries. CVE-2026-7482 uses
  the built-in push endpoint to exfiltrate heap memory contents (including
  prompts, conversation history, and environment variables) to an
  attacker-controlled model registry. No authentication is required.  
references:
  - https://www.cyera.com/blog/bleeding-llama-a-critical-memory-leak-in-the-worlds-most-popular-local-ai-platform
  - https://attack.mitre.org/techniques/T1567/002/
author: Cyber Mixology
date: 2026-05-10
tags:
  - attack.exfiltration
  - attack.t1567.002
  - attack.t1190
  - cve.2026-7482
  - ollama-bleeding-llama-suite
logsource:
  category: proxy
  product: any
detection:
  selection:
    cs-method: "POST"
    cs-uri-stem|endswith: "/api/push"
    c-ip|startswith:
      - "10."
      - "172.16."
      - "172.17."
      - "172.18."
      - "172.19."
      - "172.20."
      - "172.21."
      - "172.22."
      - "172.23."
      - "172.24."
      - "172.25."
      - "172.26."
      - "172.27."
      - "172.28."
      - "172.29."
      - "172.30."
      - "172.31."
      - "192.168."
  filter_legitimate_registry:
    cs-host|endswith:
      - "registry.ollama.ai"
      - ".your-internal-registry.example.com"
  condition: selection and not filter_legitimate_registry
falsepositives:
  - Developers legitimately publishing models to personal or organization-owned
    registries not listed in the filter block. Expand the filter with known
    legitimate registry domains rather than removing the rule.
level: high

Tuning note: Populate filter_legitimate_registry with every registry domain your organization uses legitimately. The goal is to make any push to an unknown destination high-confidence — not to alert on normal model publishing workflows.

Rule 2: Ollama Listening on All Network Interfaces

This is a posture rule — it tells you about misconfiguration, not active exploitation. Ollama bound to 0.0.0.0 is network-accessible. That’s a precondition for remote exploitation of CVE-2026-7482 and any future vulnerability in Ollama’s unauthenticated API.

This rule targets network connection telemetry — Sysmon Event ID 3 on Windows, or equivalent on Linux via auditd or eBPF-based sensors.

title: Ollama Service Listening on All Network Interfaces
id: b5d9e3f1-2c6a-4b8d-9e0f-1a3c5e7f9b2d
status: stable
description: >
  Detects the Ollama process establishing a listening socket on all network
  interfaces (0.0.0.0 or ::) rather than localhost only. Ollama's default
  configuration binds to all interfaces with no authentication, making it
  network-accessible to any host that can reach port 11434. This is the
  prerequisite misconfiguration for remote exploitation of CVE-2026-7482 and
  other unauthenticated Ollama API vulnerabilities.  
references:
  - https://www.cyera.com/blog/bleeding-llama-a-critical-memory-leak-in-the-worlds-most-popular-local-ai-platform
  - https://attack.mitre.org/techniques/T1203/
author: Cyber Mixology
date: 2026-05-10
tags:
  - attack.initial_access
  - attack.t1203
  - cve.2026-7482
  - ollama-bleeding-llama-suite
  - posture
logsource:
  category: network_connection
  product: windows
detection:
  selection:
    Image|endswith: '\ollama.exe'
    DestinationPort: 11434
    Initiated: "false"
  selection_linux:
    Image|endswith: '/ollama'
    DestinationPort: 11434
    Initiated: "false"
  filter_localhost:
    DestinationIp:
      - "127.0.0.1"
      - "::1"
  condition: (selection or selection_linux) and not filter_localhost
falsepositives:
  - Intentionally network-accessible Ollama deployments behind an authenticated
    reverse proxy. If this is your architecture, suppress by hostname and ensure
    the proxy authentication is verified separately.
level: medium

Tuning note: This rule will fire on every Ollama instance that isn’t bound to localhost — which is most of them, because that’s the default. Use it as an inventory and compliance signal first. Alert volume tells you how widespread the misconfiguration is in your environment.

Patching and Hardening

Immediate Actions — Within 24 Hours

Patch Ollama. Apply the vendor-released fix. The patch validates tensor element counts against actual buffer sizes before any quantization loop executes, closing the out-of-bounds read entirely. Update with ollama --version to confirm the patched version is running post-update.

Block port 11434 at the firewall. If patching isn’t immediately possible, enforce a network-level block on port 11434 for any host you cannot patch in the next 24 hours. This is a stopgap, not a fix — but it closes the remote attack vector while you work through the patch queue.

Treat internet-facing instances as compromised. If any Ollama instance was reachable from the public internet before patching, assume the worst. That means rotating every API key, token, and credential that may have passed through the environment — either in prompts or as environment variables. Check what models were pushed externally using your proxy or WAF logs.

Short-Term Hardening — Within One Week

Bind to localhost only. Set OLLAMA_HOST=127.0.0.1 in Ollama’s environment before starting the service. This prevents network access regardless of firewall rules. It’s a one-line config change and should be your baseline for any instance that doesn’t explicitly need to be network-accessible.

sudo systemctl edit ollama

Deploy an authentication proxy. For instances that genuinely need to be network-accessible, put an authenticated reverse proxy in front. NGINX with auth_basic is the low-friction option. OAuth2 Proxy or an API gateway with token-based auth is more robust for team deployments.

Segment and restrict egress. Ollama servers should not have unrestricted outbound internet access. The exploit’s exfiltration step requires pushing a model to an external registry. Egress filtering to known-good registry domains (or a complete block on outbound model pushes) eliminates the exfil primitive even if the read still occurs.

Audit agentic integrations. If Claude Code, LangChain, AutoGen, CrewAI, or any similar framework is routing requests through Ollama, everything those tools have passed through the model — file contents, code, tool outputs — is in scope for exposure. Review what those integrations have been submitting and treat sensitive outputs as potentially disclosed.

Longer-Term — Policy and Visibility

Establish an approved process for AI tooling. Ollama’s shadow footprint exists because there’s no friction to installing it. A developer can go from zero to a running local LLM in five minutes. That’s a feature, not a bug — but it means security teams need a policy that acknowledges AI tooling as a distinct category requiring inventory, review, and approval, rather than treating it like any other developer tool.

Build ongoing detection. The Sigma rules above are a starting point. Add them to your detection suite, tag them with ollama-bleeding-llama-suite, and run the Velociraptor hunts on a scheduled cadence. New Ollama installs will appear in your environment regularly as more developers adopt local AI tooling. Detection should be continuous, not a one-time exercise.

The Bigger Picture: Shadow AI Is the New Shadow IT

A decade ago, security teams learned to worry about shadow SaaS — developers spinning up Dropbox accounts, using personal Gmail for work, running unapproved cloud storage. The response was a combination of policy, DLP tooling, and CASB solutions that could see and control what data was going where.

Shadow AI is the same problem, one generation later. Except instead of a Dropbox account storing files, it’s a local inference server processing every sensitive prompt your team feeds it — with no audit log, no DLP integration, and until last week, a critical unauthenticated memory leak that could hand all of that to anyone who knew to look.

CVE-2026-7482 is a serious vulnerability. Patch it. But the real question it surfaces is broader: do you know what AI tooling is running in your environment? Do you know what data is flowing through it? Do you have any visibility into whether those systems are configured safely?

The Velociraptor hunts and Sigma rules in this post are a starting point for that visibility. Use them as the foundation of an ongoing AI infrastructure inventory — not just a one-time response to this CVE.

The patch closes one hole. The visibility program closes the category.

Cyera’s research team discovered and responsibly disclosed CVE-2026-7482. Read their full technical report for the complete exploit walkthrough and memory analysis. If you’re building detection or response capability around AI infrastructure risk, their research is worth following.