216labs Blog

Google’s Gemini and other large language models are shipping into products at pace — search, cloud assist, enterprise agents. With that comes a new class of vulnerabilities: models that can’t reliably tell “user instruction” from “malicious text in a document,” and that can be nudged into bypassing safety with the right prompts. This post sketches the landscape, with Gemini as a concrete example, and why it should matter to anyone building or relying on LLM-powered tools.

Gemini in the crosshairs

Two high-profile research efforts put Gemini’s security in the spotlight.

- GeminiJack (late 2025). Researchers showed that Gemini Enterprise and Vertex AI Search could be abused via *indirect* prompt injection. An attacker embeds hidden instructions inside a Google Doc, calendar invite, or email. When a user (or an automated flow) lets Gemini read that content, the model treats the embedded text as legitimate commands. In the demonstrated attack, that led to automatic search across Gmail, Calendar, and Docs and exfiltration of sensitive data — without the user clicking a link or running a script. “Zero-click” in the sense that the victim only had to have Gemini process a document the attacker could influence. Google’s response included decoupling Vertex AI Search from Gemini Enterprise and hardening the underlying pipeline.

- The “Gemini Trifecta” (Tenable, 2025). Three separate issues across Gemini’s surface: (1) the Gemini Browsing Tool could be abused to exfiltrate saved data and location; (2) the Search Personalization model was vulnerable to search-injection via manipulated browser history, leaking user data; (3) Cloud Assist accepted crafted content in log entries (e.g. HTTP User-Agent) that could be used for prompt injection, opening the door to phishing or further cloud compromise. All three have been addressed by Google, but they illustrate how many moving parts — browsing, search, logs — become new attack vectors when an LLM is in the loop.

The underlying pattern: indirect prompt injection

The thread running through these is *indirect prompt injection*. The model isn’t given a blatant “ignore your instructions” in the user’s message. Instead, the malicious instruction is *inside data the model retrieves*: a web page, an email, a doc, a log line. The model doesn’t have a reliable way to say “this part is from the user, that part is from untrusted content,” so it may follow the hidden instruction as if it were legitimate. That’s a fundamental tension: we want the model to act on “user intent,” but intent can be spoofed by content we feed into the same context. Gemini isn’t uniquely bad here — it’s a structural issue for any LLM that reasons over mixed trusted and untrusted text.

LLM vulnerabilities in general

Beyond Gemini, the broader LLM security space has crystallized into a few categories.

- Prompt injection (direct and indirect). Direct: the user (or attacker) types instructions that override or bypass the system prompt. Indirect: as above — the malicious prompt is embedded in a document, webpage, or API response the model sees. Defenses (prompt design, filtering, “canonical” instruction channels) are improving but not solved.

- Jailbreaks. Multi-turn or single-query techniques that get the model to ignore safety policies: harmful content, forbidden topics, or role-play that bypasses guardrails. Research has shown high success rates against leading models (e.g. adaptive attacks using logprobs, “Crescendo” multi-turn escalation, or embedding jailbreak prompts in long chains). Different models fail in different ways — so there’s no single fix.

- Tool use and autonomy. When the model can call APIs, search the web, or read emails, any confusion between “user said” and “data said” can lead to wrong or malicious actions. Gemini’s browsing and cloud-assist issues are examples: the more the model does on the user’s behalf, the more an attacker can try to steer that behavior via poisoned content.

- Data exfiltration and privacy. As with GeminiJack, the combination of broad access (Gmail, Calendar, Docs) and weak separation between instructions and data can turn a “helpful” agent into a data-leak channel. Enterprise deployments need to assume that any content the model sees might contain attempted instructions.

Why this matters for builders

If you’re integrating an LLM — Gemini or otherwise — into a product that touches sensitive data or performs actions (sending email, editing docs, calling APIs), you’re taking on this attack surface. Mitigations today are mostly layered: restrict what the model can see and do, sandbox tool use, treat all retrieved content as potentially adversarial, and keep auditing and red-teaming. Google’s own response (hardening Gemini 2.5, separating components, defense in depth) is the right direction, but the problem isn’t “solved.” Expect more Gemini and general-LLM vulnerabilities to show up as usage grows. The takeaway: treat LLM-powered features as a new kind of dependency — one that needs threat modeling and continuous attention, not a one-time security review.