OWASP Top 10 for LLM Applications 2025: Sensitive Information Disclosure

Large Language Models (LLMs) depend entirely on the information they are trained on. Since they need such intensive quantities of data, there’s a high likelihood of sensitive data being incorporated somewhere in the mix. However, LLM outputs are inherently unpredictable – the result is a heightened risk of sensitive information disclosure, where personal and corporate data is exposed in response to an LLM prompt. LLM data leaks are a threat to any organization that allows its employees to use LLMs.

Más información Read the GigaOm Radar Report

The Types and Impacts of Sensitive Information Disclosure

Sensitive information disclosure can take a wide variety of forms: the following types cover the most well-established causes of disclosure.

Verbatim Memorization

There are two types of memorization that can occur with an LLM: verbatim and semantic. The first sees the model directly regurgitate strings and sentences from the training data itself. This form of disclosure represents an immense risk to copyright, customer privacy, and AI data exposure.

An example of verbatim disclosure occurred in late 2023 when a team of researchers discovered that asking ChatGPT to repeat certain words indefinitely led to it suddenly regurgitating lines of training data. Some passages were lifted directly from the books that it had been trained on; the same tactic used with other prompts saw other sections of personal data regurgitated. For instance, asking GPT to say ‘poem’ indefinitely saw it generate personal identifiable information – this included names, email addresses, and phone numbers – some of which were real. These types of AI privacy risks are highly unpredictable, and a direct threat to data security.

Semantic Memorization

Whereas verbatim memorization sees direct replication of training data, semantic memorization sees an LLM output the same meaning, even if the words are changed. This is not as inherently dangerous as verbatim memorization, and can be useful in many contexts. However, LLMs don’t inherently distinguish between information that should be shared and which should not be handed over to an end-user. Although the model is trained not to disclose recognized personal data, it can still memorize these details during training, allowing for sensitive information to be extracted through prompt injection vulnerabilities. It’s one of the core risks that AI security needs to proactively manage.

Fine-Tuning and Deployment Leaks

While memorization issues are rooted within a core model, LLMs are deployed with an agent on top of the core model. Alongside the interactive element, they also often demand a degree of fine-tuning. Both of these deployment components introduce their own disclosure risks.

For instance, an organization may need to fine-tune a public model with its own internal information. If this isn’t closely monitored, it’s possible for customer data to not be anonymized before being added to the LLM. As a result, once the model is deployed in a multi-tenant environment, other customers could potentially jailbreak the model to disclose competitors’ information, such as revenue targets and product roadmaps.

Other risks stem from the way in which data is transferred to and from a central LLM: soon after the launch of GitHub Copilot, researchers discovered that the LLMs trained on public GitHub repositories could inadvertently generate sensitive API keys and passwords. This means that LLMs can actively contribute to the difficulties surrounding identity management – especially when coupled with the fact that LLM users often include confidential corporate information in their prompts.

How to Prevent Sensitive Information Disclosure through AI

Traditional Data Loss Prevention (DLP) strategies aren’t well-equipped to handle the conversational nature of LLMs. Furthermore, LLM security risks remain relatively unknown. To handle this, sensitive information disclosure needs to be prevented from the source.

Understand an LLM Agent’s Risk Profile

When fine-tuning models with sensitive or restricted data, organizations must control an agent’s output according to the requester’s own role and requirements. This is a direct application of the principle of least privilege currently deployed in Identity and Access Management (IAM). For example, a model trained on engineering content for an aero defense company carries a higher risk of sensitive data exposure if made available to all employees—including those in HR, Finance, and Legal—compared to restricting access solely to engineers in R&D.

Data Sanitization Techniques

Users often unintentionally input personal or sensitive information when providing additional context to large language models (LLMs), and without proper input validation, this behavior could lead to significant information leakage over time. One study conducted shortly after the launch of ChatGPT revealed that 11% of the data that employees submit to the platform contained confidential information, including personally identifiable information (PII), personal health information (PHI), and proprietary source code. To mitigate these risks, organizations should implement robust data sanitation procedures, ensuring sensitive information is removed or masked before interacting with third-party LLMs.

Implement Data Redaction and Noise in Training

When sensitive or proprietary data must be included in in-house training, data deduplication can help reduce the risk of memorization. Data deduplication works by identifying and removing duplicate records or redundant information before training begins. Without deduplication, a model is more likely to memorize frequently repeated patterns—especially if the same sensitive information appears multiple times—making it easier for that data to resurface in model outputs. By minimizing repetition, deduplication lowers the chances that proprietary or personal data becomes “over-learned” by the model, ultimately reducing the likelihood of sensitive information leakage during inference.

Another training technique that reduces the risk of sensitive information disclosure is noise injection. This makes models less sensitive to small variations in input data, reducing the risk of memorization. It works by subtly altering the training data—such as changing phrasing, shuffling data points, or introducing slight randomness—so that the model learns broader patterns rather than memorizing exact text. Verifying that noise injection is implemented is important, especially when sensitive or proprietary data is involved. Proper noise injection not only strengthens the model’s ability to generalize and perform better across different inputs but also lowers the likelihood that specific confidential details are memorized and reproduced during inference.

Prevent Sensitive Information Disclosure with Check Point CloudGuard

Protecting an individual LLM application doesn’t have to be complex or obtuse: a Web Application Firewall (WAF) can provide in-depth, real-time network visibility into who’s using an LLM, what services it’s connected to, and what data it’s generating.

Check Point CloudGuard provides automated security not just from a network perspective, but across an application’s entire lifecycle. It automatically enforces security policies from development through runtime. To protect sensitive data from being leaked and block AI-driven cyber attacks, CloudGuard flags unencrypted assets and detects misconfigurations or known vulnerabilities in an LLM, and any connected devices. Additionally, it can spot over-privileged accounts, privilege escalation attempts, and anomalous access behavior. See how CloudGuard ensures robust protection and schedule a demo today.

x
  Comentarios
Este sitio web utiliza cookies para optimizar su funcionalidad y para fines de análisis y marketing. Al seguir usando este sitio web, usted acepta el uso de cookies. Para obtener más información, lea nuestro Aviso de cookies.