A frustrating reality for creators today is that their work is exposed to AI training. Tech companies developing generative AI such as ChatGPT, Gemini, DeepSeek, Stable Diffusion, and Midjourney need massive datasets to train their models, and they’ve been scraping the public internet to do so.
This has raised important questions about consent, attribution, and control over creative work once it’s shared online.
If you’re looking for ways to better protect your creative data, this guide outlines practical steps you can take to reduce how your work is used in AI training, while still engaging with the internet on your own terms.
- How AI training on public content can put creative work at risk
- How to keep AI from using your art
- Your art is sensitive data
How AI training on public content can put creative work at risk
Generative AI tools require a large amount of data to work, and much of that data is sourced from the internet. OpenAI has publicly stated(new window) that it would be “impossible” to train AI like ChatGPT without access to copyrighted material found online.
Creative works from online portfolios, social media platforms, and blogs are being used to train these models without consent or attribution. For example, Meta has admitted(new window) to scraping publicly shared Facebook and Instagram posts, photos, and comments going back to 2007 to train its generative AI models. That means any creative work you’ve ever shared publicly on those platforms — like wedding photos, portfolio shots, or illustrations — could be used for AI training unless you’ve set visibility to private.
AI companies have argued in ongoing lawsuits(new window) that training on scraped internet data falls under “fair use.” At the same time, they treat the resulting models and datasets as proprietary assets. OpenAI’s terms of service prohibit “using Output to develop models that compete with OpenAI,” and the company has accused DeepSeek of “inappropriately” copying(new window) its models — the same models trained on publicly available internet data.
This apparent double standard helps explain why many creators feel their work is vulnerable to AI training without their consent, credit, or compensation. It also raises broader questions about how “publicly available” content is interpreted, particularly when creative work is shared on platforms whose licenses explicitly limit unauthorized reuse or commercial exploitation. As a result, many artists, writers, and photographers are increasingly pushing back against AI data scraping.
How to keep AI from using your art
With courts still issuing case-by-case decisions and no clear legal standard in place, creators can’t rely on the legal system alone to protect their work. In the meantime, there are practical steps you can take right now to reduce how your work is used in AI training.
None of these strategies are foolproof, though; protection tools and AI companies are always trying to outmaneuver one another. For now, think of them as individual parts to your armor that work better collectively. These are the best ways to give yourself more control over your creative data:
Cloak your art style
Cloaking tools like Glaze(new window) make it harder for AI models to train on your work by making tiny changes to pixels that confuse AI models. The image appears as intended to humans, but to an AI scraper, it registers as a different or distorted style.
“Poison” your artwork
Using tools like Nightshade(new window), you can make your artworks poisonous to AI scrapers. A “poisoned” image contains subtle, invisible changes that interfere with AI training, causing the system to misinterpret what it’s seeing, such as cars instead of cats, or clouds instead of planes. Over time, if enough poisoned images are used for training, those wrong associations can show up in future versions of the model.
However, technical protections like Glaze and Nightshade are not foolproof, and research(new window) shows they may be weakened as AI systems evolve.
Opt out of AI training
If your work exists online, chances are it has been scraped into an AI model. Using websites like Have I Been Trained(new window) and The Atlantic’s AI Watchdog(new window), you can check whether your images, writing, or other creative work appear in known datasets used to train AI models. The first allows you to submit your work to a Do Not Train registry, where participating companies can identify and exclude those images from future training runs. However, these measures are voluntary, depend on individual companies’ willingness to honor them, and do not affect models that have already been trained using your work.
If you live in the EU, you can use data protection laws like GDPR(new window) to your advantage by requesting companies to exclude your content from AI training. Some companies have opt-out processes buried in the settings of their apps; for example, here’s how to opt out of Meta AI data use on Facebook, Instagram, and WhatsApp.
Lock down your privacy settings
Reduce what you post publicly on social media and make sure your profile is set to private. The less content that’s openly accessible, the harder it is to scrape by external AI systems. But this may not be enough to protect you from the platform itself, as many companies increasingly integrate AI features — such as Meta using all Meta AI interactions for training and ads — raising questions about how both public and private content may be used over time. It’s best to avoid using social media as your primary archive or portfolio.
Be intentional about public sharing
When posting publicly to reach your audience, share smaller, lower-resolution, or watermarked versions of your work. Keep full-quality files stored in offline backups or cloud services that clearly do not use private content for AI training.
Safely store and share files
As AI tools become more deeply integrated across major platforms — such as Google adding Gemini everywhere including Google Drive and Gmail — people are increasingly cautious about how to store and share their work.
Proton Drive provides end-to-end encrypted storage and sharing for your photos, videos, albums, documents, spreadsheets, and other files. We never collect, process, share your data with third parties, or use it for AI training. Unlike Big Tech, Proton is fully supported by our community of paying subscribers rather than advertising or data use.
You can share password-protected links, set expiration dates, grant access to specific people only via email, and revoke access anytime. You can also collect files securely from people without a Proton Account.
Use private AI without giving up control
If you want the benefits of AI without giving up control over your work, and without worrying that a future policy change could suddenly turn your files into training data, use our private AI assistant(new window). Lumo never trains on your files or conversations, and it’s based on open-source code, which means anyone can verify our claims.
Lumo integrates with Proton Drive, allowing you to safely work with your files and generate images, without contributing to the AI scraping ecosystem that so many individuals and organizations are actively pushing back against.
Your art is sensitive data
Stopping AI art theft doesn’t mean rejecting AI altogether. But it does mean recognizing that creative work is sensitive data, whether it’s an illustration, a novel, or a song. Creators deserve agency and fair treatment, including the ability to decide how and whether their work is used.
No single strategy can fully prevent AI systems from absorbing publicly available content — and in some cases, indirectly exposed private content — into their training. And AI companies will have you believe there’s no way to build AI tools without using your data. We disagree(new window).
Until regulators and courts provide clearer guidance, the most effective approach is to be proactive about how you interact with the internet and to choose platforms that clearly respect your privacy and creative rights.
