AI privacy concerns: How exposed is your personal data?

A common misconception is that if you don’t use large language models (LLMs) like ChatGPT for chats, Midjourney for generating images, or Sora for generating videos, you’re safe from artificial intelligence (AI). Whether you use it or not, AI already touches your life.

For example, when you take photos on Android, Google Photos automatically scans and labels every person in your photo library using facial recognition. Posting on social media without changing your default settings creates public posts that can be scraped into the massive datasets used to train LLMs. Even personalized ads on websites or in apps are powered by AI trained on your browsing and shopping preferences.

That convenience costs your privacy and may have a deep impact on your life. Twenty photos taken from your child’s social feed are enough to build a 30-second deepfake video that can be used for blackmail, bullying, or identity theft.

Here’s what’s at stake when AI has access to your personal data and what you can do to protect your privacy online:

How is AI risking your privacy?

AI can erode privacy by collecting too much, inferring too much, sharing too much.

Your personal data may be collected

AI systems become more accurate by training on massive amounts of data, often scraped from publicly available sources, such as your Facebook posts(nueva ventana), Flickr photos(nueva ventana), or Reddit threads(nueva ventana). Casual social media posts, family photos, and profile details — often containing sensitive information and originally shared for personal or social reasons — have been included in datasets used to train billion-dollar LLMs and facial recognition systems. This happens because Big Tech treats online content as freely available for AI use, without explicit consent or regard for intellectual property.

You may be re-identified

Tech companies claim your personal data can’t be traced back to you once it has been de-identified or pseudo-anonymized, meaning obvious identifiers like names or phone numbers are stripped away. But this protection is fragile, as anonymized datasets can be re-identified by cross-referencing them with other data sources, such as social media profiles or geolocation trails.

For instance, Netflix users have been re-identified(nueva ventana) by comparing their anonymous movie ratings with IMDb information. One study(nueva ventana) shows that almost every American can be singled out in any dataset with only 15 demographic markers. Adding AI’s pattern-matching power, re-identification has become faster, easier, and accessible to anyone.

Data deletion requests may not work

Once your data trains an AI model, taking it back is almost impossible because it shapes the model’s overall behavior. Machine unlearning — techniques to make a model forget — is still in its early stages, so the only option today would be to retrain the model. And even if a company claims to have honored your data deletion request, there’s practically no way to confirm it(nueva ventana).

Other people may see your private chats

LLMs such as ChatGPT(nueva ventana), Meta AI(nueva ventana), and Grok(nueva ventana) have exposed private conversations through their share features, with chats being indexed by search engines and made publicly discoverable. The platforms weren’t transparent enough about this risk, leaving users unaware that what felt like a private exchange could end up visible to anyone on the internet.

You may be treated unfairly

If the data used by AI systems to learn patterns contains hidden biases — such as from historical inequalities or incomplete datasets — the AI can reinforce or amplify those patterns. The stakes are higher with Big Tech’s non-private AI systems, which are closed-source and operate as black boxes that can’t be independently reviewed. These systems may use sensitive attributes like race, gender, or ZIP code to make automated decisions in predictive policing(nueva ventana), hiring(nueva ventana), healthcare(nueva ventana), or credit scoring(nueva ventana).

Ad targeting is getting sharper

While non-private AI makes ads smarter by enabling hyper-targeting, it often invades the privacy of your whole family. For example, Publicis, a data broker and the world’s largest advertising company, claims to profile 2.3 billion people and track details like family preferences and income(nueva ventana) to decide whether to target them with budget or premium products.

With AI chatbots replacing traditional search, ads are following us into this new space. For instance, Perplexity is embedding ads in AI-generated responses(nueva ventana) and has placed a $34.5 billion bid to buy Google Chrome — a move aimed at gaining access to the browser’s 3+ billion users and the intimate behavioral data that comes with it.

Cloud storage may expose your data

Cloud storage providers without end-to-end encryption (E2EE) can access the photos, documents, and sensitive files you upload. They may also use that data to power AI tools, generate insights about you, or show personalized ads.

Google Drive, for example, retains access to your data and uses it for AI features like spell check and autocomplete in Google Docs. If Gemini, Google’s AI assistant, remains tightly integrated with Google Workspace, queries you make about your Drive files could also feed into AI training.

Similarly, Microsoft has announced that Word, Excel, and PowerPoint will soon autosave to OneDrive by default, another non-E2EE service where the future use of your data for advertising or AI training remains uncertain.

AI can make mistakes

Automated systems could scan your private communications and flag them as suspicious. The EU’s proposed Chat Control law would require messaging services like WhatsApp and Signal to use AI to scan every private message and photo to detect child sexual abuse material (CSAM).

But this means monitoring everyone’s conversations, not just those of suspected criminals. And history shows how easily AI can make mistakes. A father’s Google account, for example, was terminated(nueva ventana) and reported to authorities after sending a photo of his child to a doctor. What should stay between you and your doctor, or you and your family, could suddenly be exposed to tech companies and law enforcement.

Anyone can make deepfakes

AI can be used to create deepfakes — highly realistic fake photos, videos, or audio. For example, someone could take your social media photos and create a video of you saying or doing things you never did.

Bad actors exploit deepfakes for identity theft, fraud, blackmail, or reputational damage — and the risks extend to children. In 2019, criminals used deepfake audio to mimic a CEO’s voice(nueva ventana) and tricked an employee into transferring €220,000. The risks also extend to kids, too. In one incident, a predator created a deepfake image of a 14-year-old(nueva ventana) to extort money by threatening to share it.

How to keep your data private from AI systems

There are many privacy concerns with AI systems, particularly the non-private, closed-source models run by Big Tech. And while you can’t fully prevent these systems from scraping or misusing your data once it’s out there, you can reduce your footprint, demand accountability, and choose privacy-first AI that don’t exploit your data. Here’s what you can do:

On social media, make your profiles and posts private, delete old uploads, strip EXIF data from photos before sharing, and avoid sharing identifiable details — like addresses, children’s full names or the schools they attend. Find out more about how to manage the internet for your family.
Check the privacy settings of your apps. For instance, Meta AI could be scanning your camera roll photos and videos on the Facebook app for Android and iOS.
Protect against deepfakes by blurring out or redacting your family’s faces before posting photos online.
Mask your digital footprint by using a virtual private network (VPN)(nueva ventana) to hide your IP address, and use aliases to protect your email address when posting sensitive information you wouldn’t want traced back to you.
Use privacy-first services that don’t monetize your data, such as Signal for secure messaging and Brave or DuckDuckGo for private browsing.
For securely storing your most sensitive files, including private photos and confidential documents, use Proton Drive for end-to-end encrypted cloud storage. Unlike platforms that may expose supposedly private content, Drive doesn’t scan, index, or use your data for AI training — and it can’t be seen by anyone else, even when you choose to share it. Keeping your photos truly private also means they won’t end up online where they could be misused to create deepfakes.
Opt out of AI training whenever possible, such as on Gemini, ChatGPT(nueva ventana), Claude(nueva ventana), or Meta AI. Policies can change overnight with little warning, so if you want the benefits of both AI and privacy, switch to Lumo — our privacy-first AI assistant(nueva ventana) that doesn’t keep logs or train on your data.

Stronger regulations on AI privacy, such as the EU’s AI Act(nueva ventana), will be critical to shift the power back to internet users. Until then, the best defense is being mindful of what you share online, demanding accountability from the companies building these systems, and choosing transparent AI tools(nueva ventana) that respect privacy from the ground up.

What are the AI privacy concerns?