By: Geoffrey A. Fowler – washingtonpost.com – September 8, 2023
It’s your Gmail. It’s also Google’s artificial intelligence factory.
Unless you turn it off, Google uses your Gmail to train an AI to finish other people’s sentences. It does that by analyzing how you respond to its suggestions. And when you opt in to using a new Gmail function called Help Me Write, Google uses what you type into it to improve its AI writing, too. You can’t say no.
Your email is just the start. Meta, the owner of Facebook, took a billion Instagram posts from public accounts to train an AI, and didn’t ask permission. Microsoft uses your chats with Bing to coach the AI bot to better answer questions, and you can’t stop it.
Increasingly, tech companies are taking your conversations, photos and documents to teach their AI how to write, paint and pretend to be human. You might be accustomed to them selling your data or using it to target you with ads. But now they’re using it to create lucrative new technologies that could upend the economy — and make Big Tech even bigger.
We don’t yet understand the risk that this behavior poses to your privacy, reputation or work. But there’s not much you can do about it.
Sometimes the companies handle your data with care. Other times, their behavior is out of sync with common expectations for what happens with your information, including stuff you thought was supposed to be private.
Meta says it can use the contents of photos and videos shared to “public” on its social networks to train its AI products. You can make your Instagram account private or change the audience for your Facebook posts.
Gmail, by default in the U.S., uses how you respond to its Smart Compose suggestions to train the AI to better finish people’s sentences. You can opt out.
Microsoft uses your conversations with its Bing chatbot to “fine-tune” the AI and share with its partner OpenAI. There is no way to opt-out as a consumer.
Google learns from your conversations with its Bard chatbot, including having some reviewed by humans. You can ask Google to delete your chat history, but it will still hold on to chats for up to 72 hours.
Google uses what you type and other “interactions” with its Workspace Labs AI in Gmail, Docs, Slides and Sheets to help its AI become a better creative coach. You cannot opt out if you want to use these functions.
Google uses your private text or voice conversations with its Assistant to “fine-tune” the responses of Assistant or Bard. You can opt out by adjusting your Google privacy settings to not save your activity.
Google says it can use “publicly available information” to train its AI, including the contents of YouTube videos and Google Docs that have been published to the Web.
Zoom set off alarms last month by claiming it could use the private contents of video chats to improve its AI products, before reversing course. Earlier this summer, Google updated its privacy policy to say it can use any “publicly available information” to train AI. (Google didn’t say why it thinks it has that right. But it says that’s not a new policy and it just wanted to be clear it applies to its Bard chatbot.)
If you’re using pretty much any of Big Tech’s buzzy new generative AI products, you’ve likely been compelled to agree to help make their AI smarter, sometimes including having humans review what you do with them.
Lost in the data grab: Most people have no way to make truly informed decisions about how their data is being used to train AI. That can feel like a privacy violation — or just like theft.
“AI represents a once-in-a-generation leap forward,” says Nicholas Piachaud, a director at the open source nonprofit Mozilla Foundation. “This is an appropriate moment to step back and think: What’s at stake here? Are we willing just to give away our right to privacy, our personal data to these big companies? Or should privacy be the default?”
New privacy risks
It isn’t new for tech companies to use your data to train AI products. Netflix uses what you watch and rate to generate recommendations. Meta uses what you like, comment on and even spend time looking to train its AI how to order your news feed and show you ads.
Yet generative AI is different. Today’s AI arms race needs lots and lots of data. Elon Musk, chief executive of Tesla, recently bragged to his biographer that he had access to 160 billion video frames per day shot from the cameras built into people’s cars to fuel his AI ambitions.
“Everybody is sort of acting as if there is this manifest destiny of technological tools built with people’s data,” says Ben Winters, a senior counsel at the Electronic Privacy Information Center (EPIC), who has been studying the harms of generative AI. “With the increasing use of AI tools comes this skewed incentive to collect as much data as you can upfront.”
All of this brings some unique privacy risks. Training an AI to learn everything about the world means it also ends up learning intimate things about individuals, from financial and medical details to people’s photos and writing.
Some tech companies even acknowledge that in their fine print. When you sign up to use Google’s new Workspace Labs AI writing and image-generation helpers for Gmail, Docs, Sheets and Slides, the company warns: “don’t include personal, confidential, or sensitive information.”
The actual process of training AI can be a bit creepy. Companies employ humans to review some of how we use products such as Google’s new AI-fueled search called SGE. In its fine print for Workspace Labs, Google warns it may hold your data seen by human reviewers for up to four years in a manner not directly associated with your account.
Error! Filename not specified.
If you agree to use Google’s new chatbot-powered search, the fine print also says you agree to letting Google use data from it to train its AI — and even have humans take a look at your conversation. (Washington Post illustration; Caroline O’Donovan/The Washington Post via Google)
Even worse for your privacy, AI sometimes leaks data back out. Generative AI that’s notoriously hard to control can regurgitate personal info in response to a new, sometimes unforeseen prompt.
It even happened to a tech company. Samsung employees were reportedly using ChatGPT and discovered on three different occasions that the chatbot spit back out company secrets. The company then banned the use of AI chatbots at work. Apple, Spotify, Verizon and many banks have done the same.
The Big Tech companies told me they take pains to prevent leaks. Microsoft says it de-identifies user data entered in Bing chat. Google says it automatically removes personally identifiable information from training data. Meta says it will train generative AI not to reveal private information — so it might share the birthday of a celebrity, but not regular people.
Okay, but how effective are these measures? That’s among the questions the companies don’t give straight answers to. “While our filters are at the cutting edge in the industry, we’re continuing to improve them,” says Google. And how often do they leak? “We believe it’s very limited,” it says.
It’s great to know Google’s AI only sometimes leaks our information. “It’s really difficult for them to say, with a straight face, ‘we don’t have any sensitive data,’” says Winters of EPIC.
Perhaps privacy isn’t even the right word for this mess. It’s also about control. Who’d ever have imagined a vacation photo they posted in 2009 would be used by a megacorporation in 2023 to teach an AI to make art, put a photographer out of a job, or identify someone’s face to police? When they take your information to train AI, companies can ignore your original intent in creating or sharing it in the first place.
There’s a thin line between “making products better” and theft, and tech companies think they get to draw it.
To see this article and to subscribe to others like it, please choose to read more.
Source: AI trains on your Gmail and Instagram, and you can’t do much about it – The Washington Post