Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Following pushback, Zoom says it won't use customer data to train AI models (darkreading.com)
392 points by AaronM on Aug 14, 2023 | hide | past | favorite | 179 comments


If your data is stored in a database that a company can freely read and access (i.e. not end-to-end encrypted), the company will eventually update their ToS so they can use your data for AI training — the incentives are too strong to resist


I don't think that holds up.

Paying customers absolutely HATE the idea of their data being used to train AI models without their permission - this Zoom story is just the latest example of that.

Companies that try to do this will get burned. Zoom just got burned really badly, and I personally don't think they actually intended to even do this - they just didn't make it clear enough that they were NOT going to do it, which sparked a PR nightmare firestorm for them.

I think the incentives for companies are very much the other way round: if paying customers hate this, then the incentives are NOT to do it.


I promise not to use any of the data I collect about you!

Except to improve services (ML training!), Advertising (We'll sell your data to advertisers!), and by government order (pick your favorite three letter agency.)

Alternatively: They'll just use your data anyway and not tell you about it. How is anyone going to prove their data was used as the source? It's like all the NDAs people sign when they join a company and they pinky promise not to use it at the next job where they land a big fat raise and promotion... suuure they aren't going to take what they've learned and improve upon it to try and get more promotions and raises in the future.

It can't be stopped.


> It can't be stopped.

Maybe. Maybe not. Hard to tell without trying.

I applaud the EU and California for giving it a go with their data protection laws. I really hope their crackdown on this stuff is effective.


> It's like all the NDAs people sign when they join a company and they pinky promise not to use it at the next job where they land a big fat raise and promotion

Uh no. An NDA would cover proprietary intellectual property, not tools everyone else also uses. Unless you're now working for the previous employer's competitor, it's unlikely that proprietary tech would ever be used. Working for competitors and partners is usually also forbidden for some period of time after leaving.


> How is anyone going to prove their data was used as the source?

1) whistleblowing 2) compliance audits (think soc2)


How would a soc2 audit uncover that?

They're asking for evidence that your admin accounts are reviewed for permissions needed each quarter, that you are doing your DR tests, that you are following documented change management processes, etc, not asking to look at what your running containers are doing.

There is absolutely no way that any compliance audit I can think of would or would even attempt to uncover that kind of info.


I hope you are right but I fear it will become a process of boiling the frog. Companies that are in the business of renting access to your data will get increasingly clever at moving in this direction in small incremental steps, providing some user benefit along the way.

We can agree that Zoom did a terrible job of rolling out their new terms, regardless of what their intention was. What other companies will learn from this is to improve the roll out.

Once local/private inference becomes more viable, there will be even more of an incentive for the companies who store unencrypted data to use it as a competitive advantage.


This. I wouldn't be surprised if people give up after a while of searching and migrating to alternatives to each service that started using or selling data for AI training.


ehhhh I think some customers would be ok with it if it meant discounts. If your software is good enough, you can tell customers to go elsewhere. The theory is that AI done well can be incredibly powerful ($$$) so you have to build it or risk being left behind.


Sure, consumers are likely to say yes to that - but I'm talking about companies here who are much more sensitive to what happens to their data.


> Paying customers absolutely HATE the idea of their data being used to train AI models without their permission

Paying customers will be the ones to resist for the longest maybe but that frog too will be boiled eventually.


Legislation should step in to forbid applying new or changed ToS to data collected before that ToS was explicitly accepted by a user. Would be a PITA for the business (as they now have to track under what ToS version each piece of data is stored under), but hopefully enough of a PITA to make them treat that data as the liability it ought to be.


Sounds almost reminiscent of Jamie Zawinski's Law of Software Envelopment: "Every program attempts to expand until it can read mail. Those programs which cannot so expand are replaced by ones which can."


Huh? Barely any programs on my computer can read mail. That’s a silly law.


It's probably outdated. A modern version might be "every program expands until it becomes a platform with instant messaging capabilities and an app store..."


I mean, if they have a bunch of enterprise customers there’s a fairly strong disincentive; any sensible corporate customer, if they did this, would presumably say, right, cool, MS Teams time.


I'm having trouble parsing your comment. Are you saying MS Teams is more or less likely to use your data to train AI?

I'm assuming it's safe because they make a big deal about compliance. But on the other hand MS have a huge incentive to obtain data for AI since they are going all in on it.


Microsoft is unlikely to _secretly_ do it, certainly; they would be sued into oblivion, and that’s before the EC gets to them (for GDPR enforcement grinds slow but exceeding fine, and also with exceedingly big fines). US regulators might also take an interest.

Also, modern Microsoft’s _whole thing_, more or less, is “we are your trusted enterprise partner who definitely won’t do bad stuff with the data you put in our cloud services”. They are unlikely to throw that away for a bit of flavour-of-the-month AI boosterism. Note that they’ve recently released a private ChatGPT thing; they can’t credibly acknowledge the problem with one hand and exploit it with the other.


I think most user data will be completely useless for AI training. No context, no quality control. Even assuming wisdom of crowds and on average you get the correct answer, you are training an AI to give a mediocre response. Back in the days of Big Data hype, many companies decided to store everything because data was valuable. But in most cases it wasn't worth the cost of the hard drives, because it only has value if there is a buyer willing to pay the cost of making use of it. There is a lot of data out there, but it is information and knowledge that has value.


Unlike consumers, many enterprises really care about data policies. Procurement, compliance, legal and audit teams inside enterprises often have the power to block purchases and contract renewals. Many of those departments actually read ToS too.

This makes it difficult for B2B companies like Zoom to use customer data for AI training.


Our enterprise issued an ultimatum recently requiring no software outside of our MS contract. Which is going to be severely limiting, but painfully doable. This move really seems to be in reaction to the Zoom debacle.

Now we'll be forced to use Teams for online trainings after pretty much universally using Zoom since the pandemic. Our customers are gonna love that.

It's above my pay grade but I wonder if we've already signed our data over to MS for a certain price, with that stipulation.


Yep. The initial decision was certainly driven by many stakeholders who deeply believe this is a key advantage to acquire. Users are fighting an uphill battle, and no amount of pushback short of lost revenue will stop them. They'll find another way.


no need to even update the ToS

just call it "fair use", like OpenAI and GitHub


Yeah. And then regulars will argue for you that changing identifiers but otherwise copy and pasting code is perfectly cool fair use regardless of your codes licensing (I do not license any of my code for free corporate use, for example, but apparently, as long as chat GPT outputs the exact copy and paste of my code with a few things slightly changed, it’s cool).


I mean, around now would be a fairly convenient time for giant companies to change their mind about copyright. After all, that creative commons drawing and GPL code are only protected because of copyright, correct?


>a fairly convenient time for giant companies to change their mind about copyright

but then they would not be able to do this https://news.ycombinator.com/item?id=37100140


I am speaking tongue in cheek of course, if they weren't themselves protected at all it would not work out for them. They couldn't own the distribution system well enough to keep the cat in the bag.

But I can see them being okay with bringing copyright back to 10 years or something, they make most of the money from publishing a new television series early on. If they made it ten years, paying for streaming old television series doesn't require paying the copyright owners anything so it's pure profit for the streaming service.

Of course people can go elsewhere, but if they can only get things made in the last 10 years on some particular service, they'd just use the same to view old things... and pay the website without the creators getting anything.

So every distributor wins because they can all offer the older material as a pure profit for themselves and an enticement to join the service (on top of the actual unique material from the last 10 years).


Yeah, I think it's safe to assume this. One time as a contractor, I was working on a GIS-enabled game for some company. Their privacy policy was very clear that they will not use or sell your data for any reason. It was even on their home page, because it was a key selling point. By the time I was involved with this app, the money was running out, and I vividly remember being in a meeting with the leadership where they were enumerating their options, one of which was to find someone to buy all this user data they'd collected. Their commitment to privacy disappeared the instant it was tested. Ever since then, I just assume this is how every company works. No matter what they say, they're selling your data.


This suggests that we require an explicit law that forbids changes in ToS from ever apply to data collected retroactively. With a real monetary punitive incentive that allows users to sue.

I absolutely believe that companies should have the freedom to change their ToS moving forwards, and that "promises" to "never" change a ToS are worthless (your example is a perfect reason why).

BUT, I simply don't see how it could ever be fair to use data retroactively. If you change your ToS, you should only get to monetize new user data going forwards. It seems like a basic legal principle.

Is there any existing law/precedent that suggests this is already the case, i.e. that such a company can be successfully sued but that people don't usually try? Or do we need new laws around this, and are there any government reps pushing for this?


Sounds good in principle because you're worked up and mad about your data (which is the correct reaction).

I just don't love the idea that my dinky little app has to ask every customer every time I add a new feature significant enough (debatable) or different enough (debatable) that uses their data in a way either I or they didn't anticipate (debatable). God forbid I try to monetize it (debatable). 'Control over your data' is meaningless in our current paradigm and I'll rue the day something like GDPR comes to the US in a meaningful way. No wonder the EU can't build.

As for this specific article, Zoom's (rightfully) getting heat for this but I don't blame them or any company for exploring how they can monetize every last morsel of data. In zoom's case (and many enterprise software companies), customers are paying a shit load of money and they didn't sign a contract and consent to give data for training an LLM.


> I don't blame them or any company for exploring how they can monetize every last morsel of data.

I absolutely do, if it's customer data that the company previously promised not to monetize. It's not their data to do with as they please, after all.

But the tech sector has fallen very far in terms of ethics so no company can be trusted. It's just a shame. The public views our industry in a very, very poor light and that view is 100% earned.


IANAL but it seems to me that this "law" already exists.

A TOS is a contract. It literally stands for "Terms of Service." Meaning, you give me money and here are the terms under which I will offer you the service you are paying for. How enforceable that "contract" is depends on a ton of things, differing in various jurisdictions (law is complicated), but it is - at the end of the day - a contract.

So I don't know how actionable it is, but the OP said that the company considered changing their TOS for currently active users. That could, in theory, be breach of contract and the customers might have a claim (again IANAL).

[There could have also been a clause in the TOS saying that they could change the terms at any time for any reason - though I suspect in many if not most jurisdictions, that would make the entire contract unenforceable].

In your case, don't make [potentially] contractually binding promises that you can't or don't want to keep.


But your app doesn't have to do that.

The TOS are generally broad enough from the start that you can do anything you want with user data as is necessary to provide product features. Nobody's updating TOS every time they add a new feature.

Realistically, this is specifically about situations around selling data to third parties, and/or training for AI that is not related to product features. (There's a big difference between Zoom using chats for building LLM's, versus Google training on Gmail messages to build Gmail autocomplete.)


> No wonder the EU can't build.

That was unnecessary.

> I don't blame them or any company for exploring how they can monetize every last morsel of data.

That's how a company works: try to do everything they legally can to make as much money as they can. Society has to decide of the framework into which companies optimize, and that is materialized with laws that the companies must follow. In the EU, there is a tendency to believe that users have a right to some kind of privacy.

Of course, this constrains what companies can do, and you could say "no wonder the EU can't build". I just call that cultural differences. In most countries in the EU, people don't have to start a crowdfunding campaign when they go to the hospital, because they actually have some kind of social security. I am all for GDPR.


> That's how a company works: try to do everything they legally can to make as much money as they can.

No, companies don’t need to be like that. This is a meme that needs to die. Companies can have a set of values (principles) and act according to those principles. Any investors can be told ahead of time the principles by which the company operates, and if they don’t want to buy stock on that basis, they’re welcome to stay out.

Bryan Cantrill has had some excellent rants about this over the years. Eg: https://youtu.be/bNfAAQUQ_54 . His take is that money for a company is like fuel in a car. You don’t go for a road trip (start a company) because you want to get more fuel. You go because there’s some place you want to get to. And fuel (money) is something you need along the way to make your journey possible.

Don’t let sociopathic assholes off the hook. They aren’t forced to be like that. They’re choosing to abandon their ethics and common decency. Everyone would be better off if this sort of behaviour wasn’t tolerated.


> No, companies don’t need to be like that.

Well, they don't need to. But the people at the top make more money if they are. And they are not at the top because they have principles: they are at the top because they want power or money.

> Companies can have a set of values (principles) and act according to those principles.

I would love it, but I just can't buy it. Like at all. How many big companies do you know where the executives don't get a much higher salary than the employees? Humans can't help it: if they are in a position of power, they will think they are worth more.

> Any investors can be told ahead of time the principles

IMO, if you have principles, you are not an investor. And investors want to get ROI, which is more likely from companies that don't have principles.

> His take is that money for a company is like fuel in a car.

Sounds exceedingly naive to me :-). The driver does not get fuel at the end of every month.

> Everyone would be better off if this sort of behaviour wasn’t tolerated.

Yes. We need laws, set by the society. We need the people to understand that they will never be one of those rich executives, and to vote for laws that prevent them to become indecently rich.


You don't need to ask about adding features, just put in the ToS that the data will be used for app features and metrics for improving user experience.

Monetization by adding paid features falls well within those boundaries. Monetization by selling user data to whomever will buy it does not.

I'd really love to have a GDPR specifically for people like you who feel entitled to do whatever they want with collected data. I'd love to have had it when reddit decided to charge outrageous prices for the API.


That's a fair assumption in my experience from running companies. I've seen management with ethics, but even then, eventually new people are in charge, shares (or the whole thing) get sold... That's what contracts are for, commitments from a company are commitments from people that might not be around very long.


Most EULA contracts have a clause that expressly permits the company to abandon their obligations upon sale of the business or other such events. However, that seems somewhat redundant, since these same contracts also assert that they can modify the terms whenever they want. For the average user, there's little to no recourse when the original contract gets violated.


Courts seriously need to crack down on these EULAs


Or they'll do it without updating the ToS. Nothing's stopping them.

Adults realize other adults do what benefits them.


The GDPR is stopping them.


But is it even enforced? Take software licenses. Most software out there uses open-source licensed dependencies. Even the permissive licenses require attribution.

But most software does not honor those licenses, and nobody cares. Enforcing such a law takes money.

I guess at least the GDPR can be enforced, to some extent, with Big Data. It seems like the fines are usually ridiculously low (they don't seem like an intensive for the company to change anything), but that's better than nothing.


It's not very well enforced but the fines are large, and big companies like Zoom will have suitably paranoid legal departments ensuring that it is followed.


Then it is time to take control by seeing such databases with false information. We could probably design data specifically to attract an AI system looking for well-documented data. I wonder how images of Steven Colbert uploaded to imgur it would take to convince the AI that he actually was.


I don’t think this is a possible approach. It would basically mean needing to feed in so much noise that a human wouldn’t be able discern reality from fiction given no priors. Modern ML is too smart


That's why I said president of canada. A non-intelligent AI would not find any priors indicating who was the president. There isn't one. So a few thousand images in the database, against zero information to the contrary, might be enough.


AKA encryption.


Given how easy it is to keep an average human from discerning reality (see the last decade of boomers/gen xers on Facebook as an example) and the massive potential to create slight variations with LLMs, I don't think a Stephenson-style misinformation propagation campaign is that far outside the bounds of possibility.

That's a lot of compute power to waste on it, but I would guess that that's what bot networks are going to be used for in the future (or already are, right now, if they're done mining bitcoin).


Zoom is just disappointed the ToS change went viral, and that their reputation is privacy friendly enough for that to even matter.

I wonder if Teams would face similar uproar, assuming that bit isn't already in the ToS.


> I wonder if Teams would face similar uproar,

Maybe, but Teams is very good at ignoring uproars. While I assume that there are people who feel differently, everybody I know already loathes Teams and only uses it when their employer forces them to anyway.


While that’s true, isn’t part of that due to Microsoft having enterprise-friendly licensing nailed down? I’d think that doing the equivalent of industrial espionage would remove Microsoft’s offerings from some industries extremely fast. (Corporate legal departments do read licensing terms!)


If by "enterprise-friendly licensing nailed down" you mean "it's free if you buy something else".

People use it only because someone up the chain sees it's included in Office and they're like "we're not paying for something else if we get this for free". I hope Slack and others bring MS to court to stop this, this is exactly what happened during the browser wars.


I use Teams a lot, because it is (here, at least) pretty much the only such app that everyone knows. Meeting across groups or companies? Teams.

Aside from the usual (and understandable) MS hate, I don't see the problem. Features and performance? Nothing else is better, most are worse. Cisco is a mess, Skype is horrible, Zoom lies about their security, etc, etc


> Nothing else is better, most are worse.

Well, we have very different experiences with Teams. It's perhaps not the worst, but I think it's pretty bad in the sense that it's painful to use and gets in my way.


Google Meet took a battering in the original Zoom TOS thread but it's the main video call software at my current job and I don't mind it. Teams and Zoom both completely suck. Meet is kind of just fine?


Teams is a mediocre chat app, but that's the status quo in the industry. It's pretty good at video calls


Slack is pretty good, and their new video huddles handle the use case of internal meetings very well.


Yeah— the scale of uproar MS would need to budge on something that affects their core business goals probably dwarfs the number of teams users aware enough to care by 100 to 1.


The only thing that would get O365 out of companies would be if Microsoft started having a Hunger Games for CEOs


And now that Balmer is gone, MS Sustenance Pro Executive Marketplace Edition will never get internal traction.


Teams will use your O365 enterprise data to power it's AI 'copilot' offering, it's literally what its customers are asking for.


Microsoft culturally is extremely averse to using customer data for doing these kind of things. I was once talking to a Microsoft exec and he said that once the idea of using contextual ads in Hotmail was brought up (similar to Gmail), and it was shot down hard. The idea of using customer data (even non-paying ones) in this fashion was anathema. Microsoft makes its money from massive enterprise contracts which might be threatened by someone using your data to benefit your competitor in any way.


GitHub is owned by MS?


This. It throws the idea that MS is extremely averse to using customer data for these sorts of things into great doubt, doesn't it?



Related:

Zoom's TOS Permit Training AI on User Content Without Opt-Out - https://news.ycombinator.com/item?id=37038494 - Aug 2023 (35 comments)

How Zoom’s terms of service and practices apply to AI features - https://news.ycombinator.com/item?id=37037196 - Aug 2023 (177 comments)

Ask HN: Zoom alternatives which preserve privacy and are easy to use? - https://news.ycombinator.com/item?id=37035248 - Aug 2023 (16 comments)

Not Using Zoom - https://news.ycombinator.com/item?id=37034145 - Aug 2023 (194 comments)

Zoom terms now allow training AI on user content with no opt out - https://news.ycombinator.com/item?id=37021160 - Aug 2023 (510 comments)


Zoom definitely has several AI models (as does teams, google chat, etc.)

They do automatic captioning/transcription of meetings, so there is a model for that; they do automatic background blur/cutout, so there is a model for that; they are probably working on a "meeting summarization" product for that.

Those are features that people love and use all the time. I would be curious to know how anyone expects Zoom to improve on these features without collecting data from real users on the platform.


The real issue isn't that they may use customer data for these things. It's that they may use customer data without getting consent first.


Too late, it took me an hour to spin up a self-hosted Jitsi instance and I have no reason to switch back.


This. I saw the post and had a mildly customized Jitsi running on my docker host on Hetzner within 2 hours. I'm amazed at the performance and stability of the Jitsi.


What do you host jitsi on?


A NUC in my garage


Any insights or lessons learned would be greatly helpful. Several people I know, myself included, are about to go down this route.


I used the Docker instructions on their website. Just follow it very closely and make sure the right ports are opened. Then work on configuring it. You can adjust the video quality up to 1440p in the config.js - this quality blows away Zoom.


Have been working on a list of AI companies that train on user data: https://github.com/skiff-org/skiff-org.github.io/blob/main/b.... Will update the Zoom section but still suspect.


What differentiates "AI" from "ML" or other predictive modeling that finds its way into an application? I fit a regression on some customer data last week, am I training an AI on customer data? Is it a matter of intent? Of being public-facing? Of being specifically a generative model?


For me, the main issue is the potential of POI/confidential information being leaked.

I think it's fine using "usage data", but the contents of a private conversation should be considered to be... private.

LLMs and generative image models have shown the ability to leak/reproduce training data. That's a big deal.


The original thinking was generative. That is a good distinction to add. I do think regression/analysis doesn't have many of the same issues, ex. no one would have an issue if Zoom said "we're going to analyze how many minutes users spend on Zoom by time of day"


They obviously do statistical analysis of user data still which I’d argue is in the “AI on user data” bucket. So no I don’t think that applies


Does Zoom store video or audio data from calls? This is really the key question. If anything is stored, they can't be trusted.


If you record your video call, Zoom will store it for you. Otherwise, it shouldn't be stored. Also if video streaming protocol like HLS or MPEG-DASH is used, this will store the stream in video chunks, which are deleted later.


Is again disappointing that big companies just try to push this policies in the hope they will go unnoticed. What kind of person think this is ok? Is just a money grabbing exec? We need to be better than this


"Following Pushback, Zoom Says It Won't Use Customer Data to Train AI Models"

Yet...


How can they do training in the first place if everything is E2E...?


Either machine learning is so good it can pick details out of an encrypted stream or the company is using end-to-end to mean end-to-middle-to-end where company records everything in the middle.

One of those explanations seems much more likely than the other to everyone, but curiously I think some people will disagree about which side is implausible.


They can have E2E and have a secret participant in meetings recording anything. So while technically E2E they can have access to whatever meetings they want.


It's E2E, but one of the ends is on the middleman.

That's a new fun and exciting definition of E2E a lot of people are pushing.


Translation: not E2E



I see lots of comments about the meaning of end-to-end encryption but less about what actually happens here.

Zoom, like Meet, Teams, WebEx, and many others to my knowledge is "encrypted" but not by default "end-to-end encrypted" in the normal meaning of the term. (Some of these have options for E2EE but it's buried in the service configs and not easy to enable.) So they can and do see audio and video on their servers (as can anyone who breaches their infrastructure) by design. The encryption in this default mode only prevents your ISP from seeing the content of the call.

As a distinction, Signal calls are E2EE -- Signal doesn't see unencrypted video/audio for calls, even ones that are relayed through Signal servers. And even in that case, Signal still knows the participants of the call, just not what is being said.

(As a side note, this is why we built Booth.video -- to demo that this isn't a fundamental tradeoff and it's possible to have E2EE, metadata-secure video conferencing in the browser.)


Yup, Signal is the industry standard into getting actual privacy. But one player that deserves a shoutout when it comes to privacy, is Whatsapp. Even after becoming a Facebook company, it has kept E2EE, for messagings, group chats, and calls. And they do so by using the library that the greak folks from Signal put out.


> Even after becoming a Facebook company, it has kept E2EE, for messagings, group chats, and calls.

According to their own claims, right? There's no way for anyone to verify that they're actually E2EE, just Meta's word that it is so.


The use the Signal library. A security researcher could analyze the binary, and check that it always calls the signal library correctly. Prove that they don't, and congratulations, you will become a very famous security researcher.


Except Signal's founder probably has/had a connection with the NSA. All security is for making it hard for the common attacker, and hostile countries. The NSA, most likely, has social engineered its way into every stack and every important org.


What? Where did you get that from?


>As a side note, this is why we built Booth.video -- to demo that this isn't a fundamental tradeoff and it's possible to have E2EE, metadata-secure video conferencing in the browser.

now i wonder how you did that. Is the key exchange of participants happening out of band?



i think it cleared a thing up or two. However, would you mind sharing why insertable streams are apparently required for this to work? As WebRTC traffic is encrypted already E2E it seems to me that constructing the SDP with the key, currently used here with insertable streams, would be good enough.


Sure. So WebRTC is encrypted between peers when 100% of the communication is going peer to peer. But in most WebRTC services, your peer is actually the SFU, which is the server. So you're encrypting to the server, not to the other participants. (Most "pure" WebRTC platforms switch over to SFU-based communications at 4 or more participants, but many of the bigger platforms always send video/audio through the SFU regardless of how many participants there are.)


Not all meetings are e2e encrypted, because encryption disables tons of features, like cloud recordings, apps, etc.


What is zoom’s definition of End-to-end?


The standard industry definition. It's just that not everything is E2EE, you have to turn it on: https://support.zoom.us/hc/en-us/articles/360048660871-End-t...


E2E means everything between two ends is encrypted. Once it gets on their end, they can do what they please.


That's not what E2E means at all. E2E means only the parties communicating can decrypt the data i.e. the sender and the receiver. Anything short of that isn't E2E.

https://en.m.wikipedia.org/wiki/End-to-end_encryption


> That's not what E2E means at all.

It’s kind if what it means? OP’s question is w.r.t to the receiving party’s ability to consume the data. The point that’s being made is that E2E doesn’t mean encrypted at rest and receiver can’t consume the data.

I see a lot of comments nitting on the wording for a lack of specificity but, IMO, OP’s question was more about understanding what goes on at the two ends of the pipe. The point being made is that the recipient can still chose to do whatever it is they want with the content.


I am the op? E2E dose mean that only the sender and receiver has the keys. You can't redefine what E2E is.


it's what it means when zoom says they have E2E. it is a deception.


I agree with you, but to be honest I don't care what zoom says. I am not going to let them redefine something so it suits them. Might as well call it potato encryption.


Yeah, that explanation is just TLS.


E2EE implies both ends have an encrypted channel to transport data to each other directly, without an intermediary step. this is the very definition of the term, at least it is in my mind. Having the data only encrypted to and from their servers would merely be transport layer encryption. Although i have no idea whether they implement one, the other or both.

In context of video conferencing software (WebRTC specifically) this is actually somewhat interesting, because typically the signaling server is the one who hands out the public key of the other peer and needs to be trusted, so they could by all means deliver public keys to which they posses the keys for decryption and it therefore would allow them to play man in the middle in a typically relayed call. So even if E2EE is implemented, it might be done poorly without figuring out how to establish trust independently.


Yeah, the key delivery is the hardest part if you are privacy focused. Signal and Whatsapp have a screen, where you can generate a QR code, and use that to verify that you and your contact have exchanged keys without a man in the middle attack.


I wish browser would do something similar with their WebRTC stack. Something that shows independently of the site (out of its execution context) which keys are used and allow for an easy comparison of them independently. But i don't know of such functionality being there yet.


For some definition of "end." Semantically, E2E encryption should mean encrypted end-to-end between you and the person you're calling, without Zoom having the key or ability to decrypt it. For example, this is Signal's definition of E2E encryption.


Yes, E2E means everything between two ends is securely encrypted, but there is no "their end" between participants in a Zoom call, the Zoom company isn't an "end" in this conversation. If someone like them between the speaker and the listener can decode the data, that's not E2E.


This isn’t what E2E means for communication software. E2E means only the participants have the keys. Signal is a good example of this, the message is encrypted from the sender to the receiver and Signal themselves cannot decrypt it.

Separately, most Zoom meetings are not E2EE. That’s why features like live transcription work.


Only the participants do have the keys. You, the other people on the meeting, the company running Zoom, at least one government. It's still usefully encrypted to stop (at least some) other companies/countries benefiting from the information.

I think zoom probably have a defence against the fraud accusation that no reasonable person would believe end to end encrypted meant zoom doesn't have the data as that's the whole point of the service existing.


Zoom has not committed any fraud. They clearly state that by default their meetings are encrypted, but not end to end encrypted. And that you can turn on end to end encryption, but that it causes a bunch of features to be disabled. I think this is a great balance between being able to add features that are impossible with E2EE, but allowing privacy concious users to choose if they need stronger encryption.

https://support.zoom.us/hc/en-us/articles/360048660871-End-t...


e2e means/implies that only the endpoints (i.e. the users) get to see the unencrypted signal. If Zoom truely uses e2e encryption no trainable data would exist on their servers. Of course, they control the endpoint software too so they could make it do whatever they want realistically.


The answer has always been fairly simple. Allow users the choice to opt in if they'd like to. Transparency is key.


All they had to do was offer users a 10% discount to agree to it explicitly, and enough would have agreed out of millions of users to generate tons of training data. They were both greedy and stupid here and ended up shooting themselves in the foot.


Only if every participant in the call consents.


So what's the deal with something like employers requiring use of this. Is there any limit to what terms you must agree to to be employed somewhere?

It seems pretty weird that if your office used Zoom, that you would need to agree to all these terms that aren't part of your employment contract to actually be employed.


Under US law, there aren’t many relevant governmentally imposed limits on this kind of thing, no. This would be a case either for collective bargaining (unionism) regarding this aspect of working conditions, or for advocating some of the worker rights that have been legislatively recognized in regions like Europe.


i see nothing indicating collection of data wont happen.

i see nothing indicating data wont be provided to third parties.

i see nothing indicating third parties will be prevented from using aquired data to train AI

i see nothing indicating zoom will not aquire trained models from third parties that use Zoom harvested data in training.


Why should I believe that they're telling the truth? What's to stop any unethical company from doing it anyway, and not telling anyone?

There is no such thing as a training model auditor.


Their consent order with the FTC also contains a prohibition against privacy misrepresentations, so it would probably get audited during their biennial assessments. For some unethical company that doesn't get regularly audited, yeah they'd probably get away with it unless it got leaked.

> Finally, the company must obtain biennial assessments of its security program by an independent third party, which the FTC has authority to approve, and notify the Commission if it experiences a data breach.

https://www.ftc.gov/news-events/news/press-releases/2020/11/...


> 22 years after the $63 billion Enron collapse, a key audit review board finds the industry in a ‘completely unacceptable’ state

https://fortune.com/2023/07/26/pcaob-audit-completely-unnacc...


>probably get audited during their biennial assessments

a lot of things can happen in 2 years though


The FTC probably can't fine them enough to make the training unprofitable.


Can't they? How much can they fine then?


Might me more of a "they won't" rather than a "they can't", regulatory capture and all.


You could say that about most ToS bits. A lot of them are hard to prove. This at least provides a potential legal remedy in the event that a) they are lying and b) we are able to determine that.

It's better than nothing (assuming you're still using Zoom).


At a certain point, you need some basic level trust do business with anybody. Regardless of what the ToS say, the company could do anything with the data. Even supposed E2E encryption has often been found to be either not really be encrypted, or unintentionally vulnerable.

Our whole system is based on assuming a degree of trust, based on both social norms and reputation of prior interaction, with a confidence of remedy in the case of a failure. If we really had to have much stronger confidence up-front in commercial interactions there would be a lot more friction and overhead in every transaction. Dealing with the occasional fraud seems like a better tradeoff.


The upside is too high to trust them, leaving aside any geopolitical stuff, it's just a juicy business.

New Zoom Subscription Tier:

Virtual agent with perfectly fine tuned domain-specific knowledge performs 99% as well as your sales/support person.

24/7/365

$400 per month


There are model risk analysis services among big-4 and boutique firms, and these fit into conventional audit processes as domain-experts. Similar services be bought apart from audit services as risk consulting from the same firms or alternately the familiar names in management consulting.


The danger of massive fines in Europe and being sued absolutely everywhere, I’d assume.


You just shouldn’t!


Where did Zoom say this? There's no link to like a blog post or social post or something?


Ah, they updated the previous blog post (https://blog.zoom.us/zooms-term-service-ai/):

Editor’s note: This blog post was edited on August 11, 2023, to include the most up-to-date information on our terms of service. Following feedback received regarding Zoom’s recently updated terms of service Zoom has updated our terms of service and the below blog post to make it clear that Zoom does not use any of your audio, video, chat, screen sharing, attachments, or other communications like customer content (such as poll results, whiteboard, and reactions) to train Zoom’s or third-party artificial intelligence models.

It’s important to us at Zoom to empower our customers with innovative and secure communication solutions. We’ve updated our terms of service (in section 10) to further confirm that Zoom does not use any of your audio, video, chat, screen-sharing, attachments, or other communications like customer content (such as poll results, whiteboard, and reactions) to train Zoom’s or third-party artificial intelligence models. In addition, we have updated our in-product notices to reflect this.


Either way this is old news from days ago also, posted in various forms


wow, you just did a whole thing there by yourself


Streaming Data vs Batch Data.

You can't expect to train AI models without some sort of storage mechanism to train on. If they made a 'ninja edit' to their TOS, does this mean they've also backtracked on their data collection?


Is this actually true? Can’t you do online training in real-time, at least in principle? As audio comes in, for a micro batch of current calls on the local node, do STT, next token prediction, and calculate your loss. Transmit the loss update back to the centralized model.

Google posted about Federated Learning years ago: https://ai.googleblog.com/2017/04/federated-learning-collabo..., not sure how widely it has caught on though.


Screw zoom for such blatant tactics & asking their employees to work from office. How blind or horrible does your product have to be that not even your employees would use it to get work done lol


Zoom said: “we won't use your data to train AI without your consent”, but given that they require your consent to join a zoom call you can see what to do with such a statement.


"Says" is doing a LOT of work there.


The recent messaging offensive from its CEO tried to cast previous change as a lapse in process, but refused to elaborate further when pressed on more subtle points. All in all, it does smell like bs, but I am glad there is a clearly a level of scrutiny companies appear to face lately.


The cynic in me says they'll just pass the data to a brand new company that'll spin off of them. I'll always assume you're trying to do it, now. I have even less reason to trust them than I previously did.


OK, now roll back RTO and then they’ll be a respectable company again.


The pushback must be constant, or they will wear us down.


Ok, what if they change their mind just like they did now?

Also what if they break the law? Who is monitoring that? If detected, who is enforcing it?


To be clear, I dont mind zoom using data from their service to train "their AI models", particularly where these are transparent and specific.

I was more concerned about the wording, which implied they would give themselves the right to use the data to train "AI models" more generally.

I have few problems with them building a better noise cancelling solution for their platform, but lots of problems with them selling it for improving third party surveillance and fingerprinting.


If you discuss proprietary information you should be very concerned about Zoom training their models on that. Especially when they pivot into generative AI (which is the obvious use case if you have that much conversation data flowing through your system).

LLMs can regurgitate training data unpredictably so you really can’t have any enterprise data flowing through such a system.

I guess my point is that “their AI models” will very likely include more than noise cancelling before too long. It’s too juicy a dataset to ignore.


Yes. That's pretty much what I mean by "specific and transparent".

Provide me with clear uses, and the ability to withdraw or restrict my data contribution in the event of the company deciding to "expand" to other AI "solutions", and I'll feel respected as a user and allow that specific use of my data for training.

But reply with vagueness giving them a carte-blanche to use my data on anything under the sun, and I'm just gonna look somewhere else and encourage others to do the same.


I never quite understood everyone’s obsession with zoom, but i suspect we wont be hearing about zoom for long.


It's wild they needed "customer pushback" to know not to do that. Something's fishy here...


Why? I'm sure their customers and investors, since Chat GTP, have been saying "You need to implement some AI Features, google and Microsoft are moving here, why aren't you?" and the company thinks, okay, you want AI Features, for it to be accurate, we need to look at real data. But no customers want AI trained on their data.


That's unfortunate. I was looking forward to the next generation of audio and video denoising solutions.


Too late.

Now are you willing to abandon the rest of the other companies using your information to train their AI models? (Looking at Google, Microsoft (GitHub), Meta, Instagram, etc)

Now should be the time to self-host then. Whether if it is a GitLab, or Gitea instance for Git, or a typical Mastodon server with a single user that controls the instance for full ownership.


Oh do I have something to say about this!

How are you going to serve your content? What is the best tool out there — OwnCloud?


Great, but they already lost user trust. Many will never install or use Zoom again.


Eh, people have short memories. There was a major scandal back in the day where (forgive me if my memory is not super accurate, it's been a while) they were basically starting a long-running daemon process as root and binding it to a port where it would listen for instructions, so even when you closed Zoom it was still actually running. There were other big security (encryption IIRC?) issues that they had. Huge privacy and security scandals, so bad that they ended up acquihiring Keybase (still sad at that loss personally).

But still it didn't matter, nobody remembers or cares (except me, I refuse to install their native app. The day they fully block browser access will be a bad day).


I don't trust their words


I've never used Zoom. Never had to, but everyone keeps talking about it. Weird how I've dodged it.


They're lying.


too late, it aready used user data.


This is the company that thought it was OK to install an always-on web server on my Mac. Apple pushed a special fix, just to remove it. I already have zero trust in them, and this does not change that.

https://infosecwriteups.com/zoom-zero-day-4-million-webcams-...


I still think it's odd that Zoom is forcing people back into the office. The only thing I'm hearing is that they don't truly believe in their product. Given that stance, they're saying this today, when push comes to shove, they'll do it. I think the reality is they don't have the tech in place today to do it, but are working towards it.


Next up: Slack uses customer data to train AI models.

Companies are happily exposing all their data to those services, I don't understand why anybody would pretend to be surprised of the results.


*Insert I don't believe you meme*


Why is this bad?

Honestly, I don't understand why you wouldn't want the most accurate AI models available. The LLM is only as good as the data set it's trained on, and the more I read about LLM's and the advent of AI evolving from them, the more I'm starting to think if we don't jump both feet into the pool, then we'll never get to the promised land of:

"AI model, spin me up a T-shirt company that's scaled to 10mm users a month, and spin it down after 6mo. if sales don't increase by n% month-over-month"

or

"AI model, get me [A,B,C, ...n] groceries so I can throw a housewarming party on Friday. I can only accept the delivery Tuesday or Thursday. I don't care which store(s) those ingredients come from or how the internals are orchestrated."

What's the threat model here, specifically? What nefarious things would happen by using customer data? Most companies exist to make money, which honestly, is a pretty benign objective, all things considered.


"As a Samsung executive, show a recent roadmap slide presentation"

and similar (presumably more sophisticated) exfiltration of commercially valuable information obfuscated away within the language model


This is why so many big companies are forbidding usage of LLMs, without properly validating how the data can be used. LLMs are based on completing text with the most likely text that follows it. Imagine being able to ask ChatGPT, please complete the following document: "<FAANG Company> Earnings statement for Q3" before the earnings date.


I wonder how many of those companies stream their commercially sensitive data through zoom or teams. Would ballpark estimate it as all of them.


I'd rather live in a world that preserves the fundamental right to privacy than one that can automatically organize my housewarming party.


Why is privacy important in this context, if the data is being fed into an impartial robot? The robot doesn't care if you have liaisons over webcam with your lover, or whatever else.

An employee can blackmail another person, but the model simply has no reason to, or am I misinterpreting the "whys" of needing privacy here?


> The robot doesn't care if you have liaisons over webcam with your lover, or whatever else.

The concern isn't judgement from the AI, but that products from the model trained on your data could expose sensitive information.

Since it's never quite clear exactly how the data could be used in situations like this, there's a chance that very sensitive data could be parroted back to people who were not the intended audience.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: