Hacker Newsnew | past | comments | ask | show | jobs | submit | more otterley's commentslogin

Then you should offer to pay them for one. I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price.

They don't offer a ZDR [0] for files, even if you have a BAA or dealing with HIPAA data, no matter how much you pay them. Trust me, we have tried.

[0] https://code.claude.com/docs/en/zero-data-retention


I’m really confused. We were talking about SLAs, not other product features. Are you moving the goalposts?

There isn't an SLA nor is there any protections around file uploads to their services. Two, bad, things can be true at the same time.

Did you talk to them about purchasing an SLA? If so, what did they say?

I feel like you aren't really understanding what a Service-level Agreement actually is in practice. It's not a piece of paper with a specific number of nines and an associated price tag. They can be and often are very complicated documents that take multiple rounds of redlining to arrive at something both parties agree to.

If zero data-retention was non-negotiable for the customer, it's totally possible that the negotiations ended there.

I'm not sure what you're trying to accomplish or unearth beyond what's already been said, which certainly suffices for me.


As both an attorney and SRE, I understand what an SLA is. And you can absolutely get an SLA when you buy cloud services from many vendors, including AWS. Some vendors provide it at all price points; others include it at higher service tiers, without complex negotiations needed at all. And, yes, if it’s not on the menu, you may need to negotiate one. But you can’t conclusively say “they don’t offer one” unless you’ve actually gone to the company and asked.

https://aws.amazon.com/legal/service-level-agreements/

https://trailhead.salesforce.com/content/learn/modules/slack...

https://support.atlassian.com/subscriptions-and-billing/docs...

Before you casually accuse someone of not knowing what they’re talking about, first make sure you’re on firm ground yourself.


It seems like you could save a lot of time and confusion by talking about the SLA that you pay for from Anthropic instead of establishing your bona fides by posting links to various unrelated companies’ SLA pages.

Like how was your experience negotiating your SLA with Anthropic? What ballpark are you paying for the SLA with Anthropic that you have in place? How many 9s does your Anthropic SLA cover? Obviously you haven’t posted a half dozen times in this thread about how Anthropic by nature of existing offers SLAs without any knowledge of that, so some simple stuff about your SLA with Anthropic would be helpful.


I make no unqualified claims as to whether Anthropic offers an SLA. I never did. But I do know that it's unreasonable to claim they don't when you didn't even take the steps to conclusively determine it for yourself.

As I said: "I’m sure they’d love to hear from you, and they could probably deliver one to you for the right price. But it will be a high price."


Oh, well in that case, if posting URLs counts as proof of… something, there doesn’t appear to be any SLA page anywhere in their sitemap. https://www.anthropic.com/sitemap.xml

Maybe it is just common for enterprise SaaS businesses to offer SLAs without having a page about it though. Something like that could possibly be unjustifiably burdensome as well because it’s not like they could just type “make a page about how we offer SLAs” and have it magically appear


Not everything a business might be willing to do is listed on their public website.

That’s a good point. Having an SLA page is an indicator that a business offers SLAs, not having an SLA page is also an indicator that they offer SLAs, just secretly. If you think about it all of the people constantly complaining about uptime and saying stuff like “I would pay money for an SLA from Anthropic if I could” probably means that they are killing it with all those secret SLAs.

I mean obviously they have to offer them, because they exist, as otherwise you’d have to believe something crazy like “they don’t currently offer them” for reasons “that they haven’t disclosed”


Again, many companies will do things they don’t ordinarily offer for the right price. I’ve seen it happen myself (on both the buyer and seller side) on many occasions.

It goes to the extent of the company itself! Very few businesses publicize that they’re for sale or put their company’s purchase price on their website. But acquisitions happen all the time.

Anyway, I don’t appreciate your sarcasm coupled with what seems to be willful ignorance about how the world works, so I won’t be participating in this discussion with you anymore.


I don’t get it. If you wanted to convince everybody about a vast universe of secret business and your expertise in it, why would you start with telling people that weren’t able to get an SLA from Anthropic that Anthropic offers SLAs? And then admit that you don’t actually know and then double down?

Like if I wanted to convince people that In’N’Out has a secret menu (they do) I wouldn’t start by saying “They have the ingredients to make onion rings, therefore they sell onion rings” (they do not). They offer burgers with lettuce instead of a bun (“protein style”) though. That’s a fact that you can verify by going there or calling them and asking about it. I didn’t rely on my assumptions based on other fast food restaurants, I relied on my knowledge of the topic!

Edit: It seems like bad faith to admit that you’re using “probably” interchangeably with “I don’t know” and then editing in “for a billion dollars” several posts into a conversation.

I guess enjoy posting about entirely unrelated conversations in other threads though. (otterley’s post about my having previously had a short amicable exchange with dang in a different thread was deleted, but I’ll leave this part up. I think digging through people’s post histories to find unrelated grievances is icky, for lack of a better word, and wildly unhelpful for any type of discussion)

Even with the “for a billion dollars” addition, admitting “I don’t know” and “probably” are interchangeable doesn’t really change anything from a logical standpoint. Nobody argued against you not knowing, so I don’t understand the purpose of the repetition.


> why would you start with telling people that weren’t able to get an SLA

That hasn’t been established. There’s no evidence that they went to Anthropic and tried to negotiate one.

> that Anthropic offers SLAs

I didn’t. I said “they probably will for the right price.” There are two modifiers in that statement. And the price is unspecified. Their first offer could be a billion dollars. Too expensive? Negotiate down.


I would invite you to notice your interlocutor's assumptions, especially as revealed in his prior comment. Look at how he misunderstands the situation:

> If you wanted to convince everybody about a vast universe of secret business and your expertise in it...

> Like if I wanted to convince people that In’N’Out has a secret menu...

You are discussing business. He is understanding you to be attempting to "mog" him, because he cannot adopt a perspective wherein the conversation represents anything other than a vacuous social challenge or "brodown."

In short, you're wasting your time.


I am so old :(

I looked up “mogging” and I’d think “my assumptions about stuff are valid because I’m a lawyer and don’t know what you do” would count more as mogging than “that doesn’t quite sound right, this is a conversation about something specific and not your general cleverness” but I’ve got a Benny Hill archive to get through


Those are not assumptions on your interlocutor's part. You've embarrassed yourself quite badly, I'm afraid. I know you don't understand how, but that doesn't change the fact of it.

> You've embarrassed yourself quite badly, I'm afraid.

:( you are right. This isn’t the first time I’ve lost an argument because hours into a discussion somebody introduced “what if a billion dollars” or “magic amulet” or “ブルマの母” etc


It's just a world you've never seen. Don't take it too personally.

I appreciate your kindness. While I’ve got you, did you know that the Benny Hill show started in 1955 and a good chunk of what aired from then to 1969 was lost? There are a lot of fans that don’t even realize that what is sometimes labeled as season 1 is season 15! Crazy stuff!

A billion dollars is just an example. I could have said a million. When someone says "a high price" that's unspecified, you can use your imagination to hazard a guess at what that might be. Such a figure might seem unreasonable or unrealistic to you, but deals are done between companies under terms most individuals wouldn't come close to considering.

The only reason I mentioned being an attorney was because someone in the thread above accused me of not understanding SLAs. I don't ordinarily bring it up unless we're talking about law or contracts and I feel the need to defend myself or correct misunderstandings. I don't try to use it to browbeat anyone into submission, although I do believe that respect for others' lived experiences and education is relatively uncommon here on HN.

I also don't care for my words to be misconstrued to mean something I didn't say. I rarely speak in absolutes because I've learned over time that there are very few absolutes in the world. Thus, I include qualifying language in nearly everything I write. So when someone accuses me of making claims of certainty that I didn't make, I can get pretty defensive about that.



What right as a consumer do you have that is pertinent here, other than to have the vendor adhere to the terms of the agreement you have with them?

Anthropic has many customers despite the fact that they have occasional problems. They’re not suing Anthropic because Anthropic isn’t promising in its agreement something they can’t deliver.

I think you’re reading into the agreement something that isn’t there, and that’s the cause of your confusion.


I am not reading into an agreement, I am saying there is no agreement to be found to ensure service delivery and the associated liability that would come for any SLA. Also, where is the Anthorpic SLA for Enterprise?

Does it exist?

Just because people pay for things doesn't mean they know or understand what they are paying for. Nor is there the legal precedence to actually understand where the rub lies or how that impacts business.


> Just because people pay for things doesn't mean they know or understand what they are paying for.

I believe, respectfully, that’s precisely what is happening in this thread because you keep complaining about the absence of an SLA that was never in the agreement, as though it is—or is supposed to be—there, and therefore the existence of some “rights” that would flow from that.


There are no SLAs, in any agreement, thats the problem.


I know this is an unpopular opinion among freedom maximalists, but:

It’s precisely because CloudFlare isn’t responding like other CDNs to reasonable demands to cut off pirate origin sites that this mess exists. If they reacted quickly to remove configurations that are obviously facilitating copyright infringement, Spain wouldn’t resort to full scale ASN blocking.

How do we know it’s CloudFlare? Because other CDNs like CloudFront, Akamai, Fastly, etc. respond to takedown demands and aren’t being blocked. (Those also cost money and require customer identification.)

In an escalating war between the state and a corporation, the state will always prevail if they have the public’s backing. In Spain it’s clear that most people are happy to watch the match through legitimate channels even at the cost of blocking CloudFlare.


Sounds like the solution is for legitimate services to move away from Cloudflare. They contribute to the single point of failure by remaining their customers.

> It’s precisely because CloudFlare isn’t responding like other CDNs to reasonable demands to cut off pirate origin sites that this mess exists. If they reacted quickly to remove configurations that are obviously facilitating copyright infringement, Spain wouldn’t resort to full scale ASN blocking.

Apropos of anything else, CF is (reasonably) requiring a court order to remove offending material rather than just "well, company said so, so eh, just do as they say". La Liga complains that "oh, that's too slow for what we want" and just got a blanket ruling.

I am not a fan of CF but your argument seems to be "CF should just roll over any time someone says "hey, delete this", because, obviously, everyone knows it's problematic, right? Right?".


At least the DMCA in the U.S. has guardrails: not just anyone can send a takedown demand for everything. The requester has identify the works and declare under penalty of perjury that they are operating on the behalf of the owner. I imagine the equivalent EU law has similar requirements.

CloudFlare uses legal chicanery to try to subvert the DMCA by claiming that because they’re not the origin server, they’re not subject to takedown demands. So far no court has told them to knock it off. I expect that day will eventually come. Every lawsuit against them to date has ended in a settlement because CloudFlare would rather pay up than get an unfavorable ruling on the books.

CloudFlare has consistently treated loss of DMCA safe harbor protection as a material business risk; it’s been cited in every SEC filing from the 2019 IPO S-1 through the FY2025 10-K.


> At least the DMCA in the U.S. has guardrails: not just anyone can send a takedown demand for everything. The requester has identify the works and declare under penalty of perjury that they are operating on the behalf of the owner.

You'd think so, but no.

DMCA came into effect 28 years ago. All those decades, all those billions of takedowns, and you don't even need the fingers of one hand to count those who've been hit with perjury for a false takedown request, because the number is ... zero.


You might misunderstand what the law requires. The person making the complaint (demand) only has to declare under penalty of perjury that they represent the copyright holder. It does not require them, under penalty of perjury, to be correct about the underlying facts.

See 17 U.S.C. 512(c)(3)(A):

"(A) To be effective under this subsection, a notification of claimed infringement must be a written communication provided to the designated agent of a service provider that includes substantially the following: ...

"(vi) A statement that the information in the notification is accurate, and under penalty of perjury, that the complaining party is authorized to act on behalf of the owner of an exclusive right that is allegedly infringed."

In other words: someone issuing a notice of infringement relating to a Disney work must declare under penalty of perjury that they represent Disney. They don't have to declare under penalty of perjury that the work is in fact a Disney work, that the title is correct, that the use in question is not fair use, etc.

This would explain why you're not seeing what you expect to see.


Nobody cares about the DMCA guardrails and they are never meaningfully enforced. Case in point, Anthropic DMCAing thousands of repositories that simply mentioned the word "claude".

Can you explain how your example supports your conclusion? I don't follow.

So use S3.

While not obvious from the article, it appears that they want something S3 like, but isn’t from Amazon, and possibly want to self host it. The article could be much more clear about the goals

Ah, thanks. Yeah I was confused because in his long list of vendors he didn't mention Wasabi, Backblaze etc. It appears that I do not know the context of his post.

or cloudflare R2 for that matter (very useful for egress-heavy workloads for which it is ~free)

I was curious why this didn't come up in the article

I’ve never had an issue with Backblaze. I mirror my buckets to iDrive who, so far, have also been perfectly fine.

or Tigris

From the article:

> [W]e shipped an optimization. Detect duplicate files by their content hash, use hardlinks instead of downloading each copy.


I meant TRANSPARENT filesystem level dedupe. They are doing it at the application level. filesystem level dedupe makes it impossible to store the same file more than once and doesn't consume hardlinks for the references. It is really awesome.

Filesystem/file level dedupe is for suckers. =D

If the greatest filesystem in the world were a living being, it would be our God. That filesystem, of course, is ZFS.

Handles this correctly:

https://www.truenas.com/docs/references/zfsdeduplication/


I was talking about block level dedupe.

I thought you might be.

I just wanted to mention ZFS.

Have I mentioned how great ZFS is yet?


ZFS is great! However, it's too complicated for most Linux server use cases (especially with just one block device attached); it's not the default (root filesystem); and it's not supported for at least one major enterprise Linux distro family.


File system dedupe is expensive because it requires another hash calculation that cannot be shared with application-level hashing, is a relatively rare OS-fs feature, doesn't play nice with backups (because files will be duplicated), and doesn't scale across boxes.

A simpler solution is application-level dedupe that doesn't require fs-specific features. Simple scales and wins. And plays nice with backups.

Hash = sha256 of file, and abs filename = {{aa}}/{{bb}}/{{cc}}/{{d}} where

aa = hash 2 hex most significant digits

bb = hash next 2 hex digits

cc = hash next 2 hex after that

d = remaining hex digits


All good backup software should be able to do deduped incremental backups at the block level. I'm used to veeam and commvault

That costs even more, unreuseable time and effort. It's simpler to dedupe at the application level rather than shift the burden onto N things. I guess you don't understand or appreciate simplicity.

This article shows it really isn't that simple and is easy to mess up. Who cares if your storage and backup software both dedupe?

For ZFS, at least, `zfs send` is the backup solution. And it performs incremental backups with the `-i` argument.

zfs send is really awesome when combined with dedupe and incremental

Another reason to use XFS -- it doesn't have per-inode hard link limits.

(Some say ZFS as well, but it's not nearly as easy to use, and its license is still not GPL-friendly.)


xfs on mdraid is what I use on my homelab NAS across several giant RAID arrays. While it lacks some integrity and CoW features, it's really, really stable. I had ZoL ZFS troubles that the maintainers shrugged off requiring transferring everything to another volume.. so I won't ever use or recommend ZFS unless it's Sun-Oracle.

Those were not his ideas. Before Git, the Linux kernel team was using BitKeeper for DVCS (and other DVCS implementations like Perforce existed as well). Git was created as a BitKeeper replacement after a fight erupted between Andrew Tridgell (who was accused of trying to reverse engineer BitKeeper in violation of its license) and Larry McVoy (the author of BitKeeper).

https://graphite.com/blog/bitkeeper-linux-story-of-git-creat...

You may find this 10-year-old thread on HN enlightening, too: https://news.ycombinator.com/item?id=11667494


I agree and that’s the point I was trying to make.

Linus’s contribution is a great one. He learned from prior tools and contributions, made a lot of smart technical decisions, got stuff moving with a prototype, then displayed good technical leadership by handing it off to a dedicated development team.

That’s such a good lesson for all of us devs.

So why the urge to lie and pretend he coded it in a week with no help? I know you’re not saying this, but this is the common myth.


Might the conclusions be correct even if some of the facts are not? Even a stopped clock is right twice a day. And, "approximately correct" is still sometimes valuable.

I think the court dropped the ball here. On the one hand, I think they were right that using existing works--copyrighted or otherwise--to train a model was transformable fair use. On the other hand, Anthropic and others trained their models on illicit copies of the works; they (more often than not) didn't pay the copyright holders.

There's a doctrine in Fifth Amendment law called "fruit of the poisonous tree." The general rule is that prosecutors don't get to present evidence in a criminal trial that they gained unlawfully. It's excluded. The jury never gets to see it even if it provides incontrovertible evidence of guilt. The point is to discourage law enforcement from violating the rights of the accused during the investigative process, and to obtain a warrant as the Amendment requires.

It seems to me that the same logic ought to be applied to these companies. They want to make money by building the best models they can. That's fine! They should be able to use all the source data they can legitimately obtain to feed their training process. But if they refuse to do so and resort to piracy, they mustn't be allowed to claim that they then used it fairly in the transformative process.


I mean, that is what the court said! Training on pirated data was not fair use. Training on legally acquired data is fair use.

Anthropic legally acquired the data and re-trained on it before release.


It did not say that. See Judge Alsup's order (https://fingfx.thomsonreuters.com/gfx/legaldocs/jnvwbgqlzpw/...), pp. 29-30, Section IV(B)(ii) ("The Pirated Library Copies").

"[T]he test requires that we contemplate the likely result were the conduct to be condoned as a fair use — namely to steal a work you could otherwise buy (a book, millions of books) so long as you at least loosely intend to make further copies for a purportedly transformative use (writing a book review with excerpts, training LLMs, etc.), without any accountability."

See also p. 31:

"The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained 'forever' for 'general purpose' even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience."

Despite this consideration, the court still found for Anthropic on the question of fair use.


I don't read how that opposes what I said, that's part of the "training on pirated data is not fair use." That said, I am not a lawyer. From those pages:

> The copies used to train specific LLMs were justified as a fair use.

This is (in my understanding) because those were not the pirated copies.

> The copies used to convert purchased print library copies into digital library copies were justified, too, though for a different fair use.

Buying a book and then digitizing it for purposes of training is fair use.

> The downloaded pirated copies used to build a central library were not justified by a fair use.

Piracy is not fair use, you quoted this part as well.

In the conclusions section a the end of 31:

> This order grants summary judgment for Anthropic that the training use was a fair use. And, it grants that the print-to-digital format change was a fair use for a different reason. But it denies summary judgment for Anthropic that the pirated library copies must be treated as training copies.

Training is fair use. Pirating is not fair use, and therefore, you can't train on that either.

What part am I missing?


I think that's a reasonable way to interpret the court's order, but unfortunately the judge didn't really articulate the consequences of training on pirated copies "not fair use" as clearly as I would have liked. Does that mean they're simply liable for infringement of those works, or does it mean that they'd be enjoined from using them altogether to train the model? The genie was out of the bottle; how could it be put back in?

Anthropic settled the case with the publishers just a few months later, leaving the question mostly unsettled still.


I see. Thanks. I cannot wait until this is settled law too.

Nothing today; but in a democracy, we have the power to make it possible, if people vote the right way.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: