Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Even beyond that, the initial legal opinion we do have did in fact point to training being fair use: https://www.reuters.com/legal/litigation/anthropic-wins-key-...

However, I don't feel comfortable suggesting that this is settled just yet, one district judge's opinion does not mean that other future cases may disagree, or we may at some point get explicit legislation one way or the other.

 help



I think the court dropped the ball here. On the one hand, I think they were right that using existing works--copyrighted or otherwise--to train a model was transformable fair use. On the other hand, Anthropic and others trained their models on illicit copies of the works; they (more often than not) didn't pay the copyright holders.

There's a doctrine in Fifth Amendment law called "fruit of the poisonous tree." The general rule is that prosecutors don't get to present evidence in a criminal trial that they gained unlawfully. It's excluded. The jury never gets to see it even if it provides incontrovertible evidence of guilt. The point is to discourage law enforcement from violating the rights of the accused during the investigative process, and to obtain a warrant as the Amendment requires.

It seems to me that the same logic ought to be applied to these companies. They want to make money by building the best models they can. That's fine! They should be able to use all the source data they can legitimately obtain to feed their training process. But if they refuse to do so and resort to piracy, they mustn't be allowed to claim that they then used it fairly in the transformative process.


I mean, that is what the court said! Training on pirated data was not fair use. Training on legally acquired data is fair use.

Anthropic legally acquired the data and re-trained on it before release.


It did not say that. See Judge Alsup's order (https://fingfx.thomsonreuters.com/gfx/legaldocs/jnvwbgqlzpw/...), pp. 29-30, Section IV(B)(ii) ("The Pirated Library Copies").

"[T]he test requires that we contemplate the likely result were the conduct to be condoned as a fair use — namely to steal a work you could otherwise buy (a book, millions of books) so long as you at least loosely intend to make further copies for a purportedly transformative use (writing a book review with excerpts, training LLMs, etc.), without any accountability."

See also p. 31:

"The downloaded pirated copies used to build a central library were not justified by a fair use. Every factor points against fair use. Anthropic employees said copies of works (pirated ones, too) would be retained 'forever' for 'general purpose' even after Anthropic determined they would never be used for training LLMs. A separate justification was required for each use. None is even offered here except for Anthropic’s pocketbook and convenience."

Despite this consideration, the court still found for Anthropic on the question of fair use.


I don't read how that opposes what I said, that's part of the "training on pirated data is not fair use." That said, I am not a lawyer. From those pages:

> The copies used to train specific LLMs were justified as a fair use.

This is (in my understanding) because those were not the pirated copies.

> The copies used to convert purchased print library copies into digital library copies were justified, too, though for a different fair use.

Buying a book and then digitizing it for purposes of training is fair use.

> The downloaded pirated copies used to build a central library were not justified by a fair use.

Piracy is not fair use, you quoted this part as well.

In the conclusions section a the end of 31:

> This order grants summary judgment for Anthropic that the training use was a fair use. And, it grants that the print-to-digital format change was a fair use for a different reason. But it denies summary judgment for Anthropic that the pirated library copies must be treated as training copies.

Training is fair use. Pirating is not fair use, and therefore, you can't train on that either.

What part am I missing?


I think that's a reasonable way to interpret the court's order, but unfortunately the judge didn't really articulate the consequences of training on pirated copies "not fair use" as clearly as I would have liked. Does that mean they're simply liable for infringement of those works, or does it mean that they'd be enjoined from using them altogether to train the model? The genie was out of the bottle; how could it be put back in?

Anthropic settled the case with the publishers just a few months later, leaving the question mostly unsettled still.


I see. Thanks. I cannot wait until this is settled law too.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: