vessenes an hour ago

Just read (most) of the ruling.

The ruling is fine. The judge is not Alsop but he’s not technically incompetent either, which is good.

The torrent comments in general are nothing to get het up about; in summary

1) Meta wanted to download but not upload libgen and Anna’s after they couldn’t find anyone with rights to license that would talk to them.

2) they didn't want to distribute; just download. An engineer put in evidence that they restricted seeding successfully.

3) late in the case Silverman et al claimed while they hadnt been seeding they had been leeching and that counts as distribution (?!)

Judge commented as follows

1. just downloading is probably fine because it could be for purposes of fair use, and fair use concerns generally trump even good faith and fair dealing

2. Nobody could get llama to spit out more than a 60 token quote from a plaintiff book; thus llama is not made for infringement

3. We will need more briefing on this leeching thing which it is alleged is a form of distribution.

The judge lays out what he thinks a workable claim to get to the supreme court would be, which is that these llms defeat the purpose of our copyright laws by reducing the amount of human creativity and expression available to those who want to create economic value through creativity. Eg where will the jobs for biographers go?

I will say that debate is an active topic worldwide right now and a good question, with answers ranging from: “this maximizes human creativity bro” to “laser printers disrupted lead type foundries, that was great” to “nobody will ever write again and we are murdering our creative class and burning down their craftsman mid century modern homes.”

It seems to me this will get taken up next session with SCOTUS but also that it’s a little early; we just don’t know where this is going exactly. Either way, I expect our current judge will learn that leeching is precisely NOT seeding once the defense legal team has time to brief him.

  • bilekas 39 minutes ago

    > 1. just downloading is probably fine because it could be for purposes of fair use, and fair use concerns generally trump even good faith and fair dealing

    This smells a bit strange to me, it's a "for-profit" company.. Fair use is a bit of pipe-dream here. Also there is no conditions on the source of the content ? If the source was obtained from illegal sources IE illegal distribution of copyrighted materials does that not play a part ?

    Also will this set a precedent that if I download HBO's collection but don't seed or use for any commercial reasons it will be considered Fair Use ?

    This whole thing just reeks of "rules for thee but not for me".

    • bawolff 22 minutes ago

      > This smells a bit strange to me, it's a "for-profit" company.. Fair use is a bit of pipe-dream here

      Why do you think that? For-profit companies use fair use all the time. Its not unusual.

      Yes, a usage being non-commercial can be a factor in favour of fair use, but its just one factor. Its definitely not a neccesary condition nor is it a sufficient condition.

      > If the source was obtained from illegal sources IE illegal distribution of copyrighted materials does that not play a part ?

      Why would it? That isn't really how copyright works. Its about the right to "copy" (or not to), not about distribution methods.

      > Also will this set a precedent that if I download HBO's collection but don't seed or use for any commercial reasons it will be considered Fair Use ?

      No. That's not the reason this is potentially fair use.

      [Although as an aside it uses to be in Canada that only uploading was illegal].

      • bilekas 4 minutes ago

        > Why would it? That isn't really how copyright works. Its about the right to "copy" (or not to), not about distribution methods.

        Okay, but if there was no permission to "copy" the content by the owners. I wish I knew more about it all, but seems to me that quoting a snippet from a book while offering comment on it would be classic fair use. Consuming the entire collection for free to charge for transformative services really doesn't feel 'fair'.

        And again I can't shake the feeling that if I did this, was brought to court. I would be laughed at for claiming fair use.

    • graemep 20 minutes ago

      > This smells a bit strange to me, it's a "for-profit" company.. Fair use is a bit of pipe-dream here

      Fair use can be for profit.

      > if I download HBO's collection but don't seed or use for any commercial reasons it will be considered Fair Use

      No, seeding is automatically not fair use. Leeching does not automatically mean its not fair use, just that it might be.

Incipient 2 hours ago

I'm not sure how llms count as fair use. It's just that we can't show HOW they've been encoded in the model, means it's fair use? Or that statistical representations are fair use? Or is it the generation aspect? I can't sell you a Harry potter book, but I can sell you some service that let's you generate it yourself?

I feel like this has really blown a hole in copyright.

  • JimDabell 39 minutes ago

    > It's just that we can't show HOW they've been encoded in the model, means it's fair use?

    Describing training as “encoding them in the model” doesn’t seem like an accurate description of what is happening. We know for certain that a typical copyrighted work that is trained on is not contained within the model. It’s simply not possible to represent the entirety of the training set within a model of that size in any meaningful way. There are also papers showing that memorisation plateaus at a reasonably low rate according to the size of the model. Training on more works doesn’t result in more memorisation, it results in more generalisation. So arguments based on the idea that those works are being copied into the model don’t seem to be founded in fact.

    > I can't sell you a Harry potter book, but I can sell you some service that let's you generate it yourself?

    That’s the reason why cases like this are doomed to fail: No model can output any of the Harry Potter books. Memorisation doesn’t happen at that scale. At best, they can output snippets. That’s clearly below the proportionality threshold for copyright to matter.

    • mattigames 17 minutes ago

      Copyright was build to protect the artist from unauthorized copy by a human not by a machine (a machine wildly beyond their imagination at the time I mean), so the input and output limitations of humans were absolutely taken into account when writing such laws, if LLMs were treated in similar fashion authors would have had a say in wether their works can be used as inputs in such models or if they forbid it.

      • JimDabell 7 minutes ago

        This reply doesn’t seem to relate to either of the points I made.

        • mattigames 6 minutes ago

          Yes it does, the spirit of the law matters in many one cases.

          • JimDabell 2 minutes ago

            I made two points:

            - It is not accurate to describe training as “encoding works into the model”.

            – A model cannot recreate a Harry Potter book.

            Neither of these have anything to do with “the spirit of the law”.

  • duskwuff an hour ago

    Same. If I invented a novel new way of encoding video and used it to pack a bunch of movies into a single file, I would fully expect to be sued if I tried distributing that file, and equally so if I let people use a web site that let them extract individual videos from that file. Why should text be treated differently?

    • adastra22 39 minutes ago

      You are allowed to quote from copyrighted works without needing permission. Trying to assert copyright because of a quote of, say, a mere 60 words in length would get you thrown out of any judge’s court.

      It was shown, in this case, that the llms wouldn’t generate accurate quotes more than 60 words in length.

      This is not comparable to encoding a full video file.

  • userbinator 2 hours ago

    One should also keep in mind the countless people who got much of their education from pirated books.

    • bilekas 36 minutes ago

      This seems to be a bad faith argument, although it would be amusing to see Facebook use it.

      "Your honor, it's fair use because students have downloaded educational books for years."

    • Arainach 2 hours ago

      That has nothing to do with whether LLMs are fair use.

    • timeon 22 minutes ago

      People can process only fraction of that while using much more time to do that. And I'm using 'process' here to meet you in your nihilist argument that these algorithms are same as humans. Which is pretty strange because people barely acknowledge similarities with other mammals but suddenly software is equal.

  • mattigames 34 minutes ago

    The word transformative was put there in a time of manual transformative processes, like when you paint something similar to what you saw in a painting by another artist, with all the implied limitations that entails, like the time it took from you to watch that painting, and the time it takes you to create that new painting, nothing to do at all with the way LLMs operate, an honest assessment would have found that the word was meant for a wildly different use case and therefore it required a bigger and more nuanced discussion.

    • bawolff 12 minutes ago

      > The word transformative was put there in a time of manual transformative processes, like when you paint something similar to what you saw in a painting by another artist

      Do you have any citation that that is how the word "transformation" was understood historically? Because what your suggesting seems to be the opposite of what i've read.

      My understanding is even back in the 1800s (e.g. https://en.wikipedia.org/wiki/Folsom_v._Marsh ) your example would not be considered transformative, if your intention was to make a similar painting to serve a similar purpose.

  • 7speter 2 hours ago

    The judge is claiming that because the use is of the books are “so transformative,” the usage of these books to train an llm is fair use.

    I’m not familiar with the facts of the case and IANAL, and its late, but how did the plaintiffs determine their books were being used for training of the llm? Was the model spitting out language that was similar or verbatim to their works?

    • bawolff 11 minutes ago

      > but how did the plaintiffs determine their books were being used for training of the llm?

      I think facebook admited this. I don't think the fact of this is under dispute.

    • bilekas 34 minutes ago

      > The judge is claiming that because the use is of the books are “so transformative,” the usage of these books to train an llm is fair use.

      Maybe I'm mistaken but shouldn't the source come from a legal source ? This is not public domain material.

      Again if I download the entire works of HBO tv shows, then make a "transformative" version on my iphone, how can that be considered fair use?

      • bawolff 4 minutes ago

        > Maybe I'm mistaken but shouldn't the source come from a legal source

        There is no such thing as a legal or illegal source, only legal or illegal uses.

        If the use was legal, then it doesn't matter where you got the material from. Similary if you got the material via more conventional means it would still be copyright infringement if you used it in an illegal way.

        > Again if I download the entire works of HBO tv shows, then make a "transformative" version on my iphone, how can that be considered fair use?

        That wouldn't be considered transformative. In this context "transformative" means you transformed it into something with a different purpose than the original.

        However if you for example made a video essay for youtube talking about the themes (or whatever) of the tv show including clips from it, that would be transformative and probably fine.

      • zx8080 19 minutes ago

        It'll be, but in a slightly different way. As it will be considered _fair_ for the Warner Bros to sue you dry.

    • Incipient an hour ago

      >The judge is claiming that because the use is of the books are “so transformative,” the usage of these books to train an llm is fair use.

      "you're doing something so critical to our (country's) success, that we're ok to waive copyright. I get that, if the US doesn't do it, then China will(is).

      Interesting judgement, and it's implications, if you are correct haha.

JimDabell an hour ago

That’s one hell of a headline for a story about Meta winning summary judgement for most of the claims against them. You’d be forgiven for thinking Meta lost this case, going by the headline.

blitzar 2 hours ago

Surely it should be a whole separate copyright case with fines of up to $150,000 per work infringed.

  • Schnitz 2 hours ago

    It’s crazy, yet so predictable, that while the system tries to bankrupt individuals for torrenting a single book or movie in this case the excuse “it was just to train an LLM” will fly. Imagine a private individual would argue that in court.

    • HPsquared 42 minutes ago

      Ironically, the Llama models enable people to fine-tune on their own material. A lot of people are doing exactly this.

ggm an hour ago

When do the outputs cease to be a derived work?

  • guappa 43 minutes ago

    When you're rich.

ngold 3 hours ago

Sorta not really. They said the plaintiff had a non relevant argument or something.

bawolff 2 hours ago

So the argument is that by torrenting ebooks, meta provided bandwidth to the torrent network, and thus provided (financial??!) benefit to pirate sites?

I got to be honest, that sounds extremely weak to me. The benefit to the pirate site of joining the torrent swam seems like it would be extremely slight.

  • Lio 2 hours ago

    The pirate sites? There is no pirate site hosting files here, BitTorrent is peer to peer.

    If it’s fine for a large corp. like Meta to pirate books then it’s fine for everyone else. If it’s a crime for ordinary consumers then it’s a crime for Meta too.

    Especially as Meta aren’t doing this for charity. They train LLMs for their own gain.

    • bawolff 29 minutes ago

      > The pirate sites? There is no pirate site hosting files here, BitTorrent is peer to peer.

      Or "shadow library" or whatever you want to call it. The argument according to the article is that the entity that created the torrent, which also as far as i understand also operates a traditional website, benefits from meta's actions.

      I think that is really far fetched.

  • packetlost 2 hours ago

    The argument is that distribution is the infringement. This is basically the only thing torrenters get charged with.

    • bawolff 28 minutes ago

      That does not seem to be the argument that was presented in the article.