OpenAI now tries to hide that ChatGPT was trained on copyrighted books, including J.K. Rowling’s Harry Potter series::A new research paper laid out ways in which AI developers should try and avoid showing LLMs have been trained on copyrighted material.

    • Corkyskog@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      10
      ·
      1 year ago

      They used to be a non profit, that immediately turned it into a for profit when their product was refined. They took a bunch of people’s effort whether it be training materials or training Monkeys using the product and then slapped a huge price tag on it.

      • Touching_Grass@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        3
        ·
        1 year ago

        I didn’t know they were a non profit. I’m good as long as they keep the current model. Release older models free to use while charging for extra or latest features

    • BURN@lemmy.world
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      2
      ·
      1 year ago

      They’re stealing a ridiculous amount of copyrighted works to use to train their model without the consent of the copyright holders.

      This includes the single person operations creating art that’s being used to feed the models that will take their jobs.

      OpenAI should not be allowed to train on copyrighted material without paying a licensing fee at minimum.

      • uzay@infosec.pub
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        2
        ·
        1 year ago

        Also Sam Altman is a grifter who gives people in need small amounts of monopoly money to get their biometric data

        • LifeInMultipleChoice@lemmy.ml
          link
          fedilink
          English
          arrow-up
          3
          arrow-down
          1
          ·
          1 year ago

          So hypothetical here. If Dreddit did launch a system that made it so users could trade Karma in for real currency or some alternative, does that mean that all fan fictions and all other fan boy account created material would become copyright infringement as they are now making money off the original works?

      • Stamets@startrek.website
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        1 year ago

        “Stealing”.

        It cannot be theft as the product is publicly available and the original product is still available to other consumers.

        You can not like this and you can argue against it but it isn’t theft. Hasn’t and never will be. The same way piracy isn’t theft.

        People might respect this bizarre corporate protection stance if you use the correct terminology. And yes. You’re defending larger companies here, not individual artists. Copyright was invented for companies and corporations. They have extended copyright for decades to be able to hold on to stuff they believe to be theirs. They suppress creatives to take their work and put a copyright on it themselves.

        The only people you’re protecting with your argument are massive corporations. Have fun with that.

      • Touching_Grass@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        5
        ·
        1 year ago

        If they purchased the data or the data is free its theirs to do what they want without violating the copyright like reselling the original work as their own. Training off it should not violate any copyright if the work was available for free or purchased by at least one person involved. Capitalism should work both ways

        • BURN@lemmy.world
          link
          fedilink
          English
          arrow-up
          5
          arrow-down
          2
          ·
          1 year ago

          But they don’t purchase the data. That’s the whole problem.

          And copyright is absolutely violated by training off it. It’s being used to make money and no longer falls under even the widest interpretation of free use.

            • BURN@lemmy.world
              link
              fedilink
              English
              arrow-up
              4
              arrow-down
              1
              ·
              1 year ago

              It may be freely available for non-commercial works, eg. Photos on Photobucket, internet archive free book archives, etc.

              Most everything is on the internet these days, copyrighted or not. I’m sure if I googled enough I could find the entire text of Harry Potter for free. I still haven’t purchased it, and technically it’s not legally freely available. But in training these models I guarantee they didn’t care where the data came from, just that it was data.

              I’m against piracy as well for the record, but pretty much everything is available through torrenting and pirate sites at this point, copyright be damned.

              • Touching_Grass@lemmy.world
                link
                fedilink
                English
                arrow-up
                2
                arrow-down
                4
                ·
                edit-2
                1 year ago

                Don’t care, that’s not mine or these LLMs problem they don’t secure their copyright. They shouldn’t come asking for others to pay for them not securing their data. I see it as a double edged sword.

                I really hope this is a wake up call to all creative types to pack up and not use the internet like a street corner while they busk.

                If they want to come online to contribute like everybody else. Just have fun and post stuff, that’s great. But all of them are no different then any other greedy corporation. They all want more toll roads. When they do make it and earn millions and get our attention they exploit it with more ads. It swallows all the free good content. Sites gear towards these rich creators. They lawyer up and sue everybody and everything that looks or sounds like them. We lose all our good spaces to them.

                I hope the LLM allows regular people to shit post in peace finally.

                • adrian783@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  1 year ago

                  creative types are greedy for wanting compensation for their creation? is a car mechanic greedy for wanting money for fixing your car?

          • GroggyGuava@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            2
            ·
            edit-2
            1 year ago

            You need to expand on how learning from something to make money is somehow using the original material to make money. Considering that’s how art works in general, I’m having a hard time taking the side of “learning from media to make your own is against copyright”. As long as they don’t reproduce the same thing as the original, I don’t see any issues with it. If they learned from Lord of the rings to then make “the Lord of the rings” then yes, that’d be infringement. But if they use that data to make a new IP with original ideas, then how is that bad for the world/ artists.

            • BURN@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              arrow-down
              1
              ·
              1 year ago

              Creating an AI model is a commercial work. They’re made to make money. Now these models are dependent on other artists data to train on. The models would be useless if they weren’t able to train on anything.

              I hold the stance that using copyrighted data as part of a training set is a violation of copyright. That still hasn’t been fully challenged in court, so there’s no specific legal definition yet.

              Due to the requirement of copywritten materials to make the model function I feel that they are using copyrighted works in order to build a commercial product.

              Also AI doesn’t learn. LLMs build statistical models based on sentence structure of what they’ve seen before. There’s no level of understanding or inherent knowledge, and there’s nothing new being added.