‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

  • kibiz0r@lemmy.world
    link
    fedilink
    English
    arrow-up
    38
    arrow-down
    11
    ·
    11 months ago

    I’m dumbfounded that any Lemmy user supports OpenAI in this.

    We’re mostly refugees from Reddit, right?

    Reddit invited us to make stuff and share it with our peers, and that was great. Some posts were just links to the content’s real home: Youtube, a random Wordpress blog, a Github project, or whatever. The post text, the comments, and the replies only lived on Reddit. That wasn’t a huge problem, because that’s the part that was specific to Reddit. And besides, there were plenty of third-party apps to interact with those bits of content however you wanted to.

    But as Reddit started to dominate Google search results, it displaced results that might have linked to the “real home” of that content. And Reddit realized a tremendous opportunity: They now had a chokehold on not just user comments and text posts, but anything that people dare to promote online.

    At the same time, Reddit slowly moved from a place where something may get posted by the author of the original thing to a place where you’ll only see the post if it came from a high-karma user or bot. Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided. No way for the audience to respond to the author in any meaningful way and start a dialogue.

    This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.

    There are genuine problems with copyright law. Don’t get me wrong. Perhaps the most glaring problem is the fact that many prominent creators don’t even own the copyright to the stuff they make. It was invented to protect creators, but in practice this “protection” gets assigned to a publisher immediately after the protected work comes into being.

    And then that copyright – the very same thing that was intended to protect creators – is used as a weapon against the creator and against their audience. Publishers insert a copyright chokepoint in-between the two, and they squeeze as hard as they desire, wringing it of every drop of profit, keeping creators and audiences far away from each other. Creators can’t speak out of turn. Fans can’t remix their favorite content and share it back to the community.

    This is a dysfunctional system. Audiences are denied the ability to access information or participate in culture if they can’t pay for admission. Creators are underpaid, and their creative ambitions are redirected to what’s popular. We end up with an auto-tuned culture – insular, uncritical, and predictable. Creativity reduced to a product.

    But.

    If the problem is that copyright law has severed the connection between creator and audience in order to set up a toll booth along the way, then we won’t solve it by giving OpenAI a free pass to do the exact same thing at massive scale.

    • Milk_Sheikh@lemm.ee
      link
      fedilink
      English
      arrow-up
      6
      ·
      11 months ago

      Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided… This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.

      The internet is genuinely already trending this way just from LLM AI writing things like: articles and bot reviews, listicle and ‘review’ websites that laser focus for SEO hits, social media comments and posts to propagandize or astroturf…

      We are going to live and die by how the Captcha-AI arms race is ran against the malicious actors, but that won’t help when governments or capital give themselves root access.

    • flamingarms@feddit.uk
      link
      fedilink
      English
      arrow-up
      7
      arrow-down
      3
      ·
      11 months ago

      And yet, I believe LLMs are a natural evolutionary product of NLP and a powerful tool that is a necessary step forward for humanity. It is already capable of exceptionally quickly scaffolding out basic tasks. In it, I see the assumptions that all human knowledge is for all humans, rudimentary tasks are worth automating, and a truly creative idea is often seeded by information that already exists and thus creativity can be sparked by something that has access to all information.

      I am not sure what we are defending by not developing them. Is it a capitalism issue of defending people’s money so they can survive? Then that’s a capitalism problem. Is it that we don’t want to get exactly plagiarized by AI? That’s certainly something companies are and need to continue taking into account. But researchers repeat research and come to the same conclusions all the time, so we’re clearly comfortable with sharing ideas. Even in the Writer’s Guild strikes in the States, both sides agreed that AI is helpful in script-writing, they just didn’t want production companies to use it as leverage to pay them less or not give them credit for their part in the production.

      • EldritchFeminity@lemmy.blahaj.zone
        cake
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        11 months ago

        The big issue is, as you said, a capitalism problem, as people need money from their work in order to eat. But, it goes deeper than that and that doesn’t change the fact that something needs to be done to protect the people creating the stuff that goes into the learning models. Ultimately, it comes down to the fact that datasets aren’t ethically sourced and that people want to use AI to replace the same people whose work they used to create said AI, but it also has a root in how society devalues the work of creativity. People feel entitled to the work of artists. For decades, people have believed that artists shouldn’t be fairly compensated for their work, and the recent AI issue is just another stone in the pile. If you want to see how disgusting it is, look up stuff like “paid in exposure” and the other kinds of things people tell artists they should accept as payment instead of money.

        In my mind, there are two major groups when it comes to AI: Those whose work would benefit from the increased efficiency AI would bring, and those who want the reward for work without actually doing the work or paying somebody with the skills and knowledge to do the work. MidJourney is in the middle of a lawsuit right now and the developers were caught talking about how you “just need to launder it through a fine tuned Codex.” With the “it” here being artists’ work. Link The vast majority of the time, these are the kinds of people I see defending AI; they aren’t people sharing and collaborating to make things better - they’re people who feel entitled to benefit from others’ work without doing anything themselves. Making art is about the process and developing yourself as a person as much as it is about the end result, but these people don’t want all that. They just want to push a button and get a pretty picture or a story or whatever, and then feel smug and superior about how great an artist they are.

        All that needs to be done is to require that the company that creates the AI has to pay a licensing fee for copyrighted material, and allow for copyright-free stuff and content where they have gotten express permission to use (opt-in) to be used freely. Those businesses with huge libraries of copyright-free music that you pay a subscription fee to use work like this. They pay musicians to create songs for them; they don’t go around downloading songs and then cut them up to create synthesizers that they sell.