The problem with copyright law is you need, well, copies. AI systems don’t have a database of images that they reference. They learn like we do. When you picture SpongeBob in your mind, your not pulling up a reference image in a database. You just “learned” what he looks like. That’s how AI models work. They are like giant strings of math that replicate the human brain in structure. You train them by showing them a bunch of images, this is SpongeBob, this is a horse, this is a cowboy hat. The model learns what these things are, but doesn’t literally copy the images. Then when you ask for “SpongeBob on a horse wearing a cowboy hat” the model uses the patterns it learned to produce the image you asked for. When your doing the training, presumably you made copies of images for that (which is arguably fair use), but the model itself has no copies. I don’t know how all of this shakes out, not an expert in copyright law, but I do know an essential element is the existence of copies, which AI models do not contain, which is why these lawsuits haven’t gone anywhere yet, and why AI companies and their lawyers were comfortable enough to invest billions doing this in the first place. I mostly just want to clear up the “database” misconception since it’s pretty common.
It doesn’t matter if actual copies of the original images exist, reproducing copyrighted concepts like characters is still infringement. It’s the same whether it’s done by a human or a machine. The real question is who is liable. Obviously whoever distributes the images is liable (except for those exempted by section 230) but there’s no president for the trainers.
I think there are arguments to be made for fair use with open models run by end users where there is no profit motive, but for closed models where use is paid the trainers are potentially profiting directly from the creation of infringing work.
You can’t sue the paint and brush manufacturer because they made it possible for John Doe’s replicas or style. Generative AI and LLMs are a tool not a product. It is the exploitation of them as a product that is the problem. I can make the same stuff in Blender or some shit program from Autodesk, and you won’t sue them. No one tells people what to prompt. Current AI is like the internet, it is access to enormous amounts of human knowledge but with more practical utility and a few caveats. AI just happened a lot quicker and the Luddites and Lawyers are going to whine as much as the criminal billionaire class is going to exploit those that fail to understand the technology. Almost all of these situations/articles/lawsuits/politics are about trying to create a monopoly for Altmann and prevent public adoption of open source offline AI as much as possible. If everyone had access to a 70B or larger LLM right now, all of the internet would change overnight. You don’t need search engines or much of the internet structure of exploitation any more. If a company can control this technology, with majority adoption, that company will be the successor of Microsoft and then Google. All the peripheral nonsense is about controlling the market by any means necessary, preventing open minded people from looking into the utility, and playing gatekeepers to the new world paradigm of the next decade or two.
Paint making companies typically don’t have massive databases of illegally obtained copies of other people’s copyrighted images. Nor does paint fundamentally requires the existence of said database for the manufacture of paint itself. That’s where the “it’s just a tool” argument falls apart.
I love your enthusiasm though, to think that giving access to a massive llm for everyone would rid the internet of exploitation is extremely naïve and hopelessly optimistic.
How do you show a computer something? Do you perhaps add pictures to a database that the program then processes? I understand it’s not a folder called SpongeBob but at some point somebody fed it pictures of SpongeBob and now those picture exists in a database.
The reason the legal system is slow is because it’s complicated and everyone turns into philosophy majors when discussing thing like what a database is but are somehow used words like “show” without any explanation.
The reason investors are comfortable pouring billions into AI is because investors either think they are going to make the money back before regulation catches up or they are just coked up maniacs investing anything that sounds shiny.
The problem with copyright law is you need, well, copies. AI systems don’t have a database of images that they reference. They learn like we do. When you picture SpongeBob in your mind, your not pulling up a reference image in a database. You just “learned” what he looks like. That’s how AI models work. They are like giant strings of math that replicate the human brain in structure. You train them by showing them a bunch of images, this is SpongeBob, this is a horse, this is a cowboy hat. The model learns what these things are, but doesn’t literally copy the images. Then when you ask for “SpongeBob on a horse wearing a cowboy hat” the model uses the patterns it learned to produce the image you asked for. When your doing the training, presumably you made copies of images for that (which is arguably fair use), but the model itself has no copies. I don’t know how all of this shakes out, not an expert in copyright law, but I do know an essential element is the existence of copies, which AI models do not contain, which is why these lawsuits haven’t gone anywhere yet, and why AI companies and their lawyers were comfortable enough to invest billions doing this in the first place. I mostly just want to clear up the “database” misconception since it’s pretty common.
It doesn’t matter if actual copies of the original images exist, reproducing copyrighted concepts like characters is still infringement. It’s the same whether it’s done by a human or a machine. The real question is who is liable. Obviously whoever distributes the images is liable (except for those exempted by section 230) but there’s no president for the trainers.
I think there are arguments to be made for fair use with open models run by end users where there is no profit motive, but for closed models where use is paid the trainers are potentially profiting directly from the creation of infringing work.
You can’t sue the paint and brush manufacturer because they made it possible for John Doe’s replicas or style. Generative AI and LLMs are a tool not a product. It is the exploitation of them as a product that is the problem. I can make the same stuff in Blender or some shit program from Autodesk, and you won’t sue them. No one tells people what to prompt. Current AI is like the internet, it is access to enormous amounts of human knowledge but with more practical utility and a few caveats. AI just happened a lot quicker and the Luddites and Lawyers are going to whine as much as the criminal billionaire class is going to exploit those that fail to understand the technology. Almost all of these situations/articles/lawsuits/politics are about trying to create a monopoly for Altmann and prevent public adoption of open source offline AI as much as possible. If everyone had access to a 70B or larger LLM right now, all of the internet would change overnight. You don’t need search engines or much of the internet structure of exploitation any more. If a company can control this technology, with majority adoption, that company will be the successor of Microsoft and then Google. All the peripheral nonsense is about controlling the market by any means necessary, preventing open minded people from looking into the utility, and playing gatekeepers to the new world paradigm of the next decade or two.
Paint making companies typically don’t have massive databases of illegally obtained copies of other people’s copyrighted images. Nor does paint fundamentally requires the existence of said database for the manufacture of paint itself. That’s where the “it’s just a tool” argument falls apart.
I love your enthusiasm though, to think that giving access to a massive llm for everyone would rid the internet of exploitation is extremely naïve and hopelessly optimistic.
deleted by creator
How do you show a computer something? Do you perhaps add pictures to a database that the program then processes? I understand it’s not a folder called SpongeBob but at some point somebody fed it pictures of SpongeBob and now those picture exists in a database.
The reason the legal system is slow is because it’s complicated and everyone turns into philosophy majors when discussing thing like what a database is but are somehow used words like “show” without any explanation.
The reason investors are comfortable pouring billions into AI is because investors either think they are going to make the money back before regulation catches up or they are just coked up maniacs investing anything that sounds shiny.
Saving pictures of spongebob in a file system to use as drawing reference also doesn’t equal copyright infringement.