We present Audio Flamingo 3 (AF3), a fully open state-of-the-art (SOTA) large audio-language model that advances reasoning and understanding across speech, sound, and music. AF3 introduces: (i) AF-Whisper, a unified audio encoder trained using a novel strategy for joint representation learning across all 3 modalities of speech, sound, and music; (ii) flexible, on-demand thinking, allowing the model to do chain-of-thought-type reasoning before answering; (iii) multi-turn, multi-audio chat; (iv) long audio understanding and reasoning (including speech) up to 10 minutes; and (v) voice-to-voice interaction. To enable these capabilities, we propose several large-scale training datasets curated using novel strategies, including AudioSkills-XL, LongAudio-XL, AF-Think, and AF-Chat, and train AF3 with a novel five-stage curriculum-based training strategy. Trained on only open-source audio data, AF3 achieves new SOTA results on over 20+ (long) audio understanding and reasoning benchmarks, surpassing both open-weight and closed-source models trained on much larger datasets.
The current thing is Meta is very vocal about the EU AI act.
And they’re not wrong.
That doesn’t quite back up what you claimed, though. You wrote: “They want everyone else to lose rights, while they themselves retain full rights,”
Their claim of Fair Use seems straightforward. That’s not everyone else losing their rights. I am not aware where they lobby for “full rights” for themselves, whatever that means.
And Meta are super strict with trademark law and parts of copyright when it’s the other way around. I lately spent some time reading how you can and cannot use or mention their trademark, embed it into your website. And they’re very strict if it’s me using their stuff. The other way around they want free reign.
There are different kinds of intellectual property. Trademarks are different from copyright. Then there’s also trade secrets, patents, publicity rights, privacy, etc.
Generally, you can use any Trademark as long as you don’t use it for trade or harm the business that owns it. I’m not going to look it up but I’m guessing that the rules are around not giving a misleading impression of your page’s relationship with Meta.
As for copyright, when you are in the US you can make Fair Use of their materials, regardless of what the license says.
That you can’t do that in Europe is not Meta’s fault.
I mean manifacturing a supply chain for them where they get things practically for free.
Oh. You’re talking about Net Neutrality and not copyright. I’m afraid I don’t know enough about the network business to form an opinion on that.
I don’t think what happened to you was a subsidy, though. You’re offering something for free, and apparently Alibaba took advantage of you for that. That’s just how it is, sometimes.
We were talking about a specific lecture that questions the entire concept of copyright as we have it now.
I touched on a lot of subjects. In a nutshell, I am against rent-seeking. No more, no less.
That’s correct. My point was that they’re following an agenda as well. But they’re correct that that signature has consequences and doesn’t translate into unlimited corporate growth.
where they lobby for “full rights” for themselves, whatever that means
OpenAI is very secretive and not transparent at all. They promised to release a model which they’ve delayed several times now. But other than that, they don’t write papers for some time now, they don’t share stuff. And they do other small little things for their own benefit and so the competition can’t do the same. They even go ahead and keep simple numbers like the model size a big trade secret. They guard everything closely and they like it that way. It’s the literal opposite of free exchange of information. And they do that with most of their business decisions.
And Meta’s model come with a license plus an EULA. And I’ve lost track of the current situation, but as an Europen I’ve been prohibited from downloading and using Meta’s LLMs for some time. Sometimes they also want my e-mail address, I have to abide by their terms and I don’t like the terms… That’s their rights. And they’re making use of them. It is not I can just download it and do whatever because that were Fair Use as well… They retain rights, and many of them.
Trademark is definitely part of the conversation. Can models paint a mickey mouse? other trademarked stuff? Sure they do. And it’s the same trademark that protects fictional characters and other concepts. So once AI ingests that, it needs addressing as well. And it’s not just that. They (Meta/Instagram) also address copyright and they also have a lot of rules about that. With that specific thing I was more concerned with their logo, though, and that is mostly trademark law.
You’re talking about Net Neutrality and not copyright […]
No, I am talking about copyright. Net neutrality has nothing to do with any of this.
[…] and apparently Alibaba took advantage of you for that. That’s just how it is, sometimes.
Yeah, that’s kind of my point. They’re taking advantage of people. And kind of in a mischevious way, because they’ve thought about how they can defeat the usual defenses. How do you think I’m supposed to deal with that? Let everyone take advantage of me? Take down my server and quit this place?
I am against rent-seeking. No more, no less
I’m with you on this. As long as it’s fair. Make sure AI companies aren’t rent-seeking either. Because currently that’s big part of their business model.
I mean what do you think the big piles of information the gather for training are? That they don’t share and do contracts and even buy up companies to get exclusive access… How they gobble up the resources? And how prices for graphics cards skyrocket first due to crypto and then due to AI? That’s kinda rent-seeking on several different levels…
Scarlett Johansson […] that turned out to be a false
It’s definitely inspired by her performance on “Her”. Sam Altman himself made a reference, connecting his product and that specific movie. It’s likely not a coincidence. And they kind of followed up and removed that voice along with a few others. Clearly not because they were right and this is an uncontroversial topic.
as an Europen I’ve been prohibited from downloading and using Meta’s LLMs for some time.
The vision models are not for the EU. Meta trained them on Facebook data. The EU did not allow that. Meta said that this would mean that their models would not have the necessary knowledge to be useful for European users, and disallowed their use in the EU. It also means that some EU regulations don’t apply, but they did not give that as a reason, I think.
In any case, it seems quite fair to me. If Europe does not want to pitch in, but only makes demands, then why should it reap the benefits?
Some other recent open models by Tencent and Huawei are also not for the EU. That is in response to the AI Act. I am surprised that it is not a standard clause yet.
And they’re making use of them. It is not I can just download it and do whatever because that were Fair Use as well… They retain rights, and many of them.
No. They can’t override fair use. That’s the point of fair use. You cannot do what you like with it because you are in Europe and don’t have fair use.
I really don’t understand how that is supposed to make sense. You demand that American companies should be giving more free stuff to Europe. But also, they should be following European laws in the US and pay rent-seekers for the privilege. It’s ridiculous.
No, I am talking about copyright. Net neutrality has nothing to do with any of this.
I don’t see how that is about copyright.
Make sure AI companies aren’t rent-seeking either. Because currently that’s big part of their business model.
Back that up or retract the statement.
It’s definitely inspired by her performance on “Her”. Sam Altman himself made a reference, connecting his product and that specific movie.
What you are saying is that someone who sounds a bit like Scarlet Johanson must get permission from her to speak in public.
Maybe there is a language issue here. But from what you are writing, you are not against rent-seeking. You demand privileges and free money for special people; a new aristocracy. You even want privileges for Meta, even though you use these privileges as arguments why these privileges should exist. This is all absolutely ridiculous.
Let me rephrase it a bit: OpenAI is one of the prime examples. They wrote one or two scientific papers early on. And then they stopped. Deliberately. They’re not contributing anything to science. All they invent is strictly for-profit and happens behind closed doors. They take, they don’t contribute back.
And the main asset in the digital age is information. It’s necessary for AI training to pile that up in a dataset. So that’s their supply and they want it cheap because they need a lot of it. That’s where they generate their “rent” from. Do they contribute anything back with that? No. They “seek” it and pile it up and that becomes their trade secret. And that’s why I call them “rent-seeking”. (Thanks for the Wikipedia article, yours was way better than the convoluted definition I read yesterday…) And it even translates to the illegal activities mentioned in the Wikipedia article. Meta has admitted to pirating books to pile up datasets faster. OpenAI likely did the same(?) It’s just that they keep everything a secret. No company tells you anymore whether your content went into a dataset, since you might be able to use the legal system against them.
We can see that also with some platforms like Github, which turned out to be a great resource for AI training for Microsoft. Harvesting data is one of the main business models these days. And having that data is what pays the rent. It’s not all there is to it. There’s a lot of work in compiling it, curating datasets, RLHF… And then of course the science behind AI itself. But the last one aside, that’s also often done with negative effects on society. We all know about the precarious situation of the data labellers in Africa.
And then all of this, plus the experts they get from the public universities and all the GPUs in the datacenters and some electricity get turned into their (OpenAI’s) intellectual property.
You demand that American companies should be giving more free stuff to Europe. But also, they should be following European laws in the US and pay rent-seekers for the privilege. It’s ridiculous.
Maybe tell me what they contribute back? Is there anything they give? I don’t think so. They mainly seem like parasites to me, freeloading on all the information they can gather in electronic form. And then? Is there anything we get in return?
And maybe we’re having a small misunderstanding here. I’m not Anti-AI or anything. I just want people who take something from society, to contribute something back to society. And they really like to take, but they themselves painstakingly avoid disclosing the smallest little details.
I’d say there is two options. Either they do contribute back and we find a healthy relationship between society and big-tech AI companies. That’d make it completely fine if they also take things and it’s give-and-take. Or they want to do a for-profit dubious service with no-one having a say in it or look inside or be able to use it aside from what they devised for society… But then the same rules apply to them. They then also have to contribute back in form of money to pay for their supplies and license the content that goes in to their product.
My own opinion: Allow AI and cater to scientific progress. In a healthy way, though. The companies do AI and they get resources. But they’re obligated to transparency and contribute back. For example open-weight models are a good idea. I’d go further than that, because science and society also needs to address biases, what AI can be used for, and a bunch of issues that come with it. Like misinformation, spam… The companies aren’t incentivised to address that. And it starts to show impact on the internet and society. And regulations are the way to make them do what’s necessary or benefitial in the long run.
you are not against rent-seeking
I’m generally against hyper-capitalism and big corporations. They often don’t do us any good. It’s a bit complicated with AI since those companies are over-valued and there is a big investment bubble, which isn’t necessarily about society. But the copyright-industry is part of the same picture. Spotify for example isn’t healthy for society at all. And the Höffner video you linked had a lot of good points about that. I’m not sure whether you’re aware of the other side of the coin… For example I’ve talked to some musicians (copyright holders) and I’ve written some few pages of technical documentation and I’m aware that it takes several weeks behind the desk to produce 40 pages. And like half a year or more to write a novel. And somehow you need to eat something during those months… So with capitalism it’s not always easy. The current situation is sub par. And the copyright industry is mainly a business model to leech on people who create something. We’d be better off if we cut out the middle men.
I see. Thank you. I’m afraid you don’t quite understand what rent-seeking means. Let me try a hypothetical example.
Food is pretty cheap. But suppose a single company had a monopoly on supplying food. How much would people be willing to pay? People would give almost anything they have.
The reason food is cheap, is because there is no monopoly. If someone charges more than the competition, you go to the competition. You get a market price. It’s complicated but one thing that goes into the price of food is the cost of labor. Many people must work to supply food.
These workers could do other things with their time. But also, other people could do their work of supplying food. No one has a monopoly. Eventually, the cost of labor depends on how much money you must offer to people to be willing to put up with the work.
If someone had a monopoly on food supply, they could charge fantastic prices. Their cost would not change. The difference between the market price and the monopoly price is the monopoly rent.
Let’s take this closer to AI training.
Let’s say there’s some guy who’s searching through libraries and archives for stuff to digitize so that it can be sold to AI companies for training. He finds an archive of old newspapers. How much would the market price for scans of these newspapers be? Let’s ignore copyright for now.
Maybe the potential buyer could send someone else to scan the papers. So our guy could only ask to be paid for the labor in scanning the papers.
So our guy will not say where he found that archive. That is his trade secret. The potential buyer would have to send someone to search for that archive and scan it. That means our guy can ask to be paid for his labor in finding the archive AND scanning it. The potential buyer will only hire someone else to do that if our guy asks too high a price.
There is a way our guy can get more. If he destroys all remaining copies of these newspapers, then he has a monopoly. Now he can ask for as much as the potential buyer is willing to pay. That’s a monopoly rent.
Now copyright… Those newspapers are probably under copyright. If our guy is in Europe, he will have to get permission by the rights-holder to scan the papers. Copyright is a monopoly enforced by the state. The rights-holder can now extract the monopoly rent from our guy.
If the publisher has gone out of business, the rights-holders may be hard to find but he has to make the effort. In practice, this means that there is really no point in making the effort to preserve European culture and history. The copyright people don’t just harm technological progress and the European economy, they harm European culture. That’s parasitic.
You’re making the argument that OpenAI and others are trying to get paid. That’s not rent-seeking. Ideally, our laws ensure that seeking money makes you work for the benefit of other people.
Farmers work for money, and everyone else gets a lot of good, cheap food out of it. If you demand that farmers should work for free, then you’re demanding that many of us should starve.
Yes, thanks. I think I agree with you here. The copyright model is rent-seeking by nature. And we could likely do better.
Ultimately a book author wants to sell his product to me. How it’s done isn’t ideal at all, but that’s kind of his motivation. So I don’t think you want him to starve because books aren’t a valid product to sell, but it’s about the way it’s done.
My single argument here is: Look at the AI industry and compare what you just said. They’re doing exactly the same thing, just 20 times worse. And you should be opposed to that, too!
You’re making the argument that OpenAI and others are trying to get paid. That’s not rent-seeking. Ideally, our laws ensure that seeking money makes you work for the benefit of other people.
I’ve laid down how the big AI companies do nothing for the benefit of other people. I’ve asked you what you think they contribute (in case I’m wrong) and you also came up with zero things they do for other people. So it boils down pretty much to the same. A book author creates intellectual property to sell a product to people “trying to get paid”. An AI company creates intellectual property to sell a product in order “to get paid”. It’s the same thing.
Let’s tackle monopolies: Everyone can read a book in case they can get ahold of it. And with some intelligence and time, everyone can write a book. That’s a monopoly in your eyes. And while we have weird concepts like Fixed book price, that’s mainly meant to foster healthy competition and promote the sales of interesting books rather than just blockbusters. Though, I really have a wide selection of sci-fi books available and I’ve bought several of them for 50ct. And I have a public library card for 26€ a year and I can read 500 books a day if I like, and I get a selection of blue-rays on top. That’s what the monopoly does to me. (With everything else I agree with you. It’s bad that they pile that information up and that it’s not freely available but a business model.)
Now AI: I wanted to try Sora because they pioneered video models on that scale. For a long time they said “no thanks” to me. We won’t provide that service to you, it’s just for testing and a select few people we like. You get none of it. You can’t even pay, no matter how much. Then I waited for half a year and wanted to try Google’s Veo 3 and seems the interesting stuff is in the $100 a month tier. And what the fuck, the output is supposed to come with copyright? And terms and conditions?
I can’t get that service anywhere else and they just say tough luck, it’s gonna be $100 to try 8s video snippets because the company is amongst the select few who offer that (…cough…monopoly…). Or use Sora, now that it’s available, but they’ve changed the model to their likings and it became a bit worse than the initial trailers, and by the way: that is $200 a month.
So yeah, fantastic prices, also quite random, offered by less than a handful of mega corporations, based on their IP, they design “the food” so I need to eat what they devised for me. And I can’t even eat the food the way I like, but have to follow their terms and procedures.
Same applies to text gen AI. It’s a monopoly of billion dollar companies who get to shape it. Me or you, we can’t do it. It’s almost impossible to train a base model on that scale. And I can’t even use them for what I like. I wanted to try story-writing and chose some dark sci-fi and a murder mystery story, and it’s designed to refuse service to me. Instead it’ll give long lectures about ethics to me about how murder is wrong. Yeah, no shit sherlock. Interestingly, AIstudio did help me write exploits for computer security vulnerabilities for some blue-teaming I did.
In you analogy with the food: I’m hungry. Now a company comes. Of course they don’t offer me the food I like, but they say I have to eat what they designed for me. And it’s going to be a random $100 or $200. And I can’t touch the food or eat it myself, they’re adamant in spoon-feeding it as a very specific service to me. I can never cook my own food, since the resources for that cost like $100 million. And they keep the recipe a closely guarded secret and they’re so obsessed with it, they don’t even tell me the nutritional value or anything about what went in to the designer food I need to lick off their spoon.
If you want in on the business. Also tough luck. You now need to start from scratch with everything, since the data is hoarded by the big players and they don’t share. On the level of ChatGPT… Well, you can get in like Microsoft and pay some billion dollars. But with that kind of money, it’s not super accessible, exactly like you’d expect from a monopoly. Other players can get in, like the Chinese. And how do they do it? It’s sponsored/subsidised with billions of dollars by the government. And that’s what it takes and they do it this way for more than a decade now.
[…] some guy who’s searching through libraries and archives for stuff to digitize […]
That’s kind of a difficult example. I think archiving and digitizing is okay and in most cases he can do it. Copying for own use is always fine and that’s phrased so it applies to companies as well. Archival is such an allowed use. Public libraries have a seperate paragraph. They can copy and can do necessary changes like digitizing. That applies to commercial libraries as well, as long as they’re open to the public. So we have you covered here. And there is more. For some works it’s mandatory to preserve them. They need to be sent to a library and the government specifically takes care to preserve (European) culture with these things. They’re mandated for example to show up in the shelves of the national library.
I seriously doubt the AI companies are going to help with preserving culture, though. The incident with Meta torrenting books for example had them on the opposite side. They took care to “leech”. That is, they took out information from the network and made sure not to balance that out. Resulting in a negative balance on the network and “free” information exchange.
If your worried “our guy can get more. If he destroys all remaining copies of these newspapers […]” I believe you found him. It’s not exactly that, since that kind of information is duplicated and can’t be burnt that way. But Meta do the closest thing there is to it. There are resources to exchange information and culture, and they deliberately “burn” those resources for their own benefit and to the disadvantage of everyone else.
If the publisher has gone out of business […]
And that has also already happened in the realm of AI. They change their service or cease operation. And since AI is just a service and the users don’t own anything, they’re then left with nothing. First big thing I’m aware of is how Replika AI dropped the main use-case of their service and millions of people were affected. And that is way worse than books. I have been banned from services. They just said “suspicious behaviour” and deactivated my account and I was stripped access. A book author cannot do that. I can still buy his book even if he doesn’t like me. Cancelling service and doing whatever they like with the userbase is what big tech companies do.
So my argument is: You’ve really made a good argument in pointing out countless severe shortcomings of current copyright culture. And I’ve learned a lot. The AI industry is an even worse manifestation of that. They also pile up intellectual property for their product. And contrary to a book, I don’t even own the darn physical thing, but they introduce all kinds of other shenanigans and make it something I rent, boarded-up, and then they often also apply copyright on top. They stepped up everything that is bad about copyright, several notches.
And then the successful players are all ruthless. They’re not just selling me a book. Currently they’re mainly interested in investment money and I’m not really 100% their customer. They happily weigh down on society. In my last comment I addressed how they deliberately evade law and some big players even pirate and do things that are currently illegal. Just for their own benefit. Enshittification of the internet is a side-effect they gladly accept. And they’re expected to displace more things with their product (including culture) and neither do they contribute back, nor do they care about the consequences.
I think Fair Use might be a nice concept. It definitely is a regulation mechanism. The government/society is taking away privileges of people (copyright holders) with that. To the benefit of society and progress. Now go ahead and apply the same thing to AI companies! Regulate them as well!
It doesn’t have to be very bad. For example, you can’t just become a farmer. You must buy a farm. There are problems with that, but they aren’t big. Food is cheap and plentiful.
The people who make AIs want to be paid for their work. The people who build and maintain the datacenters, the hardware, the electricity, and so on. Should they work for free?
The problem starts when people want more than that.
I really have a wide selection of sci-fi books available
Have you ever noticed how many of these books were written in the USA and cheaply translated into German?
Let’s tackle monopolies: Everyone can read a book in case they can get ahold of it. And with some intelligence and time, everyone can write a book. That’s a monopoly in your eyes
No. I think you misunderstood. An exclusive copyright is a monopoly by definition.
The incident with Meta torrenting books for example had them on the opposite side. They took care to “leech”.
They were legally required to do that. Downloading the books for their purposes was fair use. Uploading would certainly not have been.
I don’t understand how this accusation makes the slightest bit of sense. These torrents are a violation of EU copyright law. Your argument means that these torrents shouldn’t exist in the first place. You are not demanding that Meta should be allowed to upload these books. You’re saying they shouldn’t be allowed to download them, either.
And they’re not wrong.
That doesn’t quite back up what you claimed, though. You wrote: “They want everyone else to lose rights, while they themselves retain full rights,”
Their claim of Fair Use seems straightforward. That’s not everyone else losing their rights. I am not aware where they lobby for “full rights” for themselves, whatever that means.
There are different kinds of intellectual property. Trademarks are different from copyright. Then there’s also trade secrets, patents, publicity rights, privacy, etc.
Generally, you can use any Trademark as long as you don’t use it for trade or harm the business that owns it. I’m not going to look it up but I’m guessing that the rules are around not giving a misleading impression of your page’s relationship with Meta.
As for copyright, when you are in the US you can make Fair Use of their materials, regardless of what the license says.
That you can’t do that in Europe is not Meta’s fault.
Oh. You’re talking about Net Neutrality and not copyright. I’m afraid I don’t know enough about the network business to form an opinion on that.
I don’t think what happened to you was a subsidy, though. You’re offering something for free, and apparently Alibaba took advantage of you for that. That’s just how it is, sometimes.
I touched on a lot of subjects. In a nutshell, I am against rent-seeking. No more, no less.
BTW, that turned out to be a false.
That’s correct. My point was that they’re following an agenda as well. But they’re correct that that signature has consequences and doesn’t translate into unlimited corporate growth.
OpenAI is very secretive and not transparent at all. They promised to release a model which they’ve delayed several times now. But other than that, they don’t write papers for some time now, they don’t share stuff. And they do other small little things for their own benefit and so the competition can’t do the same. They even go ahead and keep simple numbers like the model size a big trade secret. They guard everything closely and they like it that way. It’s the literal opposite of free exchange of information. And they do that with most of their business decisions.
And Meta’s model come with a license plus an EULA. And I’ve lost track of the current situation, but as an Europen I’ve been prohibited from downloading and using Meta’s LLMs for some time. Sometimes they also want my e-mail address, I have to abide by their terms and I don’t like the terms… That’s their rights. And they’re making use of them. It is not I can just download it and do whatever because that were Fair Use as well… They retain rights, and many of them.
Trademark is definitely part of the conversation. Can models paint a mickey mouse? other trademarked stuff? Sure they do. And it’s the same trademark that protects fictional characters and other concepts. So once AI ingests that, it needs addressing as well. And it’s not just that. They (Meta/Instagram) also address copyright and they also have a lot of rules about that. With that specific thing I was more concerned with their logo, though, and that is mostly trademark law.
No, I am talking about copyright. Net neutrality has nothing to do with any of this.
Yeah, that’s kind of my point. They’re taking advantage of people. And kind of in a mischevious way, because they’ve thought about how they can defeat the usual defenses. How do you think I’m supposed to deal with that? Let everyone take advantage of me? Take down my server and quit this place?
I’m with you on this. As long as it’s fair. Make sure AI companies aren’t rent-seeking either. Because currently that’s big part of their business model.
I mean what do you think the big piles of information the gather for training are? That they don’t share and do contracts and even buy up companies to get exclusive access… How they gobble up the resources? And how prices for graphics cards skyrocket first due to crypto and then due to AI? That’s kinda rent-seeking on several different levels…
It’s definitely inspired by her performance on “Her”. Sam Altman himself made a reference, connecting his product and that specific movie. It’s likely not a coincidence. And they kind of followed up and removed that voice along with a few others. Clearly not because they were right and this is an uncontroversial topic.
The vision models are not for the EU. Meta trained them on Facebook data. The EU did not allow that. Meta said that this would mean that their models would not have the necessary knowledge to be useful for European users, and disallowed their use in the EU. It also means that some EU regulations don’t apply, but they did not give that as a reason, I think.
In any case, it seems quite fair to me. If Europe does not want to pitch in, but only makes demands, then why should it reap the benefits?
Some other recent open models by Tencent and Huawei are also not for the EU. That is in response to the AI Act. I am surprised that it is not a standard clause yet.
No. They can’t override fair use. That’s the point of fair use. You cannot do what you like with it because you are in Europe and don’t have fair use.
I really don’t understand how that is supposed to make sense. You demand that American companies should be giving more free stuff to Europe. But also, they should be following European laws in the US and pay rent-seekers for the privilege. It’s ridiculous.
I don’t see how that is about copyright.
Back that up or retract the statement.
What you are saying is that someone who sounds a bit like Scarlet Johanson must get permission from her to speak in public.
Maybe there is a language issue here. But from what you are writing, you are not against rent-seeking. You demand privileges and free money for special people; a new aristocracy. You even want privileges for Meta, even though you use these privileges as arguments why these privileges should exist. This is all absolutely ridiculous.
Here’s rent-seeking in the German Wikipedia: https://de.wikipedia.org/wiki/Rentenökonomie
Let me rephrase it a bit: OpenAI is one of the prime examples. They wrote one or two scientific papers early on. And then they stopped. Deliberately. They’re not contributing anything to science. All they invent is strictly for-profit and happens behind closed doors. They take, they don’t contribute back.
And the main asset in the digital age is information. It’s necessary for AI training to pile that up in a dataset. So that’s their supply and they want it cheap because they need a lot of it. That’s where they generate their “rent” from. Do they contribute anything back with that? No. They “seek” it and pile it up and that becomes their trade secret. And that’s why I call them “rent-seeking”. (Thanks for the Wikipedia article, yours was way better than the convoluted definition I read yesterday…) And it even translates to the illegal activities mentioned in the Wikipedia article. Meta has admitted to pirating books to pile up datasets faster. OpenAI likely did the same(?) It’s just that they keep everything a secret. No company tells you anymore whether your content went into a dataset, since you might be able to use the legal system against them.
We can see that also with some platforms like Github, which turned out to be a great resource for AI training for Microsoft. Harvesting data is one of the main business models these days. And having that data is what pays the rent. It’s not all there is to it. There’s a lot of work in compiling it, curating datasets, RLHF… And then of course the science behind AI itself. But the last one aside, that’s also often done with negative effects on society. We all know about the precarious situation of the data labellers in Africa.
And then all of this, plus the experts they get from the public universities and all the GPUs in the datacenters and some electricity get turned into their (OpenAI’s) intellectual property.
Maybe tell me what they contribute back? Is there anything they give? I don’t think so. They mainly seem like parasites to me, freeloading on all the information they can gather in electronic form. And then? Is there anything we get in return?
And maybe we’re having a small misunderstanding here. I’m not Anti-AI or anything. I just want people who take something from society, to contribute something back to society. And they really like to take, but they themselves painstakingly avoid disclosing the smallest little details.
I’d say there is two options. Either they do contribute back and we find a healthy relationship between society and big-tech AI companies. That’d make it completely fine if they also take things and it’s give-and-take. Or they want to do a for-profit dubious service with no-one having a say in it or look inside or be able to use it aside from what they devised for society… But then the same rules apply to them. They then also have to contribute back in form of money to pay for their supplies and license the content that goes in to their product.
My own opinion: Allow AI and cater to scientific progress. In a healthy way, though. The companies do AI and they get resources. But they’re obligated to transparency and contribute back. For example open-weight models are a good idea. I’d go further than that, because science and society also needs to address biases, what AI can be used for, and a bunch of issues that come with it. Like misinformation, spam… The companies aren’t incentivised to address that. And it starts to show impact on the internet and society. And regulations are the way to make them do what’s necessary or benefitial in the long run.
I’m generally against hyper-capitalism and big corporations. They often don’t do us any good. It’s a bit complicated with AI since those companies are over-valued and there is a big investment bubble, which isn’t necessarily about society. But the copyright-industry is part of the same picture. Spotify for example isn’t healthy for society at all. And the Höffner video you linked had a lot of good points about that. I’m not sure whether you’re aware of the other side of the coin… For example I’ve talked to some musicians (copyright holders) and I’ve written some few pages of technical documentation and I’m aware that it takes several weeks behind the desk to produce 40 pages. And like half a year or more to write a novel. And somehow you need to eat something during those months… So with capitalism it’s not always easy. The current situation is sub par. And the copyright industry is mainly a business model to leech on people who create something. We’d be better off if we cut out the middle men.
I see. Thank you. I’m afraid you don’t quite understand what rent-seeking means. Let me try a hypothetical example.
Food is pretty cheap. But suppose a single company had a monopoly on supplying food. How much would people be willing to pay? People would give almost anything they have.
The reason food is cheap, is because there is no monopoly. If someone charges more than the competition, you go to the competition. You get a market price. It’s complicated but one thing that goes into the price of food is the cost of labor. Many people must work to supply food.
These workers could do other things with their time. But also, other people could do their work of supplying food. No one has a monopoly. Eventually, the cost of labor depends on how much money you must offer to people to be willing to put up with the work.
If someone had a monopoly on food supply, they could charge fantastic prices. Their cost would not change. The difference between the market price and the monopoly price is the monopoly rent.
Let’s take this closer to AI training.
Let’s say there’s some guy who’s searching through libraries and archives for stuff to digitize so that it can be sold to AI companies for training. He finds an archive of old newspapers. How much would the market price for scans of these newspapers be? Let’s ignore copyright for now.
Maybe the potential buyer could send someone else to scan the papers. So our guy could only ask to be paid for the labor in scanning the papers.
So our guy will not say where he found that archive. That is his trade secret. The potential buyer would have to send someone to search for that archive and scan it. That means our guy can ask to be paid for his labor in finding the archive AND scanning it. The potential buyer will only hire someone else to do that if our guy asks too high a price.
There is a way our guy can get more. If he destroys all remaining copies of these newspapers, then he has a monopoly. Now he can ask for as much as the potential buyer is willing to pay. That’s a monopoly rent.
Now copyright… Those newspapers are probably under copyright. If our guy is in Europe, he will have to get permission by the rights-holder to scan the papers. Copyright is a monopoly enforced by the state. The rights-holder can now extract the monopoly rent from our guy.
If the publisher has gone out of business, the rights-holders may be hard to find but he has to make the effort. In practice, this means that there is really no point in making the effort to preserve European culture and history. The copyright people don’t just harm technological progress and the European economy, they harm European culture. That’s parasitic.
You’re making the argument that OpenAI and others are trying to get paid. That’s not rent-seeking. Ideally, our laws ensure that seeking money makes you work for the benefit of other people.
Farmers work for money, and everyone else gets a lot of good, cheap food out of it. If you demand that farmers should work for free, then you’re demanding that many of us should starve.
Yes, thanks. I think I agree with you here. The copyright model is rent-seeking by nature. And we could likely do better.
Ultimately a book author wants to sell his product to me. How it’s done isn’t ideal at all, but that’s kind of his motivation. So I don’t think you want him to starve because books aren’t a valid product to sell, but it’s about the way it’s done.
My single argument here is: Look at the AI industry and compare what you just said. They’re doing exactly the same thing, just 20 times worse. And you should be opposed to that, too!
I’ve laid down how the big AI companies do nothing for the benefit of other people. I’ve asked you what you think they contribute (in case I’m wrong) and you also came up with zero things they do for other people. So it boils down pretty much to the same. A book author creates intellectual property to sell a product to people “trying to get paid”. An AI company creates intellectual property to sell a product in order “to get paid”. It’s the same thing.
Let’s tackle monopolies: Everyone can read a book in case they can get ahold of it. And with some intelligence and time, everyone can write a book. That’s a monopoly in your eyes. And while we have weird concepts like Fixed book price, that’s mainly meant to foster healthy competition and promote the sales of interesting books rather than just blockbusters. Though, I really have a wide selection of sci-fi books available and I’ve bought several of them for 50ct. And I have a public library card for 26€ a year and I can read 500 books a day if I like, and I get a selection of blue-rays on top. That’s what the monopoly does to me. (With everything else I agree with you. It’s bad that they pile that information up and that it’s not freely available but a business model.)
Now AI: I wanted to try Sora because they pioneered video models on that scale. For a long time they said “no thanks” to me. We won’t provide that service to you, it’s just for testing and a select few people we like. You get none of it. You can’t even pay, no matter how much. Then I waited for half a year and wanted to try Google’s Veo 3 and seems the interesting stuff is in the $100 a month tier. And what the fuck, the output is supposed to come with copyright? And terms and conditions?
I can’t get that service anywhere else and they just say tough luck, it’s gonna be $100 to try 8s video snippets because the company is amongst the select few who offer that (…cough…monopoly…). Or use Sora, now that it’s available, but they’ve changed the model to their likings and it became a bit worse than the initial trailers, and by the way: that is $200 a month.
So yeah, fantastic prices, also quite random, offered by less than a handful of mega corporations, based on their IP, they design “the food” so I need to eat what they devised for me. And I can’t even eat the food the way I like, but have to follow their terms and procedures.
Same applies to text gen AI. It’s a monopoly of billion dollar companies who get to shape it. Me or you, we can’t do it. It’s almost impossible to train a base model on that scale. And I can’t even use them for what I like. I wanted to try story-writing and chose some dark sci-fi and a murder mystery story, and it’s designed to refuse service to me. Instead it’ll give long lectures about ethics to me about how murder is wrong. Yeah, no shit sherlock. Interestingly, AIstudio did help me write exploits for computer security vulnerabilities for some blue-teaming I did.
In you analogy with the food: I’m hungry. Now a company comes. Of course they don’t offer me the food I like, but they say I have to eat what they designed for me. And it’s going to be a random $100 or $200. And I can’t touch the food or eat it myself, they’re adamant in spoon-feeding it as a very specific service to me. I can never cook my own food, since the resources for that cost like $100 million. And they keep the recipe a closely guarded secret and they’re so obsessed with it, they don’t even tell me the nutritional value or anything about what went in to the designer food I need to lick off their spoon.
If you want in on the business. Also tough luck. You now need to start from scratch with everything, since the data is hoarded by the big players and they don’t share. On the level of ChatGPT… Well, you can get in like Microsoft and pay some billion dollars. But with that kind of money, it’s not super accessible, exactly like you’d expect from a monopoly. Other players can get in, like the Chinese. And how do they do it? It’s sponsored/subsidised with billions of dollars by the government. And that’s what it takes and they do it this way for more than a decade now.
That’s kind of a difficult example. I think archiving and digitizing is okay and in most cases he can do it. Copying for own use is always fine and that’s phrased so it applies to companies as well. Archival is such an allowed use. Public libraries have a seperate paragraph. They can copy and can do necessary changes like digitizing. That applies to commercial libraries as well, as long as they’re open to the public. So we have you covered here. And there is more. For some works it’s mandatory to preserve them. They need to be sent to a library and the government specifically takes care to preserve (European) culture with these things. They’re mandated for example to show up in the shelves of the national library.
I seriously doubt the AI companies are going to help with preserving culture, though. The incident with Meta torrenting books for example had them on the opposite side. They took care to “leech”. That is, they took out information from the network and made sure not to balance that out. Resulting in a negative balance on the network and “free” information exchange.
If your worried “our guy can get more. If he destroys all remaining copies of these newspapers […]” I believe you found him. It’s not exactly that, since that kind of information is duplicated and can’t be burnt that way. But Meta do the closest thing there is to it. There are resources to exchange information and culture, and they deliberately “burn” those resources for their own benefit and to the disadvantage of everyone else.
And that has also already happened in the realm of AI. They change their service or cease operation. And since AI is just a service and the users don’t own anything, they’re then left with nothing. First big thing I’m aware of is how Replika AI dropped the main use-case of their service and millions of people were affected. And that is way worse than books. I have been banned from services. They just said “suspicious behaviour” and deactivated my account and I was stripped access. A book author cannot do that. I can still buy his book even if he doesn’t like me. Cancelling service and doing whatever they like with the userbase is what big tech companies do.
So my argument is: You’ve really made a good argument in pointing out countless severe shortcomings of current copyright culture. And I’ve learned a lot. The AI industry is an even worse manifestation of that. They also pile up intellectual property for their product. And contrary to a book, I don’t even own the darn physical thing, but they introduce all kinds of other shenanigans and make it something I rent, boarded-up, and then they often also apply copyright on top. They stepped up everything that is bad about copyright, several notches.
And then the successful players are all ruthless. They’re not just selling me a book. Currently they’re mainly interested in investment money and I’m not really 100% their customer. They happily weigh down on society. In my last comment I addressed how they deliberately evade law and some big players even pirate and do things that are currently illegal. Just for their own benefit. Enshittification of the internet is a side-effect they gladly accept. And they’re expected to displace more things with their product (including culture) and neither do they contribute back, nor do they care about the consequences.
I think Fair Use might be a nice concept. It definitely is a regulation mechanism. The government/society is taking away privileges of people (copyright holders) with that. To the benefit of society and progress. Now go ahead and apply the same thing to AI companies! Regulate them as well!
Sorry, but you have not understood the concept yet.
You demand that AI companies should work for free and give things away for free. But they also should pay people that make no contribution.
They do, just like farmers. If people did not find their services beneficial, they would not pay.
This is called a barrier to entry (Marktschranke).
It doesn’t have to be very bad. For example, you can’t just become a farmer. You must buy a farm. There are problems with that, but they aren’t big. Food is cheap and plentiful.
The people who make AIs want to be paid for their work. The people who build and maintain the datacenters, the hardware, the electricity, and so on. Should they work for free?
The problem starts when people want more than that.
Have you ever noticed how many of these books were written in the USA and cheaply translated into German?
No. I think you misunderstood. An exclusive copyright is a monopoly by definition.
They were legally required to do that. Downloading the books for their purposes was fair use. Uploading would certainly not have been.
I don’t understand how this accusation makes the slightest bit of sense. These torrents are a violation of EU copyright law. Your argument means that these torrents shouldn’t exist in the first place. You are not demanding that Meta should be allowed to upload these books. You’re saying they shouldn’t be allowed to download them, either.