Kernochan Symposium 2024: Panel 1
[CAITE MCGRAIL] Good morning, everyone. If you could please grab your coffees and take your seats, and we're about to get started. And you'll have a better view in the center of the room as opposed to the sides if people want to move there. So good morning, everyone. My name is Caite McGrail, and I'm the fellow here at the Kernochan Center. And I'm delighted to welcome all of you to our annual symposium, "The Past, Present, and Future of Copyright Licensing."
Before we get started, I want to go over a few logistics. The first is that you will find bios for all of our moderators and panelists and the readings for today using the QR codes on the back of your agendas or on our website. And I'd like you all to be aware that today's symposium is being recorded, and the video will be available on our website for later viewing. And so, because of that, if you're answering questions or asking questions during the panels, please make sure to turn your mics on and off at your seats.
And this is the first of many reminders today that you will hear about CLE credit. So if you have not signed in for CLE credit and you want CLE credit, now would be the time to do so. And also, bathrooms are down the hall and to the left after exiting the room. The Wi-Fi is Columbia University's just open network, and there's no password. And, finally, there are many staff and student volunteers available to help direct you if you have questions.
And so I'm excited to announce the schedule for today. We will first begin with a panel on the publishing industry. We'll have a quick break and then a panel on the image and motion picture industry. We'll then break for lunch before a third panel on the music industry, another break, and concluding with a panel on antitrust considerations and licensing. So without further ado, I would like to introduce Professor Shyam Balganesh. He's the co-faculty director of the Kernochan Center, and he will offer some introductory remarks. Thank you.
[APPLAUSE]
[PROFESSOR SHYAM BALGANESH] All right, good morning and welcome. On behalf of the Kernochan Center and Columbia Law School, before I turn it over to my colleague Jane Ginsburg, who's going to be moderating the first panel, I thought I would take a couple of minutes to tell you a little bit about how we landed on this topic, how it's part of something bigger that we are trying to do, and what, really, the goals are with the framing that we have for today's event.
So here's the story. For the last year or so, we at the Kernochan Center, have been engaged in a long project trying to understand the copyright implications of artificial intelligence, and machine learning models specifically, around the potential bases for liability. And very soon, we grew unconvinced that fair use was in any way or form a scalable or long-term solution. And we landed on the recognition that licensing in some way or form was going to be the way forward in the medium term and in the long term.
But here's the interesting thing that happened right after. As we began talking to different actors within the industry, within the academy, we encountered a somewhat surprising state of affairs, a somewhat instinctive-- one might even say "visceral"-- distrust of licensing. And as we probe deeper, it soon became clear to us that everyone who was talking about licensing had a different conception of what licensing meant. Options were collective, compulsory, extended collective, blanket, voluntary, managed, statutory. Everyone had a particular conception of what licensing meant and a visceral reaction to whether that was good or bad-- mostly bad.
And so every creative industry, in some sense, had a myopic conception of what licensing meant in order to formulate an answer to whether licensing was a scalable long-term solution for AI and machine learning.
So we thought that the answer to solving that problem was to begin by doing something more comprehensive and trying to get a handle on the different models and forms of licensing that we have and have been in existence, primarily in the United States but also beyond, for the last several decades, and to take stock of what has and what hasn't worked, where tweaks and significant modifications may be necessary, and if, as we believe, we need to design the ideal or optimal framework for artificial intelligence and machine learning.
So today's event is really that effort -- to try and be comprehensive and initiate a conversation about copyright licensing in its broadest sense. And each of our panels, as you will see, is built around a specific creative industry, where we bring in expertise from that industry around licensing to see how things have worked, to take stock of what tweaks might be necessary, whether licensing is, in fact, a solution in hindsight in some of these domains where it has existed, and to really talk about what we mean when we talk about licensing as a grander solution in terms of a menu or a variety of different options.
So that is our goal. And by 5:15 today, hopefully, you will have a sense of whether we have failed or succeeded. So I just wanted to stop there and give you a sense of what we are trying to do with this project, really initiate a comprehensive inter-industry conversation about licensing to document the different models and get a sense of what we might mix and match as we think about AI and ML in the context of licensing.
So without further ado, I will turn it over to my colleague on the faculty and the Kernochan Center faculty codirector, Professor Jane Ginsburg, who will be moderating this session. Thank you.
[APPLAUSE]
[PROFESSOR JANE GINSBURG] Thank you very much, Shyam, and thank you, all of you, for coming to this session. And we look forward to your views, also, as each of these panels has built in time for Q&A with the audience. You have everybody's bios. One change to the slide, Anita Huss-Ekerhult will be the first speaker, and she will start from a global perspective, and then we will turn the focus to the United States with our other speakers who will speak in the order in which they are seated. So you want to come up?
[ANITA HUSS-EKERHULT] Thank you so much, Jane. Thank you so much, Shyam. Thank you so much to all faculty members for this very kind invitation. I'm very, very honored and privileged to be here. I have prepared a little PowerPoint, and before I start doing that-- thank you. I just wanted to tell you a few words about IFRRO, the International Federation of Reproduction Rights Organizations that I represent, what we are doing, and who our members are, speak a little bit about how they start about going towards AI licensing, and speak about the key issues in this regard.
So IFRRO is based in Brussels, an international federation. We have 154 members at the moment in over 85 countries, and we represent over 2 million authors and publishers of text and image-based works. So that is publishers, whether trade publishers, scientific publishers, newspaper publishers, but also authors, including visual authors, all around the world. And we will have, for instance, Roy from CCC, our US member, speaking further about the US perspective. And they are actually-- they have a lot of exciting news already to share on AI and licensing. So I will defer to Roy in that regard.
More generally, IFRRO has been sharing information on AI, of course, to our members on a regular basis. There is so much happening around the world, but we focus very much on specific legal matters, especially generative AI because that's what our members affected mainly. And this goes through all our activities. I'm not going through all these slides, but I wanted to highlight that there is a lot of work being done.
And next week, we are going to present, actually, an AI advocacy toolkit. We have been working with Professor Eleonora Rosati on this. Perhaps some of you know here. She's at Stockholm University now. And it's a toolkit to help our members to really educate policy makers about the uses of AI. So when large language models have used those works, and those were copyright-protected works from our members, those were ingested into those AI systems. These copies have been stored, and they add, actually, perpetual benefits to these systems.
And this toolkit then looks at the different rights, focusing on the reproduction, of course, which most of our members are administering, and how it applies both to the input and output phase. It starts with a glossary, of course, and then a mapping of the main policy and legal developments all around the world. So it looks not only at the EU, US, but also what's happening in Japan, in Singapore, other areas, and distills these legal findings then into key talking points for our members.
We also look at the terms of service of generative AI providers because-- and looking at these clauses, actually, when you look closely, it's not that they indemnify any infringements. That's not at all the case. And, basically, what the main message is-- licensing is essential. Licensing is being offered by our members and relying solely on exceptions and limitations is certainly not enough.
There are different exceptions and limitations all around the world. In the US, it's fair use. And I'm sure we will speak a lot about that. In other areas in the EU, we just had the EU AI Act and the DSM directive. We have the text and data mining exception, which also exists in the UK, in Japan, in Singapore, and is being discussed to be introduced in Hong Kong. So there are different ways on how these exceptions and limitations work.
But our main point is that you cannot only rely on that, but that you need licensing, and that there is actually a big market for licensing, which has been shown throughout the world. So there are licensing activities on a direct level already happening. You have probably heard about the different activities. So, for instance, in Germany, but also in France, newspaper publishers are licensing, trade publishers are licensing AI users.
And, well, CCC was the first one; they announced it on mid-july. They are a collective licensing solution for content usage in internal AI systems. And this is actually an inclusion of AI reuse rights, which is included in the annual copyright license. It's the first-ever collective licensing solution for the internal use of copyright materials among our IFRRO membership. And a lot of members are actually looking at what's happening here, and they want to try a similar approach.
In the UK, our member copyright licensing agency, CLA, has been working with its member organizations, which are both authors and publishers, on a new UK text and data mining license for companies that allows them, then, to copy and use content for TDM purposes. So this will come later this year. It will exclude generative AI, though. And I will leave it at that. It will be announced more.
Other members have been surveying because they need to ask their members-- so I should say that our collective management organization or reproduction rights organizations, RROs, as they are called, they represent both authors and publishers, usually. And they are surveying, of course, their members. What do you want to do? What are your views on AI? What are your concerns?
And, for instance, ALCS in the UK had had a survey out to their members, which showed that they are actually very concerned about the uses of their works. They have not been asked, and they want to license those works. Similarly, DACS, which is our visual arts CMO members, also had a survey out to their members which showed that they strongly feel they should be compensated financially when their work is used to train artificial intelligence and that they would also sign up for a licensing mechanism to be paid when their work is used.
More visual arts members-- so this is Pictoright, our Dutch member. They went even further. So after consulting with their members, they are offering a collective opt-out system. So this is specifically related to the EU legal framework, Article 15 of the Copyright Act and Article 4 of the DSM directive, related to the text and data mining exception that I mentioned. And there is this opt-out provision. So Pictoright does that on a collective basis, ensuring the collective opt-out and helping their members to license those users subsequently.
Other members are also looking at the different options. And I know that Germany is currently consulting with their members-- Australia, New Zealand, and I think more will be announced soon. I just wanted to give you a brief overview of this, and I'm here for any questions and for any concerns that you have. So thank you so much for having me.
[APPLAUSE]
[GINSBURG] Thank you. And we will have questions at the end of the panel. We're going to turn to what Anita referenced as the market for licensing, particularly in the United States, although I can't resist an anecdote. I have some books with Cambridge University Press, and they sent me an email, requesting my permission to have the book included in the content that they are going to be licensing for AI purposes. And they said that authors should share in this as well.
So I would get a royalty of 20%. But it doesn't say 20% of what. So I am rather curious. Perhaps Roy will have an answer to that. I do hope that our panelists will talk not only about the industries, but also the authors who are encompassed within those industries. And the last thing I want to say is that Roy is a proud graduate of Columbia Law School, so thank you.
[ROY KAUFMAN] Thank you, Professor Ginsburg. For those students here, if 35 years from now, you find yourself on the other side thinking, "How on Earth did I get here?" you won't be the first.
So how do I-- oh, wrong clicker. Never mind. I'm not up yet. Thank you. It's definitely a slide here you've already seen before, but we'll get there. So at CCC, we believe responsible AI starts with licensing. So let's talk a little bit, very briefly, about my organization, Copyright Clearance Center, CCC. People who know us tend to think of us as the collective management licensing organization in the United States. And I'm here to talk to you about collective licensing. So that certainly sounds very self-reinforcing.
However, and I think that's important when we start talking about AI, just to understand that's one part of CCC's business. We have-- I lose track-- we have about 600-700 full-time employees. The vast majority of them are doing technology. I haven't done the math, but I realized, when I was thinking about this, there must be 5 to 10 technology experts at my company for every copyright expert, and it becomes really important with AI because I've got colleagues who have books about AI. I've got colleagues who build AI systems, who train AI systems, and help clients do all of those things.
So it's really helpful for me, when I have a question how does the technology work, particularly around both business development and government relations and the government relations context, we hear a lot of things that are simply not true. Yes, I can disagree with you on where fair use begins and ends. You cannot tell me copies aren't being made. That's just not a true statement.
So with that, again, this is sort of our slogan, and it's very important, particularly-- I'm working often, although not exclusively, and I'll get into that, in the corporate space with, really, your highest quality-- is this cutting-- with some of your highest-quality content creators-- scientific publishers who subject everything they publish to peer review, news publishers with very strong editorial rules. And that is the content that will create better AI. It is not-- I mean, yes, you can scrape, Jerry-- take your website of choice--
This is going in and out, so I'm just going to have to-- I'll switch to this. Sorry about that. I just don't like standing still.
This is the content that has the best information about vaccines, has been best vetted for bias, is less likely than your random Reddit feeds or your random Facebook feeds-- and I post on Reddit-- but the quality is iffy. It's inconsistent.
Pro copyright is pro-AI. You need high-quality material to train AI systems. And we know a lot of the AI systems have been trained by indiscriminate scraping of everything online. And now, as they're scraping everything online, they're getting AI-generated content. And there's a lot of research about something called model collapse. And if you go onto my LinkedIn, I've posted some of this. There was a great New York Times article-- sort of like the telephone game. The more you train something on something that was trained by the machine, the quicker it turns into gibberish.
I've already touched on this-- copying takes place. When you ingest content, you're making a copy. When you vectorize it, you're making a copy. A vector is just a stored copy. I was at something where someone on the other side-- well, we don't copy it. In fact, we just turn it into 0's and 1's. I'm like, all digital content is 0's and 1's. What are you talking about? So just keep this in mind.
Training will benefit that LLM forever. Once you've trained it, that benefit exists-- makes it tricky in terms of licensing, but let's at least be honest about what's happening. We believe, obviously, this stuff should be licensed. We believe there's direct voluntary collective licenses. There's some movement in Europe around extended collective licenses. As long as the license is well-created, fair, and people are compensated for their works, we're fine with that.
And there is no one legal standard. I'm so tired of hearing it's fair use. I'm like, what country are you in? We're in the UK. There is no fair use in the UK. There's fair dealing. Do you have an exception? No? It's not fair dealing. Can we move on now? AI Act is very clear that it takes in data mining exception-- it's very clear in the EU. Not all countries are the same, and not all countries follow US law.
You can tell my slides from the slides I got from my colleagues because they're uglier.
I talk about precursor-- I was having a brief conversation with some people before-- we have an AI license. I'm going to talk about that. We have AI licenses before we knew they were AI licenses. We've been licensing text and data mining of scientific articles, I think, for more than a decade-- way more than 11 million articles, billions of tokens, if you're an AI person. That's for training. That's for text and data mining.
Now there's a lot of debate now. Is text and data mining AI? Is it not? It's certainly not exactly the same. To me, AI is software that learns through example. That's what AI is. But text and data mining, according to the AI Act, is how AI is often trained. And then, according to something else that came out two weeks ago, that same people said, "Oh, this seems good-- text and data mining isn't how you train AI." So there's a lot of debate about that. But nonetheless, text and data mining can be used to train.
And I have a license for curriculum and instruction that I launched with a colleague in the business development department in the middle of the pandemic, and no one was talking about AI at the time, but people were saying, I need to generate questions. I need to generate passages. I need to do all of these things. So we got the rights to generate.
Now that, to me, is completely what everyone thinks of as AI today. Generative AI-- that's the whole point. That's the generate-- but it doesn't say AI. I've got lots of publishers. I've got millions of works in that license. Is it AI? I think so, but it doesn't call AI.
So Ann [INAUDIBLE] is here from the USPTO. Hey, Ann. So a couple of weeks ago, a number of us were at a PTO roundtable event, and the question was, if licensing is required, what will those licenses look like? Well, to me-- and I answered, "Well, exactly like they look today," because, again, I do business development. We've got a voluntary collective license, which I'll talk about on the next slide, and Anita has already touched on. I do transactional licensing. I have an LLM call me, and they say I'm looking for long-form fiction. Can you help me find it? Can you help me get a license to it? I'm looking for-- even though we're a text CMO-- I'm looking for video. I'm looking for high-res video.
And we help them find it because we have clients and colleagues who know how to access that stuff. And that's transactional. And also coming historically from scientific publishing, as I do-- I used to work at Wiley. There's a lot of open science, what's called open access-- it's way off-topic here. But a lot of content is openly licensed for reuse, including all uses under a CC-BY or sometimes even a CC0 license.
So we'll probably have to get to this in the questions. Now, again, AI isn't one thing. Remember, it's software that learns by example. So when you're creating a license, just like any other license, you have to know what's your use case. What are you trying to do with the content? Not how are you-- not are you using a machine to make a copy. That's not useful.
I've got two minutes.
So we started-- and this will not be the end. I've already talked to you about other things we're doing. We started with our corporate clients. That's our large-- we sell our license for internal corporate reuse to almost all large corporations you can think of on a global basis. And they said to us, "We have our people are using AI. We don't know what they're doing. Give us a license, please." So we go to our when we add rights, we have to go, we're fully voluntary. I have to go to every rights holder and say, "Can I have this right?" And if they say yes, then we go to the corporations and say, "You can have this."
Been really interesting in the rollout. So all of our corporate clients are getting this. We don't think that these uses are going to be any more distinguishable. This is the evolution of how content is used. We went through this in the '80s because we were a photocopy shop, and then, all of a sudden, digital uses happened. To us, you go from digital use to AI. It's just how content is being used. All of our clients will be getting these rights.
So I probably have less than a minute left. We have a website, and there's information here. We have tons and tons of information about copyright law, about AI. We have a very big webinar coming up. One of my general counsel just coauthored, with some academics and others, a piece on copying and how copies are made. It includes our CTO, so it's both technology-- Daniel Gervais, if you know him, he's an author on this. There's one other-- Noam Shemtov. I think that's come out already, if not any minute.
So there's a lot of stuff. Follow our page. Follow us on LinkedIn. We're always posting things about this. And I look forward to your questions, and I'll step down. Thank you very much.
[APPLAUSE]
[MATT STRATTON] Good morning, everybody. I am Matt Stratton, deputy general counsel for the Association of American Publishers. AAP is the national trade association for the US publishing industry. I should preface by saying that the views I express today are my personal capacity and not on behalf of AAP or any of its members.
The US publishing industry creates works that make vital contributions to the nation's political, intellectual, and cultural systems. They present novel ideas and new facts unearthed by authors, hold governments, businesses, and citizens accountable, contribute to a vibrant culture, educate and inspire citizens of all ages, and progress medicine and science.
They are also a source of truth at a time when there is rampant misinformation spread through the internet, social networks, and other media, a harm which generative AI only risks accelerating.
Remuneration for publishers is the result of licensing based on exclusive rights under the Copyright Act. There is nothing exceptional about the development and commercial exploitation of generative AI models by tech companies to change that basic premise. There is no question that books are the foundation of LLMs. The developers sought the expressive content of these works, which they have stored in the models by the encoding of tokens into vectors. And that training violates publishers exclusive rights.
Furthermore, there is no public policy reason to deviate from the basic operation of copyright law, which requires downstream users to license what they do not own but need for their uses. The current legal framework can accommodate and support the continued development of AI.
But let's step back for a moment and consider why licensing deals are not yet ubiquitous. It was only two years ago that generative AI technologies became a public phenomenon. The tech companies made a calculated decision to not license content. They wanted to get to market first and accepted that after the product was out there, there would be the prospect of lawsuits and licenses. For an internet-based service with a worldwide market, being first or having an early competitive advantage could separate the billion dollar winners from the also-rans.
Winning is all that matters. And a trail of lawsuits and the potential for reform is mere ancillary cleanup work. Move fast and break things is nothing new. Claims to the contrary, such as the one made by a genAI developer in his comments to the US Copyright Office that had acted on the settled expectations of fair use is nothing but disingenuous, self-serving, ex-post rhetoric.
Since 2022, creators and copyright owners have become aware of the infringement, resulting in over 30 lawsuits and also licensing deals, of which more than 30 have been publicly reported. There would be many more genuine arms-length negotiated deals if court rulings rejected the tech companies' fair use defense. Tech companies have effectively obstructed licensing deals by denying that they are legally required. It is disingenuous for these companies to plead the impracticality of licensing when they are not negotiating in earnest.
Tech companies arguments against the practicality of licensing are not credible as related to the type of content published by AAP's members. First, the bemoaning of cost is unpersuasive. Generative AI developers count among their investors some of the largest and most profitable technology companies in the world.
For example, The New York Times reported earlier this month that OpenAI is aiming for $150 billion valuation in its latest venture funding round. They can clearly afford to pay reasonable license fees to the copyright owners whose works are the very building blocks of generative AI and whose livelihoods are threatened by the same system.
Second, the quantity of the content is also not prohibitive. It has been reported, for example, that the Books3 pirate training set, with 190,000 books, was used by large generative AI developers to train their models. Compare that to a legitimate license. A single publisher may have 100,000-plus digital titles in its catalog.
Also, to give a sense of scale, AAP has approximately 130 members across its trade, education, and professional scholarly publishing sectors. For training purposes within any given sector, negotiating licenses with some percentage of that subset is quite feasible.
Whereas copyrighted works are as integral to generative AI technologies as they are to services like Netflix and Spotify, both of which licensed the works made available through their platforms, generative AI developers are capable of doing the same, and publishers are willing to work with them to effect such licensing.
Next, I'd like to turn to some high-level observations about generative AI licensing options. For publishers, licensing is about marketplace innovation and competition. The drafters of the Copyright Act set forth the exclusive rights in broad terms to ensure that copyright protection would encompass not only the breadth of technological uses known at the time of enactment but also future technological uses.
Publishers have demonstrated time and time again their ability to adapt their licensing models to the technologies of the day, including the emergence of the internet and handheld devices. For years, publishers have innovated by making books available in e-book and/or digital audio formats for lending for a subscription term or for purchase. Professional and scholarly publishers have a multitude of licensing options for their journal databases, including for text and data mining purposes.
And with the latest tech breakthrough, generative AI, authors and publishers should remain free to exercise their exclusive rights to determine how and in what ways their works are to be used and by whom, and to not exercise them if the circumstances or terms do not merit a deal.
Publishers welcome all voluntary marketplace developments. Marketplace licensing solutions are the superior tool for facilitating development of AI systems while respecting and protecting the exclusive rights of authors and publishers. There is room for direct licensing and voluntary collective licensing. It is ultimately up to the rights holder and the user.
In some publishing sectors, mainly professional and scholarly, collective licensing has worked well for publishers for certain specific use cases, which you've already heard about earlier today. But for most publishing houses, their experience and preference is for direct licensing. I expect that, in the majority of circumstances, publishers will prefer direct licensing with I developers. AAP has no role or visibility into the private contracts of its members.
But publicly-available sources have reported that publishers, including our members, have concluded licensing deals. Like markets before it, the market for generative AI is happening and developing. There are also startups, such as Calliope, created by humans and human-native AI, that are seeking to facilitate licenses, particularly for smaller AI developers. Whether through one of these models or a different voluntary licensing model, publishers will innovate and compete to serve these users, too.
Direct deals benefit all stakeholders. First-rights holders, AI developers, and their customers can have certainty with respect to rights and obligations. Second, market rate licensing fees best promote investment by authors and publishers in new human-created works that ultimately benefit the public. This public benefit includes protection against some of the potential ills of generative AI systems, including misinformation and bias.
Third, since generative AI systems will require high-quality new publications to remain state-of-the-art, a flourishing publishing industry will in turn increase the value of future generative AI systems.
Fourth, publications licensed from authorized sources are more reliable. For example, in the case of professional and scholarly journal articles, it is important to use the version of record with appropriate licensing. Earlier versions or pirated versions may be subject to post-publication modification or attraction, which could create serious and cascading scientific or medical errors in AI-generated outputs.
Finally, given that AI technologies are being and will be used in ways that impact the lives and well-being of individuals, whether financially, physically, mentally, or professionally, it is critically important that the highest quality material is used to create current and future training sets.
Publishers strongly oppose the introduction of a compulsory licensing regime to redress unauthorized uses of publishers works to train generative AI systems. Fundamentally, authors and publishers should remain free to exercise their exclusive rights to determine how and what ways their works are to be used and by whom.
In addition, there has been no market failure with respect to publishers' works that would necessitate such an extreme regime. Generative AI developers willfully decided to get to market without licensing, and they willfully persist in asserting fair use defenses, which together is delaying the ubiquity of licenses. Obstinance is not market failure.
To sum up, publishers welcome market based licensing solutions for facilitating development of AI systems because they benefit all stakeholders. They are undoubtedly feasible and they will help ensure that publishers and authors may continue their vital contributions to the nation's political, intellectual, and cultural systems benefiting society as a whole. Thank you.
[APPLAUSE]
[GINSBURG] I imagine that many of the people here already know what compulsory licensing is, voluntary collective licensing, direct licensing. But we will actually have-- our last panel will be talking about compulsory licensing, rate setting, voluntary collective, extended collective licensing, antitrust issues. So stay tuned.
The last speaker on this panel is Regan Smith, and her association represents a number of the rightholders who have brought some of those 30 lawsuits. And I imagine we'll hear something about that.
[REGAN SMITH] Thank you all for having me here today. As Jane mentioned, and thank you, right now, I am working at the News/Media Alliance, representing over 2,200 newspaper, magazine, and digital website publishers, ranging from large global titles like New York Times to snowboarder.com and a lot of small businesses that you may or may not read, but that have been part of the fabric of our internet.
And, today, I want to talk about how AI licensing for text might work, but also drawing on some of the other ways we've seen mass licensing or licensing regulation arise or not arise in other contexts. So I was lucky enough to be able to be in the US government when the newest collective, the Mechanical Licensing Collective, was set up.
And I heard from collective management organizations all over the world about and rights holders and users about considerations and setting that up, and also was able to engage in other markets via Spotify, looking at both direct and collective regionalised licensing in the music market.
Let me make sure I've got this.
So I think maybe we have some agreement here that part of what is going on right now is there's not really alignment on what needs to be licensed. So not talking about fair use or the subject of these litigations. That's what, I think, is the point we're at in time and where the real sticking point is.
We have a new technology where developers don't want to be locked in, rights holders don't want to agree to a structure that, 10 years from now, they're going to look back and regret. And there's not alignment on the overall subset of uses to be included or excluded in a licensing conversation.
I think my overall thesis is that licensing, right now, for this nascent technology, should be part of market experimentation. And we want to permit experimentation and innovation in deal structures as well as these uses because that will end up sort of lifting all boats. It's not necessarily Earth-shattering, but I think we need to keep in mind where we are today in 2024 is a particular moment in time.
So here's an example of what I mean. These are two examples of ChatGPT. So left is from OpenAI's website, and they say, this is great. You can use Oscar. Oscar will bring AI to health insurance, reducing costs and improving patient care. And at the bottom is a blurb from OpenAI's website. That's one particular use of content that has been trained and deployed using news media material and other publishing material.
On the right, you have an answer from Copilot. When I put this together, they had just sort of broken the news story about the North Carolina governor's race. And you see, Copilot is telling me everything I might need to know if I'm just doing a quick check on my phone. There's not a need to click through to the underlying article. There's not attribution.
And so you're going to stay in the Microsoft environment, which is built on OpenAI's technology, to get your answer to that rather than a journalist reporting. There's not alignment on how a deal structure would handle accommodate all of those different uses or not different uses, leaving aside what should be in or how or remuneration for these different types of pieces.
Here's another example from Meta. This week, at Meta Connect, Mark Zuckerberg said, well, if push comes to shove, if publishers and content creators don't want to use their content, we just won't use it. It's not like it's going to change the outcome that much. And coming from the news side, Meta has been pretty vocal about the desire that they can just get out of the news business. That's not why people go to Facebook or Instagram.
And then, on the right, you see an example of Meta AI when it was deployed. And this is an article from The Boston Globe-- The Boston Globe is entirely paywalled. They don't give a lot of free access to this-- talking about an article, and then says, "Do you want the full text?" Why don't I give it to you unprompted? So even within a large company, there's a volatility for uses and experimentation, and we would want to see that taken into account in licensing.
So certain licensing approaches-- let's step back. Marketplace licensing is the norm for the ways digital content gets used. So on the left, I've given examples of different types of aggregated or collective licensing, and I haven't put the ones that you might picture the most, such as ASCAP and BMI, but I haven't listed SoundExchange, for example.
I want to take a broader view of what is a collective or a group licensing. So you might have a joint venture that exists just for a particular purpose. You might have something like Overdrive, that's looking at library licenses; the Merlin Network, which services a lot of independent music being put onto streaming services; and in news, there's a lot of syndication and working together to exchange news stories from local to regional to global audiences.
In the emerging market, we have seen over 40 deals announced, and a lot of that is still encouraging. So I think, right now, we're at a point where we see a development, and we believe that this will take place and expand into aggregate licensing as well. But we're also see a number of startups that aren't maybe what you think of as traditional CMOs. And so Matt mentioned a few of those. We're looking at those to make sure that the right structure gets set up as well.
Another thing we need to consider is competition law considerations. In the US, we have established precedent, sort of finding a procompetitive benefit of aggregation to promote efficiency and reduce transaction costs. In the EU and the Copyright Directive, we also have Articles 18 and 19, so an obligation for appropriate and proportional remuneration with a corresponding transparency obligation as to how that deal is working. And so Meta, for example, was brought before an Italian court for potential violations of that a couple of years ago by music artists. But that's a way of looking at the information asymmetry that is now emerging in our digital marketplace.
When we take that to AI, I think that information asymmetry is really a key thing to think about. As others have mentioned, AI is already being deployed, kind of went to market, and we're now in the cleanup phase of this. But these are very large actors who are not waiting for a holdout or must-have content. As the quote from Mr. Zuckerberg said, we don't need any one particular piece of content. It's also an information asymmetry asymmetry, where it's difficult to figure out where content has been used or what output you see or I see in order to value a transaction.
A word to start on compulsory licensing-- I think we're a very long way from showing any sort of a need, but these are different quotes from the Copyright Office, stemming back to 2004. This is Registrar Peters, where she very clearly says, "A compulsory license is a last resort mechanism. It limits an author's bargaining power." And it is very rare internationally as well as in the United States. It's something only in exceptional cases when the marketplace has not worked.
And what happens when we have it-- sometimes, it can work well to shore up a situation like this. But a statutory license is sort of freezing in time, through a legislative language, what the uses are. So that's restricting the ways a technology might deploy or use maybe. And it's also encouraging a certain type of content to be created to fit under the license to be monetized.
I think there may be other panels that will touch upon this, but there are issues with long-standing rate setting or tariff processes, the shadow of the government rate affecting the rate that can be set, the time frame to which payouts go to rights holders or to authors who may have contracted with rights holders if they are not the rights holder themselves, and a high prevalence of litigation.
Extended collective licensing-- again, taking as a starting point. I understand we'll talk about it more. The copyright directive in the EU says-- Article 12-- this is something to be considered when it's typically onerous and impractical to obtain individual licenses. We haven't necessarily seen that happen in AI because we're at the very early stages. And then it goes through and provides a list of criteria to define the ACL if you're going to have it. So it needs to be sufficiently representative of rights holders in a member state or in a particular use.
And, in the case of AI, we need to figure out how would we define that group of rights holders in the first place. Is it my members? Is it my members plus Matt's members? There's a broad range of text uses, and it's not-- that would be a starting point before you would draw a circle and establish a particular use.
So next, I want to turn to collective management watchouts in general. NMA is very supportive of collective management in a voluntary capacity. We work with CCC. Some of our members do. And we work with a lot of other collectives around the world. But when we think of collective management in a broad lens of people coming together and some of these new startups, there's a lot of questions as to whether they're ones that news publishers or others should affiliate with because we want to get the structure right for AI.
So thinking about what is the level of the administration? What is the transparency offered to the membership or to the general public? What is the allocation methodology? AI and digital publishing is sort of born digital now. There's no reason to recreate a nationalization or a membership-based distribution that we see in things like radio. And we don't have to deal with legacy problems that have straddled some, for example, music uses when they're coming out of an analog or a nondigital place.
So just two examples on this, because I think we're running out of time, is taking a look at the administration fee. Some of the AI, I guess, licensing agents that we've seen in the market have raised fees of 30% to 50% to do what? We need to be very careful that we set up a structure that is encouraging the balance of both the technological uses we want to see, as well as continuing the purpose of copyright, incentivizing new production, and ultimately getting payment going to authors.
So some of the lowest administrative fees would be the MLC or SoundExchange-- well under 10%. In the text licensing marketplace, at least for the digital publishing I'm focusing on, it's not clear why it should be any higher.
Additionally, transparency rules-- CISAC, which is sort of an umbrella organization for lots and lots of CMOs, has some professional rules for this, saying we recommend you publish an annual report. You might summarize your income. You explain what your deductions are for. EU Article 21 has a similar list of things saying, what is your general policy in distribution? What is your policy on disputes? What is your use of deductions for other services?
And I think we would like to see more transparency across all of these AI group licensing agents, which we well agree makes sense to pursue in this age of AI, because it's important to know what we're getting into. And, I mean, I think CCC as an example-- we've had those conversations as well, and we would like to see some of that license provided to our membership more than just a rollup because we're more interested in the particular licensing for AI than maybe some of the other services.
Similar rules apply to distribution or representation. Some of these newer vehicles are both trying to service end user plus a rights holder perspective. And so we really want to be clear about, when we add an intermediary, who are they representing, and what is the value they're adding to the table, while we look forward to voluntary collective management as well as direct licenses and figure out how to make AI work in a way that is beneficial for both writers, authors, publishers, as well as technologists. Thank you.
[APPLAUSE]
[GINSBURG] Thank you very much. Before we turn to questions from you, a couple of questions for the panel. First of all, I'll go back to an earlier request. I didn't hear that much about authors. So in this licensing universe, do you see authors with a choice to authorize or not authorize the inclusion of their works in AI training data? And then, of course, the money-- how are authors going to get paid for this? So I don't know if we want to go down the table with a response to that question?
[SIDE CONVERSATION]
[HUSS-EKERHULT] Thank you. Well, I started speaking about authors because in our membership, the CMOs, as I said, they represent authors and publishers. They are also our members. And they are consulting them because this is a really important development. So I gave the example from Germany, for instance. It takes a long time because they are changing the mandate that they are offering the CMO to the authors and publishers, and they are giving them sufficient time to get back to them with feedback. So it takes several months to develop that new license, and it is taking place.
And I mentioned surveys as well-- so, for instance, our UK members, but also other members, have sent out surveys. They are looking how their members are addressing these concerns. And, indeed, what is the result of these surveys is that worldwide, actually, the authors have not been consulted on the works that have been ingested in large language models. And remuneration, of course, is something that needs to be discussed with them. So definitely, our members are including authors as well, yes.
I should perhaps also mention the Nordic countries. I know Johan will speak a lot about the extended collective license and how it functions later, but there are also many interesting developments in the Nordic countries. So, for instance, in Norway, there is a big project with the National Library and authors, publishers, and the CMOs together have a project and are looking into Norwegian language large language models actually. Also, others are being developed because it's not just English language large language models, but also in other parts of Europe, especially.
And the result of that is also authors want remuneration. Authors, of course, need to authorize the works and remuneration needs to be discussed and ensured if that's the case. Yeah.
Thank you.
[GINSBURG] And Johan Axhamn from the University of Lund in Sweden will be talking about the Nordic countries and extended collective licensing on the last panel.
[KAUFMAN] Sure. There we go. A bunch of different points-- first of all, it's got to go by the contract. So I'm aware-- I'd actually heard that Cambridge was writing to all their authors. My contract with Oxford says they have the rights to do exploitation all media now known or here. And then they have to pay me a royalty.
I won't be offended if they ask my consent to put me in an deal, but I also don't think they're required to. But if they do, they're required to pay me the royalty specified in the contract. And I've spoken to some of the publishers who have announced these large deals, and they said, of course I checked that we have the rights. And we're going to pay royalties.
Now if anyone has ever worked in publishing, between announcing a deal, getting the payment, allocating the royalties, and paying them, there's a time gap. So I wouldn't assume the worst. When I talk to people, I assume that they're going to follow the contracts.
No authors should have their materials used in AI without their consent. The Books3 data set, which is just a pirated collection, that's just-- it was appalling and, as Matt said, not even necessary. Plenty of publishers have that many books. You could do one license.
One of the challenges, though, with authors is scale because the AI companies-- and Regan touched on this-- I don't need that one piece of content. Particularly when they're building an LLM model, they want long-form fiction, but which long-form fiction is-- I hate to say this, but I've had this conversation with publishers-- to some degree, the content is a commodity. It's how many words, as long as they're structured coherently, that's what they're looking for.
And so author-- and this is something I've been struggling with. I have one LLM that approached us and said, I want long-form fiction, world-building. So I'm working with an organization that works with lots of authors, getting their consent to see if there is a deal to happen. That's that specific use case. That's not the license I announced, where there isn't a lot of demand for long-form fiction in a corporate license.
So you have to look at the use. But I do worry, and we are trying different things at CCC-- can we automate this? Can we aggregate it? How do you bring together enough authors so it's worth it? But any one book is not worth that much in the going rates to LLMs right now. Whether it's $100 or $1,000, that's not a lot of money for an LLM to reach out for an author. That's not a lot of money for an author to negotiate.
So there's got to be a way to do that better. And it's things that we are working on, but nothing I can tell you what we're doing exactly today.
[GINSBURG] That brings me back to the question of 20% of what because the contract may be 20% of the list price minus returns. But that is-- the reference point is going to be completely different for AI training. So 20% of what?
[KAUFMAN] You've got to ask your publisher that I don't know. They're the ones who said it.
[SMITH] I guess to speak for a second on authors, I think this is the big question in copyright. Is it on? OK, I think this is a big question in copyright is how you can make sure that the return goes to the original human creator and lets you incentivize that continued production. So right now, in the news, a lot of our members are unionized, and they're having conversations via contract law, via employment because there are employees who are journalists looking at what this means for AI licenses.
So Ziff Davis, for example-- it was publicly reported-- renegotiated what would happen in a newsroom for licensing for AI. And that's one way you can take stock of authors in that case is through those conversations.
I think we also should think through what is the alternative if we're putting this into a collective managed administration that either sits on top of a publisher relationship or side by side to it. And that I don't know the answer necessarily for AI licensing for text.
But I would say that we've seen some experiments, for example, neighboring rights for streaming, where the collectives, in that case, are not representative of all of the artists or the original creators, and they're not seeing the payouts go through. Instead, there's a lot of sticky membership rules, a lot of minimum thresholds like, to your point, Roy, of is any individual piece not worth a lot. If you don't hit a minimum level of use, you get nothing. You're not even getting the 20%. And 20% of what is a very good question.
And I think, similarly, I would be really interested to see, in the Nordics, if ECL has resulted in payouts to the longer tail of creators or not, because I don't know what the answer is. And then I think the last thing I would say is for time of distribution, I think that's also important when we're considering payments to individual authors. I know that, in the US, with the MLC, by regulation, we said payout needs to be in 75 days, which seemed extremely fast compared to a lot of how collectives work. But they are actually able to do that and have now paid out billions of dollars.
[STRATTON] And I can just add quickly, for us, it's primarily a question of the contract between the author and the publisher. And as a trade organization, we don't have visibility into our members' contracts. But I think, in addition, the relationship between publishers and authors is something that the publishers really value highly, and that could be a factor that comes into play in terms of approach.
[GINSBURG] OK, let me ask the panel another question, which is discussed-- or some of you discussed negotiations with the AI developers. And how does the posture of many of these AI developers, as already having your content in their training data, affect your ability to negotiate?
[SMITH] I think I've heard from individual members, and we also are looking to what NMA can do to facilitate a collective voluntary solution, and there's a real difficulty in starting at a fair level point because we don't have the information as to how many copies have made and for what, whether it is-- what's in the certain data sets, the finetuning, the uses for grounding or retrieval augmented generation.
And so you're sort of chasing the train after it's left the building, which makes it difficult as well as not being in a must-have position, we've seen, "OK, I'll talk to you if it gets a little tricky" sometimes. And it might be sort of a bandwidth on partnership teams, but there's not a-- I mean, I think, Matt, you might have said that they're not really-- sometimes, they're not necessarily answering the phone calls. And so that makes it difficult to really power through on certain negotiations. But I do think that the market is clearly emerging, and that this is a bump that will be overcome.
[STRATTON] Yeah, I largely agree with Regan. Licenses are happening. We'll see more of them. But they're not at the level they ought to be at as a result of these litigations and claims of fair use. They're not coming to the table to negotiate.
[KAUFMAN] Yeah, one of the things I'm seeing a lot-- and yeah, I agree with everything you guys said-- and it gets to that point of, yes, they shouldn't have trained on the Books3 data set, but because they did now, the value of that next training piece is just less. Until they get sued and have liability, it's very hard to negotiate the value.
One of the things, though-- my cause of optimism here is that. Everyone who's trained on the internet is trained on the internet. Now it's less safe to train on the internet, and with the AI Act restricting, I think, effectively, certain crawling and other behaviors, they need access to the content. A lot of times, they won't take a license. They'll take an access because they don't want to have a license to you to be used against them as evidence as a defendant. But they'll say, but I'm paying for access. It's a license-nonlicense. So just sort of very weird behaviors, but I do think there are increase there is increasing evidence of licensing, obviously.
Our license was actually cited in one of the more recent cases. There's evidence of licensing that will impact not just the litigations, but the need to license going forward and, hopefully, increase the value.
[HUSS-EKERHULT] Thank you. If I may add, so in Europe, you mentioned already the AI Act. There will be this obligation for users to have a sufficiently detailed summary of the content used, as it's called. This is in Article 53(1)d of the AI Act and in Recital 107. We will need to see what exactly the result will be.
But the key is transparency. So Regan has also mentioned transparency from the CMO perspective. And, of course, we have our code of conduct. But what about the transparency from the user's perspective? And I think, there, it's also important to highlight that WIPO, the World Intellectual Property Organization, has a good practice toolkit for CMOs. And there is a specific chapter not only on transparency by the CMOs, but also transparency by users. So when they are using content that they are also bound by transparency rules and have to disclose that information, of course.
[GINSBURG] Yeah, the references to the AI Act are the recent EU regulation on AI, which covers a lot more than just copyright. But one of the very interesting aspects with respect to the transparency obligations is what one of my colleagues here at Columbia calls the Brussels effect-- that is, that the regulation, although supposedly territorially limited to the EU, is in this case explicitly not territorially limited to the EU because any non-EU business that makes these AI platforms available to users in the EU is covered by these transparency obligations.
This is not very popular over here, but it's quite an interesting development to see how that plays out. It may not be entirely coincidental that a whole bunch of technology companies sent a letter to the commission saying how the commission is overregulating and is stifling innovation in the EU. So I think the battle lines are being drawn.
Now do any of the panelists have questions for each other? Then, in that case, I think we will open it up to anybody. And so if I don't say who you are, then please do say who you are and what your affiliation, if any, is. But I'll start with Professor Balganesh.
[BALGANESH] Great. Let me start by thanking all of you for your comments. So this is a question that I guess picks up on a couple of threads that Roy talked about and Regan talked about. And one of the things that I couldn't but happen to notice is that when we're talking about text and the publishing industry, we're talking about variations in the nature of content.
And especially when it comes to Regan, we're talking about news, where a large component of what may be, let's use the word "pilfered" or mined by the AI, may not be explicitly copyrightable content-- purely factual content, for example. So let's say if there's a generative AI model that comes in and takes purely factual content, if Copilot were to do that as opposed to the expression "underlying" it, if that were the case, wouldn't there be a stronger impetus to use licensing as a mechanism of controlling it, given that you would be technically outside the copyright regime in terms of negotiating an individual license?
[SMITH] Let's scrutinize that premise because we have, for example, the Hachette decision of a couple of weeks ago, saying that nonfiction content, when it's expression, is still at the core of copyright under the third factor for fair use. And I think our tradition of judicial doctrine shows that much news content is fully protected by copyright. So AP versus Meltwater would be an example or should be an example. So it's true that facts belong to no one, and that needs to be for everyone.
But I think in the case of the AI copying that we're seeing, it is a memorization of the written expression. It is taken for the particular aspects of writing that often make it protectable by copyright. And so if someone were to really try to say, hey, but not this, and back some of it out, that might be an interesting conversation. But we're not even at all like that. Instead, it's just well, it's the news-- of course, maybe there's no copyrights. And that's not our law whatsoever.
But I do think, as Roy said, we're seeing a trend to willingness to license for access and skirt around whether they are licensing for the copyrightable content. That's not dissimilar to some of the ways, I think, some of the very large platforms, social media or search engines, they may pay you for a certain piece of thing because they don't want to admit that they're also immunizing some of the copying of expressive content that they do. And you could get creative in any sort of arrangement to make sure that all of the uses and value that's transferred from one side to the other is captured in this. But I don't think, as a starting point, it makes licensing more likely just because the work is factual or nonfiction.
[GINSBURG] I think, also, that Professor Balganesh's question highlights the what are we licensing issue. Are we licensing on the way in, when it's the entire content, which will necessarily capture the expression? And/or are we licensing on the way out, when perhaps the output has siphoned the factual content or the noncopyrightable content off of the initial source material? Hmm?
[BALGANESH] The mining.
[GINSBURG] Yeah.
[KAUFMAN] Yeah, so thank you, Professor Ginsburg, because I do think, to begin with, there's a full copy, and it is the expression. And I think where you allow the facts to come out-- there's something called a RAG model, which I think of as enhanced search, where you actually have the facts coming out.
I do just want to call attention to the forgotten AI case, I think, is Thomson-ROSS, which if anyone's following it, I'm still waiting. We're going to get summary judgment any day because the court was supposed to have oral arguments late August, and I'll tell you what the facts are, but it was supposed to have oral arguments and said, "I don't need oral arguments. Everyone give me crossmotions for summary judgment."
And that's a case where someone, frankly, through dishonest means, downloaded all of Westlaw and not for the purpose-- for AI purposes to generate what, as I understand the facts, exactly what you're talking about, Shyam the idea that you'd get the answer, the facts, but not the expression, not the case notes, but everything was copied.
So we're going to have some decisions on that before we have decisions on anything else of substance, which is why I'm nervous about making a prediction. But it gets to-- as someone who creates licenses, every license has to look at what the facts are.
So I'll give you an example. I was working with a publisher who, probably, most of you have never heard of because they're not someone-- you don't know their brand, but they own a lot of imprints, and they probably have 50,000 books, and they were negotiating with an LLM I put them in touch with.
And their number one thing is "I just don't want competing books." And the LLM is like, "I don't care. I just want to train my language model. I'm not looking for the facts in your book. And I'm not looking for being able to make competing books because none of our customers are going to want to make your types of books." So that's one use case. And that's one license. And those terms would then be in that license between those two parties.
If someone else is trying to create something that answers the questions-- and I think news is really at risk here because a lot of why you read news is for the news, not for the expression, not for the entertainment. So I think news models-- first of all, a lot of the reasons the companies want the news is for the facts. And that needs to be in the license, and that needs to be covered for, and that's what the parties are going to negotiate and pay for. And there are different value props behind that.
[SMITH] I guess I would say in the case of AI, a lot of what we're seeing is the value is coming and being extracted through making copies of copyrightable content. So, of course, you can everyone should and does take facts, but it almost reminds me of like the Texaco case, where, of course, Texaco as a company could use the scientific learnings. But the court said, still, you're getting this through making reproductions and looking at the copyrighted work. And that is a licensing market that we can consider in a copyright context.
And then I think, separately, we've discussed the news industry is also thinking about if there's unfair competition elements that are also ongoing, but that we'll save for another time.
[KAUFMAN] Yeah, just to build on that-- and I know we should stop talking about and get to the next question-- but everyone says, well, how is text and data mining different from a human learning? And I'm like, text and data mining is making many, many copies of expressive content. How is this the same? You can make 100% reproduction.
So it very much is the facts. But you start with the copies. And this is why the other side keeps saying, 'We don't make copies." And then you say, 'What do you mean you don't make copies?" I say, "Well, we only make copies up until this point, but then we stop making copies. And then that's not a copy because we vectorize them." I'm like, "No, it's still a copy!" So yeah, sorry-- high horse.
[HUSS-EKERHULT] Sorry, perhaps we should also look outside the US. And I'm thinking now of the broadest exception on TDM, Japan. They look at the enjoyment. And even there, they say that the output is not covered by the exception. So that's also clear. It has never been contested in court, though. Thank you.
[GINSBURG] Makena?
[MAKENA JOY BINKER COSEN] Hi, thank you so much--
[GINSBURG] Makena is a third-year student at Columbia and one of the editors of the Journal of Law & the Arts.
[BINKER COSEN] So Regan, at the end of your presentation, you were showing us things that distinguish or, I guess, the intermediaries between the authors of works and the AI companies you were talking about speed to payout, data management, allocation methods.
If we really are moving towards an experimental system instead of one source of collective licensing, one of the questions I have is from two sides: What will distinguish these intermediaries from each other in the marketplace, both from the perspective of authors choosing who to choose as their intermediary and from AI companies -- Roy, you talked a lot about use itself, and how they might make the choice depending on use. And I'm sure that they'll also make a choice depending on who has the most works. But if we really are moving towards a marketplace of intermediaries, how do you see unique value propositions between them?
[SMITH] Yeah, I'll start. And I guess I was taking some collective management principles and also applying it to a lot of the businesses we see trying to emerge to meet the demand in the marketplace. So definitely, also some of these companies which are not members of IFRRO, but are just like being born yesterday, and maybe one day will mature and become to that sense because I don't know that they have necessarily thought through all of these aspects.
But I think a lot of those guidelines are important for both authors and rights holders. And whether you are adding a layer or whether you are replacing-- what is it working on that side, it's are you going to know what is being licensed? I know some of our members have been chasing me and I think I pinged some of your colleagues, Roy, to get a copy of the CCC license because once you license for training once, it's going to step on your abilities to license later.
And what are the restrictions on sublicense? So transparency, distribution, remuneration, how digitally and technologically are you employed to handle the financial payments and distribution of content if that's required. And your deal negotiation, I think, are all really important on the creator side of things or the rights administration side of things.
And on the licensee side of things, you're right. Size of the repertoire, scale, price, all of that is going to be important. And I think because AI is global and has such a massive appetite for text-- we know it's a lot, although no one thing is determinative-- there will probably be a variety of solutions that that eventually starts to become less volatile and more known. But it makes sense to encourage these aggregated solutions as well as some of the deals that are being reported in the press by very larger actors to see what can take on and let the businesses on all sides figure out what works out for them.
[KAUFMAN] It's any other business decision. So will this company make more sales for me? Will they charge me more or less? Will I get paid faster or slower? All of our licenses at CCC are fully voluntary and fully nonexclusive. So someone might say, "I'll do better on my own." "No, I'll do better with CCC." Or, "I'll do better on my own with CCC because, on my own, I can offer something a little bit more."
So it's basic fundamental business decisions as to what you think the reach will be, whether you have the bandwidth to do it on your own. For an individual author, the answer is almost always going to be no, but for a large publisher-- and then what is the "it?" Because we have publishers in our repertory who have given us those corporate reuse rights, but they're doing direct deals with LLMs for access. So it's basic business decisions and seeing which of these things bloom and, certainly, from a CCC but, even more importantly, from a government relations perspective, I want to see everything. I want to see everyone licensing everywhere. That's my goal.
[GINSBURG] Josh Simmons, Kirkland Ellis, and an alum.
[JOSH SIMMONS] So Professor Ginsburg had asked the question of 20% of what, and we started talking about for what, and I thought I would ask, can we talk a little bit more about the nonmonetary terms of these licenses? So the AI technology that we have today that people see, where there's not that much expressive content coming out of them is not inherent to the technology. That's something that the AI platforms are building in, probably to deal with fair use and other copyright infringement concerns. When they first came out, a lot more content was coming through to the outputs.
And so one of the questions I have is when we're talking about licensing, to what extent are you building in that concern about the outputs, including the expressive material, or is it if you pay us, we're fine with having more of our material come through on the output side?
[KAUFMAN] I touched on this before. So our license has restrictions-- our corporate license. It's an internal corporate license. And that's like any license. What's the license for? What's it allow? What does it not allow?
Back to the example I was giving of that-- the example I was giving of that one publisher who said, "I don't want my text in output. I'm happy to have it trained." A lot of publishers say that. And then other publishers, some of the deals, I'm sure, particularly the ones that are around RAG models, are absolutely you're going to have my content, and you're going to cite to it and maybe link to it. So that's another way of approaching it.
So I think each license and each thing has a cost because I think if the LLM in my first example said, well, I really want to be able to have your content out there, the publisher-- knowing that publisher-- would say, OK, but now I'm going to charge you twice as much. It's a negotiation, so what do you need?
[SMITH] I thought-- oh, go ahead.
[STRATTON] I think that issue will necessarily be something that's negotiated as a part of a license for the training data. And the publisher will just have the flexibility to negotiate the terms that they want.
[SMITH] I agree that we're seeing more restrictions on use now than we were a couple of years ago, and that is not inherent to the technology. And that's, I think, being motivated by some of the litigation more than some of the licensing deals. And so what we had seen reports of-- and I mean, like Matt, we're just looking at what publicly is reported and not on an individual member, knowing what they have been in discussions on.
But on the developer side, you don't want any encumbrances. You would rather want to have a, for this year, I can use your content for this amount of money, and it's a flat fee. And that's a more extreme side of things, but that's where you'd want to be able to go back to your developer team and say, "OK, guys, go experiment." So the parties were relatively far apart versus, on the publisher side, you don't want to sell your house for firewood. And so you need to build in what uses your permitting and what uses you're not.
And I think that we're seeing that develop in a more productive manner so you get the right balance as to where there's flexibility and where you're really defining what's a typical output limit, for example.
[GINSBURG] But the question implies a certain amount of technological cooperation because I can imagine that one answer would be, once it's in there, we don't control what comes out. So how do you deal with that sort of objection?
[SMITH] I mean, you could use contract law to enforce these guardrails and mitigations after the fact as to what the technology is going to do and how it's going to be deployed for consumers and attach different service restrictions or usage restrictions or permissions, which, to me, coming from the music side, wouldn't be odd because that's how those deals all work.
I mean, that's how any platform or agreement for-- in our digital age, it usually says, "You can use my content for this or that." And even though, perhaps, the content is sitting on a server somewhere and whatever level of encoding, you know what you have the rights to use and what rights you don't have, and you just need to follow your deal.
[HUSS-EKERHULT] I think we should also speak about standards and identifiers in that regard on the way to control what kind of works have been ingested. And there are several identifiers in that regard. So there is also the ISCC but others-- IFRRO is also involved, for instance, in ISNI, [INAUDIBLE]. All those are really important in order to track the content that is being used.
[GINSBURG] Which gives rise to a different problem, which is somewhat beyond the scope of this symposium, which is the problem of stripping out that kind of identifying information. And in a number of these 30 lawsuits, there are claims that the data stripping occurs in violation of Section 102 of the Copyright Act. But that is beyond the scope of this symposium. I wanted to point out that it is a solution or a partial solution, but it's also a big problem. Other questions? Yes, please say who you are.
[RALPH SEVUSH] Hi, my name is Ralph Sevush. I'm the general counsel for the Dramatists Guild of America. We're a member of the Authors Coalition of America, too. So first, I want to thank Anita and IFRRO. We receive revenues every year from European countries who, by statute, tax copying and pay monies over to American authors for the use of that property. We also, I think in recent years, have made a similar arrangement with CCC. So, Roy, thank you.
I have a fundamental issue, however, with the notion which a few of you expressed, which is simply how are the writers going to get paid-- well, you're going to defer to the contract with the publishing agreement. Publishing agreements are made with writers who have no union status. There is a disproportionate power imbalance in that relationship.
So collective rights and ASCAP-type models or market-type models may be insufficient to handle this, to ensure that since you're buying product by the pound, essentially, in these kinds of markets. And it's not individual negotiation and power is irrelevant. Stephen King's work is no more valuable to them than mine, in a sense, if they're just looking to train the data, if they just want sentences.
So how do we protect the creators of the content is my basic underlying question when the publishing agreements themselves are not helpful in this regard because of the power imbalance because writers can't organize.
[HUSS-EKERHULT] Well, that's more a question for individual rights management. And as I represent IFRRO and CMOS, I think my other panelists are better suited. But let me answer this a little bit with-- as we're speaking about the past as well, I used to work at the World Intellectual Property Organization at WIPO.
And there are two projects at WIPO, I think, that are of interest to you. One is called WIPO for Creators, CLIP. The first part was on music, but the next will be publishing most likely. And it's a big website with information, with tools, with videos, with famous authors speaking about important-- so it's really important to raise awareness about that and to speak about that. That's one aspect that WIPO has been working on.
The other is a publishing tool kit. So there was an author nominated by the International Authors Forum, and the publisher nominated by the International Publishers Association. And I have been working with both, and they created a toolkit. It's a publishing toolkit looking into contractual clauses, but not being prescriptive, but rather what kind of contractual clauses do authors and publishers need to come together to look at when they negotiate contracts? It's on the WIPO website, and I'm happy to share the link later.
[KAUFMAN] I don't have an easy answer. There's a power imbalance. And I'm also not familiar with your publishing sector to even know how it plays out or what normal contract would look like. At the end of the day, when we're looking at AI, if you start with the rights of the creator and the creator's assignee or licensor needs to be respected, at least you're starting somewhere. At least you're getting, in theory, either some money or the ability to at least not have your materials taken from you.
Within that power imbalance, I don't know. But I think we've seen in-- again, just from what I read in the paper, when you look at, the drama-- sorry, the Directors Guild and others like, and you don't have a union, but you are starting to see people are aware of this in a way that, two years ago, they weren't aware. And it will filter into negotiations. Whether it alters the power imbalance dynamic, I don't know.
[SEVUSH] Matt?
[STRATTON] And I do think the authors voices have been loud in this area, and it's been publicly reported in many different resources. And I think publishers will be listening to that and it will be factored in.
[SEVUSH] Is there some way to get-- for an author who has had their work used for training an AI, is there any way, technically, for that material to be withdrawn if the author doesn't want it included? Or is it once it's in, it's in, and there's no way to get it out?
[STRATTON] There has been research articles on the topic, and my understanding that it may be possible to remove works, but I think it could be a difficult process.
[KAUFMAN] The technology people-- play person. My understanding-- and this is why I get really hostile to some of-- when some of the AI companies-- oh, we'll let you opt out. Once you've been trained, you've been trained. What are you going to opt out of? Maybe they'll say, "OK, that thing that isn't a copy that's been vectorized." I will remove that vector that isn't a copy, so don't call it a copy, but I will no longer have that. But a lot of the damage has been done. And a lot of the value has been generated.
And so I've been on record at times talking about one tech company who says, "Oh, well, we're going to opt out," and that's their response to-- we're not getting into Article 4 of the AI of the digital single market, but EU law. So they're saying, oh you can opt out. Opt out is not the same as someone getting permission in advance. And don't fall for it.
So, unfortunately, I will say you're probably part of a lawsuit, whether you know it or not, you're probably a class plaintiff. There are so many class actions. I assume I'm a class plaintiff. I don't even in which cases. So yeah, so at least you have some legal recourse. But yeah, once it's trained, they're getting that value.
[SMITH] I just wanted to add one thing that I think we are paying attention to is when a model is improved or a company puts on a new model, whether they retrain from scratch because that would be an opportunity to change the material on which something is trained and not have to deal with an opt-out situation. So I know one of the larger AI developers was just in a new training phase over the summer, and some of the news publishers were reporting their websites almost were like desegregating their ability to serve readers, like human readers, because these bots and crawlers were so intense because the theory was that they were starting from scratch again.
[BALGANESH] I just wanted to respond to that last thing. So as part of this project, we've been talking to some computer scientists and engineers specifically about trying to figure out whether you can eliminate things. So I think you're absolutely right that forgetting is not the same thing as going in and hitting a Delete button.
But there is this emerging science within the AI world called machine unlearning that is actually evolving and is actually, in 2024, a area where they're trying to figure out how to develop a cost-effective mechanism of retraining the model in recognition of the fact that the concern is not just about copyright-- one of the early concerns that triggered unlearning was privacy. So a lot of personal information when the machine was trained on and was doxing in its output, they had to figure out whether you could untrain that in a way that didn't rely on it in any way or form to even suggest signals. So machine unlearning is the next phase in this development that I think would be the kind of solution we're looking.
[GINSBURG] We have time for one more question.
Yes, David Lightman, an alum.
[DAVID LIGHTMAN] Yes, and former editor of the Journal of Law and the Arts. So we have a new editor and all that.
[GINSBURG] And former playwright, right? Member of the Dramatists Guild?
[LIGHTMAN] Correct, yes. Can you speak a little bit more to the Fair Use question? We now have this kind of tension between the Oracle case, which is it's OK if it's to create a new product versus the Warhol analysis requiring a much higher degree of justification maybe than we had before. Warhol-- can the panelists speak to that tension a little bit?
[GINSBURG] In one minute.
[KAUFMAN] I've written a bit in The Scholarly Kitchen, which is something for science and academic publishing. I've written quite a bit about this. Look, Fair Use is a continuum. You look at Texaco or the Kinko's case, I think these sort of fundamental CCC cases, where Fair Use would be fair if this person does it, but the exact same act for the exact same purpose by Kinko's is an infringement. So all I'm going to say is, I meet with a lot of government people, and they're like one side comes in and says everything's Fair Use. And another side comes in-- it's closer to my side, probably-- and says nothing is Fair Use. I would never say that.
You got to look at the market factors, back to why I said I want to see every license. It's great if it's true of CCC, but it's also great if it isn't because, at the end of the day, what is the licensing-- what is the fourth Fair Use factor going to look at? They're going to look at market harm. And you don't need to have a license to have market harm. Everyone remembers the Google case, but no one remembers the Fox case that came after from the same court saying, yeah, it's transformative, and no, there isn't a license, but it's still an infringement.
All licensing will have a bearing on the results of lawsuits. I would say that's horse and cart are a little bit flipped in that. But I do think facts were created on the ground. I'm always asked, well, they said they couldn't have gotten a license. I'm like, yeah, because they didn't ask anyone for a license, and we didn't know what they were doing. So of course there's no AI license when there's no AI.
Then you come out and say, "Well, there was no license for this." Well, if you actually sat down with some of us and said, "This is what I want to do," we just said, "OK, what do you need? What are you looking for? Which content are you looking for?" Let us talk to the rights holders. Let's build a license. Most licenses CCC creates doesn't get-- they don't get created because the rights holders want a license out. They get created because there's a business use case that someone needs a license for. Much easier to find sellers than it is to find buyers.
So all of this licensing will have an impact on fair use. But, at the end of the day, Fair Use, as Shyam said, it's not the answer. It doesn't scale because just because it's Fair Use for this academic institution to do something doesn't mean it's fair use for this corporation. And someone answer me, is OpenAI a nonprofit or a $50 billion startup? And the answer is it's both. Is that Fair Use or not fair use? I don't know. So all of these things are just going to factor in.
And then I've got Anita here-- it doesn't matter. There's no Fair Use in the UK. There's no commercial text in data mining exception in the UK. Let's follow the Getty case that's there, where the court said high likelihood of success for training AI on Getty Images. Let's look at the EU, where Fair Use doesn't apply and Article 4 says you can do commercial text and data mining unless the rights holder reserved the rights. So we got to look at this a little bit globally. But back to your question on fair use, yeah, this impacts on it. But how in each case is going to be a little bit different.
[GINSBURG] Just to last word here on Fair Use, which I said we were actually not really going to get into fair use, but after Warhol, the existence of a licensing market, I think will be the principal issue. And Fair Use tends to depress licensing markets. So we'll see how that works out going forward.
So are we having a coffee break now, CAITE?
[MCGRAIL]Yeah, 15 minutes.
[GINSBURG] So first of all, I'd like to thank the panel very much.
[APPLAUSE]
And there is coffee and goodies in the back of the room. So mill around here, and we will resume in 15 minutes. That's correct? OK.