DIRITTO D’AUTORE – Pagina 3

Il New York Times, e un gruppo di scrittori, citano OpenAI e Microsoft per violazione di copyright, consistente nella riproduzione di articoli (o di loro libri) per allenare la loro Intelligenza Artificiale e nel loro inserimento nell’output

– I –

Il NYT il 27 dicembre dà notizia di aver fatto causa per il saccheggio dei suoi articoli e materiali per allenare ChatGP e altri sistemi di AI e l’uso nell’output dai prompts degli utenti.

Offre pure il link all’atto di citazione. depositato presso il South. Dist. di NY il 27 dicembre 2023 , Case 1:23-cv-11195 .

Qui interessante è la decrizione del funzionamento della generative AI e del suo training, oltre alla storia di OpenAI che -contrariamente agli inizi (solo strategicamente open, allora vien da dire)- open adesso non lo è più , §§ 55 ss.: v. § 75 ss

Le condotte in violazione (con molti esempi reali -screenshot- delle prove eseguite dall’attore, spesso a colori: anzi, viene detto in altro articolo che l’Exhibit J contiene 100 esempi ; lo stesso sito in altro articolo offre il link diretto a questo allegato J) sono:

– Unauthorized Reproduction of Times Works During GPT Model Training,§ 83 ss

– Embodiment of Unauthorized Reproductions and Derivatives of Times Works in
GPT Models, § 98 ss

– Unauthorized Public Display of Times Works in GPT Product Outputs, § 102 ss;

-Unauthorized Retrieval and Dissemination of Current News, § 108 ss.

V. ora su Youtube l’interessante analisi riga per riga della citazione svolta da Giovanni Ziccardi.

– II –

Giunge poi notizia di analoga iniziativa giudiziaria (qui però come class action) promossa da scrittori USA. Vedasi la citazione depositata il 19 dicembre 2023 al South. Dist. di NY da Alter, Bird, Branch ed altri contro più o meno gli stessi convenuti. I datasets per il training sono presi da Common Crawl, Webtext, Books1 and 2, Wikipedia etc, § 72 (lo dice lo stesso OPenAI).

L’allegata modalità seguita per la violazione:

<<90. Defendants used works authored and owned by Plaintiffs in the training of their GPT models, and in doing so reproduced these works and commercially exploited them without a license.
91. While OpenAI and Microsoft have kept the contents of their training data secret, it is likely that, in training their GPT models, they reproduced all or nearly all commercially successful nonfiction books. As OpenAI investor Andreesen Horowitz has admitted, “large language models,” like Defendants’ GPT models, “are trained on something approaching the entire corpus of the written word,” a corpus that would of course include Plaintiffs’ works.
92. The size of the Books2 database—the “internet based books corpora” that
Defendants used to train GPT-3, GPT-3.5, and possibly GPT-4 as well—has led commentators to believe that Books2 is comprised of books scraped from entire pirated online libraries such as LibGen, ZLibrary, or Bibliotik. Shawn Presser, an independent software developer, created an open-source set of training data called Books3, which was intended to give developers, in his words, “OpenAI-grade training data.” The Books3 dataset, similar in size to Books2, was built
from a corpus of pirated copies of books available on the site Bibliotik. Works authored and owned by Plaintiffs Alter, Bird, Branch, Cohen, Linden, Okrent, Sancton, Sides, Schiff, Shapiro, Tolentino, and Winchester are available on Books3, an indication that these works were also likely included in the similarly sized Books2>>.

Vedremo l’esito (magari già la comparsa di costitzione, speriamo)

– III –

“Chat GPT Is Eating the World” pubblica una utile lista delle cause pendenti in USA azionanti il copyright contro l’uso in AI (sono 15 , quasi tutte class actions).

Ci trovi anche il fascicolo processuale della sopra cit. NYT Times c. Microsoft-OpenAI (v. DOCKET, link diretto qui e qui nei vari Exhibit l’elenco dell’enorme quantità di articoli copiati)

– IV –

Resta però da vedere se allenare i LARGE LANGUAGE MODELS con materiale protetto ne determini realmente una “riproduzione” sotto il profilo tecnico/informatico: o meglio se tecnicamente si dia un fenomeno che possa giuridicamente qualificarsi “riproduzione”. Kevin Bryan su X dice di no ; Lemley-CAsey pure affermano la legittimità per policy reasons . Ma data la norma in vigore, si deve accertare se vi sia o meno riproduzione: in caso positivo, infatti, l’eventuale elaborazione creativa (tutto da vedere se ricorra e come vada giudicata la creatività) non può prescindere dal consenso dei titolari delle opere riprodotte.

Che queste AI richeidano di accedere a materialiper lo più protetti è com,prensibile: lo dice OpenAI (v. Dan Milmo 8 genn. 2023 nel Guardian). Ma non aiuta a risolvere detto dubbio tecnico-giuridico

Il marchio tridimendsionale costituito da design con valore artistico: la SC si pronuncia sul (l’ennesima lite nel) caso Piaggio v. ZHEJIANG ZHONGNENG INDUSTRY GROUP

Cass. sez. 1 del 28 novembre 2023 n. 33.100, rel,. Ioffrida, affronta tre importanti questioni, in una delle più interessanti vertenze IP degli ultimi anni:

Questo il marchio Piaggio, costituito dal frontale dello scooter:

“TRIDIMENSIONALE IL MARCHIO CONSISTE NELLA RAPPRESENTAZIONE TRIDIMENSIONALE DI UNO SCOOTER. LA RAPPRESENTAZIONE È FORNITA IN 5 VISTE ORTOGONALI E 1 PROSPETTICA, COME DA ALLEGATO” (dal fascicolo)

V. qui il fascicolo in TMview.

Tre , si diceva, son le questioni affrontate:

1) se vi sia o meno valore sostanziale nella forma, tale da render invalido il marchio ex art. 9.1.c) cpi,

2) il rapporto tra tale giudizio e l’eventuale artisticità dell’oggetto ex art. 2 n. 10 l. aut.

3) l’individuazione del segno su cui rendere il giudizio di contraffazine, tenuto conto che il frotnale è leggermente variato più volte nel corso degli anni.

Sub 1) la SC (§ 3.6) accoglie la tesi dei produttore cinese : il valore sostanziale non significa “decisivo” o “prevalente”: basta che contribuisca in qualche modo alla scelta di acqisto.

Ci pare tesi errata: il termine “sostanziale” significa assai di più di quanto afferma la SC.

Irrilevante la giurisprudenza europea che non può violare la cristallina portata semantica del lemma.

sub 2): stante la sostanziale sovrapponibilià tra valore sostnziale (art. 9.c cpi) e valore artistico (art. 2.10 l. aut.) , <<ne consegue che, il riconoscimento di un valore artistico alla forma di un prodotto quale opera di design, ai fini della tutela secondo la l.d.a., – per essere la (Omissis) addirittura divenuta, per effetto di numerosi riconoscimenti da parte dell’ambiente artistico, non meramente industriale (quali anche le innumerevoli presenze in “film, pubblicità, fotografie, che hanno come protagonista un mito”, pag. 29 della sentenza impugnata), “un’icona simbolo del costume e del design artistico italiano”, comporta, di regola, che la stessa forma dia al prodotto quel “valore sostanziale” che osta alla registrazione della forma come marchio>> (§ 3.9).

Il giudizio è affrettato visto che son diversi i consumatori/utenti nei due casi.

Inoltre non viene considerato il fattore tempo: prima che il segno di forma sia “iconico”, il marchio può essere stato depositato validamente.

sub 3) è il punto più complesso sia in teoria che in fatto. La SC ravvisa unicità del segno su cui operare il giudizio contraffattorio (conterebbe una sua concezine astratta, astorica), pur in presenza di (modesto) variare nel corso degli anni (§ 4,3). E’ questine difficile, implicante anche un approccio di teoria estetica e sulla quale non mi pronuncio. CErto non accoglierla porrebbe problemi pratici assai significativi, allalue dell’evoluzione graduale ma costante del frontale nel corso del tempo.

Pertanto rinvia ad altra sezione della CdA di Torino

E’ grave, infine, circa la tecnica redazionale, che i nostri giudici si ostinino nel non inserire la riproduzione completas (qui non c’è nemmeno una minimale) dei segni o dei prodotti incausa : vanifica la ratio della pubbliczione della sentenza ,. consistente nel permettere il controllo pubblico e democratico della stessa.

E’ lacuna che dovrebbe essere colmata.

La Court of Appeal inglese sulla creatività come artistic work di una graphic user interface

Si reclama il diritto di autore sul lavoro grafico sottostante (GUIs: graphical user interfaces), creato tramite uso di un software:

La corte di appello 20.11.2023, [2023] EWCA Civ 1354 – Case No: CA-2023-000920, THJ SYSTEMS LIMITED – OPTIONNET LLP Claimants copntro DANIEL SHERIDAN-SHERIDAN OPTIONS MENTORING CORPORATION ravvisa la creatività.

La ravvisa non però secondo la tradizionale concezione inglese dello “skill and labour” , come aveva fatto il giudice in primo grado: << I am satisfied that the work of creating the look and functionality of interface including the arrangements of the tables and graphs did involve the exercise of sufficient skill and labour for the result to amount to an artistic work>>. § 21

La ravvisa invece secondo il concetto del diritto UE , elaborato dalla sentenza Infopaq del 2009 da parte della Corte di Giustizia (<< “… original in the sense that it is its author’s own intellectual creation”>>, § 15):

<<23 In my judgment the Defendants are right that the judge did not apply the correct test, which I have set out in paragraph 16 above. This is not because of his reference to “functionality” in [214], which appears to be a slip of the pen having regard to what he went on to say in the last sentence of [215]. It is because the test he applied was that of “skill and labour”, which was the test applied by the English courts prior to Infopaq, including in Navitaire Inc v easyJet Airline Co Ltd [2004] EWHC 1725 (Ch), [2006] RPC 3 and Nova Productions Ltd v Mazooma Games Ltd [2006] EWHC 24 (Ch), [2006] RPC 14, and not the test of “author’s own intellectual creation” laid down by the Court of Justice. As can be seen from cases such as Football Dataco and Funke Medien, these two tests are not the same, and the European test is more demanding; although Painer establishes that even a simple portrait photograph may satisfy it in an appropriate case. In fairness to the judge, I should make it clear that he was not referred to any of the relevant case law on this question (although the Defendants cited BSA, they did so in relation to a different issue).

It follows that this Court must re-assess the originality of the R & P Charts applying the correct test. Before turning to consider the evidence, it is important to make five points. First, the test is an objective one. Secondly, the test is not one of artistic merit: section 4(1)(a) of the 1988 Act expressly provides that graphic works qualify as artistic works “irrespective of artistic quality”, and nothing in the case law of the CJEU suggests otherwise. Thirdly, the burden of proof lies on the Claimants. Fourthly, particularly given that we are concerned with graphic works, a key item of evidence is the works themselves. Fifthly, as counsel for the Defendants rightly emphasised, the functionality of the Software is irrelevant to this question. The enquiry concerns the visual appearance of the R & P Charts. Given the informative purpose of the R & P Charts, the visual appearance is primarily a matter of the layout of the R & P Charts.

It can be seen from the example of the R & P Charts reproduced above, particularly when enlarged, that the various component parts of the image have been laid out with some care. Mr Mitchell has designed the display so as to cram quite a large amount of information into a single screen. Moreover, he has made choices as to what to put where, including such matters as which commands to put into the ribbon and in what order. He also selected what fonts and colours to use.

When one turns to Mr Mitchell’s evidence, his statement that “the look and feel of it is my brainchild” was not challenged. Nor were his statements that “[e]verything is original” and “everything on there is my design” because, although he had sourced components from a library, he had put them “into various locations”. The cross-examiner used the analogy of building something from Lego bricks, and in my view the analogy is a good one. As the Court of Justice held in BSA at [48], “the national court must take account, inter alia, of the specific arrangement or configuration of all the components which form part of the graphic user interface”. Mr Mitchell did not enlarge upon the choices he had made, but he was not asked about this. Nor was it put to Mr Mitchell that the visual appearance of the R & P Charts was dictated by technical considerations, rules or other constraints which left no room for creative freedom. Nor did the Defendants adduce any evidence to contradict Mr Mitchell’s evidence, such as similar graphical user interfaces produced by third parties. As the judge observed, the evidence was limited, but nevertheless it was all one way.

It is plain that the degree of visual creativity which went into the R & P Charts was low. But that does not mean that there was no creativity at all. The consequence of the low degree of creativity is that the scope of protection conferred by copyright in the R & P Charts is correspondingly narrow, so that only a close copy would infringe: see Infopaq at [45]-[48]. (It is sometimes suggested that Painer at [95]-[98] is authority to the contrary, but all that passage establishes is that the protection conferred by copyright on portrait photographs as a category is not inferior to that enjoyed by other categories of works, including other kinds of photographs.) It does not mean that the R & P Charts are not protected by copyright at all, which would have the consequence that even an identical copy would not infringe.

I therefore conclude that, even though the judge applied the wrong test, he was correct to find that the R & P Charts were original. I would therefore dismiss the Defendants’ appeal, save that I would restrict the declaration made by the judge to the R & P Charts>>.

(notizia e link a Bailii da Jeremy Blum e Toby Headdon in Kluwer Copyright law).

L’emittente radiotelevisica ha diritto al compenso per le copie private (art. 5.2.b) dir. 29-2001)

Corte di giustizia 23.,11.2023, Seven.One entertainment v. Corint, C-260/22 illumina (poco, per vero) la norma in oggetto, in relazione alla sua trasposizione nazionale tedesca, che esenta dal diritto al compenso tramite collecting society le emittenti per le loro trasmissioni.

La soluzione della corte è scontata, visto il dettato letterale della dir. 298/2001. Palesement irrilevante, poi, è la circostranza per cui certi organismi TV siano anche produttori di pellicole maturando così il relativo credito, giustamente osserva la CG.

Meno semplice è la sua attiazione e in particolare se lo Stao possa esentare certi aventi diritto sulla base di danno inesistente o minimo: o meglio, visto che può, quando ricorra tale fattiuspecie concreta. Serve poi parità di trattamento nel senso che è da vedere se può esentare tutti o solo caso per caso (ed allora in base a quali criteri oggettivi).

Ma su tutto ciò solo il giudice nazinale può decidere, conclude la CG

L’intelligenza artificiale di Facebook viola il diritto di elaborazione delle opere letterarie utilizzate?

Large Language Model Meta AI (LLaMA) (v.ne la descrizione nel sito di Meta) non viola il diritto di elaborazione sulle opere letterarie usate per creare tali modelli, dice il Trib. del distretto nord della Calofiornia Case No. 23-cv-03417-VC, 20 novembre 2023 , Kadrey v. Meta.

Nè nella costituzione dei modelli medesimi nè nell’output genrato dal loro uso:

<<1. The plaintiffs allege that the “LLaMA language models are themselves infringing
derivative works” because the “models cannot function without the expressive information
extracted” from the plaintiffs’ books. This is nonsensical. A derivative work is “a work based
upon one or more preexisting works” in any “form in which a work may be recast, transformed,
or adapted.” 17 U.S.C. § 101. There is no way to understand the LLaMA models themselves as a
recasting or adaptation of any of the plaintiffs’ books.

[più che altro non c’è prova: non si può dire che sia impossibile in astratto]
2. Another theory is that “every output of the LLaMA language models is an infringing
derivative work,” and that because third-party users initiate queries of LLaMA, “every output
from the LLaMA language models constitutes an act of vicarious copyright infringement.” But
the complaint offers no allegation of the contents of any output, let alone of one that could be understood as recasting, transforming, or adapting the plaintiffs’ books. Without any plausible
allegation of an infringing output, there can be no vicarious infringement. See Perfect 10, Inc. v.
Amazon.com, Inc., 508 F.3d 1146, 1169 (9th Cir. 2007).
The plaintiffs are wrong to say that, because their books were duplicated in full as part of
the LLaMA training process, they do not need to allege any similarity between LLaMA outputs
and their books to maintain a claim based on derivative infringement. To prevail on a theory that
LLaMA’s outputs constitute derivative infringement, the plaintiffs would indeed need to allege
and ultimately prove that the outputs “incorporate in some form a portion of” the plaintiffs’
books. Litchfield v. Spielberg, 736 F.2d 1352, 1357 (9th Cir. 1984); see also Andersen v.
Stability AI Ltd., No. 23-CV-00201-WHO, 2023 WL 7132064, at *7-8 (N.D. Cal. Oct. 30, 2023)
(“[T]he alleged infringer’s derivative work must still bear some similarity to the original work or
contain the protected elements of the original work.”); 2 Melville B. Nimmer & David Nimmer,
Nimmer on Copyright § 8.09 (Matthew Bender Rev. Ed. 2023) (“Unless enough of the pre-
existing work is contained in the later work to constitute the latter an infringement of the former,
the latter, by definition, is not a derivative work.”); 1 Melville B. Nimmer & David Nimmer,
Nimmer on Copyright § 3.01 (Matthew Bender Rev. Ed. 2023) (“A work is not derivative unless
it has substantially copied from a prior work.” (emphasis omitted)). The plaintiffs cite Range
Road Music, Inc. v. East Coast Foods, Inc., 668 F.3d 1148 (9th Cir. 2012), but that case is not
applicable here. In Range Road, the infringement was the public performance of copyrighted
songs at a bar. Id. at 1151-52. The plaintiffs presented evidence (namely, the testimony of
someone they sent to the bar) that the songs performed were, in fact, the protected songs. Id. at
1151-53. The defendants presented no evidence of their own that the protected songs were not
performed. Nor did they present evidence that the performed songs were different in any
meaningful way from the protected songs. Id. at 1154. The Ninth Circuit held that, under these
circumstances, summary judgment for the plaintiffs was appropriate. And the Court rejected the
defendants’ contention that the plaintiffs, under these circumstances, were also required to
present evidence that the performed songs were “substantially similar” to the protected songs.
That contention made no sense, because the plaintiffs had already offered unrebutted evidence
that the songs performed at the bar were the protected songs. Id. at 1154. Of course, if the
defendants had presented evidence at summary judgment that the songs performed at the bar
were meaningfully different from the protected songs, then there would have been a dispute over
whether the performances were infringing, and the case would have needed to go to trial. At that
trial, the plaintiffs would have needed to prove that the performed songs (or portions of the
performed songs) were “substantially similar” to the protected songs. That’s the same thing the
plaintiffs would need to do here with respect to the content of LLaMA’s outputs. To the extent
that they are not contending LLaMa spits out actual copies of their protected works, they would
need to prove that the outputs (or portions of the outputs) are similar enough to the plaintiffs’
books to be infringing derivative works. And because the plaintiffs would ultimately need to
prove this, they must adequately allege it at the pleading stage>>

[anche qui manca la prova]

Motivazione un pò striminzita, per vero.

(notizia e link dal blog di Eric Goldman)

Coreografia v. “emotes” nel diritto di autore: l’appello in Hanagami v. Epic Games

Avevo notiziato il 05.09.2022 su Central District della California 24 agosto 2022, caso 2: 22-cv-02063-SVW-MRW che aveva rigettato la domanda di tutela verso l’uso nel gioco Fortnite.

Ora l’appello (in 14 mesi !!!) che riforma la sentenza di primo grado.

La sentenza è di una certa importanza per capire la disciplina della tutela autorale dell’opera coreografica.

Dal summary iniziale:

<<Games, Inc., the creator of the videogame Fortnite, infringed
the copyright of a choreographic work when the company
created and sold a virtual animation, known as an “emote,”
depicting portions of the registered choreography.
The panel held that, under the “extrinsic test” for
assessing substantial similarity, Hanagami plausibly alleged
that his choreography and Epic’s emote shared substantial
similarities. The panel held that, like other forms of
copyrightable material such as music, choreography is
composed of various elements that are unprotectable when
viewed in isolation. What is protectable is the
choreographer’s selection and arrangement of the work’s
otherwise unprotectable elements. The panel held that
“poses” are not the only relevant element, and a
choreographic work also may include body position, body
shape, body actions, transitions, use of space, timing, pauses,
energy, canon, motif, contrast, and repetition. The panel
concluded that Hanagami plausibly alleged that the creative
choices he made in selecting and arranging elements of the
choreography—the movement of the limbs, movement of the hands and fingers, head and shoulder movement, and tempo—were substantially similar to the choices Epic made in creating the emote.
The panel held that the district court also erred in dismissing Hanagami’s claim on the ground that the allegedly copied choreography was “short” and a “small component” of Hanagami’s overall work. The panel declined to address the issue whether the work was entitled to broad or only thin copyright protection>>.

(notizia e link da Eric Goldman)

Ancora su AI, data scraping e violazione di copyright (questa volta per lo più negata)

La corte del distr. Nord della California 30 ottobre 2023, Case 3:23-cv-00201-WHO, Andersen v. Stability AI, DeviantArt, Midjourney, esamina il tema in oggetto (segnalazione e link di Jess Miers su X).

Le domande sono tutte rigettate tranne quelal verso Stability, per la quale è cocnessa facoltà di modifica:

<<3. Direct Infringement Allegations Against Stability Plaintiffs’ primary theory of direct copyright infringement is based on Stability’s creation and use of “Training Images” scraped from the internet into the LAION datasets and then used to train Stable Diffusion. Plaintiffs have adequately alleged direct infringement based on the allegations that Stability “downloaded or otherwise acquired copies of billions of copyrighted images without permission to create Stable Diffusion,” and used those images (called “Training Images”) to train Stable Diffusion and caused those “images to be stored at and incorporated into Stable Diffusion as compressed copies.” Compl. ¶¶ 3-4, 25-26, 57. In its “Preliminary Statement” in support of its motion to dismiss, Stability opposes the truth of plaintiffs’ assertions. See Stability Motion to Dismiss (Dkt. No. 58) at 1. However, even Stability recognizes that determination of the truth of these allegations – whether copying in violation of the Copyright Act occurred in the context of training Stable Diffusion or occurs when Stable Diffusion is run – cannot be resolved at this juncture. Id. Stability does not otherwise oppose the sufficiency of the allegations supporting Anderson’s direct copyright infringement claims with respect to the Training Images>>.

Provvedimento itneressante poer chi si occupa del tema, dato che da noi ancora non se ne son visti.

Copyright e standards

La corte di appello del distretto di Columbia , 12.09.2023, No. 22-7063, AMERICAN SOCIETY FOR TESTING AND MATERIALS, ET AL v. PUBLIC.RESOURCE.ORG, INC., dà qualche interessante insegnamento sul tema (qui la pagina della corte mentre qui il link diretto al pdf).

Tre organizzazioni, che predispongno standard per certi settori di impresa, fanno causa a public.resource.org, per aver pubblicato centinaia di standards: il che violerebbe il copyright su di essi gravante.

Di questi la maggior parte era anche stata inserita (incorporate) nella legislazione usa.

La corte di appello dice che tale pubblicaizone da parte di https://public.resource.org/ costituisce fair use (per la parte incorporated).

I primi tre fattori del 17 us code § 107 sono a favore del convenuto.

L’ultimo (effetti economici sul mercato dell’opera protetta) è invece incerto: ma non basta a controbilanciare gli altri tre.

<<n ASTM II, we noted that Public Resource’s copying may harm the market for the plaintiffs’ standards, but we found the extent of any such harm to be unclear. 896 F.3d at 453. We noted three considerations that might reduce the amount of harm: First, the plaintiffs themselves make the incorporated standards available for free in their reading rooms. Second, Public Resource may not copy unincorporated standards—or unincorporated portions of standards only partially incorporated. Third, the plaintiffs have developed and copyrighted updated versions of the relevant standards, and these updated versions have not yet been incorporated into law. We asked the parties to address these issues, among others, on remand. See id.
The updated record remains equivocal. The plaintiffs press heavily on what seems to be a common-sense inference: If users can download an identical copy of an incorporated standard for free, few will pay to buy the standard. Despite its intuitive appeal, this argument overlooks the fact that the plaintiffs regularly update their standards—including all 185 standards at issue in this appeal. And regulators apparently are much less nimble in updating the incorporations. So, many of the builders, engineers, and other regular consumers of the plaintiffs’ standards may simply purchase up-to-date versions as a matter of course. Moreover, some evidence casts doubt on the plaintiffs’ claims of significant market injury. Public Resource has been posting incorporated standards for fifteen years. Yet the plaintiffs have been unable to produce any economic analysis showing that Public Resource’s activity has harmed any relevant market for their standards. To the contrary, ASTM’s sales have increased over that time; NFPA’s sales have decreased in recent years but are cyclical with publications; and ASHRAE has not pointed to any evidence of its harm. See ASTM III, 597 F. Supp. 3d at 240.
The plaintiffs’ primary evidence of harm is an expert report opining that Public Resource’s activities could put the plaintiffs’ revenues at risk. Yet although the report qualitatively describes harms the plaintiffs could suffer, it makes no serious attempt to quantify past or future harms. Like the district court, we find it “telling” that the plaintiffs “do not provide any quantifiable evidence, and instead rely on conclusory assertions and speculation long after [Public Resource] first began posting the standards.” ASTM III, 597 F. Supp. 3d at 240.
Finally, our analysis of market effects must balance any monetary losses to the copyright holders against any “public benefits” of the copying. Oracle, 141 S. Ct. at 1206. Thus, even if Public Resource’s postings were likely to lower demand for the plaintiffs’ standards, we would also have to consider the substantial public benefits of free and easy access to the law. As the Supreme Court recently confirmed: “Every citizen is presumed to know the law, and it needs no argument to show that all should have free access” to it. Georgia v. Public.Resource.Org., Inc., 140 S. Ct. 1498, 1507 (2020) (cleaned up)>>.

Sintesi sul quarto:

<<We conclude that the fourth fair-use factor does not significantly tip the balance one way or the other. Common sense suggests that free online access to many of the plaintiffs’ standards would tamp down the demand for their works. But there are reasons to doubt this claim, the record evidence does not strongly support it, and the countervailing public benefits are substantial.>>

Sintesi comlpèessiva: <<In sum, the first three factors under section 107 strongly favor fair use, and the fourth is equivocal. We thus conclude that Public Resource’s non-commercial posting of incorporated standards is fair use>>

Altra azione contro società di A. I., basata su diritto di autore: Concord Music, Universal Music e altri c. Anthropic PBC

Tramite il modello AI chiamato Claude2, Anthropic violerebbe il copyright di molte canzoni (della loro parte letterariA) . Così la citazione in giudizio da parte di molti produttori (tra i maggiori al mondo, parrebbe).

Ne dà notizia The Verge oggi 19 ottobre (articolo di Emilia David), ove trovi pure il link all’atto introduttivo di citazione in giudizio.

Riposto solo i passi sul come fuinziona il traininig e l’output di Claude2 e poi dove stia la vioalzione.

<<6 . Anthropic is in the business of developing, operating, selling, and licensing AI technologies. Its primary product is a series of AI models referred to as “Claude.” Anthropic builds its AI models by scraping and ingesting massive amounts of text from the internet and potentially other sources, and then using that vast corpus to train its AI models and generate output based on this copied text. Included in the text that Anthropic copies to fuel its AI models are the lyrics to innumerable musical compositions for which Publishers own or control the copyrights, among countless other copyrighted works harvested from the internet. This copyrighted material is not free for the taking simply because it
can be found on the internet. Anthropic has neither sought nor secured Publishers’ permission to use their valuable copyrighted works in this way. Just as Anthropic does not want its code taken without its authorization, neither do music publishers or any other copyright owners want their works to be exploited without permission.
7.
Anthropic claims to be different from other AI businesses. It calls itself an AI “safety and research” company, and it claims that, by training its AI models using a so-called “constitution,” it ensures that those programs are more “helpful, honest, and harmless.” Yet, despite its purportedly principled approach, Anthropic infringes on copyrights without regard for the law or respect for the creative community whose contributions are the backbone of Anthropic’s infringing service.
8.
As a result of Anthropic’s mass copying and ingestion of Publishers’ song lyrics, Anthropic’s AI models generate identical or nearly identical copies of those lyrics, in clear violation of Publishers’ copyrights. When a user prompts Anthropic’s Claude AI chatbot to provide the lyrics to songs such as “A Change Is Gonna Come,” “God Only Knows,” “What a Wonderful World,” “Gimme Shelter,” “American Pie,” “Sweet Home Alabama,” “Every Breath You Take,” “Life Is a Highway,” “Somewhere Only We Know,” “Halo,” “Moves Like Jagger,” “Uptown Funk,” or any other number of Publishers’ musical compositions, the chatbot will provide responses that contain all or significant portions of those lyrics>>.

<<11. By copying and exploiting Publishers’ lyrics in this manner—both as the input it uses to train its AI models and as the output those AI models generate—Anthropic directly infringes Publishers’ exclusive rights as copyright holders, including the rights of reproduction, preparation of derivative works, distribution, and public display. In addition, because Anthropic unlawfully enables, encourages, and profits from massive copyright infringement by its users, it is secondarily liable for the infringing acts of its users under well-established theories of contributory infringement and vicarious infringement. Moreover, Anthropic’s AI output often omits critical copyright management information regarding these works, in further violation of Publishers’ rights; in this respect, the composers of the song lyrics frequently do not get recognition for being the creators of the works that are being distributed. It is unfathomable for Anthropic to treat itself as exempt from the ethical and legal rules it purports to embrace>>

Come funziona il training di AI:

<<54. Specifically, Anthropic “trains” its Claude AI models how to generate text by taking the following steps:
a. First, Anthropic copies massive amounts of text from the internet and potentially other sources. Anthropic collects this material by “scraping” (or copying or downloading) the text directly from websites and other digital sources and onto Anthropic’s servers, using automated tools, such as bots and web crawlers, and/or by working from collections prepared by third parties, which in turn may have been harvested through web scraping. This vast collection of text forms the input, or “corpus,” upon which the Claude AI model is then trained.
b.   Second, as it deems fit, Anthropic “cleans” the copied text to remove material it perceives as inconsistent with its business model, whether technical or subjective in nature (such as deduplication or removal of offensive language), or for other reasons.
In most instances, this “cleaning” process appears to entirely ignore copyright infringements embodied in the copied text.
c.   Third, Anthropic copies this massive corpus of previously copied text into computer memory and processes this data in multiple ways to train the Claude AI models, or establish the values of billions of parameters that form the model. That includes copying, dividing, and converting the collected text into units known as “tokens,” which are words or parts of words and punctuation, for storage. This process is referred to as “encoding” the text into tokens. For Claude, the average token is about 3.5 characters long.4
d.   Fourth, Anthropic processes the data further as it “finetunes” the Claude AI model and engages in additional “reinforcement learning,” based both on human feedback and AI feedback, all of which may require additional copying of the collected text.
55.   Once this input and training process is complete, Anthropic’s Claude AI models generate output consistent in structure and style with both the text in their training corpora and the reinforcement feedback. When given a prompt, Claude will formulate a response based on its model, which is a product of its pretraining on a large corpus of text and finetuning, including based on reinforcement learning from human feedback. According to Anthropic, “Claude is not a bare language model; it has already been fine-tuned to be a helpful assistant.”5 Claude works with text in the form of tokens during this processing, but the output is ordinary readable text>>.

Violazioni:

<<56.
First, Anthropic engages in the wholesale copying of Publishers’ copyrighted lyrics as part of the initial data ingestion process to formulate the training data used to program its AI models.
57.
Anthropic fuels its AI models with enormous collections of text harvested from the internet. But just because something may be available on the internet does not mean it is free for Anthropic to exploit to its own ends.
58.
For instance, the text corpus upon which Anthropic trained its Claude AI models and upon which these models rely to generate text includes vast amounts of Publishers’ copyrighted lyrics, for which they own or control the exclusive rights.
59.
Anthropic largely conceals the specific sources of the text it uses to train its AI models. Anthropic has stated only that “Claude models are trained on a proprietary mix of publicly available information from the Internet, datasets that we license from third party businesses, and data that our users affirmatively share or that crowd workers provide,” and that the text on which Claude 2 was trained continues through early 2023 and is 90 percent English-language.6 The reason that Anthropic refuses to disclose the materials it has used for training Claude is because it is aware that it is copying copyrighted materials without authorization from the copyright owners.
60.
Anthropic’s limited disclosures make clear that it has relied heavily on datasets (e.g., the “Common Crawl” dataset) that include massive amounts of content from popular lyrics websites such as genius.com, lyrics.com, and azlyrics.com, among other standard large text
collections, to train its AI models.7
61.
Moreover, the fact that Anthropic’s AI models respond to user prompts by generating identical or near-identical copies of Publishers’ copyrighted lyrics makes clear that Anthropic fed the models copies of those lyrics when developing the programs. Anthropic had to first copy these lyrics and process them through its AI models during training, in order for the models to subsequently disseminate copies of the lyrics as output.
62.
Second, Anthropic creates additional unauthorized reproductions of Publishers’ copyrighted lyrics when it cleans, processes, trains with, and/or finetunes the data ingested into its AI models, including when it tokenizes the data. Notably, although Anthropic “cleans” the text it ingests to remove offensive language and filter out other materials that it wishes to exclude from its training corpus, Anthropic has not indicated that it takes any steps to remove copyrighted content.
63.
By copying Publishers’ lyrics without authorization during this ingestion and training process, Anthropic violates Publishers’ copyrights in those works.
64.
Third, Anthropic’s AI models disseminate identical or near-identical copies of a wide range of Publishers’ copyrighted lyrics, in further violation of Publishers’ rights.
65.
Upon accessing Anthropic’s Claude AI models through Anthropic’s commercially available API or via its public website, users can request and obtain through Claude verbatim or near-verbatim copies of lyrics for a wide variety of songs, including copyrighted lyrics owned and controlled by
Publishers. These copies of lyrics are not only substantially but strikingly similar to the original copyrighted works>>

<<70.
Claude’s output is likewise identical or substantially and strikingly similar to Publishers’ copyrighted lyrics for each of the compositions listed in Exhibit A. These works that have been infringed by Anthropic include timeless classics as well as today’s chart-topping hits, spanning a range of musical genres. And this represents just a small fraction of Anthropic’s infringement of Publishers’ works and the works of others, through both the input and output of its AI models.
71.
Anthropic’s Claude is also capable of generating lyrics for new songs that incorporate the lyrics from existing copyrighted songs. In these cases, Claude’s output may include portions of one copyrighted work, alongside portions of other copyrighted works, in a manner that is entirely inconsistent and even inimical to how the songwriter intended them.
72.
Moreover, Anthropic’s Claude also copies and distributes Publishers’ copyrighted lyrics even in instances when it is not asked to do so. Indeed, when Claude is prompted to write a song about a given topic—without any reference to a specific song title, artist, or songwriter—Claude will often respond by generating lyrics that it claims it wrote that, in fact, copy directly from portions of Publishers’ copyrighted lyrics>>.

<<80.
In other words, Anthropic infringes Publishers’ copyrighted lyrics not only in response to specific requests for those lyrics. Rather, once Anthropic copies Publishers’ lyrics as input to train its AI models, those AI models then copy and distribute Publishers’ lyrics as output in response to a wide range of more generic queries related to songs and various other subject matter>>.

La citazione in giudizio dell’associazione scrittori usa contro Open AI

E’ reperibile in rete (ad es qui) la citazione in giuidizio avanti il South. Dist. di New Yoerk contro Open AI per vioalzione di copyright proposta dalla importante Autorhs Guild e altri (tra cui scrittori notissimi) .

L’allenamento della sua AI infatti pare determini riproduzione e quindi (in assenza di eccezione/controdiritto) violazione.

Nel diritto UE l’art. 4 della dir 790/2019 presuppone il diritto di accesso all’opera per invocare l’eccezione commerciale di text and data mining:

<< 1. Gli Stati membri dispongono un’eccezione o una limitazione ai diritti di cui all’articolo 5, lettera a), e all’articolo 7, paragrafo 1, della direttiva 96/9/CE, all’articolo 2 della direttiva 2001/29/CE, all’articolo 4, paragrafo 1, lettere a) e b), della direttiva 2009/24/CE e all’articolo 15, paragrafo 1, della presente direttiva per le riproduzioni e le estrazioni effettuate da opere o altri materiali cui si abbia legalmente accesso ai fini dell’estrazione di testo e di dati.

2. Le riproduzioni e le estrazioni effettuate a norma del paragrafo 1 possono essere conservate per il tempo necessario ai fini dell’estrazione di testo e di dati.

3. L’eccezione o la limitazione di cui al paragrafo 1 si applica a condizione che l’utilizzo delle opere e di altri materiali di cui a tale paragrafo non sia stato espressamente riservato dai titolari dei diritti in modo appropriato, ad esempio attraverso strumenti che consentano lettura automatizzata in caso di contenuti resi pubblicamente disponibili online.

4. Il presente articolo non pregiudica l’applicazione dell’articolo 3 della presente direttiva>>.

Il passaggio centrale (sul se ricorra vioalzione nel diritto usa) nella predetta citazione sta nei §§ 51-64:

<<51. The terms “artificial intelligence” or “AI” refer generally to computer systems designed to imitate human cognitive functions.
52. The terms “generative artificial intelligence” or “generative AI” refer specifically to systems that are capable of generating “new” content in response to user inputs called “prompts.”
53. For example, the user of a generative AI system capable of generating images
from text prompts might input the prompt, “A lawyer working at her desk.” The system would then attempt to construct the prompted image. Similarly, the user of a generative AI system capable of generating text from text prompts might input the prompt, “Tell me a story about a lawyer working at her desk.” The system would then attempt to generate the prompted text.
54. Recent generative AI systems designed to recognize input text and generate
output text are built on “large language models” or “LLMs.”
55. LLMs use predictive algorithms that are designed to detect statistical patterns in the text datasets on which they are “trained” and, on the basis of these patterns, generate responses to user prompts. “Training” an LLM refers to the process by which the parameters that define an LLM’s behavior are adjusted through the LLM’s ingestion and analysis of large
“training” datasets.
56. Once “trained,” the LLM analyzes the relationships among words in an input
prompt and generates a response that is an approximation of similar relationships among words in the LLM’s “training” data. In this way, LLMs can be capable of generating sentences, p aragraphs, and even complete texts, from cover letters to novels.
57. “Training” an LLM requires supplying the LLM with large amounts of text for
the LLM to ingest—the more text, the better. That is, in part, the large in large language model.
58. As the U.S. Patent and Trademark Office has observed, LLM “training” “almost
by definition involve[s] the reproduction of entire works or substantial portions thereof.”4
59. “Training” in this context is therefore a technical-sounding euphemism for
“copying and ingesting.”
60. The quality of the LLM (that is, its capacity to generate human-seeming responses
to prompts) is dependent on the quality of the datasets used to “train” the LLM.
61. Professionally authored, edited, and published books—such as those authored by Plaintiffs here—are an especially important source of LLM “training” data.
62. As one group of AI researchers (not affiliated with Defendants) has observed,
“[b]ooks are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story.”5
63. In other words, books are the high-quality materials Defendants want, need, and have therefore outright pilfered to develop generative AI products that produce high-quality results: text that appears to have been written by a human writer.
64. This use is highly commercial>>