2025년 2월 3일

[Copyright, AI – Whether AI's data learning constitutes copyright infringement]

[Copyright, AI – Whether AI's data learning constitutes copyright infringement]

[Copyright, AI – Whether AI's data learning constitutes copyright infringement]


Hello. I am lawyer Shin Jun-seon of the law firm Cheongchul.


Recently, the three major domestic broadcasters KBS, MBC, and SBS filed a lawsuit alleging that Naver's generative AI 'HyperCLOVA' used news content without permission for its machine learning, claiming copyright infringement and violation of the Unfair Competition Prevention Act. As copyright issues related to AI training data have emerged as significant issues, many similar lawsuits are ongoing overseas. In the current situation, where there is an ongoing demand for legal rulings to clarify the boundary between advancing AI technology and copyright protection, we would like to examine what the future main issues may be.


[Question] Can AI's learning of news articles be considered copyright infringement?


[Answer]

1. Examination from a domestic legal perspective


a. Perspective of copyright law

Considering the copyright law and the Unfair Competition Prevention Act of South Korea, the first issue is whether AI's training constitutes an act of copyright infringement. According to Article 2, Paragraph 1 of the Copyright Act, news content recognized for its creativity is protected as a work. If it is considered that AI replicated or distributed the original article during the learning process, there is a high possibility that it would be recognized as copyright infringement. However, if the learning process is merely data analysis, the possibility of applying fair use under Article 35-3 of the Copyright Act may also be considered.

However, under current law, the application of fair use is determined by the court on a case-by-case basis, hence it is still uncertain whether AI's learning would be recognized as fair use. If a phenomenon of 'regurgitation' where AI outputs articles as-is is observed, it could be recognized as copyright infringement. Conversely, if it simply generates new information using the learned knowledge, there may be grounds for fair use.

Moreover, South Korea's copyright law includes provisions (such as Article 93, etc.) for the protection of database creators' rights, allowing the party claiming infringement to argue that database creators' rights have been violated based on these provisions.


b. Perspective of the Unfair Competition Prevention Act

Broadcasters view Naver's AI learning as an act to secure "unfair competitive advantage" and argue that it constitutes an act of unfair competition, utilizing data created through legitimate efforts without authorization under Article 2, Paragraph 1 of the domestic Unfair Competition Prevention Act. While there have been no clear legal judgments in South Korea regarding whether AI's learning constitutes unfair competition, considering the overseas cases introduced below, there is a possibility this could be interpreted as unfair competition in the future.


c. Reference to domestic crawling cases

As for whether AI's data learning process constitutes a copyright infringement, there has yet to be a definitive case in South Korea. However, the Supreme Court ruled in a case that determined whether a 'crawling program' collecting information from competitors' API servers violated copyright law and the Information and Communications Network Act. It concluded that significant portions of a database being collected repetitively and systematically could be considered copyright infringement. The court indicated that it must comprehensively evaluate the quantitative and qualitative aspects of the data to determine whether it harms the usual utilization of the database or unjustly infringes the creator's interests (Supreme Court decision 2022. 5. 12. Case No. 2021Do1533).

The data collection in this case was performed through a crawling program, which could differ somewhat in nature from the AI in the example case. However, since both involve "automated program-based online data collection," it is likely a meaningful reference case.


2. Comparison with overseas cases

Regarding AI data learning, many lawsuits are underway abroad. In the case of the broadcasters and Naver AI, the court may have no choice but to refer to the results of overseas lawsuits to judge the merits of the broadcasters' claims. Some key cases currently occurring overseas include:


a. New York Times vs OpenAI & Microsoft

The New York Times filed a lawsuit against OpenAI and Microsoft at the end of 2023, claiming that they used its articles without permission for AI training. The lawsuit is ongoing, with the central issue being whether the unauthorized use of millions of New York Times articles by OpenAI for Chat GPT development constitutes copyright infringement. OpenAI has countered that the use of publicly available materials for AI model training falls under fair use.


b. Raw Story & AlterNet vs OpenAI

In the first half of 2024, these two media companies sued OpenAI, claiming that it illegally collected their articles for training data and removed copyright management information (CMI)*. Instead of directly asserting copyright infringement, the lawsuit argues that OpenAI undermined the copyright protection system by deleting the CMI during the training process. However, the Southern District Court of New York dismissed the lawsuit, stating that the plaintiffs had failed to sufficiently prove that they suffered actual damages due to OpenAI's actions. It is expected that the plaintiffs will seek to prove damages and continue their legal battles.


l Copyright Management Information (CMI)

Information related to copyright included in specific works for digital copyright protection. Elements protected under copyright law and the Digital Millennium Copyright Act (DMCA).


c. Lawsuit related to Stable Diffusion image generation

Stable Diffusion is a deep-learning model that generates images through text input, released by Stability AI in August 2022. This AI model can generate new images by learning from image data collected from the internet.

Getty Images, as an image provider, filed a lawsuit against Stability AI in the UK, claiming that Stable Diffusion unlawfully collected and used millions of its images as training data, violating copyright law and trademark rights (as the Getty Images watermark is included in the images). Additionally, individual artists have filed copyright infringement lawsuits in the US against similar image generation AI models, including Stability AI, Midjourney, and DeviantArt, with similar claims.

The lawsuit raises issues such as whether the image learning of AI models constitutes copyright infringement by directly learning (replicating) original images, whether it is permissible for AI to learn to imitate a specific artist's style, and whether AI-generated images may be considered transformative works under fair use.


3. Future prospects

In domestic cases, there have also been prior claims that Naver's AI model violated the 'news content partnership agreement' while learning news articles. It is anticipated that domestic courts will also review such contractual and minor violation cases when determining whether AI learning constitutes copyright infringement or unfair competition.

As AI technology evolves, the business scope of AI models will expand, making conflicts with copyright law inevitable. The cases of domestic broadcasters will be critical in determining the direction of the domestic AI industry and copyright protection policies. However, since this area lacks clear precedents, courts are likely to closely monitor and refer to the progress of ongoing lawsuits abroad. In the future, there will likely be a process to find a balance that protects creators' rights while not hindering AI innovation on a systemic level.


Lawyer Shin Jun-seon has extensive experience in consulting and resolving disputes regarding various copyright legal issues. Should you require legal advice, please feel free to contact.


Related work cases that are good to see together

403 Teheran-ro, Gangnam-gu, Seoul, Rich Tower, 7th floor

Tel. 02-6959-9936

Fax. 02-6959-9967

cheongchul@cheongchul.com

Privacy Policy

Disclaimer

© 2025. Cheongchul. All rights reserved

403 Teheran-ro, Gangnam-gu, Seoul, Rich Tower, 7th floor

Tel. 02-6959-9936

Fax. 02-6959-9967

cheongchul@cheongchul.com

Privacy Policy

Disclaimer

© 2025. Cheongchul. All rights reserved

403 Teheran-ro, Gangnam-gu, Seoul, Rich Tower, 7th floor

Tel. 02-6959-9936

Fax. 02-6959-9967

cheongchul@cheongchul.com

Privacy Policy

Disclaimer

© 2025. Cheongchul. All rights reserved