OpenAI accidentally deleted potential evidence in New York Times copyright lawsuit case-美玉无瑕网

OpenAI may have accidentally deleted important data related to its ongoing copyright lawsuit brought by the New York Times.

First reported by TechCrunch, counsel for theTimes and its co-plaintiff Daily Newssent a letter to the judge overseeing the case, detailing how "an entire week’s worth of its experts' and lawyers' work" was "irretrievably lost." OpenAI had provided the plaintiffs with two dedicated virtual machines for researching alleged instances of copyright infringement. According to the letter, on Nov. 14, "programs and search result data stored on one of the dedicated virtual machines was erased by OpenAI engineers."

SEE ALSO: OpenAI says over 2 million people consulted ChatGPT for the 2024 election

The Timeshas accused OpenAI, and Microsoft which uses OpenAI's models for its Bing AI chatbot, of copyright infringement by training its models on paywalled and unauthorized content. The lawsuit detailed multiple instances of "near-verbatim" copy in ChatGPT responses. OpenAI has refuted this claim, saying their models were trained on publicly available data, and therefore fair use under copyright laws. The case hinges on the Times being able to prove that OpenAI's models copied and used its content without compensation or credit.

Mashable Light Speed Want more out-of-this world tech, space and science stories? Sign up for Mashable's weekly Light Speed newsletter. By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy. Thanks for signing up!

OpenAI was able to recover most of the erased data, but the "folder structure and file names" of the work was unrecoverable, rendering the data unusable. Now, the plaintiffs' counsel must start their evidence gathering from scratch. In the letter, counsel affirmed that there's "no reason to believe [the erasure] was intentional," but also pointed out how "OpenAI is in the best position to search its own datasets." The AI company has avoided sharing any detail about its training data.