AUTHENTICATION IN THE AGE OF SYNTHETIC MEDIA: THE CASE OF OPENAI’S SORA ANNOUNCEMENT

Author: Bertram Lyons, CEO & Co-founder at Medex Forensics

Yesterday, OpenAI demonstrated the capabilities of its new text-to-video model, Sora. The advances in AI technology to create realistic imagery will not cease. There’s nothing inherently bad about creative imagery. As a society we loved CGI in cinema, and we enjoy the creative opportunities that tools from companies like Adobe have made available to us for years in the digital realm.

The challenge that Generative AI brings, in my opinion, is the ability to understand intent and to differentiate creative output from documentary output as we encounter digital media in the wild, whether on websites, or published and disseminated via social media platforms.

Yesterday, I enjoyed viewing and attempting to visually critique the output from Sora. However, today, as I reviewed the files more carefully wearing my digital forensics hat, I encountered a confusing state of affairs.

On the OpenAI announcement of Sora, there is a disclaimer front and center that reads: All videos on this page were generated directly by Sora without modification.

Screenshot of the disclaimer posted on the OpenAI Sora announcement at https://openai.com/sora. Last accessed 2024-02-16.

I run a digital video forensics company called Medex Forensics. We build technology that supports the investigation of the authenticity of digital video files. OpenAI, kindly, allowed for the download of the videos displayed on their site as part of their big splash announcement. Insatiably curious, I downloaded all of the videos made available on the site (48 videos in total) in order to review them from a forensic perspective to understand their origin. I expected to find, because of the disclaimer above, a homogenous cache of videos that all contained a set of the same internal elements and features from the encoder used by Sora. But that’s not what I found at all.

(Access my high-level analysis notes here.)

What I found was that 26 of the files had been post-processed via Adobe Photoshop / After Effects. From reading the embedded XMP, it is suggested that these were changed and saved, and that a watermark was applied to the files after the fact of their creation.

Some of these were edited on Windows machines to apply, at the least, a watermark. This is suggested because of the presence of file paths such as the following embedded in the XMP within the files themselves:

"G:\Shared drives\OpenAI_Timon_OAICA-L0004__Rafiki_OAICA-L0005\Rafiki_\Rafiki_Production\RF010_watermark\Work\AE\Dogs downtown.aep"

Some of these were edited on MacOS to apply, at the least, a watermark. This is suggested because of the presence of file paths such as the following embedded in the XMP within the files themselves:

 "/Users/adrianmorantv/Library/CloudStorage/[email protected]/Shared drives/OpenAI_Timon_OAICA-L0004__Rafiki_OAICA-L0005/Rafiki_/Rafiki_Production/Work/evan/0212_Rafiki_v03b_EB folder/0212_Rafiki_v03b_EB.aep"

The work of creating these 26 files took place from UTC 2024-02-14 02:51:32 to UTC 2024-02-14 17:20:19 – a fifteen-hour window on February 14th.

However, another cache of 15 files from the full set do not contain the Adobe XMP, nor any internal Adobe structural elements. This doesn’t mean they never passed through an Adobe product, but could mean either they did not receive the same watermark treatment, or that they did receive the watermark treatment but were then re-encoded subsequently by an encoder that does not recognize Adobe internal structural elements and does not recreate them in files that it re-encodes. So, here we have a set of files that now contain a timecode track (not present in the 26 Adobe files above) and that contain Apple-specific internal elements – signaling that these files were encoded on an Apple product. These files have no internal embedded dates to review.

There are 6 files that strangely contain audio tracks in them. And these contain XMP embedded documents that include hundreds of tracked edits to content and metadata. I am not under the impression that Sora can create audio in videos, so why do we find audio tracks (containing silence) only in these six files?

The work of creating these 6 files took place from UTC 2024-02-14 12:03:34 to UTC 2024-02-14 17:36:35 – a five-hour window on February 14th.

Then there is one outlier, as strange as can be, that contains an ExifTool embedded XMP element that leverages a TikTok namespace (i.e., https://business.tiktok.com) to embed three identifiers within the file.

What does it all mean?

Well, I’m not in the business of ascribing intention as an outcome of an analysis. I’m in the business of describing provenance, that is, describing the lifecycle of a digital file. These files from Sora are great and there’s nothing to suggest they were not “generated directly by Sora without modification” from an intentionality perspective. But there’s obviously more to the story of these files. There is evidence of at least four distinct lifecycles across this set of 48 files.

Efforts such as the C2PA schema and Content Credentials from the Content Authenticity Initiative are working tirelessly to establish frameworks that allow orgs such as OpenAI to document the lifecycle of a digital file. OpenAI notes that they do intend to leverage C2PA within their output files at some point in the future. Until then, we are left wondering what happened to these files between creation and publication? How many are actually not modified as OpenAI suggests, and how many were augmented for publication in one way or another?

In a world where synthetic imagery (and audio) from tools like OpenAI’s Sora and Google’s Lumiere may reduce the trust we have in the media we see, it is paramount that claims made by organizations who publish media can be trusted, vetted, and validated. In this case, the claim cannot be validated.

Related Resources

Discover more from Medex Forensics

Subscribe now to keep reading and get access to the full archive.

Continue reading