For several days now, the Internet has been busy dissecting the new and revolutionary prodigy born from the depths of OpenAI : the AI tool for generating videos from "prompts" Sora .
ChatGPT's new invention is an AI model capable of creating videos up to a minute long without using any base image and using only text prompts.
To demonstrate the phenomenal virtues of Sora, Sam Altman , CEO of OpenAI, took a handful of user suggestions on X (formerly Twitter) last Thursday and quickly metamorphosed them into videos that were as realistic as they were rich in detail. It took good old Sam Altman at least 20 minutes to transform the "prompts" suggested by X users into videos.
It is worth keeping in mind, however, that Sora is an experimental AI model and that the speed it shows now will not necessarily be the same as the speed it will show in the future, when it is available to the general public.
Sora is a system capable of simulating aspects of the physical world in motion thanks to its architecture , which amalgamates diffusion technology with a transformer-based engine.
When generating moving images, Sora sources videos in their original resolution and then divides them into smaller sections, or visual patches . To achieve this process, the videos are simplified in the so-called latent space, where the clips used by Sora as raw material are compressed both temporally and spatially.
And it is precisely in this latent space that Sora is trained to generate moving images. OpenAI's tool examines, for example, in this latent space the correct appearance of a drop of water on the ground in a sequence and then uses different spatiotemporal sequences to correctly represent rain in this latent space. "We train a network that reduces the dimensionality of visual data. This network takes raw video as input and generates a latent representation that is compressed both temporally and spatially. Sora is trained and subsequently generates videos within this compressed space. We train a corresponding decoder model that maps the generated latents to pixel space," says OpenAI.
Dissecting the workings of Sora
“By unifying the way we represent data, we train broadcast transformers with a wider range of visual data than previously possible, spanning different durations, resolutions, and aspect ratios,” OpenAI notes.
Dividing videos into smaller sections (or spatiotemporal patches) results in the creation of a three-dimensional grid that is progressively filled with compressed visual data.
An associated decoding model then links the spatiotemporal patches (which are nothing more than small units that group information) to pixel data, which is ultimately used to generate realistic-looking videos.
The diffusion model that Sora relies on is fed with noisy patches (raw image fragments) and is trained to predict what those patches will look like free of any noise. “Sora is a diffusion model. By receiving noisy input patches and conditioning information such as text prompts, it is trained to predict the original clean patches,” OpenAI emphasizes.
Like other great language models, Sora relies on a huge amount of data to train itself. And the videos used for training are processed at their original resolution and ratio. This allows the model to be provided with insights into the physical world and to mimic details present in people, animals and all kinds of real environments with incredible accuracy .
The deeper you go into training the OpenAI tool, the better results Sora is able to deliver. In other words, what Sora is able to achieve is largely due to the progress made in its training.
In this sense, what OpenAI has shown so far is just the tip of the iceberg of a tool that in the future could generate complete films with solid and coherent plots and specific main characters.
The marketing and advertising industry has a lot to gain from Sora (even in its most primitive version)
The seventh art will still have to wait a long time to fully exploit Sora, but the marketing and advertising industry will almost certainly find many practical applications for the first version of this model.
completely new videos but also turn existing images into videos or change certain aspects of a previously canada whatsapp lead recorded clip.
Brands can use photos of their products to create videos, for example. And a bicycle brand could easily turn a still image of its product into a video showing how it works, if it wanted to.
Sora also has the ability to easily replace protagonists in specific environments. In one of the tool's "demos" a drone flies over, for example, Roman ruins and in the next section of the video that drone transforms into a butterfly fluttering over the water.
Taking this example to a marketing context, a diving equipment manufacturer could easily replace a fish in water with a diver to show their products in action and illustrate how they work.
There are a plethora of considerations to consider when creating videos with AI (deepfakes and proper labelling of such material, for example), but tools like Sora make quality video creation accessible to businesses of all kinds and conditions (regardless of their resources). And this ultimately translates into the democratisation of video advertising.
Newsletter
Subscribe to our newsletter!
WhatsApp
Follow MarketingDirecto.com on WhatsApp
Topics
After all, Sora can not only generate
-
- Posts: 39
- Joined: Sat Dec 28, 2024 3:07 am