Meta rolled out its “Movie Gen” AI suite on October 4, which can generate photorealistic movies up to 16 seconds long, including sound effects and music.
Though not the first AI model to generate video and audio from text prompts, Movie Gen showcases the latest technological advancements.
Per the developer, the model outperformed its competitors in tests that human participants conducted.
Meta’s blog post reveals that Movie Gen can output films as long as 16 seconds, with a frame rate of 16 frames per second.
Historically, filmmakers shot Hollywood films before digital technology at 24 FPS to produce the “film look.”
Higher frames per second (FPS) rates are typically favored in gaming and graphical applications for smoother motion and enhanced visual clarity.
However, Meta’s 16 FPS offering is not far below what professionals consider acceptable for quality movie imagery.
This indicates that the technology can deliver visually appealing content, even if it doesn’t meet the high FPS standards of video games.
Meta’s new models could revolutionize content creation, allowing for the production of entire movies from simple text prompts or the editing of existing visuals.
This includes replacing or adjusting objects and backgrounds, granting creators unprecedented flexibility and creative control.
Perhaps the most notable innovation in Meta’s AI suite is its ability to generate up to 45 seconds of synchronized audio, complete with sound effects and background music.
Read also: Mark Zuckerberg announced Meta is developing wearables that read brain signals to control computers.
Meta claims its Movie Gen technology seamlessly aligns this audio with the motion in generated videos, offering a cohesive and immersive experience.
The foundational models that support Movie Gen remain under wraps by Meta for the moment.
According to the company, it has not set a timeline for the product’s launch, as it requires further safety testing before the introduction.
As stated in a research paper by the AI team at Meta:
“The Movie Gen cast of foundation models were developed for research purposes and need multiple improvements before deploying them … when we do deploy these models, we will incorporate safety models that can reject input prompts or generations that violate our policies to prevent misuse.”