Algorithm Predicts What Happens Next in a Photo And Makes It Into a Video

—

Using a deep learning algorithm, MIT’s Carl Vondrick, Hamed Pirsiavash, and Antonio Torralba recently generated one second of predictive video based on a single still frame.

Called Scene Dynamics, the software has been taught with roughly two million unlabeled videos. After being fed a new image, the system runs two competing neural networks. The first generates the predictive video while the second discerns if the videos are real or fake. Beyond predicting an impressive number of frames based on assumed motion, the algorithm also classifies the specific action occurring. While clearly not perfect, the results are impressive already.

It’s notable the software learned from unlabeled videos. Deep learning programs are usually fed masses of meticulously labeled data (images, for example). This takes a lot of time and effort and limits learning to tailored experiences. The researchers hope their work will advance less laborious “unsupervised learning,” reducing the need for special data sets and allowing machines to learn from messier information.

Also, this isn’t the only project with the goal of predictive video.

Visual Dynamics is a similar project (also out of MIT) working to generate new frames of predictive video per source frame. The difference? Visual Dynamics predicts short snippets of what may theoretically happen next, while Scene Dynamics creates entirely new longer sequences of video that didn’t exist before. Also, Scene Dynamics can separate background from subjects and generate new content for each.

Predictive video from stills has a variety of immediate applications, most notably creating video “out of thin air.” And there might even be room for more creative endeavors down the road.

Don’t like ads? Become a supporter and enjoy The Good Men Project ad free

“I sort of fantasize about a machine creating a short movie or TV show,” lead author Carl Vondrick told Motherboard. “We’re generating just one second of video, but as we start scaling up maybe it can generate a few minutes of video where it actually tells a coherent story. We’re not near being able to do that, but I think we’re taking a first step.”

Beyond video creation, similar motion prediction capabilities might be integrated into computer vision systems, allowing robots to better guess how people and objects in front of them will move. Such powers might help them avoid damaging themselves or hurting others around them.

More speculatively, if software like this can predict motion, what else might it be trained to predict?

One possible use in the future could be predicting what blurry or distorted pixels in videos should look like if sharpened. Low-resolution, compressed, or artifact-laden video would then be automatically upgraded to high resolution.

According to the researchers, they also see use-cases for improved security tactics and self-driving technology. But the dark side of multimedia manipulation is clear too. We may eventually see it power propaganda or generate falsified evidence (assuming fakeness can’t be easily detected).

Thankfully, we still have quite a way to go before this concern is valid. But for better or worse, as media manipulation becomes more flexible and widespread, video as a medium will shift into something more fluid than static. Ultimately, how such technology is used will depend on the motivation of each user.

The code is already available on GitHub if anyone wants to start playing around today. And the original video data set is also available on the Scene Dynamics website.
—

—

This post was previously published on Singularity Hub and is republished here under a Creative Commons license CC BY-ND 4.0.

—
—

What’s your take? Comment below or write a response and submit to us your own point of view or reaction here at the red box, below, which links to our submissions portal.

◊♦◊

Sign up for our Writing Prompts email to receive writing inspiration in your inbox twice per week.

If you believe in the work we are doing here at The Good Men Project, please join us as a Premium Member, today.

All Premium Members get to view The Good Men Project with NO ADS.

A $50 annual membership gives you an all-access pass. You can be a part of every call, group, class, and community.
A $25 annual membership gives you access to one class, one Social Interest group, and our online communities.
A $12 annual membership gives you access to our Friday calls with the publisher, our online community.

Register New Account

Username

First Name

Last Name

Password

Password Again

Choose Your Payment Method

Auto Renew

Don’t like ads? Become a supporter and enjoy The Good Men Project ad free

Need more info? A complete list of benefits is here.

◊♦◊

Get the best stories from The Good Men Project delivered straight to your inbox, here.

Photo credit: Shutterstock

Get Daily Email

Become a Premium Member

Algorithm Predicts What Happens Next in a Photo And Makes It Into a Video

Imagine if your favorite picture could automatically be converted into a short video and labeled. Sound like a fantasy? Maybe not for much longer.

What’s your take? Comment below or write a response and submit to us your own point of view or reaction here at the red box, below, which links to our submissions portal.

If you believe in the work we are doing here at The Good Men Project, please join us as a Premium Member, today.

Register New Account

About Singularity Hub

Username
Password

	Remember Me Lost your password?