L ooking back three years at the 2017 NAB Show, one of the notable new technology trends was the influx of artificial intelligence (AI) applications and services. Perhaps the word “appeared” is understating what happened—IBM showed up with a 12-foot diameter, supercomputer-esque display of Watson and the AI services that it (or he?) provided. Microsoft and Quantum also devoted show space to this new AI technology. The takeaway was that there were AI engines—learning machines—brains—that were hungry for video content. Few knew what the implications were at the time.
It was clear that cloud-based AI was outpacing network bandwidth and cloud service capabilities to analyze the vast content archives that existed in on-premise storage at the major media companies. Thousands of hours of material were sitting in LTO tape libraries, object storage systems, and disk arrays. Companies with content libraries found the concept of AI-based metadata creation quite compelling, but weren’t ready to undertake the massive data migration and retrieval needed to process the content in the cloud—at the time, the only option for AI engines that lived there.
The Value of AI
A better title for this section might be the value of machines. From the discovery of the lever, the inclined plane, and the wedge, we have taken advantage of machines to amplify human capabilities and reduce the work associated with human labor. For most of the history of film and video, the process of adding intelligence to the flow of images that make up “media” was a thankless, human effort. From the earliest chronological logs identifying the feet and frames of film from one edit to another, to shot logging multiple camera shoots to expedite editing, to the modern tagging of content for later retrieval and reuse, we have had to dedicate human energy and intelligence to the process of understanding and remembering what we just created.
AI, in the context of video production, gave us a “machine” that could mechanically and electronically “look” at every frame. It gave that machine the “intelligence” to recognize the arrangement of pixels in the image as something meaningful. And it gave that intelligent machine the ability to “learn” how to be better at recognizing what it was being asked to look for. Each of these three levels—automation, intelligence, and learning—reduce the burden on human activity associated with the task.
For many years, during the “democratization” of content driven by the Internet and personal video networks like YouTube, Netflix, and Cognicast (a late entry in 2019), the return on investment of having actual human beings examine every frame of video for valuable information simply wasn’t worth the cost. Most companies weren’t even entering basic metadata information that would identify place, participants, and personalities, let alone revenue-generating information like brand appearances and celebrity associations.
What Are You Doing About It?
If you’ve read this far, we must assume that you are a content owner, and you’re not yet capitalizing on the value of the metadata in your content. You are, in essence, a mine owner who is not mining. Your first question should be, “Do I have ore in my mine?”
If you have content that involves live events and real-time footage, the answer is almost certainly yes. This includes news, sports, corporate activities, college and university events, even house of worship activities. Here’s one way to look at it. Have you ever been asked for content that includes:
- A person
- A statement by a person
- The sentiment of the person or the audience
- A location where something happened
- An image of a particular object
- An image of a logo or other commercially identifiable object
- Any combination of the above
There are hundreds, if not thousands of AI engines available to inspect your content and return metadata on a frame-by-frame basis. Even though it may cost a few dollars per hour, this insight represents highly valuable information that you would have paid a person far more per hour to glean from the video, and it would have taken much longer. Think about it this way, the public relations department of your company wants to know about any video clips in the archives where your CEO mentions the word “innovation.” This task can quickly turn from “impossible” when approached manually to “how soon do you need it” using AI engines.
I Can’t Send Everything to the Cloud
You’re right, you can’t. Even with today’s ubiquitous multi-hundred-bit connectivity, network performance still lags demand. Performing an AI task on all the content you have in primary storage and archives by moving it to the cloud for evaluation would be impossibly time-consuming and costly. The innovations introduced back in 2017 around moving the AI engines to the content rather than the content to the AI broke the mold. This allowed for the cost-effective use of multiple AI engines—even stacked on the same job—to be “called down” from the cloud and applied to any entire library of content with no content movement. As it turns out, this proved to be the breakthrough that the media industry was looking for to take advantage of AI at scale. The most notable alliance, that of Quantum and Veritone, provided content owners with access to AI engines without moving data. This not only increased the speed of analysis, but also reduced the cost. Shortly after the availability of the Quantum and Veritone solution, content owners were running machines against petabytes of content and monetizing video material that had previously been lying dormant in archives—even on tape.
For More Information
For additional “historical” information about the use of Veritone AI engine orchestration with Quantum’s StorNext® infrastructure, click here.