Cloud Blog: Enhance viewer engagement with gen AI-powered scene detection for ads

Source URL: https://cloud.google.com/blog/topics/developers-practitioners/use-ai-powered-scene-detection-for-more-effective-ad-placement/
Source: Cloud Blog
Title: Enhance viewer engagement with gen AI-powered scene detection for ads

Feedly Summary: Online video consumption has skyrocketed. A staggering 1.8 billion people globally subscribed to streaming services in 20231, and 92% of internet users worldwide watched online videos every month in 20242. This growth creates a significant opportunity for advertisers who want to reach their customers with great creative, but ineffective ad placement can disrupt their customers’ viewing experiences.
An important way to deliver a better ad experience is seamless ad integration, which means placing ads at natural breaks in video content to avoid interrupting the narrative flow. Scene change detection technology identifies these natural breaks by analyzing a video’s visual, audio, and textual elements. Google’s AI models such as Gemini offer a win-win for viewers and advertisers:

Increased viewer engagement: Seamless ad integration minimizes disruption and enhances the viewing experience.

Higher ad revenue: More relevant ads lead to better click-through rates and increased advertiser ROI.

Simplified workflows: Google Cloud’s Vertex AI platform streamlines the entire video monetization process, from scene detection to ad placement.

To help you maximize the potential of your ad inventory, we’ll share how Google Cloud’s generative AI revolutionizes scene detection, leading to more effective ad placement, improved reach, higher viewer engagement, and ultimately, increased revenue for publishers.

aside_block
), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

The challenges of traditional ad break detection 
Traditional ad break detection methods, designed primarily for structured television content with fade-outs and fixed commercial breaks, often struggle to identify ideal ad placement points in today’s diverse video landscape. These methods—including shot boundary detection, motion analysis, audio analysis, and rule-based systems—can miss subtle transitions, misinterpret rapid movement, operate independently of visual context, lack flexibility, and rely on manual tagging. This is where Google’s Gemini models can help.
Intelligent scene detection with Google’s Gemini models
Gemini’s multimodal capabilities can analyze video, audio, and text simultaneously, enabling a level of nuanced scene understanding that was previously impossible. Now, we can ask Gemini to understand the nuances of video content and generate very granular contextual metadata, unlocking capabilities that were previously impossible to achieve efficiently.
Here are some examples of how Gemini identifies ad breaks and provides detailed contextual metadata:

Ad Break Example

Transition Feeling

Transition Type

Narrative Type

Prior Scene Summary

Daytime to Evening Dinner

Cheerful, relaxed

Outdoor to indoor

Scene transition from plot to end

A group of friends enjoying dinner at a restaurant.

End of Tense Dialogue Scene

Tense, dramatic

Fade-out

Scene of rising conflict

Two characters arguing over a specific issue.

Busy Street to Quiet Cafe

Neutral

Hard cut, outdoor to indoor

Scene transition

A character walking along a busy street.

This enriched metadata allows for the precise matching of the right ad to the right user at the right time. For example, the first ad break (Daytime to Evening Dinner), with its associated sentiment of “cheerful and relaxed," might be ideal for advertisements that resonate with those feelings such as travel, entertainment or leisure products, rather than just a product like cookware. By understanding not just the basic context, but also the emotional tone of a scene, Gemini facilitates a new level of contextual advertising that is far more engaging for the viewer.

Image 1 – Sample of detected scene change with corresponding metadata from Ep12 Pororo – Pretty, The Great Storyteller

Proof point: The Google Cloud architecture 
Google Cloud, powered with the Gemini 1.5 Pro model, delivers a robust and scalable solution for intelligent ad break detection. Its multimodal analysis capabilities simultaneously process video, audio, and text to detect even subtle transitions, enabling seamless ad integration. Gemini’s ability to process up to 2 million tokens ensures comprehensive analysis of long videos across diverse genres with minimal retraining, offering versatility for media providers. This large context window allows the model to analyze approximately 2 hours of video and audio content in a single pass, which significantly reduces processing time and complexity compared to methods that require breaking videos into smaller chunks.
The architecture ensures high performance and reliability through these key stages:

Image 2 – Architecture diagram for the scene change detection

1. Video Ingestion and Storage (GCS): Videos are ingested and stored in Google Cloud Storage (GCS), a highly scalable and durable object storage service offering various storage classes to optimize cost and performance. GCS ensures high availability and accessibility for processing.  Robust security measures, including Identity and Access Management (IAM) roles and fine-grained access controls, are in place.
2. Orchestration and simultaneous processing (Vertex AI pipelines & Gemini): Vertex AI pipelines orchestrate the end-to-end video analysis process, ensuring seamless execution of each stage. Vertex AI manages simultaneous processing of multiple videos using Google Gemini’s multimodal analysis, significantly accelerating the workflow while maintaining scalability. This includes built-in safety filters powered by Gemini, which perform a nuanced contextual analysis of video, audio, and text to discern potentially inappropriate content. The results are returned in JSON format, detailing scene change timestamps, video metadata, and contextual insights.
Post-processing is then applied to the JSON output to structure the data in a tabular format, ensuring compatibility with downstream storage and analysis tools. This includes:

Standardizing timestamps: Ensuring uniform time formats for consistent querying and integration.

Metadata mapping:  Beyond basic metadata extraction, this stage includes the classification of scenes (or entire video programs) into industry standard taxonomies, such as the IAB’s, or  the customer’s own custom taxonomies. This allows for more granular organization of video content based on their type and provides an easier method of ad targeting.

Error handling and data validation: Filtering out incomplete or invalid entries to maintain data quality.

3. Structured data storage and enrichment (BigQuery): The structured data resulting from Gemini’s scene change detection analysis, including timestamps, metadata, and contextual insights, is stored in BigQuery. BigQuery ML can leverage this integrated data to build predictive models for ad placement optimization. For example, you can schedule a 15-second action-themed ad during a scene change in an action sequence, targeting viewers who frequently watch action movies in the evening.
4. Monitoring and logging (GCP operations suite): GCP Operations Suite provides comprehensive monitoring and alerting for the entire pipeline, including real-time visibility into job progress and system health.  This includes detailed logging, automated alerts for failures, and dashboards for key performance indicators.  This proactive approach ensures timely issue resolution and maximizes system reliability.
Conclusion:  A win-win for viewers and advertisers
Ready to transform your video ad strategy? Learn more about Google Cloud, Gemini and BigQuery.For developers looking to get hands-on experience, you can also explore this notebook detailing how to use the Gemini API for video analysis

1. Statista. (2024). Online video viewers worldwide quarterly.2. Exploding Topics. (2024). 50+ video streaming stats: Key trends in 2024.

AI Summary and Description: Yes

**Short Summary with Insight:**
The rapid growth of online video consumption presents both challenges and opportunities for advertisers seeking effective ad placement. Google Cloud’s advanced AI technology, specifically the Gemini model, offers innovative solutions for seamless ad integration through intelligent scene detection, enhancing viewer engagement and optimizing ad revenue.

**Detailed Description:**
This text emphasizes the significant rise in online video consumption, which has become a critical arena for advertising strategies. The advancements in AI, particularly with Google Cloud’s Gemini, introduce powerful capabilities for scene detection and ad placement that cater to the evolving landscape of video content consumption.

Key points include:

– **Explosion of Online Video Consumption:**
– Over 1.8 billion global subscriptions to streaming services as of 2023.
– 92% of internet users watch online videos monthly, signifying an expansive audience for ads.

– **Challenges in Ad Placement:**
– Traditional ad break detection struggles with modern, diverse video formats.
– Existing methods may not effectively identify optimal ad placements, leading to potential viewer disruption.

– **Gemini’s Intelligent Scene Detection:**
– Gemini leverages multimodal capabilities (video, audio, text) for nuanced scene understanding.
– Enhanced contextual metadata allows for targeted ad placement that resonates with viewer sentiment.

– **Efficiency and Workflow Improvements:**
– Google Cloud’s Vertex AI streamlines the video monetization process from scene detection to ad integration.
– The architecture includes stages for video ingestion, orchestration of processing, structured data storage, and monitoring, ensuring scalability and reliability.

– **Practical Application of Metadata:**
– Contextual advertising based on emotional tone and scene transitions enhances viewer engagement.
– Specific ad placements can be made by correlating audience sentiments with ad content, improving overall effectiveness and ROI.

– **Scalable Cloud Architecture:**
– Built upon robust Google Cloud services ensuring secure and efficient processing of large volumes of video content.
– Using BigQuery for data analysis, advertisers can optimize placements based on real-time audience behavior.

**Conclusion:**
The insights from Google Cloud’s advancements highlight a transformative approach in video advertising, enabling a win-win situation for both viewers and advertisers. As more businesses look to leverage AI for smart ad placements, understanding these innovative technologies will be crucial for professionals in security, compliance, and ad optimization fields.