Cloud Blog: Expanding Vertex AI with the next wave of generative AI media models

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/announcing-veo-3-imagen-4-and-lyria-2-on-vertex-ai/
Source: Cloud Blog
Title: Expanding Vertex AI with the next wave of generative AI media models

Feedly Summary: Today, we are introducing the next wave of generative AI media models on Vertex AI: Imagen 4, Veo 3, and Lyria 2. 
We’ve already seen customers generate stunning, photorealistic images with Imagen 3, Google’s image generation model. Customers have taken these images and transformed them into high quality videos and assets with Veo 2. We’ve even seen customers take these remarkable videos and bring them to life with professional-grade audio using Lyria, Google’s advanced AI music generation model.
With a surge of momentum in the generative AI media space across marketing, media, and more, storytelling has never been easier.  Users are creating campaign assets quicker, and building breakthrough creative content. Let’s take a look into each model and the ways you can get started today. 
Imagen 4: Higher quality image generation
Today we’re introducing Imagen 4 text-to-image generation on Vertex AI in public preview. As Google’s highest quality image generation model, Imagen 4 delivers:

Outstanding text rendering and prompt adherence 

Higher overall image quality across all styles

Multilingual prompt support to help creators globally

Prompt: Capture an intimate close-up bathed in warm, soft, late-afternoon sunlight filtering into a quintessential 1960s kitchen. The focal point is a charmingly designed vintage package of all-purpose flour, resting invitingly on a speckled Formica countertop. The packaging itself evokes pure nostalgia: perhaps thick, slightly textured paper in a warm cream tone, adorned with simple, bold typography (a friendly serif or script) in classic red and blue “ALL-PURPOSE FLOUR”, featuring a delightful illustration like a stylized sheaf of wheat or a cheerful baker character. In smaller bold print at the bottom of the package: “NET WT 5 LBS (80 OZ) 2.27kg”. Focus sharply on the package details – the slightly soft edges of the paper bag, the texture of the vintage printing, the inviting “All-Purpose Flour" text. Subtle hints of the 1960s kitchen frame the shot – the chrome edge of the counter gleaming softly, a blurred glimpse of a pastel yellow ceramic tile backsplash, or the corner of a vintage metal canister set just out of focus. The shallow depth of field keeps attention locked on the beautifully designed package, creating an aesthetic rich in warmth, authenticity, and nostalgic appeal.

Prompt: This four-panel comic strip uses a charming, deliberately pixelated art style reminiscent of classic 8-bit video games, featuring simple shapes and a limited, bright color palette dominated by greens, blues, browns, and the dinosaur’s iconic grey/black. The setting is a stylized pixel beach. Panel one shows the familiar Google Chrome T-Rex dinosaur, complete with its characteristic pixelated form, wearing tiny pixel sunglasses and lounging on a pixelated beach towel under a blocky yellow sun. Pixelated palm trees sway gently in the background against a blue pixel sky. A caption box with pixelated font reads, "Even error messages need a vacation." Panel two is a close-up of the T-Rex attempting to build a pixel sandcastle. It awkwardly pats a mound of brown pixels with its tiny pixel arms, looking focused. Small pixelated shells dot the sand around it. Panel three depicts the T-Rex joyfully hopping over a series of pixelated cacti planted near the beach, mimicking its game obstacle avoidance. Small "Boing! Boing!" sound effect text appears in a blocky font above each jump. A pixelated crab watches from the side, waving its pixel claw. The final panel shows the T-Rex floating peacefully on its back in the blocky blue pixel water, sunglasses still on, with a contented expression. A small thought bubble above it contains pixelated "Zzz…" indicating relaxation.

Prompt: Filmed cinematically from the driver’s seat, offering a clear profile view of the young passenger on the front seat with striking red hair. Her gaze is fixed ahead, concentrated on navigating the dusty, lonely highway visible through her side window, which shows a blurred expanse of dry earth and perhaps distant, hazy mountains. Her arm rests on the window ledge or steering wheel. The shot includes part of the aged truck interior beside her – the door panel, maybe a glimpse of the worn seat fabric. The lighting could be late afternoon sun, casting long shadows and warm highlights across her face and the truck’s interior. This angle emphasizes her individual presence and contemplative state within the vast, empty landscape.
To get started with Imagen 4 in public preview on Vertex AI, you can use Media Studio or run the following code sample, which uses the Google Gen AI SDK for Python.

code_block
)])]>

Veo 3: Higher-quality video generation with audio and speech
Veo 3 is our latest state-of-the art video generation model from Google DeepMind. With Veo 3, you can generate videos with:

Improved quality when generating videos from text and image prompts 

Speech, such as dialogue and voice-overs

Audio, such as music and sound effects

Here’s what a few of our customers have to say about productivity and creative gains with Veo: 
Klarna, a leader in digital payments, is leveraging Veo and Imagen on Vertex AI to boost content creation efficiency. From b-roll to YouTube bumpers, the company is significantly reducing production timelines.
“At Klarna, we’re constantly exploring ways to push the boundaries of innovation in our marketing efforts, and Veo has been a game-changer in our creative workflows. With Veo and Imagen, we’ve transformed what used to be time-intensive production processes into quick, efficient tasks that allow us to scale content creation rapidly. Whether it’s producing engaging b-roll, crafting eye-catching YouTube bumpers, or developing dynamic social media animations, these tools have empowered our teams to be more agile and creative. The results speak for themselves, driving increased engagement and content performance. With Google Cloud, we’re laying the groundwork for the future of commerce and revolutionizing how we bring our brand to life.” – David Sandström, Chief Marketing Officer, Klarna
Jellyfish, a renowned digital marketing company within The Brandtech Group, has integrated Veo into their top performing AI marketing platform, Pencil, and teamed up with Japan Airlines to offer AI generated in-flight entertainment.

Japan Airlines transforms in-flight entertainment with Google’s latest GenAI models – Jellyfish

"The addition of Veo 2 in Pencil reinforces our commitment to empowering marketers with sophisticated AI, enabling them to produce campaigns that are not only smarter and faster but also bolder and more artistically inspired. Our pilots have shown incredible results, with an average 50% reduction in costs and time-to-market efficiencies. This step change in control and quality turns previously impossible ideas into real marketing content in minutes. Japan Airlines is leading the way in applying Gen AI to the travel industry, and we’re excited to see how other brands follow suit." – David Jones, Founder & CEO, Brandtech
Kraft Heinz’s Tastemaker platform empowers their teams with access to Imagen and Veo, dramatically accelerating creative and campaign development processes. 
"With Veo and Imagen on Vertex AI as part of our Tastemaker platform, Kraft Heinz has unlocked unprecedented speed and efficiency in our creative workflows. What once took us eight weeks is now only taking eight hours, resulting in substantial cost savings.” – Justin Thomas, Head Digital Experience & Growth
Envato, a global leader for digital creative assets and templates, used Veo 2 to develop their newly launched video generation feature, VideoGen, to enable creative professionals to turn text or images into hyper realistic and cinematic video content. 
“We’ve tried many of the top video models, and Veo 2 has driven the most impressive results in terms of speed and quality across a diverse set of text and image inputs. Within the first few days of launch, tens of thousands of Envato subscribers were already accessing VideoGen, with nearly 60% of their generated videos being downloaded for use in creative projects. Since March, Envato has seen VideoGen usage surpass 100%+ month over month. It’s been a pleasure working with Google Cloud to bring Envato’s VideoGen feature to life with Veo.” said Aaron Rutley, Head of Product for AI at Envato.
See how it works: Veo 3 is capable of handling intricate prompt details, as demonstrated in the following examples.

Lost Island

Prompt: A medium shot, historical adventure setting: Warm lamplight illuminates a cartographer in a cluttered study, poring over an ancient, sprawling map spread across a large table. Cartographer: "According to this old sea chart, the lost island isn’t myth! We must prepare an expedition immediately!"

Purple Door

Prompt: A low-angle shot shows an open, light purple door leading from a room with light purple walls and a gray floor to a vibrant outdoor scene. Lush green grass and wildflowers spill from the doorway onto the indoor floor, creating a whimsical transition between spaces. Beyond the door, rolling green hills dotted with more wildflowers stretch towards a bright, clear sky. A single tree stands prominently in the foreground of the outdoor scene, its leaves adding depth to the view. The sunlight and natural elements contrast with the simplicity of the indoor space, inviting a sense of wonder and escape.
Veo 3 is in private preview on Vertex AI and will be available more broadly in the coming weeks. If you’re interested in early access, please fill out this form. 
Lyria 2: Greater creative control with music generation
At Google Cloud Next 2025, we announced Lyria in Vertex AI, Google’s text-to-music model. Today, we’re announcing Lyria 2 is generally available in Vertex AI. As Google’s latest music generation model, Lyria 2 features high-fidelity music across a range of styles. As your next creative collaborator, Lyria 2 provides:

High-quality audio content from text prompts

Greater creative control over instruments, BPM, and other characteristics

To start creating content with Lyria 2, check out Media Studio on Vertex AI. Once there, you can start generating music from text prompts or access the model API via Vertex AI. For inspiration, check out some of the music clips and prompts below.

Creating music with Lyria 2: Psychedelic Cumbia

Prompt: Upbeat, Rhythmic Peruvian Cumbia with a psychedelic edge, LA, Live performance at a Latin music Festival, incorporating electric guitars, bass, and often utilizing a prominent timbales percussion section, creating a powerful and danceable vibe. Vibrant and energetic.

Creating music with Lyria 2: Epic Orchestral

Prompt: Sweeping Orchestral Film Score, Pristine Studio recording, London, 100-piece Orchestra, Majestic and profound. A blend of soaring melodies, dramatic harmonic shifts, and powerful percussive elements, with instruments such as french horns, strings, and timpani, and a thematic approach, featuring intricate orchestrations, dynamic range, and emotional depth, evoking a cinematic and awe-inspiring atmosphere.
See what some of our customers have to say about Lyria 2 so far: 
Captions is an AI-powered video creation tool that allows users to create studio-grade talking videos quickly and easily. They have integrated Lyria 2 into their Mirage Edit feature enabling customers to quickly generate complete videos with customized sound.  
“At Captions, our Mirage Edit feature already gives subscribers the power to go from prompt to fully-edited AI talking video — complete with images, B-roll clips, voiceovers, and transitions. Now, we’re adding a keystone element: adaptive music powered by Google’s Lyria 2. With a single prompt, Lyria composes a score that syncs to the script, pacing, and transitions at every emotional beat, so our customers can publish cinematic short-form videos without ever leaving Captions or shuffling through stock libraries.” said Dwight Churchill, Co-Founder and COO, Captions.ai
Dashverse, owner of digital content platforms such as Dashtoon and DashReels, is leveraging Google’s Lyria 2 on Vertex AI to provide the next generation of AI-native creators with advanced music generation capabilities. This integration allows users to craft dynamic and emotionally responsive soundtracks that seamlessly adapt to the narrative and pacing of their content on platforms like DashReels. 
“We’ve always believed in empowering everyday creators at Dashverse — whether they’re making comics with Dashtoon or short dramas on DashReels. Our move into dynamic, emotionally resonant storytelling with DashReels needed a music engine that was just as expressive and responsive. Lyria 2 on Vertex AI delivers exactly that. It gives our users studio-level control over music — adapting to emotion, scene, and pacing — without the overhead. It’s not just a soundtrack generator; it’s a storytelling amplifier. We’re incredibly excited about what this unlocks for the next generation of AI-native creators.” said Soumyadeep Mukherjee, CTO, Dashverse

aside_block
<ListValue: [StructValue([(‘title’, ‘$300 in free credit to try Google Cloud AI and ML’), (‘body’, <wagtail.rich_text.RichText object at 0x3e675afa15e0>), (‘btn_text’, ‘Start building for free’), (‘href’, ‘http://console.cloud.google.com/freetrial?redirectPath=/vertex-ai/’), (‘image’, None)])]>

Create securely and share responsibly 
The security and safety of any AI generated content is crucial. Therefore, these models are designed with built in safeguards, allowing you to concentrate on your creative work. Veo 3, Imagen 4, and Lyria 2 are all built with safety as a fundamental design principle in partnership with Google DeepMind.
Watermarking: By default, all creations generated with Veo, Imagen, and Lyria utilize SynthID, a technology that embeds an invisible watermark directly into the generated output. This watermark allows for the identification of AI generated media, ensuring transparency. 
Safety filters: Both input prompts and output content for all generative AI media models are accessed against a list of safety filters. By being able to configure how aggressively the content is filtered, you can ensure the assets meet your brand values. In visual output data, you also have control over person generation. 
Get started 
You can learn more about these new models by checking out the resources below: 

Imagen documentation

Veo documentation

Lyria documentation

AI Summary and Description: Yes

**Summary:** The text details the introduction of three new generative AI media models available on Google Cloud’s Vertex AI: Imagen 4, Veo 3, and Lyria 2. These models enhance capabilities in image, video, and music generation respectively, showcasing innovative features that significantly improve content creation in various industries. The introduction emphasizes built-in security and safety features, critical for professionals in AI and cloud security.

**Detailed Description:**

The text introduces significant advancements in generative AI technology from Google on its Vertex AI platform. This development is particularly relevant to professionals in the fields of AI and cloud computing, as it highlights the potential for enhanced creative workflows through the use of AI.

Key Models Introduced:
– **Imagen 4:**
– Positioned as the highest quality image generation model.
– Features include:
– Outstanding text rendering and adherence to prompts.
– Higher image quality across various styles.
– Multilingual prompt support for global creators.

– **Veo 3:**
– A state-of-the-art video generation model.
– Capabilities include:
– Improved quality in generating videos from text and image prompts.
– Options for speech and voice-overs.
– Ability to integrate audio elements like music and sound effects.
– Notable customer feedback indicates significant productivity improvements and decreased production timelines.

– **Lyria 2:**
– This is an advanced music generation model.
– It offers:
– High-fidelity music generation from text prompts.
– Enhanced creative control over musical characteristics.
– Integrated by companies such as Captions.ai, which have used it to create dynamic soundtracks that sync with visual content.

**Security and Compliance Features:**
– All three models incorporate critical security principles including:
– **Watermarking via SynthID:** An invisible watermark that identifies AI-generated media, ensuring transparency and traceability.
– **Safety Filters:** Mechanisms to filter content based on safety standards, allowing for brand alignment and responsible usage of generated assets.

**Customer Success Stories:**
– Various companies, including Klarna, Kraft Heinz, and Japan Airlines, share how these models have transformed their content creation processes, offering notable reductions in time and cost while enhancing creativity and engagement.

In conclusion, the introduction of Imagen 4, Veo 3, and Lyria 2 on Vertex AI not only marks a significant leap in generative AI capabilities but also emphasizes the importance of security and ethical use in AI-generated content. This is essential for professionals managing security, privacy, and compliance in AI and cloud services.