Cloud Blog: How inference at the edge unlocks new AI use cases for retailers

Jan 13, 2025

—

Source URL: https://cloud.google.com/blog/topics/retail/ai-for-retailers-boost-roi-without-straining-budget-or-resources/
Source: Cloud Blog
Title: How inference at the edge unlocks new AI use cases for retailers

Feedly Summary: For retailers, making intelligent, data-driven decisions in real-time isn’t an advantage — it’s a necessity. Staying ahead of the curve means embracing AI, but many retailers hesitate to adopt because it’s costly to overhaul their technology. While traditional AI implementations may require significant upfront investments, retailers can leverage existing assets to harness the power of AI.
These assets, ranging from security cameras to point-of-sale systems, can unlock store analytics, faster transactions, staff enablement, loss prevention, and personalization — all without straining the budget. In this post, we’ll explore how inference at the edge, a technique that runs AI-optimized applications on local devices without relying on distant cloud servers, can transform retail assets into powerful tools.

aside_block
), (‘btn_text’, ‘Get started for free’), (‘href’, ‘https://console.cloud.google.com/freetrial?redirectPath=/welcome’), (‘image’, None)])]>

How retailers can build an AI foundation
Retailers can find assets to fuel their AI in all corners of the business. You can unlock employee productivity by transforming your vast repository of handbooks, training materials, and operational procedures into working assets for AI.
Digitized manuals for store equipment, human resources, loss prevention, and domain-specific information can also be combined with agent-based AI assistants to provide contextually aware “next action assistants”. By extending AI optimized applications from the cloud to the edge, retail associates can now ask their AI assistant, “What do I do next?” with a detailed and fast response tailored to the retail associate’s question.
Edge processing power decision point: CPU vs GPU
Next, we’ll explore the critical decision on the right hardware to power your applications. The two primary options are CPUs (Central Processing Units) and GPUs (Graphics Processing Units), each with its own strengths and weaknesses. Making the informed choice requires understanding your specific use cases and balancing performance requirements, bandwidth, and model processing with cost considerations. Consider this chart to guide your decision-making process, especially when choosing between deploying at a regional DC or at the edge.
Decision matrix (chart):

Feature

CPU

GPU

Use cases (examples)

Cost

Lower

Higher

Basic analytics, people counting, simple object detection

Performance

Required; Good for general-purpose tasks

Optional; Good for parallel processing

Complex AI, video analytics, high-resolution image processing, ML model training

Power consumption

Lower

Higher

Remote locations, small form-factor devices

Latency

Moderate

Lower (for parallel tasks)

Real-time applications, immediate insights

Deployment location

Edge or Regional DC

Typically Edge, but feasible in Regional DC

Determined by latency, bandwidth, and data processing needs

Key decision criteria for retail decision makers

Complexity of AI models: Retail use case focused AI models, like basic object detection, can often run efficiently on CPUs. More complex models, such as those used for real-time video analytics or personalized recommendations with large datasets, typically require the parallel processing power of GPUs.

Data volume and velocity: If you’re processing large amounts of data at high speed, a GPU may be necessary to keep up with the demand. For smaller datasets and lower throughput, a CPU may suffice.

Latency requirements: For use cases requiring ultra-low latency, such as real-time fraud detection, GPUs can provide faster processing, especially when located at the edge, closer to the data source. However, network latency between the edge and a regional DC might negate this benefit if the GPU is located regionally.

Budget: GPUs usually have a higher price tag than CPUs. Carefully consider your budget and the potential ROI of investing in GPU-powered solutions before making a decision. Start with CPU-based solutions where possible and upgrade to GPUs only when absolutely necessary.

Power consumption: GPUs generally consume more power than CPUs. This is an important factor to consider for edge deployments, especially in locations with limited power availability. This is less of a concern if deploying at a regional DC where power and cooling are centralized.

Deployment location: The proximity of the processing power to the data source has major implications for latency. Deploying at the edge (in-store) minimizes latency for real-time use cases. Regional DCs introduce network latency, making them less suitable for applications requiring immediate action. However, certain tasks requiring heavy compute but not low latency (e.g., nightly inventory analysis) might be better suited for a regional DC where resources can be pooled and managed centrally.

Remember, not all AI and ML require new investments in emerging technology. Many AI/ML based use cases can produce the desired outcome without using a GPU. For example, consider visual inspection for storage analytics and fast check out referenced in the Google Distributed Cloud Price-a-Tray interactive game. The inference is performed at 5FPS, while the video stream continues to run at 25FPS. The bounding boxes are then drawn on top of the returned information rather than having one system perform the video stream, detection and bounding boxes. This enables more efficient use of the CPU since many of the actions in this example can be split across cores and threads.
But there are cases when GPUs do make sense. When very high precision is required, GPUs are often needed as the drop in fidelity to quantize a model may reduce the quality beyond acceptable thresholds. In the example of tracking an item, if millimeter movement accuracy is required, 5FPS would not be sufficient on a reasonably fast moving item and a GPU would likely be required.

There is a middle between GPUs and CPUs—the world of speciality accelerators. Accelerators come in the form of peripherals to a system or as special instruction sets to a CPU. CPUs are being manufactured with advanced matrix multiplication math assisting tensor manipulation on-chip, greatly improving performance of ML and AI models. One concrete example is running models compiled for OpenVINO. In addition, Google Distributed Cloud (GDC) Server and Rack editions utilize Intel Core processors, an architecture designed to be more flexible, supporting matrix math improving the performance of ML models on CPU over traditional ML model service serving.
Bring AI to your business
By tapping into the power of existing infrastructure and deploying AI at the edge, retailers can deliver modern customer experiences, streamline operations, and unlock employee productivity.
Learn more about how to transform your retail brand with Google Distributed Cloud.

AI Summary and Description: Yes

Summary: The text discusses how retailers can leverage existing assets, including edge computing and AI technologies, to optimize operations, employee productivity, and customer experiences without substantial new investments.

Detailed Description: The content explores several major points regarding the use of AI in retail, emphasizing the following insights:

– **Edge Computing**: It highlights the importance of inference at the edge, allowing AI applications to run on local devices. This reduces the need for constant connectivity to distant cloud servers and minimizes latency in decision-making processes.

– **Existing Assets**: Retailers can utilize current technologies such as security cameras and point-of-sale systems to gather analytics, enhance loss prevention, and personalize customer experiences, thereby maximizing their existing investments.

– **AI-Optimized Applications**: The article discusses the implementation of AI assistant tools that can help employees perform their tasks more efficiently by providing real-time, context-aware responses tailored to specific queries.

– **Hardware Choices**: It outlines the decision-making process involved in selecting the right hardware (CPU vs. GPU) for different AI applications, stressing the significance of understanding use cases, cost implications, and performance requirements.

– **Key Decision Criteria**: Emphasizes important factors for retail decision-makers:
– **Complexity of AI Models**: Identifying whether a CPU is adequate for simpler tasks or if a GPU is necessary for complex processing.
– **Data Volume and Velocity**: Recognizing when high-speed data processing requires GPU support.
– **Latency Requirements**: Considering deployment location’s impact on application performance.
– **Budget Considerations**: Weighing the cost of GPU investments against expected return on investment.
– **Power Consumption**: Acknowledging the energy demands of different hardware solutions, especially in edge deployments.

– **Specialty Accelerators**: Introduces advanced processor capabilities designed for machine learning, which improve efficiency outside of traditional CPU/LGPU frameworks.

– **Call to Action for Retailers**: Encourages retailers to adopt AI and leverage their existing infrastructure to enhance customer interactions and streamline operations.

By focusing on the emerging trends in AI implementation and hardware optimization, this content is particularly relevant to professionals in the realms of AI, cloud computing, and infrastructure security, indicating practical applications and avenues for improvement within the retail sector.

2 5 a accelerators accuracy Act agent AI AI applications AI implementation AI models AI technologies analysis analytics anti Application application performance applications Arch architecture architecture design art as assistant assistants availability bandwidth based business by C capabilities chip CIA Cloud cloud computing cloud server cloud servers complexity compute Computing connectivity Console content Context core cost cost considerations cost implications CPUs critical cross Current Customer customer experience customer interactions D data data processing data-driven data-driven decisions dataset datasets de decision decision-making Decision-making Processes decisions deployment design detection domain driven driven decisions e edge edge computing efficiency efficient emerging trends end energy energy demand energy demands Entra event exp fact fast focused for framework frameworks fraud fraud detection full g Gen git Go Google Google Distributed Cloud GPU GPU support GPUs grade graph graphics graphics processing graphics processing units hardware hardware choices high High-Resolution high-speed Highlight HR http HTTPS human image image processing implementation implications in Inference information infrastructure infrastructure security insights Intel inter interaction investment Investments ite k l large large datasets latency learning led line operations lm low low latency mac machine Machine Learning making making processes manipulation math Matrix max media mini ML model model training models Modern multi network network latency next no non NSA o object detection of on one oost open operation opt optimization optimized applications ory over parallel processing performance performance requirements personalization phi porting post Power power consumption powered solutions practical applications pre precision prevention price procedures processing processing power processor processors product productivity professionals question R rack rag Ray RCE real real-time real-time applications Region repository Requirements resolution resources response retail retail sector return on investment right ROI s sec security server servers service side Sig Sim Simple SoC source SSE storage system systems T Task tasks tech technologies technology text the Threads throughput Time time applications to tool tools Tor TP tracking training trends trial two Ultra up upgrade US use use cases video Wi x