Simon Willison’s Weblog: Meta AI release Llama 3.3

Source URL: https://simonwillison.net/2024/Dec/6/llama-33/#atom-everything
Source: Simon Willison’s Weblog
Title: Meta AI release Llama 3.3

Feedly Summary: Meta AI release Llama 3.3
This new Llama-3.3-70B-Instruct model from Meta AI makes some bold claims:

This model delivers similar performance to Llama 3.1 405B with cost effective inference that’s feasible to run locally on common developer workstations.

I have 64GB of RAM in my M2 MacBook Pro, so I’m looking forward to trying a slightly quantized GGUF of this model to see if I can run it while still leaving some memory free for other applications.
It has 70B parameters, a 128,000 token context length and was trained to support English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.
The model card says that the training data was “A new mix of publicly available online data" – 15 trillion tokens with a December 2023 cut-off.
They used "39.3M GPU hours of computation on H100-80GB (TDP of 700W) type hardware" which they calculate as 11,390 tons CO2eq. I believe that’s equivalent to around 20 fully loaded passenger flights from New York to London (at ~550 tons per flight).
Tags: meta, generative-ai, llama, training-data, ai, edge-llms, llms

AI Summary and Description: Yes

Summary: The release of Meta’s Llama 3.3 model marks a significant advancement in the field of generative AI, providing an efficient model with reduced inference costs suitable for local deployment. With extensive training data and a large parameter count, it offers potential operational advantages and implications for infrastructure and AI security.

Detailed Description: The Llama 3.3 model from Meta AI has made several noteworthy advancements that could influence various aspects of AI deployment and operational efficiency:

– **Performance**: The Llama 3.3 model reportedly delivers comparable performance to the larger Llama 3.1 405B model, but with a focus on cost-effective inference, making it more accessible for local runs on standard developer workstations.

– **Memory Efficiency**: With 70 billion parameters and the capability to handle a 128,000 token context length, the model demonstrates a potential for high-performance processing. The ability to run a slightly quantized version on devices like the M2 MacBook Pro indicates a shift towards user-friendly AI applications for developers without extensive infrastructure resources.

– **Training Data**: The model was trained using an impressive dataset comprising a new mix of publicly available online data, totaling around 15 trillion tokens, with a December 2023 cut-off. This broad data set enhances the model’s versatility in understanding and generating text in multiple languages, including English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai.

– **Environmental Considerations**: The release notes mention the computational resources used—39.3 million GPU hours on H100-80GB hardware—resulting in a substantial carbon footprint measured at 11,390 tons CO2 equivalent. This raises important discussions about the environmental impact of training advanced AI models, echoing broader concerns about sustainability in technology development.

– **Technical Aspects**: The deployment realm is expanded with insights into hardware requirements, where the utilization of high-performance GPU resources exemplifies the infrastructure demands for running such sophisticated AI models.

– **Implications for Security and Compliance**: As organizations adopt such models, considerations surrounding data privacy, software security, and responsible AI usage become increasingly pertinent. Practitioners in AI and infrastructure security should be keenly aware of compliance and governance frameworks that address both the deployment and operation of AI capabilities.

This development underscores the continuous evolution and impact of generative AI technologies, which could significantly shape future security and compliance efforts across the industry.