Source URL: https://simonwillison.net/2025/Jun/3/shisa-v2/
Source: Simon Willison’s Weblog
Title: Shisa V2 405B: Japan’s Highest Performing LLM
Feedly Summary: Shisa V2 405B: Japan’s Highest Performing LLM
Leonard Lin and Adam Lensenmayer have been working on Shisa for a while. They describe their latest release as “Japan’s Highest Performing LLM".
Shisa V2 405B is the highest-performing LLM ever developed in Japan, and surpasses GPT-4 (0603) and GPT-4 Turbo (2024-04-09) in our eval battery. (It also goes toe-to-toe with GPT-4o (2024-11-20) and DeepSeek-V3 (0324) on Japanese MT-Bench!)
This 405B release is a follow-up to the six smaller Shisa v2 models they released back in April, which took a similar approach to DeepSeek-R1 in producing different models that each extended different existing base model from Llama, Qwen, Mistral and Phi-4.
The new 405B model uses Llama 3.1 405B Instruct as a base, and is available under the Llama 3.1 community license.
Shisa is a prominent example of Sovereign AI – the ability for nations to build models that reflect their own language and culture:
We strongly believe that it’s important for homegrown AI to be developed both in Japan (and globally!), and not just for the sake of cultural diversity and linguistic preservation, but also for data privacy and security, geopolitical resilience, and ultimately, independence.
We believe the open-source approach is the only realistic way to achieve sovereignty in AI, not just for Japan, or even for nation states, but for the global community at large.
The accompanying overview report has some fascinating details:
Training the 405B model was extremely difficult. Only three other groups that we know of: Nous Research, Bllossom, and AI2 have published Llama 405B full fine-tunes. […] We implemented every optimization at our disposal including: DeepSpeed ZeRO-3 parameter and activation offloading, gradient accumulation, 8-bit paged optimizer, and sequence parallelism. Even so, the 405B model still barely fit within the H100’s memory limits
In addition to the new model the Shisa team have published shisa-ai/shisa-v2-sharegpt, 180,000 records which they describe as "a best-in-class synthetic dataset, freely available for use to improve the Japanese capabilities of any model. Licensed under Apache 2.0".
An interesting note is that they found that since Shisa out-performs GPT-4 at Japanese that model was no longer able to help with evaluation, so they had to upgrade to GPT-4.1:
Tags: translation, llm-release, evals, generative-ai, llama, ai, llms, fine-tuning, leonard-lin
AI Summary and Description: Yes
Summary: The text details the release of Shisa V2 405B, Japan’s highest-performing LLM, which surpasses existing models like GPT-4 in performance and reflects Japan’s commitment to developing sovereign AI that respects cultural and linguistic diversity while ensuring data privacy and security.
Detailed Description:
– **Introduction of Shisa V2 405B**:
– Launched by Leonard Lin and Adam Lensenmayer, this LLM (Large Language Model) claims to be Japan’s most advanced AI model, outpacing notable models like GPT-4 and DeepSeek-V3.
– The release represents a critical advancement in AI technology within Japan, emphasizing a push for localized development.
– **Model Specifications**:
– Shisa V2 405B utilizes Llama 3.1 as its base model, licensed under the Llama 3.1 community license.
– This version follows earlier smaller models and indicates a strategic effort to create tailored AI solutions for specific language and cultural contexts.
– **Sovereign AI Concept**:
– The authors highlight the importance of homegrown AI initiatives, speaking to cultural diversity, data privacy, and geopolitical resilience.
– There is a strong belief in the necessity of an open-source approach to foster AI sovereignty, not just for Japan but as a model for the global community.
– **Technical Challenges**:
– Training the 405B model involved significant technical challenges, indicating competition among leading AI research groups.
– Advanced techniques such as DeepSpeed ZeRO-3, gradient accumulation, and 8-bit paged optimization were employed to maximize performance within hardware constraints.
– **Dataset Contribution**:
– The Shisa team has also released a dataset of 180,000 records under an Apache 2.0 license intended to enhance Japanese language model capabilities.
– The dataset reflects a commitment to improving generative AI across platforms while promoting accessibility.
– **Performance Insights**:
– Shisa’s notable performance metrics reveal a need to upgrade the evaluation criteria, as models like GPT-4 could no longer adequately benchmark its capabilities.
This development is particularly relevant for professionals in AI and cloud computing security as it underscores the importance of localized AI solutions that prioritize privacy, compliance, and cultural relevance. The emphasis on open-source frameworks allows for a broader discussion around collaboration in AI safety and innovation.