Simon Willison’s Weblog: Shisa V2 405B: Japan’s Highest Performing LLM

Jun 3, 2025

—

Source URL: https://simonwillison.net/2025/Jun/3/shisa-v2/
Source: Simon Willison’s Weblog
Title: Shisa V2 405B: Japan’s Highest Performing LLM

Feedly Summary: Shisa V2 405B: Japan’s Highest Performing LLM
Leonard Lin and Adam Lensenmayer have been working on Shisa for a while. They describe their latest release as “Japan’s Highest Performing LLM".

Shisa V2 405B is the highest-performing LLM ever developed in Japan, and surpasses GPT-4 (0603) and GPT-4 Turbo (2024-04-09) in our eval battery. (It also goes toe-to-toe with GPT-4o (2024-11-20) and DeepSeek-V3 (0324) on Japanese MT-Bench!)

This 405B release is a follow-up to the six smaller Shisa v2 models they released back in April, which took a similar approach to DeepSeek-R1 in producing different models that each extended different existing base model from Llama, Qwen, Mistral and Phi-4.
The new 405B model uses Llama 3.1 405B Instruct as a base, and is available under the Llama 3.1 community license.
Shisa is a prominent example of Sovereign AI – the ability for nations to build models that reflect their own language and culture:

We strongly believe that it’s important for homegrown AI to be developed both in Japan (and globally!), and not just for the sake of cultural diversity and linguistic preservation, but also for data privacy and security, geopolitical resilience, and ultimately, independence.
We believe the open-source approach is the only realistic way to achieve sovereignty in AI, not just for Japan, or even for nation states, but for the global community at large.

The accompanying overview report has some fascinating details:

Training the 405B model was extremely difficult. Only three other groups that we know of: Nous Research, Bllossom, and AI2 have published Llama 405B full fine-tunes. […] We implemented every optimization at our disposal including: DeepSpeed ZeRO-3 parameter and activation offloading, gradient accumulation, 8-bit paged optimizer, and sequence parallelism. Even so, the 405B model still barely fit within the H100’s memory limits

In addition to the new model the Shisa team have published shisa-ai/shisa-v2-sharegpt, 180,000 records which they describe as "a best-in-class synthetic dataset, freely available for use to improve the Japanese capabilities of any model. Licensed under Apache 2.0".
An interesting note is that they found that since Shisa out-performs GPT-4 at Japanese that model was no longer able to help with evaluation, so they had to upgrade to GPT-4.1:

Tags: translation, llm-release, evals, generative-ai, llama, ai, llms, fine-tuning, leonard-lin

AI Summary and Description: Yes

Summary: The text details the release of Shisa V2 405B, Japan’s highest-performing LLM, which surpasses existing models like GPT-4 in performance and reflects Japan’s commitment to developing sovereign AI that respects cultural and linguistic diversity while ensuring data privacy and security.

Detailed Description:

– **Introduction of Shisa V2 405B**:
– Launched by Leonard Lin and Adam Lensenmayer, this LLM (Large Language Model) claims to be Japan’s most advanced AI model, outpacing notable models like GPT-4 and DeepSeek-V3.
– The release represents a critical advancement in AI technology within Japan, emphasizing a push for localized development.

– **Model Specifications**:
– Shisa V2 405B utilizes Llama 3.1 as its base model, licensed under the Llama 3.1 community license.
– This version follows earlier smaller models and indicates a strategic effort to create tailored AI solutions for specific language and cultural contexts.

– **Sovereign AI Concept**:
– The authors highlight the importance of homegrown AI initiatives, speaking to cultural diversity, data privacy, and geopolitical resilience.
– There is a strong belief in the necessity of an open-source approach to foster AI sovereignty, not just for Japan but as a model for the global community.

– **Technical Challenges**:
– Training the 405B model involved significant technical challenges, indicating competition among leading AI research groups.
– Advanced techniques such as DeepSpeed ZeRO-3, gradient accumulation, and 8-bit paged optimization were employed to maximize performance within hardware constraints.

– **Dataset Contribution**:
– The Shisa team has also released a dataset of 180,000 records under an Apache 2.0 license intended to enhance Japanese language model capabilities.
– The dataset reflects a commitment to improving generative AI across platforms while promoting accessibility.

– **Performance Insights**:
– Shisa’s notable performance metrics reveal a need to upgrade the evaluation criteria, as models like GPT-4 could no longer adequately benchmark its capabilities.

This development is particularly relevant for professionals in AI and cloud computing security as it underscores the importance of localized AI solutions that prioritize privacy, compliance, and cultural relevance. The emphasis on open-source frameworks allows for a broader discussion around collaboration in AI safety and innovation.

-4o .NET 0 license 1 10 2 2024 2025 24 3 4 5 a access accessibility Act advanced advanced AI advancement AI ai model AI safety AI technology Ai2 and apach Apache Apache 2 Apache 2.0 Apache 2.0 license app Arch art as authors benchmark Best Bi by C capabilities challenges CI class Cloud cloud computing cloud computing security co Col collaboration commit community Competition compliance Computing concept Context core criteria critical cross cultural diversity culture D data data privacy dataset de deep DeepSeek deepspeed development diversity e end evals evaluation eXtended fine fine-tuning for framework frameworks free full g Gen generative Generative AI geo geopolitical Go GPT GPT-4o grade Group gs H hardware hardware co high Highlight HR http HTTPS improving in independence initiatives innovation insights inter io ite J Japan Japanese language model Just k l Labor language language model large large language model leading led Li llama Llama 3 Llama 4 llm llms lm local long low M man max memory metrics Mila Mistral Mode model model capabilities model specifications models N nation no nous research o oE of off on only open open-source OPM opt optimization optimizer ory oS other out over parallelism parameter performance Performance Insights performance metrics phi platform platforms pre privacy professionals Q Qwen R R1 rate RCE real record red release report research Resil resilience Ro s safe safety search sec security sequence SHA Sig Sim small smaller models solutions source Sovereign AI sovereignty speaking specific SSE SSO state strategic synthetic Synthetic Data synthetic dataset T Tags: Tails team tech technical challenges techniques technology test text the to TP training translation tuning UI under up upgrade ups US use uth V V3 val Valuation version Ware web Wi x zero