The Register: DeepSeek isn’t done yet with OpenAI – image-maker Janus Pro is gunning for DALL-E 3

Jan 27, 2025

—

Source URL: https://www.theregister.com/2025/01/27/deepseek_image_openai/
Source: The Register
Title: DeepSeek isn’t done yet with OpenAI – image-maker Janus Pro is gunning for DALL-E 3

Feedly Summary: Crouching tiger, hidden layer(s)
Barely a week after DeepSeek’s R1 LLM turned Silicon Valley on its head, the Chinese outfit is back with a new release it claims is ready to challenge OpenAI’s DALL-E 3.…

AI Summary and Description: Yes

Summary: The text discusses the release of DeepSeek’s Janus Pro family of multimodal large language models (LLMs), which competes with OpenAI’s DALL-E 3 for image generation capabilities. These models are designed with improved performance over their predecessors, emphasizing the intricacies of AI development and the implications of such advancements on the AI market.

Detailed Description:
– DeepSeek, a Chinese company, has launched two new LLMs, Janus Pro 1B and 7B, which are said to rival OpenAI’s DALL-E 3 in image generation capabilities.
– These models are designed to manage both image generation and vision processing tasks, building on the previously released Janus model.
– Significant improvements made:
– Decoupling visual encoding into a separate pathway while retaining a unified transformer architecture.
– Targeting higher parameter counts and utilizing a larger dataset.
– Research findings indicated that the original Janus model had performance issues, particularly with short prompts and image generation stability, which have largely been addressed with Janus Pro.
– In competitive benchmarking (GenEval and DPG-Bench), Janus Pro 7B reportedly outperformed DALL-E 3 and Stable Diffusion 3 Medium, though with a limitation of only handling image resolutions of 384×384 pixels.
– DeepSeek’s Janus Pro models were trained efficiently on a modest GPU setup, illustrating potential for practical deployment despite infrastructural constraints.
– While the models show promise, DeepSeek acknowledges certain limitations, particularly in multimodal understanding and fine-detail image generation due to resolution constraints.
– The codebase for Janus is open-source under an MIT license, though usage of the Pro models requires adherence to DeepSeek’s Model License.
– The launch has triggered a notable market response, raising concerns about US dominance in AI and underlying infrastructural needs, against a backdrop of ongoing cybersecurity challenges for the company.

Key Implications:
– The development of competitive LLMs signifies a rapidly evolving landscape in AI and image processing, underscoring the importance of continual innovation and adaptation in security measures within the AI sector.
– The intersection of performance evolution and market reaction highlights the need for compliance and governance frameworks in AI development, particularly when it affects market stability.
– The ongoing cyberattack response emphasizes the critical need for robust cybersecurity measures as organizations expand their AI capabilities.

01 1 2 3 4 5 7 a Act adaptation advancement advancements AI AI development and API Arch architecture art as attack benchmark benchmarking C capabilities challenges Chinese code codebase coding competitive competitive benchmarking compliance compliance and governance concerns critical cyber cyberattack Cybersecurity cybersecurity challenges cybersecurity measures D data dataset de DeepSeek deployment design development e E 3 edge efficient exp fine for framework frameworks g Gen generation GIS Go governance governance framework governance frameworks GPU gs high Highlight http HTTPS ICO image image generation image processing implications in innovation inter ite J Janus Pro k knowledge l land language language model language models large large language model large language models Large Language Models (LLMs) led limitations llm llms lm market market reaction market response modal model models multi Multimodal multimodal understanding no o of on one open open-source openai organization organizations out over parameter performance performance issues Pixel pre processing prompt prompts R R1 raising rate RCE react red release report research resolution response Ro s search sec security security challenges security measure security measures short Sig Silicon Silicon Valley source SSE stability T Task tasks text the to Tor TP transformer transformer architecture two UI up US usage V val Vision vision processing visual encoding Wi x