Tag: context length

  • Hacker News: RWKV Language Model

    Source URL: https://www.rwkv.com/ Source: Hacker News Title: RWKV Language Model Feedly Summary: Comments AI Summary and Description: Yes Summary: The RWKV (RNN with LLM capabilities) presents a significant innovation in language model design by combining the advantages of recurrent neural networks (RNNs) and transformers. Its unique features, including linear time processing and lack of attention…

  • Simon Willison’s Weblog: Things we learned out about LLMs in 2024

    Source URL: https://simonwillison.net/2024/Dec/31/llms-in-2024/#atom-everything Source: Simon Willison’s Weblog Title: Things we learned out about LLMs in 2024 Feedly Summary: A lot has happened in the world of Large Language Models over the course of 2024. Here’s a review of things we figured out about the field in the past twelve months, plus my attempt at identifying…

  • Simon Willison’s Weblog: Finally, a Replacement for BERT: Introducing ModernBERT

    Source URL: https://simonwillison.net/2024/Dec/24/modernbert/ Source: Simon Willison’s Weblog Title: Finally, a Replacement for BERT: Introducing ModernBERT Feedly Summary: Finally, a Replacement for BERT: Introducing ModernBERT BERT was an early language model released by Google in October 2018. Unlike modern LLMs it wasn’t designed for generating text. BERT was trained for masked token prediction and was generally…

  • Hacker News: A Replacement for Bert

    Source URL: https://huggingface.co/blog/modernbert Source: Hacker News Title: A Replacement for Bert Feedly Summary: Comments AI Summary and Description: Yes **Short Summary with Insight:** The text discusses the introduction of ModernBERT, an advanced encoder-only model that surpasses older models like BERT in both performance and efficiency. Boasting an increased context length of 8192 tokens, faster processing…

  • Simon Willison’s Weblog: Meta AI release Llama 3.3

    Source URL: https://simonwillison.net/2024/Dec/6/llama-33/#atom-everything Source: Simon Willison’s Weblog Title: Meta AI release Llama 3.3 Feedly Summary: Meta AI release Llama 3.3 This new Llama-3.3-70B-Instruct model from Meta AI makes some bold claims: This model delivers similar performance to Llama 3.1 405B with cost effective inference that’s feasible to run locally on common developer workstations. I have…

  • Simon Willison’s Weblog: First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin)

    Source URL: https://simonwillison.net/2024/Dec/4/amazon-nova/ Source: Simon Willison’s Weblog Title: First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin) Feedly Summary: Amazon released three new Large Language Models yesterday at their AWS re:Invent conference. The new model family is called Amazon Nova and comes in three sizes: Micro, Lite and Pro. I built…

  • Hacker News: 32k context length text embedding models

    Source URL: https://blog.voyageai.com/2024/09/18/voyage-3/ Source: Hacker News Title: 32k context length text embedding models Feedly Summary: Comments AI Summary and Description: Yes Summary: The text highlights the launch of the Voyage 3 series embedding models, which provide significant advancements in retrieval quality, latency, and cost-effectiveness compared to existing models like OpenAI’s. Specifically, the Voyage 3 models…

  • Hacker News: Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference

    Source URL: https://cerebras.ai/blog/llama-405b-inference/ Source: Hacker News Title: Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses breakthrough advancements in AI inference speed, specifically highlighting Cerebras’s Llama 3.1 405B model, which showcases significantly superior performance metrics compared to traditional GPU solutions. This…

  • Hacker News: Qwen2.5 Turbo extends context length to 1M tokens

    Source URL: http://qwenlm.github.io/blog/qwen2.5-turbo/ Source: Hacker News Title: Qwen2.5 Turbo extends context length to 1M tokens Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the introduction of Qwen2.5-Turbo, a large language model (LLM) that significantly enhances processing capabilities, particularly with longer contexts, which are critical for many applications involving AI-driven natural language…

  • Simon Willison’s Weblog: Qwen: Extending the Context Length to 1M Tokens

    Source URL: https://simonwillison.net/2024/Nov/18/qwen-turbo/#atom-everything Source: Simon Willison’s Weblog Title: Qwen: Extending the Context Length to 1M Tokens Feedly Summary: Qwen: Extending the Context Length to 1M Tokens The new Qwen2.5-Turbo boasts a million token context window (up from 128,000 for Qwen 2.5) and faster performance: Using sparse attention mechanisms, we successfully reduced the time to first…