Tag: vocabulary

  • Hacker News: IETF setting standards for AI preferences

    Source URL: https://www.ietf.org/blog/aipref-wg/ Source: Hacker News Title: IETF setting standards for AI preferences Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the formation of the AI Preferences (AIPREF) Working Group, aimed at standardizing how content preferences are expressed for AI model training, amid concerns from content publishers about unauthorized use. This…

  • Hacker News: Sketch-of-Thought: Efficient LLM Reasoning

    Source URL: https://arxiv.org/abs/2503.05179 Source: Hacker News Title: Sketch-of-Thought: Efficient LLM Reasoning Feedly Summary: Comments AI Summary and Description: Yes Summary: The provided text discusses a novel prompting framework called Sketch-of-Thought (SoT) aimed at optimizing large language models (LLMs) by minimizing token usage while maintaining or improving reasoning accuracy. This innovation is particularly relevant for AI…

  • Hacker News: A minimal PyTorch implementation for training your own small LLM from scratch

    Source URL: https://github.com/Om-Alve/smolGPT Source: Hacker News Title: A minimal PyTorch implementation for training your own small LLM from scratch Feedly Summary: Comments AI Summary and Description: Yes **Summary:** This text describes a minimal PyTorch implementation for training a small Language Model (LLM) from scratch, intended primarily for educational purposes. It showcases modern techniques in LLM…

  • Simon Willison’s Weblog: Anomalous Tokens in DeepSeek-V3 and r1

    Source URL: https://simonwillison.net/2025/Jan/26/anomalous-tokens-in-deepseek-v3-and-r1/#atom-everything Source: Simon Willison’s Weblog Title: Anomalous Tokens in DeepSeek-V3 and r1 Feedly Summary: Anomalous Tokens in DeepSeek-V3 and r1 Glitch tokens (previously) are tokens or strings that trigger strange behavior in LLMs, hinting at oddities in their tokenizers or model weights. Here’s a fun exploration of them across DeepSeek v3 and R1.…

  • Simon Willison’s Weblog: deepseek-ai/DeepSeek-V3-Base

    Source URL: https://simonwillison.net/2024/Dec/25/deepseek-v3/#atom-everything Source: Simon Willison’s Weblog Title: deepseek-ai/DeepSeek-V3-Base Feedly Summary: deepseek-ai/DeepSeek-V3-Base No model card or announcement yet, but this new model release from Chinese AI lab DeepSeek (an arm of Chinese hedge fund High-Flyer) looks very significant. It’s a huge model – 685B parameters, 687.9 GB on disk (TIL how to size a git-lfs…

  • Hacker News: Probably pay attention to tokenizers

    Source URL: https://cybernetist.com/2024/10/21/you-should-probably-pay-attention-to-tokenizers/ Source: Hacker News Title: Probably pay attention to tokenizers Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The text delves into the critical role of tokenization in AI applications, particularly those utilizing Retrieval-Augmented Generation (RAG). It emphasizes how understanding tokenization can significantly affect the performance of AI models, especially in contexts…