Tag: inference speeds
-
The Register: Cheat codes for LLM performance: An introduction to speculative decoding
Source URL: https://www.theregister.com/2024/12/15/speculative_decoding/ Source: The Register Title: Cheat codes for LLM performance: An introduction to speculative decoding Feedly Summary: Sometimes two models really are faster than one Hands on When it comes to AI inferencing, the faster you can generate a response, the better – and over the past few weeks, we’ve seen a number…
-
Simon Willison’s Weblog: Claude 3.5 Haiku price drops by 20%
Source URL: https://simonwillison.net/2024/Dec/5/claude-35-haiku-price-drops-by-20/#atom-everything Source: Simon Willison’s Weblog Title: Claude 3.5 Haiku price drops by 20% Feedly Summary: Claude 3.5 Haiku price drops by 20% Buried in this otherwise quite dry post about Anthropic’s ongoing partnership with AWS: To make this model even more accessible for a wide range of use cases, we’re lowering the price…