Hacker News: Coping with dumb LLMs using classic ML

Jan 24, 2025

—

Source URL: https://softwaredoug.com/blog/2025/01/21/llm-judge-decision-tree
Source: Hacker News
Title: Coping with dumb LLMs using classic ML

Feedly Summary: Comments

AI Summary and Description: Yes

Summary: The text provides an innovative approach to utilizing local LLMs (large language models) to assess product relevance for e-commerce search queries. By collecting data on LLM decisions and comparing them against human evaluations, the author aims to refine search relevance estimations and improve search quality without incurring high costs typical of external AI solutions. This method has implications for optimizing search functionalities in e-commerce settings and leveraging machine learning to enhance decision-making processes.

Detailed Description:

– **Local LLM Evaluation**: The author explores the potential of a local LLM to identify relevant products for search queries by measuring its decisions against human preferences. This method is aimed at increasing efficiency and reducing costs associated with human evaluators and cloud compute resources.

– **Experimental Approach**: Through numerous experiments, the author assesses the effectiveness of various prompt configurations (forcing decisions, allowing for ‘Neither’, double-checking preferences, etc.) to evaluate products based on different attributes (name, description, classification, etc.).

– **Data Collection**: The process involves generating a substantial dataset where the LLM evaluates product pairs (e.g. “entrance table”), collecting preferences from an agent across various attributes (name, description, classification), and juxtaposing these against human preferences to measure efficacy.

– **Machine Learning Integration**:
– The author proposes using the collected evaluations to build a classification model, where the predictions by the LLM serve as features to predict human preferences.
– By employing decision trees as a simple yet interpretable model, they determine how to enhance search result relevance and prioritize attributes in determining the most suitable products.

– **Precision and Recall Analysis**: The experimentation includes a detailed analysis of the impact of varying different configurations on precision and recall, highlighting trade-offs that can guide how to optimize model sensitivity versus specificity.

– **Future Directions**: The author considers the potential to refine this approach beyond basic decision trees, exploring more complex models like gradient boosting to improve classification accuracy. They suggest that simpler LLM outputs, when integrated effectively with traditional machine learning, could create robust search solutions while remaining interpretable.

This text presents significant insights for professionals in AI and e-commerce, particularly in the application of LLMs and machine learning for understanding user preferences and enhancing search functionality—key avenues in improving the overall consumer experience and operational efficiency in digital marketplaces.

01 1 2 5 a accuracy Act agent AGI AI analysis and anti Application Arch art as based by C checking CIA class classification classification model Cloud Col commerce compute compute resources Configuration cost Costs cross D data data collection dataset de decision decision tree decision-making Decision-making Processes decisions digital e e-commerce effective effectiveness efficiency Entra ERP evaluation evaluator exp experience experimentation External features fine for functionality future future directions g Gen git gs hack hacker Hacker News high Highlight HR http HTTPS human human evaluation implications in innovative approach insights integration inter interpret IRS k l language language model language models large large language model large language models learning led llm llms lm low mac machine Machine Learning machine learning integration making making processes market marketplace mini ML model models news no o of off offs on oost operation operational efficiency opt out Outputs over pre precision precision and recall processes product products professionals prompt R rag rate RCE recall red resources Ro s search search functionality search relevance sensitivity settings side Sig Sim Simple SoC software source SSE T text the to Tor TP trade UI US use user user preferences uth V val Valuation Wi x