data quality – Page 5 – Experimental News Clipping Site

Hacker News: Narrow finetuning can produce broadly misaligned LLM [pdf]

Feb 25, 2025

—

by

Source URL: https://martins1612.github.io/emergent_misalignment_betley.pdf Source: Hacker News Title: Narrow finetuning can produce broadly misaligned LLM [pdf] Feedly Summary: Comments AI Summary and Description: Yes **Summary:** The document presents findings on the phenomenon of “emergent misalignment” in large language models (LLMs) like GPT-4o when finetuned on specific narrow tasks, particularly the creation of insecure code. The results…

Hacker News: SWE-Bench tainted by answer leakage; real pass rates significantly lower

Feb 21, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://arxiv.org/abs/2410.06992 Source: Hacker News Title: SWE-Bench tainted by answer leakage; real pass rates significantly lower Feedly Summary: Comments AI Summary and Description: Yes Summary: The paper “SWE-Bench+: Enhanced Coding Benchmark for LLMs” addresses significant data quality issues in the evaluation of Large Language Models (LLMs) for coding tasks. It presents empirical analysis revealing…

Cloud Blog: How to use gen AI for better data schema handling, data quality, and data generation

Feb 18, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/data-analytics/how-gemini-in-bigquery-helps-with-data-engineering-tasks/ Source: Cloud Blog Title: How to use gen AI for better data schema handling, data quality, and data generation Feedly Summary: In the realm of data engineering, generative AI models are quietly revolutionizing how we handle, process, and ultimately utilize data. For example, large language models (LLMs) can help with data schema…

Anton on Security – Medium: Cross-post: Office of the CISO 2024 Year in Review: AI Trust and Security

Jan 28, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://medium.com/anton-on-security/cross-post-office-of-the-ciso-2024-year-in-review-ai-trust-and-security-e73af11fb374?source=rss—-8e8c3ed26c4c—4 Source: Anton on Security – Medium Title: Cross-post: Office of the CISO 2024 Year in Review: AI Trust and Security Feedly Summary: AI Summary and Description: Yes Summary: The text provides a comprehensive overview of Google’s insights and resources regarding the secure implementation of generative AI in 2024. It covers critical security…

Hacker News: Lessons from building a small-scale AI application

Jan 23, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.thelis.org/blog/lessons-from-ai Source: Hacker News Title: Lessons from building a small-scale AI application Feedly Summary: Comments AI Summary and Description: Yes Summary: The text encapsulates critical lessons learned from constructing a small-scale AI application, emphasizing the differences between traditional programming and AI development, alongside the intricacies of managing data quality, training pipelines, and system…

Cloud Blog: Introducing BigQuery metastore, a unified metadata service with Apache Iceberg support

Jan 22, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/data-analytics/introducing-bigquery-metastore-fully-managed-metadata-service/ Source: Cloud Blog Title: Introducing BigQuery metastore, a unified metadata service with Apache Iceberg support Feedly Summary: Does your organization use multiple data processing engines like BigQuery, Apache Spark, Apache Flink and Apache Hive? Wouldn’t it be great if you could provide a single source of truth for all of your analytics…

Hacker News: PostgreSQL Anonymizer

Jan 17, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://postgresql-anonymizer.readthedocs.io/en/stable/ Source: Hacker News Title: PostgreSQL Anonymizer Feedly Summary: Comments AI Summary and Description: Yes Summary: The text discusses the PostgreSQL Anonymizer, an extension aimed at masking personally identifiable information (PII) and commercially sensitive data within PostgreSQL databases. This tool offers a declarative approach to anonymization, enabling application developers to integrate data masking…

CSA: How Can Businesses Mitigate AI "Lying" Risks Effectively?

Jan 13, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://www.schellman.com/blog/cybersecurity/llms-and-how-to-address-ai-lying Source: CSA Title: How Can Businesses Mitigate AI "Lying" Risks Effectively? Feedly Summary: AI Summary and Description: Yes Summary: The text addresses the accuracy of outputs generated by large language models (LLMs) in AI systems, emphasizing the risk of AI “hallucinations” and the importance of robust data management to mitigate these concerns.…

Cloud Blog: Supervised Fine Tuning for Gemini: A best practices guide

Jan 7, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/products/ai-machine-learning/master-gemini-sft/ Source: Cloud Blog Title: Supervised Fine Tuning for Gemini: A best practices guide Feedly Summary: Foundation models such as Gemini have revolutionized how we work, but sometimes they need guidance to excel at specific business tasks. Perhaps their answers are too long, or their summaries miss the mark. That’s where supervised fine-tuning…

Cloud Blog: Enhance viewer engagement with gen AI-powered scene detection for ads

Jan 7, 2025

—

by

Kurt Seifried

in Uncategorized

Source URL: https://cloud.google.com/blog/topics/developers-practitioners/use-ai-powered-scene-detection-for-more-effective-ad-placement/ Source: Cloud Blog Title: Enhance viewer engagement with gen AI-powered scene detection for ads Feedly Summary: Online video consumption has skyrocketed. A staggering 1.8 billion people globally subscribed to streaming services in 20231, and 92% of internet users worldwide watched online videos every month in 20242. This growth creates a significant opportunity…

Tag: data quality