¡Desconecta con la aplicación Player FM !
#83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo)
Manage episode 476397433 series 3332503
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
In this episode, host Murilo is joined by returning guest Paolo, Data Management Team Lead at dataroots, for a deep dive into the often-overlooked but rapidly evolving domain of unstructured data quality. Tune in for a field guide to navigating documents, images, and embeddings without losing your sanity.
What we unpack:
- Data management basics: Metadata, ownership, and why Excel isn’t everything.
- Structured vs unstructured data: How the wild west of PDFs, images, and audio is redefining quality.
- Data quality challenges for LLMs: From apples and pears to rogue chatbots with “legally binding” hallucinations.
- Practical checks for document hygiene: Versioning, ownership, embedding similarity, and tagging strategies.
- Retrieval-Augmented Generation (RAG): When ChatGPT meets your HR policies and things get weird.
- Monitoring and governance: Building systems that flag rot before your chatbot gives out 2017 vacation rules.
- Tooling and gaps: Where open source is doing well—and where we’re still duct-taping workflows.
- Real-world inspirations: A look at how QuantumBlack (McKinsey) is tackling similar issues with their AI for DQ framework.
Capíttulos
1. #83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo) (00:00:00)
2. Welcome to Data Topics (00:00:46)
3. Introducing Data Management (00:01:30)
4. Unstructured vs Structured Data (00:06:30)
5. RAG and Chatbot Applications (00:09:38)
6. Data Quality Issues in Documents (00:17:15)
7. Metadata Checks and Content Analysis (00:25:05)
8. Testing Outputs and Monitoring (00:34:18)
9. Governance and Available Tools (00:42:52)
10. Summary and Additional Resources (00:48:09)
83 episodios
Manage episode 476397433 series 3332503
Welcome to the cozy corner of the tech world where ones and zeros mingle with casual chit-chat. Datatopics Unplugged is your go-to spot for relaxed discussions around tech, news, data, and society.
Dive into conversations that should flow as smoothly as your morning coffee (but don't), where industry insights meet laid-back banter. Whether you're a data aficionado or just someone curious about the digital age, pull up a chair, relax, and let's get into the heart of data, unplugged style!
In this episode, host Murilo is joined by returning guest Paolo, Data Management Team Lead at dataroots, for a deep dive into the often-overlooked but rapidly evolving domain of unstructured data quality. Tune in for a field guide to navigating documents, images, and embeddings without losing your sanity.
What we unpack:
- Data management basics: Metadata, ownership, and why Excel isn’t everything.
- Structured vs unstructured data: How the wild west of PDFs, images, and audio is redefining quality.
- Data quality challenges for LLMs: From apples and pears to rogue chatbots with “legally binding” hallucinations.
- Practical checks for document hygiene: Versioning, ownership, embedding similarity, and tagging strategies.
- Retrieval-Augmented Generation (RAG): When ChatGPT meets your HR policies and things get weird.
- Monitoring and governance: Building systems that flag rot before your chatbot gives out 2017 vacation rules.
- Tooling and gaps: Where open source is doing well—and where we’re still duct-taping workflows.
- Real-world inspirations: A look at how QuantumBlack (McKinsey) is tackling similar issues with their AI for DQ framework.
Capíttulos
1. #83 Who’s Minding the Metadata? Why Data Quality Matters in GenAI (Quality Time With Paolo) (00:00:00)
2. Welcome to Data Topics (00:00:46)
3. Introducing Data Management (00:01:30)
4. Unstructured vs Structured Data (00:06:30)
5. RAG and Chatbot Applications (00:09:38)
6. Data Quality Issues in Documents (00:17:15)
7. Metadata Checks and Content Analysis (00:25:05)
8. Testing Outputs and Monitoring (00:34:18)
9. Governance and Available Tools (00:42:52)
10. Summary and Additional Resources (00:48:09)
83 episodios
Todos los episodios
×Bienvenido a Player FM!
Player FM está escaneando la web en busca de podcasts de alta calidad para que los disfrutes en este momento. Es la mejor aplicación de podcast y funciona en Android, iPhone y la web. Regístrate para sincronizar suscripciones a través de dispositivos.