Data Science in the AI Era: What's New, What's Not with Dustin Tingley
Published December 2, 2024
Course Mentioned in this Post: Data Science Principles
“These generative AI tools for writing code and even conducting data analysis has been truly transformative,” Dustin Tingley says. “I still do write code, but a fraction of what I used to write. And that's not to say that people who are writing code who are doing data science are going to be replaced. But it is certainly transforming the set of tasks that they're having to do."
Image
Recently, Harvard on Digital past participants had the exclusive opportunity to join a 60-minute webinar with Dustin Tingley, a Professor of Government and Public Policy with a joint appointment in the Harvard Government Department and the Harvard Kennedy School of Public Policy. During this session, Professor Tingley explored data science in the AI era, beginning with a short presentation grounded in his research and insights in statistics, machine learning, data science, AI, and generative AI. Topics discussed included the distinction between analyzing data and creating data using pre-trained models, the roles of data pipelines and big data, and the implications of AI for the practice of data science. Following the presentation, Professor Tingley opened the floor to questions from the audience. You can find a recap of the Q&A's key takeaways below.
Key Takeaways from the Q&A
Question 1: Training up generative AI models on big data is one thing, but the role of understanding the causality is another. Is that something that's enhanced by generative AI? Is that still very much a human thing?
Key Points from Dustin Tingley's Answer:
- Establishing causal relationships is not something AI tools have really transformed yet
- AI requires a model of human behavior to understand causality challenges
- Humans are better at causal reasoning because they can understand context and behavioral models
- While AI might enhance the process, fundamental causal inference still requires human design and interpretation
- Establishing causality is difficult even for humans, making it an especially challenging task for AI
Question 2: To what extent might AI tools manipulate data or present biased insights, and how can we safeguard against that?
Key Points from Dustin Tingley's Answer:
- The quality and nature of training data determines AI outputs
- If training data is biased or limited, the AI will reflect those biases
- Transparency is crucial — organizations need to understand their data sources
- Companies should dedicate personnel to monitoring and questioning AI systems
- Ethics and bias considerations should be integrated from the start, not added later
Question 3: Has facial recognition improved since the Data Science Principles course was created, and how has that evolved with these ideas around generative AI being able to generate images?
Key Points from Dustin Tingley's Answer:
- Facial recognition tools have improved significantly and are being commercially deployed
- New concerns have emerged with deep fakes — AI-generated video and photo content
- Voice generation combined with facial deep fakes creates new security challenges
- Traditional identity verification methods (voice, face) are becoming less reliable
- There are both concerning applications (identity theft) and beneficial uses (language dubbing)
Question 4: What's the opportunity for the non-coder like myself? Somebody who conceptually gets it, but otherwise doesn't have their hands in the middle of it?
Key Points from Dustin Tingley's Answer:
- The Harvard online course, Data Science Principles, was intentionally designed with no code and no math
- Understanding conceptual pieces is crucial for working with data scientists
- Key concepts include: bias in data, missing data, causality, measurement improvement
- Gen AI tools are transforming coding itself, making technical skills less critical
- Non-coders with strong conceptual understanding are becoming more valuable in organizations
Question 5: Evaluation of content generated by AI specifically LLMs... what's the latest and greatest in ensuring that the content is good enough to be consumed in terms of accuracy?
Key Points from Dustin Tingley's Answer:
- Performance benchmarks exist but shouldn't be the only consideration
- Evaluate LLMs as part of complete products/solutions, not in isolation
- Consider practical factors like cost and computational requirements
- Debate continues between broad vs. narrow training approaches
- Quality assessment should include context, cost, and practical application
Question 6: Do you think that the significant increase in data training could diminish the value of data?
Key Points from Dustin Tingley's Answer:
- No, AI tools will likely inspire new questions and generate new forms of data
- Time saved by AI can be redirected to asking better questions
- The technology may unlock human potential for higher-level thinking
- New developments like collaborative AI will create new opportunities
- The evolution of human-AI interaction will lead to new types of data
Question 7: Do you think it's even worth pursuing a mid-career change in AI privacy law?
Key Points from Dustin Tingley's Answer:
- Privacy issues will persist regardless of political changes
- Global nature of digital products requires maintaining privacy standards
- Consumer demand for privacy will continue regardless of regulations
- Market forces may drive privacy protection even without government mandates
- Career opportunities in privacy and AI will remain valuable
To continue learning with Dustin Tingley about data science with a nearly code- and math-free introduction to prediction, causality, visualization, data wrangling, privacy, and ethics, apply for the Data Science Principles course. This course is part of the Harvard on Digital Learning Path, offering past and present learners access to exclusive events--like this webinar--where they can engage directly with faculty and ask their most pressing questions.
Dustin Tingley is a data scientist at Harvard University. He is the Thomas D. Cabot Professor of Public Policy with a joint appointment in the Harvard Kennedy School of Public Policy and Harvard Government Department. Professor Tingley is Deputy Vice Provost for Advances in Learning and helps to direct Harvard's education focused data science and technology team. He has helped a variety of organizations use the tools of data science and helped to develop machine learning algorithms and accompanying software for the social sciences. He has written on a variety of topics using data science techniques, including education, politics, and economics.
Related Articles
Effective Teamwork in Health Care: Expert Panel Discussion
A discussion on the latest research and practical guidance on teamwork in health care with Harvard thought leaders, Amy Edmondson and Michaela Kerrissey.
Image
Addressing Racial Disparities in Health Care
A discussion with health care leaders, moderated by Aswita Tan-McGrory, Director of the Disparities Solutions Center at Massachusetts General Hospital.
ImageTopics
Announcements
Learner Resources
Webinars
Data Science and Digital
Health Care
Humanities
Leadership and Business
Big Data for Social Good
Big social problems require big data solutions
Using real-world data and policy interventions as applications, this course will teach core concepts in economics and statistics and equip you to tackle some of the most pressing social challenges of our time.
Read More >
Data Science Principles
Are you prepared for our data-driven world?
Data Science Principles gives you an overview of data science with a code- and math-free introduction to prediction, causality, data wrangling, privacy, and ethics.
Read More >
Data Privacy and Technology
Explore the risks and rewards of data privacy and collection
Explore legal and ethical implications of one’s personal data, the risks and rewards of data collection and surveillance, and the needs for policy, advocacy, and privacy monitoring.
Read More >
Back to top