My Reading List: Data Science

This is a live post listing links to Data Science related posts and videos I consider to be interesting, high-quality, or even essential to better understand particular topics within such a wide field.
Data Preprocessing
Target Encoding
Extending Target Encoding: post by Daniele Micci-Barreca explaining how he came up with the idea of target encoding, and its possible extensions.
Target encoding done the right way: post by Max Halford, Head of Data at Carbonfact, explaining in detail how to combine additive smoothing and target encoding.
Analysis and Modeling
Modeling Methods
Generalized Additive Models: A good online book on Generalized Additive Models by Michael Clark, Senior Machine Learning Scientist at Strong Analytics.
Model Explainability
Model-Independent Score Explanation: Post by Daniele Micci-Barreca on model explainability. It also explains a very clever method to better understand any model just from it’s predictions.
AI Explanations whitepaper: White paper of Google’s “AI Explanations” product with a pretty good overall view of the state of the art of model explainability.
Towards A Rigorous Science of Interpretable Machine Learning: Pre-print by Finale Doshi-Velez and Been Kim offering a rigorous definition and evaluation of model interpretability.
Spatial Analysis
PostGEESE? Introducing The DuckDB Spatial Extension: In this post, the authors of DuckDB present the new PostGIS-like spatial extension for this popular in-process data base engine.
Coding
Why You Shouldn’t Nest Your Code: In this wonderful video, CodeAesthetic explains in detail (and beautiful graphics!) a couple of methods to reduce the level of nesting in our code to improve readability and maintainability. This video has truly changed how I code in R!
Miscellany
What is Retrieval-Augmented Generation (RAG)?: In this video, Marina Danilevsky, Senior Data Scientist at IBM, offers a pretty good explanation on how the Retrieval-Augmented Generation method can improve the credibility of large language models.