My Reading List: Data Science

Image generated by craiyon.com

This is a live post listing links to Data Science related posts and videos I consider to be interesting, high-quality, or even essential to better understand particular topics within such a wide field.

Data Preprocessing

Target Encoding

Extending Target Encoding: post by Daniele Micci-Barreca explaining how he came up with the idea of target encoding, and its possible extensions.

Target encoding done the right way: post by Max Halford, Head of Data at Carbonfact, explaining in detail how to combine additive smoothing and target encoding.

Analysis and Modeling

Modeling Methods

Generalized Additive Models: A good online book on Generalized Additive Models by Michael Clark, Senior Machine Learning Scientist at Strong Analytics.

Model Explainability

Model-Independent Score Explanation: Post by Daniele Micci-Barreca on model explainability. It also explains a very clever method to better understand any model just from it’s predictions.

AI Explanations whitepaper: White paper of Google’s “AI Explanations” product with a pretty good overall view of the state of the art of model explainability.

Towards A Rigorous Science of Interpretable Machine Learning: Pre-print by Finale Doshi-Velez and Been Kim offering a rigorous definition and evaluation of model interpretability.

Spatial Analysis

PostGEESE? Introducing The DuckDB Spatial Extension: In this post, the authors of DuckDB present the new PostGIS-like spatial extension for this popular in-process data base engine.

Coding

Why You Shouldn’t Nest Your Code: In this wonderful video, CodeAesthetic explains in detail (and beautiful graphics!) a couple of methods to reduce the level of nesting in our code to improve readability and maintainability. This video has truly changed how I code in R!

Miscellany

What is Retrieval-Augmented Generation (RAG)?: In this video, Marina Danilevsky, Senior Data Scientist at IBM, offers a pretty good explanation on how the Retrieval-Augmented Generation method can improve the credibility of large language models.

Blas M. Benito
Blas M. Benito
Data Scientist and Team Lead

Related