How to Find the Best Theoretical Distribution for Your Data

Causal Discovery

Preview

0:00

-3:43

How to Find the Best Theoretical Distribution for Your Data

Knowing the underlying data distribution is an essential step for data modeling and has many applications, such as anomaly detection, synthetic data creation, and data compression.

erdogant

Nov 11, 2024

∙ Paid

Knowing the underlying (probability) distribution of your data has many modeling advantages. The easiest manner to determine the underlying distribution is by visually inspecting the random variable(s) using a histogram. With the candidate distribution, various plots can be created such as the Probability Distribution Function plot (PDF/CDF), and the QQ plot. However, to determine the exact distribution parameters (e.g., loc, scale), it is essential to use quantitative methods. In this blog, I will describe why it is important to determine the underlying probability distribution for your data set. What the differences are between parametric and non-parametric distributions. How to determine the best fit using a quantitative approach and how to confirm it using visual inspections. Analyses are performed using the distfit library, and a notebook is accompanied for easy access and experimenting.

Refer a friend

Get 50% off for 1 year

Listen to this episode with a 7-day free trial

Subscribe to Causal Data Science to listen to this post and get 7 days of free access to the full post archives.