Understanding the strength of relationships between variables in a data set is important because variables with statistically similar behavior can affect the reliability of models. To detect the relationships between features and remove the so-called multicollinearity we can use correlation measures for continuous variables. However, when we also have categorical variables and thus mixed data sets, it becomes even more challenging to test for multicollinearity. Statistical tests, such as Hypergeometric testing and the Mann-Whitney U test can be used to test for associations across variables in mixed data sets. Although this is great, it requires various intermediate steps such as the typing of variables, one-hot encoding, and multiple test corrections, among others. This entire pipeline is readily implemented in a method named HNet. In this blog, I will demonstrate how to detect variables with similar behavior so that multicollinearity can be easily detected.
1×
Preview
0:00
Current time: 0:00 / Total time: -1:04
-1:04

Causal Discovery
Learn the core concepts of machine learning, causal discovery, and data visualization through clear, hands-on Python examples. Master both theory and practice to apply these techniques confidently in real-world scenarios!
Learn the core concepts of machine learning, causal discovery, and data visualization through clear, hands-on Python examples. Master both theory and practice to apply these techniques confidently in real-world scenarios!Listen on
Substack App
RSS Feed
Recent Episodes