On generalized Gower distance for mixed-type data: extensive simulation study and new software tools
PDF


Google Scholar citations

How to Cite

Grané , Aurea; Scielzo-Ortiz, Fabio. “On generalized Gower distance for mixed-type data: extensive simulation study and new software tools”. SORT-Statistics and Operations Research Transactions, pp. 213-44, doi:10.57645/20.8080.02.28.


Abstract

Data scientists address real-world problems using multivariate and heterogeneous data-sets, characterized by multiple variables of different natures. Selecting a suitable distance function between units is crucial, as many statistical techniques and machine learning algorithms depend on this concept. Traditional distances, such as Euclidean or Manhattan, are unsuitable for mixed-type data, and although Gower distance was designed to handle this kind of data, it may lead to suboptimal results in the presence of outlying units or underlying correlation structure. In this work robust distances for mixed-type data are defined and explored, namely robust generalized Gower and robust related metric scaling. A new Python package is developed, which enables to compute these robust proposals as well as classical ones.

Keywords

  • distances
  • generalized Gower
  • multivariate heterogeneous data
  • outliers
  • robust Mahalanobis
  • related metric scaling
https://doi.org/10.57645/20.8080.02.28
PDF