Abstract
Data scientists address real-world problems using multivariate and heterogeneous data-sets, characterized by multiple variables of different natures. Selecting a suitable distance function between units is crucial, as many statistical techniques and machine learning algorithms depend on this concept. Traditional distances, such as Euclidean or Manhattan, are unsuitable for mixed-type data, and although Gower distance was designed to handle this kind of data, it may lead to suboptimal results in the presence of outlying units or underlying correlation structure. In this work robust distances for mixed-type data are defined and explored, namely robust generalized Gower and robust related metric scaling. A new Python package is developed, which enables to compute these robust proposals as well as classical ones.
Keywords
Rights

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
(c) Authors transfer the exploitation rights of their works to the journal. The Institut d'estadística de Catalunya holds the copyright ownership of the contents published in the journal. Authors may deposit a copy of their works in repositories, as specified in the self-archiving policy., 2025
Copyright
All content in the journal SORT is published under Creative Commons Attribution-NonCommercial-No Derivatives 4.0 International license (CC BY-NC-ND 4.0), the terms of which are available at https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en


