Feature selection is a widely used strategy in machine learning for the reduction of feature sets to their relevant essence to improve predictions and performance. It is also employed for knowledge discovery in applied disciplines such as biology and medicine to find potentially causal factors. But machine learning models often do not represent a unique solution to a given problem, especially in high dimensional settings where redundant factors are likely and spurious correlations exist.
Basing decisions about causal elements on feature selection is therefore inaccurate or wrong when not considering the presence of redundant but also relevant features. Most existing selection algorithms are specifically removing redundancies and not suitable for the task of all-relevant feature selection, or they require careful parametrization and are hard to interpret, which makes them difficult to use.
This thesis is focused on feature selection methods for the analytical use case to facilitate understanding of potential causal factors, for linear and non-linear problems. We propose several new algorithms and methods for all-relevant feature selection to improve knowledge discovery, enabled by statistical methods to improve the accuracy of existing solutions and allow the differentiation between different types of relevance. Furthermore, we offer a new heuristic to automatically group related features together, and we analyse the definition of relevance in the context of privileged information, where data is only available in training.
We also introduce software implementations, which were specifically designed to be modular, efficient and able to parallelize for applications in high dimensional problems. The methods and implementations were evaluated on a wide range of synthetic and real datasets to show their performance in comparison with existing algorithms.