This thesis is focused on feature selection methods for the analytical use case to facilitate understanding of potential causal factors, for linear and non-linear problems. We propose several new algorithms and methods for all-relevant feature selection to improve knowledge discovery, enabled by statistical methods to improve the accuracy of existing solutions and allow the differentiation between different types of relevance. Furthermore, we offer a new heuristic to automatically group related features together, and we analyse the definition of relevance in the context of privileged information, where data is only available in training.