Generalized Fisher Score for Feature Selection

Published 14 Feb 2012 in cs.LG and stat.ML | (1202.3725v1)

Abstract: Fisher score is one of the most widely used supervised feature selection methods. However, it selects each feature independently according to their scores under the Fisher criterion, which leads to a suboptimal subset of features. In this paper, we present a generalized Fisher score to jointly select features. It aims at finding an subset of features, which maximize the lower bound of traditional Fisher score. The resulting feature selection problem is a mixed integer programming, which can be reformulated as a quadratically constrained linear programming (QCLP). It is solved by cutting plane algorithm, in each iteration of which a multiple kernel learning problem is solved alternatively by multivariate ridge regression and projected gradient descent. Experiments on benchmark data sets indicate that the proposed method outperforms Fisher score as well as many other state-of-the-art feature selection methods.

Abstract PDF Upgrade to Chat

Authors (3)

Citations (744)

View on Semantic Scholar

Summary

The paper introduces a novel approach that jointly selects feature subsets by maximizing the lower bound of the Fisher score to capture feature interdependencies.
It reformulates feature selection as a mixed integer programming problem, converting it into a quadratically constrained linear program solved with a cutting plane algorithm.
Experimental results on UCI, face, and digit recognition datasets demonstrate superior classification accuracy and reduced redundancy compared to traditional methods.

Generalized Fisher Score for Feature Selection

Introduction

Feature selection is critical in high-dimensional data processing due to the curse of dimensionality, which significantly increases computational complexity and susceptibility to overfitting. Traditional feature selection methodologies can be broadly classified into three categories: filter-based, wrapper-based, and embedded methods. Among these, filter-based methods like the Fisher score are widely utilized for their computational efficiency. However, standard Fisher score methods select features independently, often leading to suboptimal subsets.

Generalized Fisher Score

The paper proposes a novel method, the Generalized Fisher Score (GFS), which aims to address the limitations of traditional Fisher scores by evaluating feature subsets jointly rather than independently. GFS seeks to maximize the lower bound of the Fisher score, effectively capturing the combined discriminative power of feature subsets and managing redundancy among features.

Reformulation and Optimization

The reformulation of the Fisher score into a mixed integer programming (MIP) problem is central to GFS. The MIP problem is further transformed into a quadratically constrained linear programming (QCLP) problem, allowing it to be solved using a cutting plane algorithm. In each iteration of this algorithm, a multiple kernel learning problem is addressed using multivariate ridge regression and projected gradient descent.

Theoretical Framework

The proposed method leverages indicator variables to denote feature selection and reformulates the Fisher score maximization problem into a regularized discriminant analysis (RDA) problem. The equivalence relationship between RDA and multivariate linear regression provides a path to solving the QCLP problem efficiently. An essential component of this approach is the dual formulation, which is tackled using multiple kernel learning techniques, ensuring the complexity remains manageable even for high-dimensional datasets.

Algorithm and Computational Complexity

The cutting plane algorithm iteratively adds the most violated constraints to a subset of working constraints, enabling a polynomial-time solution to the problem. The algorithm's efficiency is underscored by its time complexity, O(T(cns+slogm)), where T indicates the number of iterations, s represents the average number of non-zero features, and c is the number of classes.

Experimental Evaluation

The GFS was experimentally validated on multiple benchmark datasets. The results demonstrated its superiority over traditional Fisher scores and other contemporary feature selection methods such as Laplacian Score, HSIC, and Trace Ratio criterion.

UCI Datasets

The UCI datasets experiment showcases the GFS's consistent performance across various high-dimensional datasets. The proposed method outstripped other methods, achieving higher classification accuracies in most cases. For instance, on the ionosphere dataset, GFS achieved an accuracy of 89.14% compared to 87.97% for the traditional Fisher score.

Face Recognition

In the ORL face recognition dataset, GFS achieved superior performance with an accuracy of 88.78% when 100 features were selected. The evaluation highlighted GFS's effectiveness in discarding redundant features and selecting highly discriminative features, even in challenging real-world datasets.

Digit Recognition

On the USPS handwritten digit recognition dataset, GFS again demonstrated its prowess, achieving an accuracy of 92.69% with 100 selected features. The method showed its robustness, maintaining high performance even when only a small number of features were selected.

Practical and Theoretical Implications

The proposed GFS method has significant implications for feature selection in high-dimensional datasets. Practically, it provides a robust tool for reducing computational complexity while enhancing classification performance. Theoretically, GFS offers a more holistic approach to feature selection by considering feature subsets jointly, leading to better capture of inherent data structures and relationships among features.

Future Directions

Future research could explore several avenues, including extending GFS to unsupervised and semi-supervised settings, investigating its performance in different domains such as genomics and text classification, and further improving the computational efficiency of the cutting plane algorithm.

Conclusion

The paper presents a sophisticated method for feature selection that effectively addresses the shortcomings of traditional Fisher scores. The Generalized Fisher Score (GFS) method provides a promising approach for improving classification performance on high-dimensional datasets by jointly selecting features and managing redundancy, as evidenced by comprehensive experimental results.

Markdown Report Issue