- The paper introduces a novel approach that jointly selects feature subsets by maximizing the lower bound of the Fisher score to capture feature interdependencies.
- It reformulates feature selection as a mixed integer programming problem, converting it into a quadratically constrained linear program solved with a cutting plane algorithm.
- Experimental results on UCI, face, and digit recognition datasets demonstrate superior classification accuracy and reduced redundancy compared to traditional methods.
Generalized Fisher Score for Feature Selection
Introduction
Feature selection is critical in high-dimensional data processing due to the curse of dimensionality, which significantly increases computational complexity and susceptibility to overfitting. Traditional feature selection methodologies can be broadly classified into three categories: filter-based, wrapper-based, and embedded methods. Among these, filter-based methods like the Fisher score are widely utilized for their computational efficiency. However, standard Fisher score methods select features independently, often leading to suboptimal subsets.
Generalized Fisher Score
The paper proposes a novel method, the Generalized Fisher Score (GFS), which aims to address the limitations of traditional Fisher scores by evaluating feature subsets jointly rather than independently. GFS seeks to maximize the lower bound of the Fisher score, effectively capturing the combined discriminative power of feature subsets and managing redundancy among features.
The reformulation of the Fisher score into a mixed integer programming (MIP) problem is central to GFS. The MIP problem is further transformed into a quadratically constrained linear programming (QCLP) problem, allowing it to be solved using a cutting plane algorithm. In each iteration of this algorithm, a multiple kernel learning problem is addressed using multivariate ridge regression and projected gradient descent.
Theoretical Framework
The proposed method leverages indicator variables to denote feature selection and reformulates the Fisher score maximization problem into a regularized discriminant analysis (RDA) problem. The equivalence relationship between RDA and multivariate linear regression provides a path to solving the QCLP problem efficiently. An essential component of this approach is the dual formulation, which is tackled using multiple kernel learning techniques, ensuring the complexity remains manageable even for high-dimensional datasets.
Algorithm and Computational Complexity
The cutting plane algorithm iteratively adds the most violated constraints to a subset of working constraints, enabling a polynomial-time solution to the problem. The algorithm's efficiency is underscored by its time complexity, O(T(cns+slogm)), where T indicates the number of iterations, s represents the average number of non-zero features, and c is the number of classes.
Experimental Evaluation
The GFS was experimentally validated on multiple benchmark datasets. The results demonstrated its superiority over traditional Fisher scores and other contemporary feature selection methods such as Laplacian Score, HSIC, and Trace Ratio criterion.
UCI Datasets
The UCI datasets experiment showcases the GFS's consistent performance across various high-dimensional datasets. The proposed method outstripped other methods, achieving higher classification accuracies in most cases. For instance, on the ionosphere dataset, GFS achieved an accuracy of 89.14% compared to 87.97% for the traditional Fisher score.
Face Recognition
In the ORL face recognition dataset, GFS achieved superior performance with an accuracy of 88.78% when 100 features were selected. The evaluation highlighted GFS's effectiveness in discarding redundant features and selecting highly discriminative features, even in challenging real-world datasets.
Digit Recognition
On the USPS handwritten digit recognition dataset, GFS again demonstrated its prowess, achieving an accuracy of 92.69% with 100 selected features. The method showed its robustness, maintaining high performance even when only a small number of features were selected.
Practical and Theoretical Implications
The proposed GFS method has significant implications for feature selection in high-dimensional datasets. Practically, it provides a robust tool for reducing computational complexity while enhancing classification performance. Theoretically, GFS offers a more holistic approach to feature selection by considering feature subsets jointly, leading to better capture of inherent data structures and relationships among features.
Future Directions
Future research could explore several avenues, including extending GFS to unsupervised and semi-supervised settings, investigating its performance in different domains such as genomics and text classification, and further improving the computational efficiency of the cutting plane algorithm.
Conclusion
The paper presents a sophisticated method for feature selection that effectively addresses the shortcomings of traditional Fisher scores. The Generalized Fisher Score (GFS) method provides a promising approach for improving classification performance on high-dimensional datasets by jointly selecting features and managing redundancy, as evidenced by comprehensive experimental results.