Unsupervised Feature Selection with Adaptive Structure Learning

Published 3 Apr 2015 in cs.LG | (1504.00736v1)

Abstract: The problem of feature selection has raised considerable interests in the past decade. Traditional unsupervised methods select the features which can faithfully preserve the intrinsic structures of data, where the intrinsic structures are estimated using all the input features of data. However, the estimated intrinsic structures are unreliable/inaccurate when the redundant and noisy features are not removed. Therefore, we face a dilemma here: one need the true structures of data to identify the informative features, and one need the informative features to accurately estimate the true structures of data. To address this, we propose a unified learning framework which performs structure learning and feature selection simultaneously. The structures are adaptively learned from the results of feature selection, and the informative features are reselected to preserve the refined structures of data. By leveraging the interactions between these two essential tasks, we are able to capture accurate structures and select more informative features. Experimental results on many benchmark data sets demonstrate that the proposed method outperforms many state of the art unsupervised feature selection methods.

Abstract PDF Upgrade to Chat

Authors (2)

Citations (209)

View on Semantic Scholar

Summary

The paper presents FSASL, which integrates adaptive global and local structure learning to improve unsupervised feature selection.
FSASL leverages sparse representations and probabilistic neighborhood graphs to accurately capture both global and local data structures.
Extensive experiments on benchmark datasets demonstrate that FSASL significantly boosts clustering accuracy and normalized mutual information.

Unsupervised Feature Selection with Adaptive Structure Learning

The field of unsupervised feature selection, particularly within the context of high-dimensional data, poses significant challenges due to the lack of labeled data to guide the selection of relevant features. This paper presents a novel approach to unsupervised feature selection, termed as "Feature Selection with Adaptive Structure Learning" (FSASL), that aims to simultaneously address structure learning and feature selection in an intertwined manner.

Overview of the Approach

The FSASL framework is constructed on the premise of solving the inherent chicken-and-egg problem associated with unsupervised feature selection: reliable intrinsic data structure identification necessitates the selection of informative features, yet the selection of informative features depends on the accurate characterization of intrinsic data structures. To solve this, FSASL proposes a unified approach that adapts both structure learning and feature selection in a mutually reinforcing cycle.

The proposed method draws from recent advances in sparse representation and structure preservation techniques. In particular, the intrinsic structure of the data is adaptively captured through sparse representation for global structures and probabilistic neighborhood relationships for local structures. The key innovations in FSASL can be summarized as follows:

Adaptive Global Structure Learning: The global structure of data is captured through sparse representation derived from the combination of the entire dataset. This approach benefits from the discriminative nature of sparse representations.
Adaptive Local Structure Learning: The local manifold structure is captured through a learned probabilistic neighborhood graph, rather than the traditional fixed nearest-neighbor graph, allowing for more refined local structure characterization following feature selection steps.
Unified Learning Framework: By integrating adaptive global and local structure learning processes with feature selection, the framework balances mutual reinforcement between structure learning and feature selection, enabling each task to iteratively boost the performance of the other.

Strong Numerical Results and Implications

The significance of this work is empirically demonstrated through extensive experiments on multiple benchmark datasets, spanning categories such as digit/letter recognition, facial image data, and biomedical data. FSASL consistently outperforms established unsupervised feature selection methods in terms of clustering performance, as measured by accuracy (ACC) and normalized mutual information (NMI). For instance, FSASL displays improvement in clustering accuracy and NMI, reinforcing the effectiveness of the adaptive learning approach.

Results indicate that FSASL's ability to concurrently leverage both global and local structures, while adapting these structures based on selected features, is crucial in offering improved performance over existing models. The adaptive nature of the structure learning within FSASL offers a substantial advantage over fixed-structure methods, particularly in the presence of high-dimensional noisy data.

Practical and Theoretical Implications

The implications of this research are twofold:

Practical Implications: FSASL provides a scalable and adaptable solution for unsupervised feature selection, ideal for high-dimensional datasets characteristic of contemporary applications in image analysis and bioinformatics. It reduces computational complexity by focusing processing efforts on a refined subset of features, enhancing both efficiency and effectiveness of subsequent data mining processes.
Theoretical Contributions: By framing feature selection as an interactive process with structure learning, FSASL introduces a new paradigm for addressing unsupervised feature selection challenges. This interactive process challenges traditional models and lays the groundwork for future exploration into adaptive learning frameworks that can be used in various facets of machine learning and data transformation.

Future Directions

The paper suggests several potential future directions for improving the FSASL framework. These include the optimization of parameter settings to reduce computational overhead, the exploration of alternative regularization frameworks (e.g., non-convex regularizers such as the $\ell_0$ norm), and the development of parallelized algorithms to handle large-scale datasets efficiently.

In conclusion, FSASL presents a robust, adaptable framework for unsupervised feature selection, effectively leveraging mutual reinforcement between adaptive global and local structure learning. Through both strong performance on benchmark datasets and insightful theoretical contributions, it marks a noteworthy advancement in the field of feature selection methodologies.

Markdown Report Issue