Fast Correlation-Based Filter
Contact: Haralambos Sarimveis
Categories: Feature selection
Exposed methods:
Feature selection |
|
---|---|
Input: | |
Output: | |
Input format: | Weka's ARFF format |
Output format: | Weka's ARFF format |
User-specified parameters: | A predefined threshold |
Reporting information: | The optimal subset of variables |
Description:
The FCBF (Fast Correlation-Based Filter) algorithm consists of two stages: the first one is a relevance analysis,
aimed at ordering the input variables depending on a relevance score, which is computed as the symmetric
uncertainty with respect to the target output. This stage is also used to discard irrelevant variables, which are
those whose ranking score is below a predefined threshold. The second stage is a redundancy analysis, aimed
at selecting predominant features from the relevant set obtained in the first stage. This selection is an iterative
process that removes those variables which form an approximate Markov blanket. The method is described in
details in [YUL04].
More information can be found in the following Web page:
http://www.public.asu.edu/~huanliu/FCBF/FCBFsoftware.html
Background (publication date, popularity/level of familiarity, rationale of approach, further comments)
Widely used standard feature selection method, disadvantage: the input variables
should be discretized
Class-blind/class-sensitive feature selection
Class-sensitive feature selection
Type (optimal, greedy, randomized)
Optimal
Filter/wrapper/hybrid approach
Filter
Type of Descriptor:
Interfaces:
Priority: Medium
Development status:
Homepage:
Dependencies:
External components: WEKA
Technical details
Data: No
Software: Yes
Programming language(s): Java
Operating system(s): Linux, Win, Mac OS
Input format: Weka's ARFF format
Output format: Weka's ARFF format
License: GPL
References
References:
[YUL04] Yu, L., Liu, H. (2004). Efficient Feature Selection via Analysis of Relevance and Redundancy, Journal of Chemical Machine Learning Research 5:1205-1224.