Module es.upm.fi.cig.multictbnc
Package es.upm.fi.cig.multictbnc.fss
Class StatisticalBasedFeatureSelection
java.lang.Object
es.upm.fi.cig.multictbnc.fss.StatisticalBasedFeatureSelection
- Direct Known Subclasses:
ConInd
Provides the basis for statistical-based feature subset selection algorithms.
-
Constructor Summary
ConstructorDescriptionStatisticalBasedFeatureSelection
(List<String> nameClassVariables, ParameterLearningAlgorithm cimPLA, int maxSeparatingSizeFSS, double sigTimeTransitionHyp, double sigStateToStateTransitionHyp) Initializes the necessary structures for feature subset selection including parameter learning algorithms, significance levels, and a cache for redundancy checks. -
Method Summary
Modifier and TypeMethodDescriptionstatic boolean
conditionalIndependenceTest
(CIMNode featureNode, CPTNode classNode, List<CIMNode> sepSet) Tests whether a feature and a class variables are conditionally independent given a certain separating set.boolean
redundancyAnalysis
(CIMNode evaluatedFeatureNode, CPTNode classNode, CIMNode featureNode) Determines if a feature node is redundant with respect to the class node, given another feature node.boolean
redundancyAnalysis
(CIMNode evaluatedFeatureNode, CPTNode classNode, List<CIMNode> featureNodes) Determines if a feature node is redundant with respect to a class node, given a set of other feature nodes.boolean
redundancyAnalysis
(CIMNode evaluatedFeatureNode, CPTNode classNode, List<CIMNode> featureNodes, int maxSizeSepSetRedundancyAnalysis) Determines if a feature node is redundant with respect to a class node, given sets of features nodes with a determine maximum size.redundancyAnalysis
(List<CIMNode> evaluatedFeatureNodes, CPTNode classNode) Returns a subset of feature nodes that are non-redundant given the class node.redundancyAnalysis
(List<CIMNode> evaluatedFeatureNodes, CPTNode classNode, CIMNode conditionedFeatureNode) Returns a subset of feature nodes that are non-redundant given a conditioned feature node.void
setDataset
(Dataset newDataset) Sets the dataset to be used in the feature subset selection.protected static boolean
testNullStateToStateTransitionHypothesis
(CIMNode featureNode, List<String> nameFeatureVariablesSepSet, CTBNSufficientStatistics suffStat, CTBNSufficientStatistics suffStatGivenClass) Evaluates the null state-to-state transition hypothesis between a feature node and the class node, given a separating set.protected static boolean
testNullTimeToTransitionHypothesis
(CIMNode featureNode, List<String> nameFeatureVariablesSepSet, CTBNSufficientStatistics suffStat, double[][] qx, CTBNSufficientStatistics suffStatGivenClassVariable, double[][] qxGivenClassVariable) Evaluates the null time to transition hypothesis between a feature node and the class node, given a separating set.
-
Constructor Details
-
StatisticalBasedFeatureSelection
public StatisticalBasedFeatureSelection(List<String> nameClassVariables, ParameterLearningAlgorithm cimPLA, int maxSeparatingSizeFSS, double sigTimeTransitionHyp, double sigStateToStateTransitionHyp) Initializes the necessary structures for feature subset selection including parameter learning algorithms, significance levels, and a cache for redundancy checks.- Parameters:
nameClassVariables
- names of class variables to be considered in the feature selection processcimPLA
- parameter learning algorithm for CIM nodesmaxSeparatingSizeFSS
- maximum size of the separating set to be considered in redundancy analysissigTimeTransitionHyp
- significance level for time transition hypothesis testssigStateToStateTransitionHyp
- significance level for state-to-state transition hypothesis tests
-
-
Method Details
-
conditionalIndependenceTest
public static boolean conditionalIndependenceTest(CIMNode featureNode, CPTNode classNode, List<CIMNode> sepSet) Tests whether a feature and a class variables are conditionally independent given a certain separating set.- Parameters:
featureNode
- feature node to testclassNode
- class node to testsepSet
- list of nodes forming the separating set- Returns:
true
if the variables are conditionally independent,false
otherwise.
-
testNullStateToStateTransitionHypothesis
protected static boolean testNullStateToStateTransitionHypothesis(CIMNode featureNode, List<String> nameFeatureVariablesSepSet, CTBNSufficientStatistics suffStat, CTBNSufficientStatistics suffStatGivenClass) Evaluates the null state-to-state transition hypothesis between a feature node and the class node, given a separating set. This method iteratively tests each state of the feature node against all possible states of its parents (including those in the separating set and the class node), to determine if state transitions of the feature node are independent of the class node, conditioned on the separating set.- Parameters:
featureNode
- feature node whose state transitions are being testednameFeatureVariablesSepSet
- names of the feature variables in the separating setsuffStat
- sufficient statistics for the feature node without the class variable as a parentsuffStatGivenClass
- sufficient statistics for the feature node with the class variable as a parent- Returns:
true
if the null hypothesis is not rejected indicating that the state transitions of the feature node are conditionally independent of the class node given the separating set;false
otherwise
-
testNullTimeToTransitionHypothesis
protected static boolean testNullTimeToTransitionHypothesis(CIMNode featureNode, List<String> nameFeatureVariablesSepSet, CTBNSufficientStatistics suffStat, double[][] qx, CTBNSufficientStatistics suffStatGivenClassVariable, double[][] qxGivenClassVariable) Evaluates the null time to transition hypothesis between a feature node and the class node, given a separating set. This method iteratively tests each state of the feature node against all combined states of the class node and the separating set. The hypothesis test checks if the transition times of the feature node are independent of the class node, given the states of the separating set.- Parameters:
featureNode
- feature node whose transition times are being testednameFeatureVariablesSepSet
- names of the feature variables in the separating setsuffStat
- sufficient statistics for the feature node without the class variable as a parentqx
- matrix with the intensities of the feature node without the class variablesuffStatGivenClassVariable
- sufficient statistics for the feature node with the class variable as a parentqxGivenClassVariable
- matrix with the intensities of the feature node with the class variable as a parent- Returns:
true
if the null hypothesis is not rejected, meaning that the feature and class variables are conditionally independent for the given significance level;false
otherwise
-
redundancyAnalysis
public List<CIMNode> redundancyAnalysis(List<CIMNode> evaluatedFeatureNodes, CPTNode classNode, CIMNode conditionedFeatureNode) Returns a subset of feature nodes that are non-redundant given a conditioned feature node. This method evaluates each feature node in a given list against a conditioned feature node to determine if it is redundant. Redundancy is assessed based on the conditional independence of each feature node from the class node, considering the conditioned feature node.- Parameters:
evaluatedFeatureNodes
- list of feature nodes to be evaluated for redundancyclassNode
- class node used in the redundancy analysisconditionedFeatureNode
- a feature node used as a condition in the redundancy analysis- Returns:
- list of non-redundant feature nodes
-
redundancyAnalysis
Returns a subset of feature nodes that are non-redundant given the class node. This method performs redundancy analysis on each feature node in the list, considering all possible separating sets up to a specified maximum size, to determine if any feature is redundant in the context of the class node.- Parameters:
evaluatedFeatureNodes
- list of feature nodes to be evaluated for redundancyclassNode
- class node used in the redundancy analysis- Returns:
- list of non-redundant feature nodes
-
redundancyAnalysis
public boolean redundancyAnalysis(CIMNode evaluatedFeatureNode, CPTNode classNode, CIMNode featureNode) Determines if a feature node is redundant with respect to the class node, given another feature node. This method checks the conditional independence of the evaluated feature node from the class node, considering the given feature node as a conditioning variable.- Parameters:
evaluatedFeatureNode
- feature node being evaluated for redundancyclassNode
- class node used in the redundancy analysisfeatureNode
- feature node used as a conditioning variable in the analysis- Returns:
true
if the evaluated feature node is redundant,false
otherwise
-
redundancyAnalysis
public boolean redundancyAnalysis(CIMNode evaluatedFeatureNode, CPTNode classNode, List<CIMNode> featureNodes) Determines if a feature node is redundant with respect to a class node, given a set of other feature nodes. This method checks the conditional independence of the evaluated feature node from the class node, considering a set of feature nodes as conditioning variables.- Parameters:
evaluatedFeatureNode
- feature node being evaluated for redundancyclassNode
- class node used in the redundancy analysisfeatureNodes
- list of feature nodes used as conditioning variables in the analysis- Returns:
true
if the evaluated feature node is redundant,false
otherwise
-
redundancyAnalysis
public boolean redundancyAnalysis(CIMNode evaluatedFeatureNode, CPTNode classNode, List<CIMNode> featureNodes, int maxSizeSepSetRedundancyAnalysis) Determines if a feature node is redundant with respect to a class node, given sets of features nodes with a determine maximum size. This method evaluates the redundancy of the feature node by examining all possible separating sets up to the specified maximum size.- Parameters:
evaluatedFeatureNode
- feature node being evaluated for redundancyclassNode
- class node used in the redundancy analysisfeatureNodes
- list of feature nodes used as conditioning variablesmaxSizeSepSetRedundancyAnalysis
- maximum size of separating sets to consider in the analysis- Returns:
true
if the evaluated feature node is redundant,false
otherwise
-
setDataset
Sets the dataset to be used in the feature subset selection.- Parameters:
newDataset
- dataset to be used in the feature selection
-