Class StatisticalBasedFeatureSelection

java.lang.Object
es.upm.fi.cig.multictbnc.fss.StatisticalBasedFeatureSelection
Direct Known Subclasses:
ConInd

public abstract class StatisticalBasedFeatureSelection extends Object
Provides the basis for statistical-based feature subset selection algorithms.
  • Constructor Details

    • StatisticalBasedFeatureSelection

      public StatisticalBasedFeatureSelection(List<String> nameClassVariables, ParameterLearningAlgorithm cimPLA, int maxSeparatingSizeFSS, double sigTimeTransitionHyp, double sigStateToStateTransitionHyp)
      Initializes the necessary structures for feature subset selection including parameter learning algorithms, significance levels, and a cache for redundancy checks.
      Parameters:
      nameClassVariables - names of class variables to be considered in the feature selection process
      cimPLA - parameter learning algorithm for CIM nodes
      maxSeparatingSizeFSS - maximum size of the separating set to be considered in redundancy analysis
      sigTimeTransitionHyp - significance level for time transition hypothesis tests
      sigStateToStateTransitionHyp - significance level for state-to-state transition hypothesis tests
  • Method Details

    • conditionalIndependenceTest

      public static boolean conditionalIndependenceTest(CIMNode featureNode, CPTNode classNode, List<CIMNode> sepSet)
      Tests whether a feature and a class variables are conditionally independent given a certain separating set.
      Parameters:
      featureNode - feature node to test
      classNode - class node to test
      sepSet - list of nodes forming the separating set
      Returns:
      true if the variables are conditionally independent, false otherwise.
    • testNullStateToStateTransitionHypothesis

      protected static boolean testNullStateToStateTransitionHypothesis(CIMNode featureNode, List<String> nameFeatureVariablesSepSet, CTBNSufficientStatistics suffStat, CTBNSufficientStatistics suffStatGivenClass)
      Evaluates the null state-to-state transition hypothesis between a feature node and the class node, given a separating set. This method iteratively tests each state of the feature node against all possible states of its parents (including those in the separating set and the class node), to determine if state transitions of the feature node are independent of the class node, conditioned on the separating set.
      Parameters:
      featureNode - feature node whose state transitions are being tested
      nameFeatureVariablesSepSet - names of the feature variables in the separating set
      suffStat - sufficient statistics for the feature node without the class variable as a parent
      suffStatGivenClass - sufficient statistics for the feature node with the class variable as a parent
      Returns:
      true if the null hypothesis is not rejected indicating that the state transitions of the feature node are conditionally independent of the class node given the separating set; false otherwise
    • testNullTimeToTransitionHypothesis

      protected static boolean testNullTimeToTransitionHypothesis(CIMNode featureNode, List<String> nameFeatureVariablesSepSet, CTBNSufficientStatistics suffStat, double[][] qx, CTBNSufficientStatistics suffStatGivenClassVariable, double[][] qxGivenClassVariable)
      Evaluates the null time to transition hypothesis between a feature node and the class node, given a separating set. This method iteratively tests each state of the feature node against all combined states of the class node and the separating set. The hypothesis test checks if the transition times of the feature node are independent of the class node, given the states of the separating set.
      Parameters:
      featureNode - feature node whose transition times are being tested
      nameFeatureVariablesSepSet - names of the feature variables in the separating set
      suffStat - sufficient statistics for the feature node without the class variable as a parent
      qx - matrix with the intensities of the feature node without the class variable
      suffStatGivenClassVariable - sufficient statistics for the feature node with the class variable as a parent
      qxGivenClassVariable - matrix with the intensities of the feature node with the class variable as a parent
      Returns:
      true if the null hypothesis is not rejected, meaning that the feature and class variables are conditionally independent for the given significance level; false otherwise
    • redundancyAnalysis

      public List<CIMNode> redundancyAnalysis(List<CIMNode> evaluatedFeatureNodes, CPTNode classNode, CIMNode conditionedFeatureNode)
      Returns a subset of feature nodes that are non-redundant given a conditioned feature node. This method evaluates each feature node in a given list against a conditioned feature node to determine if it is redundant. Redundancy is assessed based on the conditional independence of each feature node from the class node, considering the conditioned feature node.
      Parameters:
      evaluatedFeatureNodes - list of feature nodes to be evaluated for redundancy
      classNode - class node used in the redundancy analysis
      conditionedFeatureNode - a feature node used as a condition in the redundancy analysis
      Returns:
      list of non-redundant feature nodes
    • redundancyAnalysis

      public List<CIMNode> redundancyAnalysis(List<CIMNode> evaluatedFeatureNodes, CPTNode classNode)
      Returns a subset of feature nodes that are non-redundant given the class node. This method performs redundancy analysis on each feature node in the list, considering all possible separating sets up to a specified maximum size, to determine if any feature is redundant in the context of the class node.
      Parameters:
      evaluatedFeatureNodes - list of feature nodes to be evaluated for redundancy
      classNode - class node used in the redundancy analysis
      Returns:
      list of non-redundant feature nodes
    • redundancyAnalysis

      public boolean redundancyAnalysis(CIMNode evaluatedFeatureNode, CPTNode classNode, CIMNode featureNode)
      Determines if a feature node is redundant with respect to the class node, given another feature node. This method checks the conditional independence of the evaluated feature node from the class node, considering the given feature node as a conditioning variable.
      Parameters:
      evaluatedFeatureNode - feature node being evaluated for redundancy
      classNode - class node used in the redundancy analysis
      featureNode - feature node used as a conditioning variable in the analysis
      Returns:
      true if the evaluated feature node is redundant, false otherwise
    • redundancyAnalysis

      public boolean redundancyAnalysis(CIMNode evaluatedFeatureNode, CPTNode classNode, List<CIMNode> featureNodes)
      Determines if a feature node is redundant with respect to a class node, given a set of other feature nodes. This method checks the conditional independence of the evaluated feature node from the class node, considering a set of feature nodes as conditioning variables.
      Parameters:
      evaluatedFeatureNode - feature node being evaluated for redundancy
      classNode - class node used in the redundancy analysis
      featureNodes - list of feature nodes used as conditioning variables in the analysis
      Returns:
      true if the evaluated feature node is redundant, false otherwise
    • redundancyAnalysis

      public boolean redundancyAnalysis(CIMNode evaluatedFeatureNode, CPTNode classNode, List<CIMNode> featureNodes, int maxSizeSepSetRedundancyAnalysis)
      Determines if a feature node is redundant with respect to a class node, given sets of features nodes with a determine maximum size. This method evaluates the redundancy of the feature node by examining all possible separating sets up to the specified maximum size.
      Parameters:
      evaluatedFeatureNode - feature node being evaluated for redundancy
      classNode - class node used in the redundancy analysis
      featureNodes - list of feature nodes used as conditioning variables
      maxSizeSepSetRedundancyAnalysis - maximum size of separating sets to consider in the analysis
      Returns:
      true if the evaluated feature node is redundant, false otherwise
    • setDataset

      public void setDataset(Dataset newDataset)
      Sets the dataset to be used in the feature subset selection.
      Parameters:
      newDataset - dataset to be used in the feature selection