java.lang.Object
es.upm.fi.cig.multictbnc.data.representation.Dataset

public class Dataset extends Object
Represents a time series dataset, which stores sequences and provides methods to access and modify their information.
  • Constructor Details

    • Dataset

      public Dataset(String nameTimeVariable, List<String> nameClassVariables)
      Creates an empty dataset with the names of the time variable and class variables.
      Parameters:
      nameTimeVariable - name of the time variable
      nameClassVariables - names of the class variables
    • Dataset

      public Dataset(String nameTimeVariable)
      Creates an empty dataset with the name of the time variable.
      Parameters:
      nameTimeVariable - name of the time variable
    • Dataset

      public Dataset(List<Sequence> sequences)
      Creates a dataset with a list of sequences.
      Parameters:
      sequences - list of Sequence
  • Method Details

    • addFeatureVariable

      public void addFeatureVariable(String nameFeatureVariable)
      Registers the name of a feature in the dataset to allow the inclusion of sequences that contain it. This method requires a fill state that will be used to include the variable in those sequences.
      Parameters:
      nameFeatureVariable - feature variable name
    • getSequences

      public List<Sequence> getSequences()
      Returns the sequences of the dataset.
      Returns:
      list with the sequences of the dataset
    • addFeatureVariable

      public void addFeatureVariable(String nameFeatureVariable, Dataset dataset)
      Add a new feature variable to the dataset given the sequences containing the transitions of the variable. It is assumed that the given sequences and those of the dataset have the same length. This method ignores the time variable.
      Parameters:
      nameFeatureVariable - name of the feature variable to add
      dataset - dataset containing sequences with the new feature variable
    • addSequence

      public boolean addSequence(List<String[]> data)
      Receives a list of Strings (a sequence) from which a Sequence is created and adds it to the dataset. The first array of Strings has to contain the name of the variables.
      Parameters:
      data - list of Strings (a sequence) where the first array contains the name of the variables
      Returns:
      true if the sequence was successfully added to the dataset; false otherwise.
    • addSequence

      public boolean addSequence(List<String[]> data, String filePath)
      Receives a list of Strings (a sequence) and the path of the file from which it was extracted. Then, it creates a Sequence and adds it to the dataset. The first array of Strings representing the sequence has to contain the name of the variables.
      Parameters:
      data - list of Strings (a sequence) where the first array contains the name of the variables
      filePath - path of the file from which the sequence was extracted
      Returns:
      true if the sequence was successfully added to the dataset; false otherwise.
    • getNumDataPoints

      public int getNumDataPoints()
      Returns the number of data points. In this case, this is the number of sequences.
      Returns:
      number of sequences
    • getNameAllVariables

      public List<String> getNameAllVariables()
      Returns the name of all the variables, including the time variable.
      Returns:
      name of all the variables
    • getNameTimeVariable

      public String getNameTimeVariable()
      Returns the name of the time variable.
      Returns:
      name of time variable
    • getNameVariables

      public List<String> getNameVariables()
      Returns the name of all the variables except the time variable. The list returned contains first the features and then the class variables.
      Returns:
      name of all the variables except the time variable
    • getNameFeatureVariables

      public List<String> getNameFeatureVariables()
      Returns the names of the feature variables.
      Returns:
      list with the names of the feature variables
    • getNameClassVariables

      public List<String> getNameClassVariables()
      Returns the name of the class variables. Those class variables that should be ignored are filtered.
      Returns:
      list with the names of the class variables
    • addSequence

      public boolean addSequence(Sequence sequence)
      Receives a Sequence to add it to the dataset.
      Parameters:
      sequence - a Sequence
      Returns:
      true if the sequence was successfully added to the dataset; false otherwise.
    • checkVarianceFeatures

      public void checkVarianceFeatures(boolean removeZeroVariance)
      Removes from the dataset those feature variables with zero variance. This method should be used when the entire dataset was read, as new sequences could be included.
      Parameters:
      removeZeroVariance - true to remove variables with no variance, false otherwise
    • getPossibleStatesVariable

      public List<String> getPossibleStatesVariable(String nameVariable)
      Returns the possible states of the specified variable. The states of the variable are extracted once and stored in a map to avoid recomputations. In order to not always return a reference to the same State list, the State objects from the map are copied.
      Parameters:
      nameVariable - variable name
      Returns:
      states of the variable
    • getLabelPowerset

      public Dataset getLabelPowerset()
      Returns a multi-class dataset generated from the multidimensional dataset.
      Returns:
      multi-class dataset
    • getNumClassVariables

      public int getNumClassVariables()
      Returns the number of class variables.
      Returns:
      number of class variables.
    • getNumObservation

      public int getNumObservation()
      Returns the number of observations in the dataset, i.e., the number of observations that occur in all the sequences.
      Returns:
      number of observations
    • getStatesClassVariables

      public State[] getStatesClassVariables()
      Gets the states of the class variables for each of the sequences.
      Returns:
      array of State objects
    • getStatesVariables

      public Map<String,List<String>> getStatesVariables()
      Gets the possible states of all variables.
      Returns:
      array of State objects
    • setStatesVariables

      public void setStatesVariables(Map<String,List<String>> statesVariables)
      Sets states of all variables. This method is used when training and test datasets are defined, and the training dataset needs to know all possible states.
      Parameters:
      statesVariables - a {code Map} linking the names of the variables with their possible states
    • initialiazeStatesClassVariables

      public void initialiazeStatesClassVariables()
      Retrieves the states of the class variables and stores them in a Map.
    • removeFeatureVariables

      public void removeFeatureVariables(List<String> namesFeatureVariables)
      Remove the specified feature variables from the dataset.
      Parameters:
      namesFeatureVariables - names of the feature variables
    • removeFeatureVariable

      public void removeFeatureVariable(String nameFeatureVariable)
      Remove the specified feature variable from the dataset.
      Parameters:
      nameFeatureVariable - feature variable name
    • setIgnoredClassVariables

      public void setIgnoredClassVariables(List<String> ignoredClassVariables)
      Sets the class variables to ignored.
      Parameters:
      ignoredClassVariables - names of the class variables to ignore