java.lang.Object
es.upm.fi.cig.multictbnc.data.representation.Dataset
Represents a time series dataset, which stores sequences and provides methods to access and modify their
information.
-
Constructor Summary
ConstructorDescriptionCreates an empty dataset with the name of the time variable.Creates an empty dataset with the names of the time variable and class variables.Creates a dataset with a list of sequences. -
Method Summary
Modifier and TypeMethodDescriptionvoid
addFeatureVariable
(String nameFeatureVariable) Registers the name of a feature in the dataset to allow the inclusion of sequences that contain it.void
addFeatureVariable
(String nameFeatureVariable, Dataset dataset) Add a new feature variable to the dataset given the sequences containing the transitions of the variable.boolean
addSequence
(Sequence sequence) Receives aSequence
to add it to the dataset.boolean
addSequence
(List<String[]> data) Receives a list of Strings (a sequence) from which aSequence
is created and adds it to the dataset.boolean
addSequence
(List<String[]> data, String filePath) Receives a list of Strings (a sequence) and the path of the file from which it was extracted.void
checkVarianceFeatures
(boolean removeZeroVariance) Removes from the dataset those feature variables with zero variance.Returns a multi-class dataset generated from the multidimensional dataset.Returns the name of all the variables, including the time variable.Returns the name of the class variables.Returns the names of the feature variables.Returns the name of the time variable.Returns the name of all the variables except the time variable.int
Returns the number of class variables.int
Returns the number of data points.int
Returns the number of observations in the dataset, i.e., the number of observations that occur in all the sequences.getPossibleStatesVariable
(String nameVariable) Returns the possible states of the specified variable.Returns the sequences of the dataset.State[]
Gets the states of the class variables for each of the sequences.Gets the possible states of all variables.void
Retrieves the states of the class variables and stores them in aMap
.void
removeFeatureVariable
(String nameFeatureVariable) Remove the specified feature variable from the dataset.void
removeFeatureVariables
(List<String> namesFeatureVariables) Remove the specified feature variables from the dataset.void
setIgnoredClassVariables
(List<String> ignoredClassVariables) Sets the class variables to ignored.void
setStatesVariables
(Map<String, List<String>> statesVariables) Sets states of all variables.
-
Constructor Details
-
Dataset
Creates an empty dataset with the names of the time variable and class variables.- Parameters:
nameTimeVariable
- name of the time variablenameClassVariables
- names of the class variables
-
Dataset
Creates an empty dataset with the name of the time variable.- Parameters:
nameTimeVariable
- name of the time variable
-
Dataset
Creates a dataset with a list of sequences.- Parameters:
sequences
- list ofSequence
-
-
Method Details
-
addFeatureVariable
Registers the name of a feature in the dataset to allow the inclusion of sequences that contain it. This method requires a fill state that will be used to include the variable in those sequences.- Parameters:
nameFeatureVariable
- feature variable name
-
getSequences
Returns the sequences of the dataset.- Returns:
- list with the sequences of the dataset
-
addFeatureVariable
Add a new feature variable to the dataset given the sequences containing the transitions of the variable. It is assumed that the given sequences and those of the dataset have the same length. This method ignores the time variable.- Parameters:
nameFeatureVariable
- name of the feature variable to adddataset
- dataset containing sequences with the new feature variable
-
addSequence
Receives a list of Strings (a sequence) from which aSequence
is created and adds it to the dataset. The first array of Strings has to contain the name of the variables.- Parameters:
data
- list of Strings (a sequence) where the first array contains the name of the variables- Returns:
true
if the sequence was successfully added to the dataset;false
otherwise.
-
addSequence
Receives a list of Strings (a sequence) and the path of the file from which it was extracted. Then, it creates aSequence
and adds it to the dataset. The first array of Strings representing the sequence has to contain the name of the variables.- Parameters:
data
- list of Strings (a sequence) where the first array contains the name of the variablesfilePath
- path of the file from which the sequence was extracted- Returns:
true
if the sequence was successfully added to the dataset;false
otherwise.
-
getNumDataPoints
public int getNumDataPoints()Returns the number of data points. In this case, this is the number of sequences.- Returns:
- number of sequences
-
getNameAllVariables
Returns the name of all the variables, including the time variable.- Returns:
- name of all the variables
-
getNameTimeVariable
Returns the name of the time variable.- Returns:
- name of time variable
-
getNameVariables
Returns the name of all the variables except the time variable. The list returned contains first the features and then the class variables.- Returns:
- name of all the variables except the time variable
-
getNameFeatureVariables
Returns the names of the feature variables.- Returns:
- list with the names of the feature variables
-
getNameClassVariables
Returns the name of the class variables. Those class variables that should be ignored are filtered.- Returns:
- list with the names of the class variables
-
addSequence
Receives aSequence
to add it to the dataset.- Parameters:
sequence
- aSequence
- Returns:
true
if the sequence was successfully added to the dataset;false
otherwise.
-
checkVarianceFeatures
public void checkVarianceFeatures(boolean removeZeroVariance) Removes from the dataset those feature variables with zero variance. This method should be used when the entire dataset was read, as new sequences could be included.- Parameters:
removeZeroVariance
-true
to remove variables with no variance,false
otherwise
-
getPossibleStatesVariable
Returns the possible states of the specified variable. The states of the variable are extracted once and stored in a map to avoid recomputations. In order to not always return a reference to the same State list, the State objects from the map are copied.- Parameters:
nameVariable
- variable name- Returns:
- states of the variable
-
getLabelPowerset
Returns a multi-class dataset generated from the multidimensional dataset.- Returns:
- multi-class dataset
-
getNumClassVariables
public int getNumClassVariables()Returns the number of class variables.- Returns:
- number of class variables.
-
getNumObservation
public int getNumObservation()Returns the number of observations in the dataset, i.e., the number of observations that occur in all the sequences.- Returns:
- number of observations
-
getStatesClassVariables
Gets the states of the class variables for each of the sequences.- Returns:
- array of
State
objects
-
getStatesVariables
Gets the possible states of all variables.- Returns:
- array of
State
objects
-
setStatesVariables
Sets states of all variables. This method is used when training and test datasets are defined, and the training dataset needs to know all possible states.- Parameters:
statesVariables
- a {code Map} linking the names of the variables with their possible states
-
initialiazeStatesClassVariables
public void initialiazeStatesClassVariables()Retrieves the states of the class variables and stores them in aMap
. -
removeFeatureVariables
Remove the specified feature variables from the dataset.- Parameters:
namesFeatureVariables
- names of the feature variables
-
removeFeatureVariable
Remove the specified feature variable from the dataset.- Parameters:
nameFeatureVariable
- feature variable name
-
setIgnoredClassVariables
Sets the class variables to ignored.- Parameters:
ignoredClassVariables
- names of the class variables to ignore
-