Module es.upm.fi.cig.multictbnc
Class AbstractCSVReader
java.lang.Object
es.upm.fi.cig.multictbnc.data.reader.AbstractCSVReader
- All Implemented Interfaces:
DatasetReader
- Direct Known Subclasses:
MultipleCSVReader
,SingleCSVReader
Common attributes and methods for dataset readers.
-
Constructor Summary
ConstructorDescriptionAbstractCSVReader
(String datasetFolder) Receives the path to the dataset folder and initialises the reader as out-of-date. -
Method Summary
Modifier and TypeMethodDescriptionvoid
extractVariableNames
(File csvFile) Extracts the names of the variables given in some CSV files.Returns the name of the class variables.Returns the name of the feature variables.Returns the name of the time variable.Returns the names of all the variables of the dataset, including those that are not used.boolean
Indicates if the dataset is out-of-date.Reads a CSV file.readDataset
(int numFiles) Creates a dataset using only the specified number of files.void
removeZeroVarianceVariables
(boolean removeZeroVarianceVariables) Defines if the feature variables with no variance should be removed.void
setDatasetAsOutdated
(boolean outdated) Defines a previously read dataset as out-of-date, so it should be reloaded.void
setTimeAndClassVariables
(String nameTimeVariable, List<String> nameClassVariables) Receives the names of the time and class variables of a dataset.void
setTimeAndFeatureVariables
(String nameTimeVariable, List<String> nameFeatureVariables) Receives the names of the time and feature variables of a dataset.void
setTimeVariable
(String nameTimeVariable) Receives the name of the time variable of a dataset.void
setVariables
(String nameTimeVariable, List<String> nameClassVariables, List<String> nameFeatureVariables) Receives the names of the time variable, feature variables and class variables of a dataset.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface es.upm.fi.cig.multictbnc.data.reader.DatasetReader
readDataset
-
Constructor Details
-
AbstractCSVReader
Receives the path to the dataset folder and initialises the reader as out-of-date. In that way, the dataset will be loaded when it is requested.- Parameters:
datasetFolder
- path to the dataset folder
-
-
Method Details
-
extractVariableNames
Extracts the names of the variables given in some CSV files. This method assumes that the names are in the first row.- Parameters:
csvFile
- CSV file- Throws:
FileNotFoundException
- if the CSV file was not found
-
readCSV
public List<String[]> readCSV(String pathFile, List<String> excludeVariables) throws VariableNotFoundException, FileNotFoundException Reads a CSV file.- Parameters:
pathFile
- path to the CSV fileexcludeVariables
- names of variables to ignore when reading the CSV- Returns:
- list with the rows (lists of strings) of the CSV
- Throws:
VariableNotFoundException
- if a specified variable was not found in the provided filesFileNotFoundException
- if the CSV file was not found
-
getNameClassVariables
Description copied from interface:DatasetReader
Returns the name of the class variables.- Specified by:
getNameClassVariables
in interfaceDatasetReader
- Returns:
- name of the class variables.
-
getNameFeatureVariables
Description copied from interface:DatasetReader
Returns the name of the feature variables.- Specified by:
getNameFeatureVariables
in interfaceDatasetReader
- Returns:
- name of the feature variables.
-
getNameTimeVariable
Description copied from interface:DatasetReader
Returns the name of the time variable.- Specified by:
getNameTimeVariable
in interfaceDatasetReader
- Returns:
- name of the time variable.
-
getNameVariables
Description copied from interface:DatasetReader
Returns the names of all the variables of the dataset, including those that are not used.- Specified by:
getNameVariables
in interfaceDatasetReader
- Returns:
- names of the variables
-
isDatasetOutdated
public boolean isDatasetOutdated()Description copied from interface:DatasetReader
Indicates if the dataset is out-of-date.- Specified by:
isDatasetOutdated
in interfaceDatasetReader
- Returns:
true
if dataset is out-of-date;false
otherwise.
-
readDataset
Description copied from interface:DatasetReader
Creates a dataset using only the specified number of files. This method allows reading datasets using batches.- Specified by:
readDataset
in interfaceDatasetReader
- Parameters:
numFiles
- number of files- Returns:
- a
Dataset
- Throws:
UnreadDatasetException
- thrown if the dataset could not be read
-
removeZeroVarianceVariables
public void removeZeroVarianceVariables(boolean removeZeroVarianceVariables) Description copied from interface:DatasetReader
Defines if the feature variables with no variance should be removed.- Specified by:
removeZeroVarianceVariables
in interfaceDatasetReader
- Parameters:
removeZeroVarianceVariables
-true
to remove zero variance feature variables,false
-
setDatasetAsOutdated
public void setDatasetAsOutdated(boolean outdated) Description copied from interface:DatasetReader
Defines a previously read dataset as out-of-date, so it should be reloaded.- Specified by:
setDatasetAsOutdated
in interfaceDatasetReader
- Parameters:
outdated
-true
to set dataset as out-of-date;false
otherwise.
-
setTimeAndClassVariables
Description copied from interface:DatasetReader
Receives the names of the time and class variables of a dataset. All the other variables are considered feature variables.- Specified by:
setTimeAndClassVariables
in interfaceDatasetReader
- Parameters:
nameTimeVariable
- name of the time variablenameClassVariables
- name of the class variables
-
setTimeAndFeatureVariables
Description copied from interface:DatasetReader
Receives the names of the time and feature variables of a dataset. This method can be used, for example, when reading datasets to be classified.- Specified by:
setTimeAndFeatureVariables
in interfaceDatasetReader
- Parameters:
nameTimeVariable
- name of the time variablenameFeatureVariables
- name of the feature variables
-
setTimeVariable
Description copied from interface:DatasetReader
Receives the name of the time variable of a dataset.- Specified by:
setTimeVariable
in interfaceDatasetReader
- Parameters:
nameTimeVariable
- name of the time variable
-
setVariables
public void setVariables(String nameTimeVariable, List<String> nameClassVariables, List<String> nameFeatureVariables) Description copied from interface:DatasetReader
Receives the names of the time variable, feature variables and class variables of a dataset. This method can be used, for example, when read training datasets.- Specified by:
setVariables
in interfaceDatasetReader
- Parameters:
nameTimeVariable
- name of the time variablenameClassVariables
- names of the class variablesnameFeatureVariables
- names of the feature variables
-