Constructor and Description |
---|
SimpleDataset()
Initializes an empty dataset
|
Modifier and Type | Method and Description |
---|---|
void |
addExample(Example example)
Add an example to the dataset
|
void |
addExamples(Dataset datasetToBeAdded)
Add all the examples contained in
datasetToBeAdded |
static Dataset |
extractExamplesOfClasses(Dataset dataset,
List<Label> labels)
This method extracts examples of given
labels from
dataset |
List<Label> |
getClassificationLabels()
Returns all the classification labels in the dataset.
|
Example |
getExample(int exampleIndex)
Return the example stored in the
exampleIndex position |
List<Example> |
getExamples()
Returns an array containing all the stored examples
|
Example |
getNextExample()
Returns the next
n Example s stored in the Dataset or a fewer number
if n examples are not available. |
List<Example> |
getNextExamples(int n)
Returns the next
Example stored in the Dataset |
int |
getNumberOfExamples()
Returns the number of
Example s in the dataset |
int |
getNumberOfNegativeExamples(Label positiveClass)
Returns the number of negative
Example s of a given class |
int |
getNumberOfPositiveExamples(Label positiveClass)
Returns the number of positive
Example s of a given class |
Example |
getRandExample() |
List<Example> |
getRandExamples(int k) |
List<Label> |
getRegressionProperties()
Returns all the regression properties in the dataset.
|
SimpleDataset |
getShuffledDataset() |
Vector |
getZeroVector(String representationIdentifier)
Returns a zero vector compliant with the representation identifier by
representationIdentifier containings all zero |
boolean |
hasNextExample()
Returns a boolean declaring whether there are other Examples in the dataset
|
void |
manipulate(Manipulator... manipulators)
Manipulates all the examples in the dataset accordingly to the strategies defined by the given
manipulators . |
SimpleDataset[] |
nFolding(int n)
Returns
n datasets. |
SimpleDataset[] |
nFoldingClassDistributionInvariant(int n)
Returns
n datasets. |
void |
populate(String filename)
Populate the dataset by reading it from a platform
compliant file.
|
void |
reset()
Reset the reading pointer
|
void |
save(String outputFilePath)
Save the dataset in a file.
|
void |
setSeed(long seed)
Sets the seed of the random generator used to shuffling examples and getting random examples
|
void |
shuffleExamples(Random randomGenerator)
Shuffles the examples in the dataset
|
SimpleDataset[] |
split(float percentage)
Returns two datasets created by splitting this dataset accordingly to
percentage . |
SimpleDataset[] |
splitClassDistributionInvariant(float percentage)
Returns two datasets created by splitting this dataset accordingly to
percentage . |
public void addExample(Example example)
addExample
in interface Dataset
example
- the example to be addedpublic void addExamples(Dataset datasetToBeAdded)
datasetToBeAdded
datasetToBeAdded
- the dataset containing all the examples to be addedpublic Example getExample(int exampleIndex)
exampleIndex
positionexampleIndex
- the index of the example to returnexampleIndex
positionpublic boolean hasNextExample()
Dataset
hasNextExample
in interface Dataset
true
if and only if there is at least another Example in the datasetpublic Example getNextExample()
Dataset
n Example
s stored in the Dataset or a fewer number
if n
examples are not available.getNextExample
in interface Dataset
n Example
spublic List<Example> getNextExamples(int n)
Dataset
Example
stored in the DatasetgetNextExamples
in interface Dataset
n
- the number of examples to be returnedExample
public void reset()
Dataset
public int getNumberOfPositiveExamples(Label positiveClass)
Dataset
Example
s of a given classgetNumberOfPositiveExamples
in interface Dataset
positiveClass
- the class whose number of positive Example
s are requiredExample
s of positiveClasspublic int getNumberOfNegativeExamples(Label positiveClass)
Dataset
Example
s of a given classgetNumberOfNegativeExamples
in interface Dataset
positiveClass
- the class whose number of negative Example
s are requiredExample
s of positiveClasspublic int getNumberOfExamples()
Dataset
Example
s in the datasetgetNumberOfExamples
in interface Dataset
Example
s in the datasetpublic List<Label> getClassificationLabels()
Dataset
getClassificationLabels
in interface Dataset
public List<Label> getRegressionProperties()
Dataset
getRegressionProperties
in interface Dataset
public void shuffleExamples(Random randomGenerator)
randomGenerator
- a random number generatorpublic SimpleDataset[] splitClassDistributionInvariant(float percentage)
percentage
. The original distribution of the examples among
the classes is maintained in the two datasets. The examples are split
accordingly to their order. Thus the first dataset consists of the first
percentage
% of examples of each class, while the second
dataset consists in all the remaining examplespercentage
- should be a number in [0,1]public SimpleDataset[] split(float percentage)
percentage
. The examples are split accordingly to their
order without maintaining the original data distribution among the
classes. Thus the first dataset consists of the first
percentage
% of examples, while the second dataset consists
in all the remaining examplespercentage
- should be a number in [0,1]public SimpleDataset[] nFoldingClassDistributionInvariant(int n)
n
datasets. Each dataset is a fold storing 1/n of
the total examples. The folds are not overlapped and maintain the
original distribution of the examples among the classes. The example in
this dataset are split into n
folds accordingly to their
order, so that for instance the first folds has all the first examples of
each classn
- the number of folds to createn
datasets each one consisting of 1/n% of the
examplespublic SimpleDataset[] nFolding(int n)
n
datasets. Each dataset is a fold storing 1/n of
the total examples. The folds are not overlapped and do not maintain the
original distribution of the examples among the classes. The example in
this dataset are split into n
folds accordingly to their
order, so that for instance the first folds has all the first examples.n
- the number of folds to createn
datasets each one consisting of 1/n% of the
examplespublic List<Example> getExamples()
Dataset
getExamples
in interface Dataset
public static Dataset extractExamplesOfClasses(Dataset dataset, List<Label> labels) throws InstantiationException, IllegalAccessException
labels
from
dataset
dataset
- original datasetlabels
- labels of interestIllegalAccessException
InstantiationException
public void populate(String filename) throws Exception
filename
- the path of the file to be readException
public Example getRandExample()
getRandExample
in interface Dataset
public List<Example> getRandExamples(int k)
getRandExamples
in interface Dataset
k
- the number of examples to be returnedk
random examples.
NOTE: Duplicates are allowed
public SimpleDataset getShuffledDataset()
getShuffledDataset
in interface Dataset
public void setSeed(long seed)
Dataset
public Vector getZeroVector(String representationIdentifier)
Dataset
representationIdentifier
containings all zerogetZeroVector
in interface Dataset
representationIdentifier
- the identifier of the representationrepresentationIdentifier
containings all zeropublic void manipulate(Manipulator... manipulators)
Dataset
manipulators
.
manipulator
in the arraymanipulate
in interface Dataset
manipulators
- the manipulators that must be applied to all the examples in the datasetpublic void save(String outputFilePath) throws FileNotFoundException, IOException
outputFilePath
- the file pathFileNotFoundException
IOException
Copyright © 2015 Semantic Analytics Group @ Uniroma2. All rights reserved.