public class SimpleDataset extends Object implements Dataset
Constructor and Description |
---|
SimpleDataset()
Initializes an empty dataset
|
Modifier and Type | Method and Description |
---|---|
void |
addExample(Example example)
Add an example to the dataset
|
void |
addExamples(Dataset datasetToBeAdded)
Add all the examples contained in
datasetToBeAdded |
static Dataset |
extractExamplesOfClasses(Dataset dataset,
List<Label> labels)
This method extracts examples of given
labels from
dataset |
List<Label> |
getClassificationLabels()
Returns all the classification labels in the dataset.
|
Example |
getExample(int exampleIndex)
Return the example stored in the
exampleIndex position |
List<Example> |
getExamples()
Returns an array containing all the stored examples
|
Example |
getNextExample()
Returns the next
n Example s stored in the Dataset or a fewer number
if n examples are not available. |
List<Example> |
getNextExamples(int n)
Returns the next
Example stored in the Dataset |
int |
getNumberOfExamples()
Returns the number of
Example s in the dataset |
int |
getNumberOfNegativeExamples(Label positiveClass)
Returns the number of negative
Example s of a given class |
int |
getNumberOfPositiveExamples(Label positiveClass)
Returns the number of positive
Example s of a given class |
Example |
getRandExample() |
List<Example> |
getRandExamples(int k) |
List<Label> |
getRegressionProperties()
Returns all the regression properties in the dataset.
|
SimpleDataset |
getShuffledDataset() |
Vector |
getZeroVector(String representationIdentifier)
Returns a zero vector compliant with the representation identifier by
representationIdentifier containing all zeros |
boolean |
hasNextExample()
Returns a boolean declaring whether there are other Examples in the dataset
|
boolean |
isConsistent()
Evaluates whether the examples included in this dataset are compatible
with each other.
|
void |
manipulate(Manipulator... manipulators)
Manipulates all the examples in the dataset accordingly to the strategies defined by the given
manipulators . |
SimpleDataset[] |
nFolding(int n)
Returns
n datasets. |
SimpleDataset[] |
nFoldingClassDistributionInvariant(int n)
Returns
n datasets. |
void |
populate(DatasetReader reader)
Populate the dataset using the provided
reader |
void |
populate(String filename)
Populate the dataset by reading it from a KeLP
compliant file.
|
void |
reset()
Reset the reading pointer
|
void |
save(String outputFilePath)
Save the dataset in a file.
|
void |
setSeed(long seed)
Sets the seed of the random generator used to shuffling examples and getting random examples
|
void |
shuffleExamples(Random randomGenerator)
Shuffles the examples in the dataset
|
SimpleDataset[] |
split(float percentage)
Returns two datasets created by splitting this dataset accordingly to
percentage . |
SimpleDataset[] |
splitClassDistributionInvariant(float percentage)
Returns two datasets created by splitting this dataset accordingly to
percentage . |
public void addExample(Example example)
addExample
in interface Dataset
example
- the example to be addedpublic void addExamples(Dataset datasetToBeAdded)
datasetToBeAdded
datasetToBeAdded
- the dataset containing all the examples to be addedpublic Example getExample(int exampleIndex)
exampleIndex
positionexampleIndex
- the index of the example to returnexampleIndex
positionpublic boolean hasNextExample()
Dataset
hasNextExample
in interface Dataset
true
if and only if there is at least another Example in the datasetpublic Example getNextExample()
Dataset
n Example
s stored in the Dataset or a fewer number
if n
examples are not available.getNextExample
in interface Dataset
n Example
spublic List<Example> getNextExamples(int n)
Dataset
Example
stored in the DatasetgetNextExamples
in interface Dataset
n
- the number of examples to be returnedExample
public void reset()
Dataset
public int getNumberOfPositiveExamples(Label positiveClass)
Dataset
Example
s of a given classgetNumberOfPositiveExamples
in interface Dataset
positiveClass
- the class whose number of positive Example
s are requiredExample
s of positiveClasspublic int getNumberOfNegativeExamples(Label positiveClass)
Dataset
Example
s of a given classgetNumberOfNegativeExamples
in interface Dataset
positiveClass
- the class whose number of negative Example
s are requiredExample
s of positiveClasspublic int getNumberOfExamples()
Dataset
Example
s in the datasetgetNumberOfExamples
in interface Dataset
Example
s in the datasetpublic List<Label> getClassificationLabels()
Dataset
getClassificationLabels
in interface Dataset
public List<Label> getRegressionProperties()
Dataset
getRegressionProperties
in interface Dataset
public void shuffleExamples(Random randomGenerator)
randomGenerator
- a random number generatorpublic SimpleDataset[] splitClassDistributionInvariant(float percentage)
percentage
. The original distribution of the examples among
the classes is maintained in the two datasets. The examples are split
accordingly to their order. Thus the first dataset consists of the first
percentage
% of examples of each class, while the second
dataset consists in all the remaining examplespercentage
- should be a number in [0,1]public SimpleDataset[] split(float percentage)
percentage
. The examples are split accordingly to their
order without maintaining the original data distribution among the
classes. Thus the first dataset consists of the first
percentage
% of examples, while the second dataset consists
in all the remaining examplespercentage
- should be a number in [0,1]public SimpleDataset[] nFoldingClassDistributionInvariant(int n)
n
datasets. Each dataset is a fold storing 1/n of
the total examples. The folds are not overlapped and maintain the
original distribution of the examples among the classes. The example in
this dataset are split into n
folds accordingly to their
order, so that for instance the first folds has all the first examples of
each classn
- the number of folds to createn
datasets each one consisting of 1/n% of the
examplespublic SimpleDataset[] nFolding(int n)
n
datasets. Each dataset is a fold storing 1/n of
the total examples. The folds are not overlapped and do not maintain the
original distribution of the examples among the classes. The example in
this dataset are split into n
folds accordingly to their
order, so that for instance the first folds has all the first examples.n
- the number of folds to createn
datasets each one consisting of 1/n% of the
examplespublic List<Example> getExamples()
Dataset
getExamples
in interface Dataset
public static Dataset extractExamplesOfClasses(Dataset dataset, List<Label> labels) throws InstantiationException, IllegalAccessException
labels
from
dataset
dataset
- original datasetlabels
- labels of interestIllegalAccessException
InstantiationException
public void populate(String filename) throws Exception
filename
- the path of the file to be readException
public void populate(DatasetReader reader) throws Exception
reader
datasetReader
- the readerException
public Example getRandExample()
getRandExample
in interface Dataset
public List<Example> getRandExamples(int k)
getRandExamples
in interface Dataset
k
- the number of examples to be returnedk
random examples.
NOTE: Duplicates are allowed
public SimpleDataset getShuffledDataset()
getShuffledDataset
in interface Dataset
public void setSeed(long seed)
Dataset
public Vector getZeroVector(String representationIdentifier)
Dataset
representationIdentifier
containing all zeros
NOTE: it assumes that there is at least an example in the dataset and that the representation is directly available on the example using the getRepresentation method (i.e., the example is not an ExamplePair storing the representation in its left or right element)
getZeroVector
in interface Dataset
representationIdentifier
- the identifier of the representationrepresentationIdentifier
containing all zerospublic void manipulate(Manipulator... manipulators)
Dataset
manipulators
.
manipulator
in the arraymanipulate
in interface Dataset
manipulators
- the manipulators that must be applied to all the examples in the datasetpublic void save(String outputFilePath) throws FileNotFoundException, IOException
outputFilePath
- the file pathFileNotFoundException
IOException
public boolean isConsistent()
Copyright © 2018 Semantic Analytics Group @ Uniroma2. All rights reserved.