Let’s start with a very simple classification example based on a linear version of the Passive Aggressive algorithm (LinearPassiveAggressiveClassification). The full code of this example can be found in the GitHub repository kelp-full, in particular in the source file HelloLearning.java.
Dataset used here is the same of the svmlight page; each example is only modified to be readable by KeLP. In fact, a single row in KeLP must indicate what kind of vectors your are using, Sparse or Dense. In the svmlight dataset there are sparse vectors, so if you open the train.dat and test.dat files you can notice that each vector is enclosed in BeginVector (|BV|) and EndVector (|EV|) tags.
The classification task consists in classifying an example with respect to the “+1” and “-1” classes. The dataset is thus composed by examples of such classes:
- Training set (2000 examples, 1000 of class “+1” (positive), and 1000 of class “-1” (negative))
- Test set (600 examples, 300 of class “+1” (positive), and 300 of class “-1” (negative))
Let’s start doing some Java code.
First of all, we need to load dataset in memory and define what is the positive class of the classification problem.
1 2 3 4 5 6 7 8 |
// Read a dataset into a trainingSet variable SimpleDataset trainingSet = new SimpleDataset(); trainingSet.populate("src/main/resources/hellolearning/train.klp"); // Read a dataset into a test variable SimpleDataset testSet = new SimpleDataset(); testSet.populate("src/main/resources/hellolearning/test.klp"); // define the positive class StringLabel positiveClass = new StringLabel("+1"); |
If you want, you can print some statistics about dataset through some useful built-in methods.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
// print some statistics System.out.println("Training set statistics"); System.out.print("Examples number "); System.out.println(trainingSet.getNumberOfExamples()); System.out.print("Positive examples "); System.out.println(trainingSet.getNumberOfPositiveExamples(positiveClass)); System.out.print("Negative examples "); System.out.println(trainingSet.getNumberOfNegativeExamples(positiveClass)); System.out.println("Test set statistics"); System.out.print("Examples number "); System.out.println(testSet.getNumberOfExamples()); System.out.print("Positive examples "); System.out.println(testSet.getNumberOfPositiveExamples(positiveClass)); System.out.print("Negative examples "); System.out.println(testSet.getNumberOfNegativeExamples(positiveClass)); |
Then, instantiate a new Passive Aggressive algorithm and set some parameter on it.
1 2 3 4 5 6 7 8 |
// instantiate a passive aggressive algorithm LinearPassiveAggressive passiveAggressiveAlgorithm = new LinearPassiveAggressive(); // use the first (and only here) representation passiveAggressiveAlgorithm.setRepresentation("0"); // indicate to the learner what is the positive class passiveAggressiveAlgorithm.setLabel(positiveClass); // set an aggressiveness parameter passiveAggressiveAlgorithm.setAggressiveness(0.01f); |
Learn a model on the trainingSet obtaining a Classifier
1 2 |
// learn and get the prediction function Classifier f = passiveAggressiveAlgorithm.learn(trainingSet); |
Finally, we classify each example in the test set and compute some performance measure.
1 2 3 4 5 6 7 8 9 |
int correct=0; for (Example e : testSet.getExamples()) { ClassificationOutput p = f.predict(testSet.getNextExample()); if (p.getScore(positiveClass) > 0 && e.isExampleOf(positiveClass)) correct++; else if (p.getScore(positiveClass) < 0 && !e.isExampleOf(positiveClass)) correct++; } System.out.println("Accuracy: " + ((float)correct/(float)testSet.getNumberOfExamples())); |
At the end of the training the program of the HelloLearning.java file will output the 97.16% accuracy.