To generate the input data for KeLP we developed a specific project: kelp-input-generator.
This project relies on third party software components, such as the Stanford Parser, and provides the functionalities to extract KeLP data structures from text snippets. Being a general purpose machine learning platform, KeLP is not limited to only Natural Language Processing tasks. However, for the moment, we do not provide any feature extraction capability for different fields.
In order to preserve the lightweight of the main KeLP project, kelp-input-generator is not included in kelp-full. If you want to use the kelp-input-generator functionalities in your maven project you can easily include it with the following Maven repository:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
<repositories> <repository> <id>kelp_repo_snap</id> <name>KeLP Snapshots Repository</name> <releases> <enabled>false</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <snapshots> <enabled>true</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>fail</checksumPolicy> </snapshots> <url>http://sag.art.uniroma2.it:8081/artifactory/kelp-snapshot/</url> </repository> <repository> <id>kelp_repo_release</id> <name>KeLP Stable Repository</name> <releases> <enabled>true</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <snapshots> <enabled>false</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>fail</checksumPolicy> </snapshots> <url>http://sag.art.uniroma2.it:8081/artifactory/kelp-release/</url> </repository> </repositories> |
Then, the Maven dependency for the kelp-input-generator project is:
1 2 3 4 5 6 7 |
<dependencies> <dependency> <groupId>it.uniroma2.sag.kelp</groupId> <artifactId>kelp-input-generator</artifactId> <version>1.0.1-SNAPSHOT</version> </dependency> </dependencies> |
Currently, kelp-input-generator allows to easily generate TreeRepresentations from text snippets. In particular, it provides the capabilities to extract the LOCT, LCT and GRCT representations, which are a tree views of a dependency graph, as introduced in (Croce et al., 2011).
In the future we plan to extend kelp-input-generator by adding the possibility to extract shallow and constituency tree representations, as well as SequenceRepresentations and DirectedGraphRepresentations.
References
Danilo Croce, Alessandro Moschitti, and Roberto Basili. Structured lexical similarity via convolution kernels on dependency trees. In Proceedings of EMNLP, Edinburgh, Scotland, UK., 2011.