Software and datasets: Difference between revisions
(Created blank page) |
No edit summary |
||
Line 1: | Line 1: | ||
__NOTOC__ | |||
Some useful research results we have been working on. Here I do not list Github repositories that are not packaged as a standalone software component. | |||
= Software = | |||
== nutIE == | |||
NutIE (codename) will be an end-to-end information extraction toolkit. It will consist of a self-contained runnable web application (GUI) and Scala library for programmatic access. | |||
The tool currently supports the data import and visualization, model training and evaluation for the coreference resolution task. | |||
The project currently consists of two separate projects: | |||
* Web-based managements part: [https://bitbucket.org/szitnik/nutie-web nutIE Web] | |||
* Backend with REST API and programmatic Scala library to use in third-party projects: [https://bitbucket.org/szitnik/nutie-core nutIE Core] | |||
<gallery mode="packed-hover"> | |||
Image:NutIE.png|nutIE: End-to-end information extraction tool | |||
Image:NutIE_01.png|nutIE: Arbitrary data browser | |||
Image:NutIE_02.png|nutIE: Model training | |||
Image:NutIE_03.png|nutIE: Coreference resolution visualization | |||
</gallery> | |||
== Lemmagen4J == | |||
I have rewritten Lemmagen v3.0 ([http://lemmatise.ijs.si/ http://lemmatise.ijs.si/]) from C# to Java code. The eclipse project is available here: [{{filepath:Lemmagen4J.zip}} Lemmagen4J.zip]. | |||
See Train and Test classes and other code for documentation purposes. For building Slovene model, you can use [{{filepath:Wfl-me-sl.zip}} Slovene part from MULTEXT-EAST dataset]. | |||
You can read more about Lemmagen in the author's paper: [{{filepath:LemmagenPaper2010.pdf}} Lemmagen Paper, 2010]. | |||
== Merging and matching framework == | |||
Framework for matching and merging using semantics. It implements attribute resolution, collective entity resolution and redundancy elimination techniques with various metrics and approaches. Download the project along with the datasets here: [{{filepath:Merging.zip}} Data Merging framework, october 2011]. | |||
Read more: Žitnik S., Šubelj L., Lavbič D., Vasilecas O., Bajec M. (2013). '''General Context-Aware Data Matching and Merging Framework''' in Informatica, vol. 24, num. 1, pp. 119-152. [{{filepath:INFO902.pdf}} Article] |
Revision as of 20:48, 5 August 2022
Some useful research results we have been working on. Here I do not list Github repositories that are not packaged as a standalone software component.
Software
nutIE
NutIE (codename) will be an end-to-end information extraction toolkit. It will consist of a self-contained runnable web application (GUI) and Scala library for programmatic access.
The tool currently supports the data import and visualization, model training and evaluation for the coreference resolution task.
The project currently consists of two separate projects:
- Web-based managements part: nutIE Web
- Backend with REST API and programmatic Scala library to use in third-party projects: nutIE Core
Lemmagen4J
I have rewritten Lemmagen v3.0 (http://lemmatise.ijs.si/) from C# to Java code. The eclipse project is available here: Lemmagen4J.zip.
See Train and Test classes and other code for documentation purposes. For building Slovene model, you can use Slovene part from MULTEXT-EAST dataset.
You can read more about Lemmagen in the author's paper: Lemmagen Paper, 2010.
Merging and matching framework
Framework for matching and merging using semantics. It implements attribute resolution, collective entity resolution and redundancy elimination techniques with various metrics and approaches. Download the project along with the datasets here: Data Merging framework, october 2011.
Read more: Žitnik S., Šubelj L., Lavbič D., Vasilecas O., Bajec M. (2013). General Context-Aware Data Matching and Merging Framework in Informatica, vol. 24, num. 1, pp. 119-152. Article