diff --git a/docs/api.md b/docs/api.md index 4e295c8..59b9607 100644 --- a/docs/api.md +++ b/docs/api.md @@ -1,112 +1,165 @@ # API ## /entitymentions GET +### Parameters +| Parameter | Type | Description | +|-----------|--------|------------------| +| `article` | STRING | The article path | -The `/entitymentions` endpoint is a **GET** endpoint. When doing a **GET** request to the endpoint, a JSON Array is returned containing all the currently known entitymentions, their indexes and the file they originate from. The format of the JSON array is formatted as follows: +The `/entitymentions` endpoint is a **GET** endpoint. When doing a **GET** request to the endpoint, a JSON Array is returned containing all the currently known entitymentions, their indexes, type, label, iri and the file they originate from. The JSON array is formatted as follows: ```JSON -[ - { - "name": "ENTITY MENTION", - "startIndex": INT, - "endIndex":INT, - "fileName":"FILENAME.EXTENSION" - }, - { - "name": "ENTITY MENTION", - "startIndex": INT, - "endIndex":INT, - "fileName":"FILENAME.EXTENSION" - } -] +{ + "fileName": STRING, + "language": STRING, + "sentences": [ + { + "sentence": STRING, + "sentenceStartIndex": INT, + "sentenceEndIndex": INT, + "entityMentions": [ + { + "name": STRING, + "type": STRING, + "label": STRING, + "startIndex": INT, + "endIndex": INT, + "iri": STRING + } + ] + } + ] +} ``` ### Example Output -Here is an example of an output from the endpoint. For simplification, only a single file has been processed by the Entity Recognizer and Linker, and just a few of the found entity mentions is shown below: +Here is an example of an output from the endpoint `/entitymentions?article=test.txt`. For simplification, only a single file has been processed by the Entity Recognizer and Linker: ```JSON -[ - { - "name": "Martin Kjærs", - "startIndex": 28, - "endIndex": 40, - "fileName": "Artikel.txt" - }, - { - "name": "Region Nordjylland", - "startIndex": 100, - "endIndex": 118, - "fileName": "Artikel.txt" - }, - { - "name": "Aalborg", - "startIndex": 285, - "endIndex": 292, - "fileName": "Artikel.txt" - } -] +{ + "fileName": "test.txt", + "language": "en", + "sentences": [ + { + "sentence": "Hi my name is marc", + "sentenceStartIndex": 0, + "sentenceEndIndex": 47, + "entityMentions": [ + { + "name": "marc", + "type": "Entity", + "label": "GPE", + "startIndex": 14, + "endIndex": 18, + "iri": "knox-kb01.srv.aau.dk/marc" + } + ] + } + ] +} ``` -## articlename/entities GET +## /entitymentions/all GET -The `articlename/entities` endpoint is a **GET** endpoint. The articlename in the url has to replaced with a name of an article including .txt. When doing a GET request to the endpoint, a JSON Array is returned containing the currently known entitymentions found in the given article name including their indexes. The format of the JSON array is formatted as follows: +The `/entitymentions/all` endpoint is a **GET** endpoint. When doing a **GET** request to the endpoint, a JSON Array is returned containing the all articles with their currently known entitymentions found. The JSON array is formatted as follows: ```JSON [ { - "name": "ENTITY MENTION", - "startIndex": INT, - "endIndex":INT, - "fileName":"FILENAME.EXTENSION" - }, - { - "name": "ENTITY MENTION", - "startIndex": INT, - "endIndex":INT, - "fileName":"FILENAME.EXTENSION" + "fileName": STRING, + "language": STRING, + "sentences": [ + { + "sentence": STRING, + "sentenceStartIndex": INT, + "sentenceEndIndex": INT, + "entityMentions": [ + { + "name": STRING, + "type": STRING, + "label": STRING, + "startIndex": INT, + "endIndex": INT, + "iri": STRING + } + ] + } + ] } ] ``` ### Example Output -Here is an example of an output from the endpoint when getting for Artikel.txt/entities. For simplification, only a single file has been processed by the Entity Recognizer and Linker, and just a few of the found entity mentions is shown below: +Here is an example of an output from the endpoint when getting all articles. For simplification, only two files has been processed by the Entity Recognizer and Linker: ```JSON [ { - "name": "Martin Kjærs", - "startIndex": 28, - "endIndex": 40, - "fileName": "Artikel.txt" + "fileName": "test.txt", + "language": "en", + "sentences": [ + { + "sentence": "Hi my name is marc", + "sentenceStartIndex": 0, + "sentenceEndIndex": 47, + "entityMentions": [ + { + "name": "marc", + "type": "Entity", + "label": "PERSON", + "startIndex": 14, + "endIndex": 18, + "iri": "knox-kb01.srv.aau.dk/marc" + } + ] + } + ] }, { - "name": "Region Nordjylland", - "startIndex": 100, - "endIndex": 118, - "fileName": "Artikel.txt" - }, - { - "name": "Aalborg", - "startIndex": 285, - "endIndex": 292, - "fileName": "Artikel.txt" + "fileName": "test2.txt", + "language": "en", + "sentences": [ + { + "sentence": "Hi my name is joe", + "sentenceStartIndex": 0, + "sentenceEndIndex": 47, + "entityMentions": [ + { + "name": "Joe", + "type": "Entity", + "label": "PERSON", + "startIndex": 14, + "endIndex": 17, + "iri": "knox-kb01.srv.aau.dk/joe" + } + ] + } + ] } ] ``` -## detectlanguage POST +## /detectlanguage POST +This endpoint expects the given request body to contain some input text and returns its language. It uses the [langdetect](https://pypi.org/project/langdetect/) library. -This endpoint will check the language in the given text. -Send the text in the request body and it will return the language. -The given text has to be longer than 4 characters. -The function will return the lanugage in 2 charaters. +> **_NOTE:_** The function will return the language as a [ISO 639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) code. ### Example -Request body: The man was walking down the street -Response: en +Request body: "The man was walking down the street"\ +Response: en + + +### Constraints +- The given text has to be longer than 4 characters. + +### Supported languages +`af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he, +hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl, +pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw` +> **_NOTE:_** see [List of ISO 639-1 codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) for more information diff --git a/docs/directorywatcher.md b/docs/directorywatcher.md new file mode 100644 index 0000000..ef1cc1f --- /dev/null +++ b/docs/directorywatcher.md @@ -0,0 +1,60 @@ +# [Directory Watcher](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/lib/DirectoryWatcher.py) +The pipeline starts when a new file is placed in a watched folder by pipeline part A. The Directory Watcher's responsibility is to call a callback function when a new file is created in the watched folder. + +## Features +- [watchdog](https://pypi.org/project/watchdog/) for file events +- Async callback support +- [Threading](https://docs.python.org/3/library/threading.html) + +## Overview + +The `DirectoryWatcher` provides a simple way to monitor a specified directory for file creation events and execute asynchronous callbacks in response. It utilizes the [watchdog](https://pypi.org/project/watchdog/) library for filesystem monitoring and integrates with [asyncio](https://docs.python.org/3/library/asyncio.html) for handling asynchronous tasks. Furthermore the `DirectoryWatcher` uses [threading](https://docs.python.org/3/library/threading.html). + +> **_NOTE:_** [Threading](https://docs.python.org/3/library/threading.html) is used to avoid blocking the main thread's code from executing. + + +## Example usage +```python +# Importing +from lib.DirectoryWatcher import DirectoryWatcher + +dirPath = "some/path/to/a/directory" + +# Setup +async def newFileCreated(file_path: str): + print("New file created in " + file_path) + + +dirWatcher = DirectoryWatcher( + directory=dirPath, async_callback=newFileCreated +) + +# A fast API event function running on startup +@app.on_event("startup") +async def startEvent(): + dirWatcher.start_watching() + +# A fast API event function running on shutdown +@app.on_event("shutdown") +def shutdown_event(): + dirWatcher.stop_watching() +``` + +> **_NOTE:_** The fast API event functions are not needed to use the `Directory Watcher` + + +## Methods +```python +def __init__(self, directory, async_callback): +``` +### Parameters: +- **directory** (str): A path to the directory you want to watch ie. `some/path/to/a/directory` +- **async_callback** (function): An async callback function to be called when a new file is created in the **directory**. This function should accept a single parameter, which is the path of the created file. + +```python +def start_watching(self) -> threading.Thread: +``` + +```python +def stop_watching(self): +```