-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into 93-remove-directory-path-from-filename-prop-…
…in-json-output
- Loading branch information
Showing
2 changed files
with
183 additions
and
70 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,112 +1,165 @@ | ||
# API | ||
|
||
## /entitymentions <sup><span style="color:lightgreen">GET</span></sup> | ||
### Parameters | ||
| Parameter | Type | Description | | ||
|-----------|--------|------------------| | ||
| `article` | STRING | The article path | | ||
|
||
The `/entitymentions` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing all the currently known entitymentions, their indexes and the file they originate from. The format of the JSON array is formatted as follows: | ||
The `/entitymentions` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing all the currently known entitymentions, their indexes, type, label, iri and the file they originate from. The JSON array is formatted as follows: | ||
|
||
```JSON | ||
[ | ||
{ | ||
"name": "ENTITY MENTION", | ||
"startIndex": INT, | ||
"endIndex":INT, | ||
"fileName":"FILENAME.EXTENSION" | ||
}, | ||
{ | ||
"name": "ENTITY MENTION", | ||
"startIndex": INT, | ||
"endIndex":INT, | ||
"fileName":"FILENAME.EXTENSION" | ||
} | ||
] | ||
{ | ||
"fileName": STRING, | ||
"language": STRING, | ||
"sentences": [ | ||
{ | ||
"sentence": STRING, | ||
"sentenceStartIndex": INT, | ||
"sentenceEndIndex": INT, | ||
"entityMentions": [ | ||
{ | ||
"name": STRING, | ||
"type": STRING, | ||
"label": STRING, | ||
"startIndex": INT, | ||
"endIndex": INT, | ||
"iri": STRING | ||
} | ||
] | ||
} | ||
] | ||
} | ||
``` | ||
|
||
### Example Output | ||
|
||
Here is an example of an output from the endpoint. For simplification, only a single file has been processed by the Entity Recognizer and Linker, and just a few of the found entity mentions is shown below: | ||
Here is an example of an output from the endpoint `/entitymentions?article=test.txt`. For simplification, only a single file has been processed by the Entity Recognizer and Linker: | ||
|
||
```JSON | ||
[ | ||
{ | ||
"name": "Martin Kjærs", | ||
"startIndex": 28, | ||
"endIndex": 40, | ||
"fileName": "Artikel.txt" | ||
}, | ||
{ | ||
"name": "Region Nordjylland", | ||
"startIndex": 100, | ||
"endIndex": 118, | ||
"fileName": "Artikel.txt" | ||
}, | ||
{ | ||
"name": "Aalborg", | ||
"startIndex": 285, | ||
"endIndex": 292, | ||
"fileName": "Artikel.txt" | ||
} | ||
] | ||
{ | ||
"fileName": "test.txt", | ||
"language": "en", | ||
"sentences": [ | ||
{ | ||
"sentence": "Hi my name is marc", | ||
"sentenceStartIndex": 0, | ||
"sentenceEndIndex": 47, | ||
"entityMentions": [ | ||
{ | ||
"name": "marc", | ||
"type": "Entity", | ||
"label": "GPE", | ||
"startIndex": 14, | ||
"endIndex": 18, | ||
"iri": "knox-kb01.srv.aau.dk/marc" | ||
} | ||
] | ||
} | ||
] | ||
} | ||
``` | ||
|
||
|
||
## articlename/entities <sup><span style="color:lightgreen">GET</span></sup> | ||
## /entitymentions/all <sup><span style="color:lightgreen">GET</span></sup> | ||
|
||
The `articlename/entities` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. The articlename in the url has to replaced with a name of an article including .txt. When doing a GET request to the endpoint, a JSON Array is returned containing the currently known entitymentions found in the given article name including their indexes. The format of the JSON array is formatted as follows: | ||
The `/entitymentions/all` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing the all articles with their currently known entitymentions found. The JSON array is formatted as follows: | ||
|
||
```JSON | ||
[ | ||
{ | ||
"name": "ENTITY MENTION", | ||
"startIndex": INT, | ||
"endIndex":INT, | ||
"fileName":"FILENAME.EXTENSION" | ||
}, | ||
{ | ||
"name": "ENTITY MENTION", | ||
"startIndex": INT, | ||
"endIndex":INT, | ||
"fileName":"FILENAME.EXTENSION" | ||
"fileName": STRING, | ||
"language": STRING, | ||
"sentences": [ | ||
{ | ||
"sentence": STRING, | ||
"sentenceStartIndex": INT, | ||
"sentenceEndIndex": INT, | ||
"entityMentions": [ | ||
{ | ||
"name": STRING, | ||
"type": STRING, | ||
"label": STRING, | ||
"startIndex": INT, | ||
"endIndex": INT, | ||
"iri": STRING | ||
} | ||
] | ||
} | ||
] | ||
} | ||
] | ||
``` | ||
|
||
### Example Output | ||
|
||
Here is an example of an output from the endpoint when getting for <span style="color:lightgreen">Artikel.txt/entities</span>. For simplification, only a single file has been processed by the Entity Recognizer and Linker, and just a few of the found entity mentions is shown below: | ||
Here is an example of an output from the endpoint when getting all articles. For simplification, only two files has been processed by the Entity Recognizer and Linker: | ||
|
||
```JSON | ||
[ | ||
{ | ||
"name": "Martin Kjærs", | ||
"startIndex": 28, | ||
"endIndex": 40, | ||
"fileName": "Artikel.txt" | ||
"fileName": "test.txt", | ||
"language": "en", | ||
"sentences": [ | ||
{ | ||
"sentence": "Hi my name is marc", | ||
"sentenceStartIndex": 0, | ||
"sentenceEndIndex": 47, | ||
"entityMentions": [ | ||
{ | ||
"name": "marc", | ||
"type": "Entity", | ||
"label": "PERSON", | ||
"startIndex": 14, | ||
"endIndex": 18, | ||
"iri": "knox-kb01.srv.aau.dk/marc" | ||
} | ||
] | ||
} | ||
] | ||
}, | ||
{ | ||
"name": "Region Nordjylland", | ||
"startIndex": 100, | ||
"endIndex": 118, | ||
"fileName": "Artikel.txt" | ||
}, | ||
{ | ||
"name": "Aalborg", | ||
"startIndex": 285, | ||
"endIndex": 292, | ||
"fileName": "Artikel.txt" | ||
"fileName": "test2.txt", | ||
"language": "en", | ||
"sentences": [ | ||
{ | ||
"sentence": "Hi my name is joe", | ||
"sentenceStartIndex": 0, | ||
"sentenceEndIndex": 47, | ||
"entityMentions": [ | ||
{ | ||
"name": "Joe", | ||
"type": "Entity", | ||
"label": "PERSON", | ||
"startIndex": 14, | ||
"endIndex": 17, | ||
"iri": "knox-kb01.srv.aau.dk/joe" | ||
} | ||
] | ||
} | ||
] | ||
} | ||
] | ||
``` | ||
|
||
|
||
|
||
## detectlanguage <sup><span style="color:lightgreen">POST</span></sup> | ||
## /detectlanguage <sup><span style="color:orange">POST</span></sup> | ||
This endpoint expects the given request body to contain some input text and returns its language. It uses the [langdetect](https://pypi.org/project/langdetect/) library. | ||
|
||
This endpoint will check the language in the given text. | ||
Send the text in the request body and it will return the language. | ||
The given text has to be longer than 4 characters. | ||
The function will return the lanugage in 2 charaters. | ||
> **_NOTE:_** The function will return the language as a [ISO 639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) code. | ||
### Example | ||
<span style="color:lightgreen">Request body: </span> The man was walking down the street | ||
<span style="color:lightgreen">Response: </span> en | ||
<span style="color:orange">Request body: </span> "The man was walking down the street"\ | ||
<span style="color:orange">Response: </span> en | ||
|
||
|
||
### Constraints | ||
- The given text has to be longer than 4 characters. | ||
|
||
### Supported languages | ||
`af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he, | ||
hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl, | ||
pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw` | ||
|
||
> **_NOTE:_** see [List of ISO 639-1 codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) for more information |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,60 @@ | ||
# [Directory Watcher](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/lib/DirectoryWatcher.py) | ||
The pipeline starts when a new file is placed in a watched folder by pipeline part A. The Directory Watcher's responsibility is to call a callback function when a new file is created in the watched folder. | ||
|
||
## Features | ||
- [watchdog](https://pypi.org/project/watchdog/) for file events | ||
- Async callback support | ||
- [Threading](https://docs.python.org/3/library/threading.html) | ||
|
||
## Overview | ||
|
||
The `DirectoryWatcher` provides a simple way to monitor a specified directory for file creation events and execute asynchronous callbacks in response. It utilizes the [watchdog](https://pypi.org/project/watchdog/) library for filesystem monitoring and integrates with [asyncio](https://docs.python.org/3/library/asyncio.html) for handling asynchronous tasks. Furthermore the `DirectoryWatcher` uses [threading](https://docs.python.org/3/library/threading.html). | ||
|
||
> **_NOTE:_** [Threading](https://docs.python.org/3/library/threading.html) is used to avoid blocking the main thread's code from executing. | ||
|
||
## Example usage | ||
```python | ||
# Importing | ||
from lib.DirectoryWatcher import DirectoryWatcher | ||
|
||
dirPath = "some/path/to/a/directory" | ||
|
||
# Setup | ||
async def newFileCreated(file_path: str): | ||
print("New file created in " + file_path) | ||
|
||
|
||
dirWatcher = DirectoryWatcher( | ||
directory=dirPath, async_callback=newFileCreated | ||
) | ||
|
||
# A fast API event function running on startup | ||
@app.on_event("startup") | ||
async def startEvent(): | ||
dirWatcher.start_watching() | ||
|
||
# A fast API event function running on shutdown | ||
@app.on_event("shutdown") | ||
def shutdown_event(): | ||
dirWatcher.stop_watching() | ||
``` | ||
|
||
> **_NOTE:_** The fast API event functions are not needed to use the `Directory Watcher` | ||
|
||
## Methods | ||
```python | ||
def __init__(self, directory, async_callback): | ||
``` | ||
### Parameters: | ||
- **directory** (str): A path to the directory you want to watch ie. `some/path/to/a/directory` | ||
- **async_callback** (function): An async callback function to be called when a new file is created in the **directory**. This function should accept a single parameter, which is the path of the created file. | ||
|
||
```python | ||
def start_watching(self) -> threading.Thread: | ||
``` | ||
|
||
```python | ||
def stop_watching(self): | ||
``` |