Skip to content

Commit

Permalink
Merge branch 'main' into 93-remove-directory-path-from-filename-prop-…
Browse files Browse the repository at this point in the history
…in-json-output
  • Loading branch information
FredTheNoob authored Dec 8, 2023
2 parents 8f9f698 + 07cb9ba commit 6684264
Show file tree
Hide file tree
Showing 2 changed files with 183 additions and 70 deletions.
193 changes: 123 additions & 70 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -1,112 +1,165 @@
# API

## /entitymentions <sup><span style="color:lightgreen">GET</span></sup>
### Parameters
| Parameter | Type | Description |
|-----------|--------|------------------|
| `article` | STRING | The article path |

The `/entitymentions` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing all the currently known entitymentions, their indexes and the file they originate from. The format of the JSON array is formatted as follows:
The `/entitymentions` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing all the currently known entitymentions, their indexes, type, label, iri and the file they originate from. The JSON array is formatted as follows:

```JSON
[
{
"name": "ENTITY MENTION",
"startIndex": INT,
"endIndex":INT,
"fileName":"FILENAME.EXTENSION"
},
{
"name": "ENTITY MENTION",
"startIndex": INT,
"endIndex":INT,
"fileName":"FILENAME.EXTENSION"
}
]
{
"fileName": STRING,
"language": STRING,
"sentences": [
{
"sentence": STRING,
"sentenceStartIndex": INT,
"sentenceEndIndex": INT,
"entityMentions": [
{
"name": STRING,
"type": STRING,
"label": STRING,
"startIndex": INT,
"endIndex": INT,
"iri": STRING
}
]
}
]
}
```

### Example Output

Here is an example of an output from the endpoint. For simplification, only a single file has been processed by the Entity Recognizer and Linker, and just a few of the found entity mentions is shown below:
Here is an example of an output from the endpoint `/entitymentions?article=test.txt`. For simplification, only a single file has been processed by the Entity Recognizer and Linker:

```JSON
[
{
"name": "Martin Kjærs",
"startIndex": 28,
"endIndex": 40,
"fileName": "Artikel.txt"
},
{
"name": "Region Nordjylland",
"startIndex": 100,
"endIndex": 118,
"fileName": "Artikel.txt"
},
{
"name": "Aalborg",
"startIndex": 285,
"endIndex": 292,
"fileName": "Artikel.txt"
}
]
{
"fileName": "test.txt",
"language": "en",
"sentences": [
{
"sentence": "Hi my name is marc",
"sentenceStartIndex": 0,
"sentenceEndIndex": 47,
"entityMentions": [
{
"name": "marc",
"type": "Entity",
"label": "GPE",
"startIndex": 14,
"endIndex": 18,
"iri": "knox-kb01.srv.aau.dk/marc"
}
]
}
]
}
```


## articlename/entities <sup><span style="color:lightgreen">GET</span></sup>
## /entitymentions/all <sup><span style="color:lightgreen">GET</span></sup>

The `articlename/entities` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. The articlename in the url has to replaced with a name of an article including .txt. When doing a GET request to the endpoint, a JSON Array is returned containing the currently known entitymentions found in the given article name including their indexes. The format of the JSON array is formatted as follows:
The `/entitymentions/all` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing the all articles with their currently known entitymentions found. The JSON array is formatted as follows:

```JSON
[
{
"name": "ENTITY MENTION",
"startIndex": INT,
"endIndex":INT,
"fileName":"FILENAME.EXTENSION"
},
{
"name": "ENTITY MENTION",
"startIndex": INT,
"endIndex":INT,
"fileName":"FILENAME.EXTENSION"
"fileName": STRING,
"language": STRING,
"sentences": [
{
"sentence": STRING,
"sentenceStartIndex": INT,
"sentenceEndIndex": INT,
"entityMentions": [
{
"name": STRING,
"type": STRING,
"label": STRING,
"startIndex": INT,
"endIndex": INT,
"iri": STRING
}
]
}
]
}
]
```

### Example Output

Here is an example of an output from the endpoint when getting for <span style="color:lightgreen">Artikel.txt/entities</span>. For simplification, only a single file has been processed by the Entity Recognizer and Linker, and just a few of the found entity mentions is shown below:
Here is an example of an output from the endpoint when getting all articles. For simplification, only two files has been processed by the Entity Recognizer and Linker:

```JSON
[
{
"name": "Martin Kjærs",
"startIndex": 28,
"endIndex": 40,
"fileName": "Artikel.txt"
"fileName": "test.txt",
"language": "en",
"sentences": [
{
"sentence": "Hi my name is marc",
"sentenceStartIndex": 0,
"sentenceEndIndex": 47,
"entityMentions": [
{
"name": "marc",
"type": "Entity",
"label": "PERSON",
"startIndex": 14,
"endIndex": 18,
"iri": "knox-kb01.srv.aau.dk/marc"
}
]
}
]
},
{
"name": "Region Nordjylland",
"startIndex": 100,
"endIndex": 118,
"fileName": "Artikel.txt"
},
{
"name": "Aalborg",
"startIndex": 285,
"endIndex": 292,
"fileName": "Artikel.txt"
"fileName": "test2.txt",
"language": "en",
"sentences": [
{
"sentence": "Hi my name is joe",
"sentenceStartIndex": 0,
"sentenceEndIndex": 47,
"entityMentions": [
{
"name": "Joe",
"type": "Entity",
"label": "PERSON",
"startIndex": 14,
"endIndex": 17,
"iri": "knox-kb01.srv.aau.dk/joe"
}
]
}
]
}
]
```



## detectlanguage <sup><span style="color:lightgreen">POST</span></sup>
## /detectlanguage <sup><span style="color:orange">POST</span></sup>
This endpoint expects the given request body to contain some input text and returns its language. It uses the [langdetect](https://pypi.org/project/langdetect/) library.

This endpoint will check the language in the given text.
Send the text in the request body and it will return the language.
The given text has to be longer than 4 characters.
The function will return the lanugage in 2 charaters.
> **_NOTE:_** The function will return the language as a [ISO 639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) code.
### Example
<span style="color:lightgreen">Request body: </span> The man was walking down the street
<span style="color:lightgreen">Response: </span> en
<span style="color:orange">Request body: </span> "The man was walking down the street"\
<span style="color:orange">Response: </span> en


### Constraints
- The given text has to be longer than 4 characters.

### Supported languages
`af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,
hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,
pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw`

> **_NOTE:_** see [List of ISO 639-1 codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) for more information
60 changes: 60 additions & 0 deletions docs/directorywatcher.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
# [Directory Watcher](https://github.com/Knox-AAU/PreProcessingLayer_EntityRecognitionAndLinking/blob/main/lib/DirectoryWatcher.py)
The pipeline starts when a new file is placed in a watched folder by pipeline part A. The Directory Watcher's responsibility is to call a callback function when a new file is created in the watched folder.

## Features
- [watchdog](https://pypi.org/project/watchdog/) for file events
- Async callback support
- [Threading](https://docs.python.org/3/library/threading.html)

## Overview

The `DirectoryWatcher` provides a simple way to monitor a specified directory for file creation events and execute asynchronous callbacks in response. It utilizes the [watchdog](https://pypi.org/project/watchdog/) library for filesystem monitoring and integrates with [asyncio](https://docs.python.org/3/library/asyncio.html) for handling asynchronous tasks. Furthermore the `DirectoryWatcher` uses [threading](https://docs.python.org/3/library/threading.html).

> **_NOTE:_** [Threading](https://docs.python.org/3/library/threading.html) is used to avoid blocking the main thread's code from executing.

## Example usage
```python
# Importing
from lib.DirectoryWatcher import DirectoryWatcher

dirPath = "some/path/to/a/directory"

# Setup
async def newFileCreated(file_path: str):
print("New file created in " + file_path)


dirWatcher = DirectoryWatcher(
directory=dirPath, async_callback=newFileCreated
)

# A fast API event function running on startup
@app.on_event("startup")
async def startEvent():
dirWatcher.start_watching()

# A fast API event function running on shutdown
@app.on_event("shutdown")
def shutdown_event():
dirWatcher.stop_watching()
```

> **_NOTE:_** The fast API event functions are not needed to use the `Directory Watcher`

## Methods
```python
def __init__(self, directory, async_callback):
```
### Parameters:
- **directory** (str): A path to the directory you want to watch ie. `some/path/to/a/directory`
- **async_callback** (function): An async callback function to be called when a new file is created in the **directory**. This function should accept a single parameter, which is the path of the created file.

```python
def start_watching(self) -> threading.Thread:
```

```python
def stop_watching(self):
```

0 comments on commit 6684264

Please sign in to comment.