Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin' into 86-send-output-to-group-cs…
Browse files Browse the repository at this point in the history
…-endpoint

merge
  • Loading branch information
FredTheNoob committed Dec 7, 2023
2 parents c8638f6 + 1bc3460 commit 4fcb3c9
Show file tree
Hide file tree
Showing 2 changed files with 131 additions and 72 deletions.
193 changes: 123 additions & 70 deletions docs/api.md
Original file line number Diff line number Diff line change
@@ -1,112 +1,165 @@
# API

## /entitymentions <sup><span style="color:lightgreen">GET</span></sup>
### Parameters
| Parameter | Type | Description |
|-----------|--------|------------------|
| `article` | STRING | The article path |

The `/entitymentions` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing all the currently known entitymentions, their indexes and the file they originate from. The format of the JSON array is formatted as follows:
The `/entitymentions` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing all the currently known entitymentions, their indexes, type, label, iri and the file they originate from. The JSON array is formatted as follows:

```JSON
[
{
"name": "ENTITY MENTION",
"startIndex": INT,
"endIndex":INT,
"fileName":"FILENAME.EXTENSION"
},
{
"name": "ENTITY MENTION",
"startIndex": INT,
"endIndex":INT,
"fileName":"FILENAME.EXTENSION"
}
]
{
"fileName": STRING,
"language": STRING,
"sentences": [
{
"sentence": STRING,
"sentenceStartIndex": INT,
"sentenceEndIndex": INT,
"entityMentions": [
{
"name": STRING,
"type": STRING,
"label": STRING,
"startIndex": INT,
"endIndex": INT,
"iri": STRING
}
]
}
]
}
```

### Example Output

Here is an example of an output from the endpoint. For simplification, only a single file has been processed by the Entity Recognizer and Linker, and just a few of the found entity mentions is shown below:
Here is an example of an output from the endpoint `/entitymentions?article=test.txt`. For simplification, only a single file has been processed by the Entity Recognizer and Linker:

```JSON
[
{
"name": "Martin Kjærs",
"startIndex": 28,
"endIndex": 40,
"fileName": "Artikel.txt"
},
{
"name": "Region Nordjylland",
"startIndex": 100,
"endIndex": 118,
"fileName": "Artikel.txt"
},
{
"name": "Aalborg",
"startIndex": 285,
"endIndex": 292,
"fileName": "Artikel.txt"
}
]
{
"fileName": "test.txt",
"language": "en",
"sentences": [
{
"sentence": "Hi my name is marc",
"sentenceStartIndex": 0,
"sentenceEndIndex": 47,
"entityMentions": [
{
"name": "marc",
"type": "Entity",
"label": "GPE",
"startIndex": 14,
"endIndex": 18,
"iri": "knox-kb01.srv.aau.dk/marc"
}
]
}
]
}
```


## articlename/entities <sup><span style="color:lightgreen">GET</span></sup>
## /entitymentions/all <sup><span style="color:lightgreen">GET</span></sup>

The `articlename/entities` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. The articlename in the url has to replaced with a name of an article including .txt. When doing a GET request to the endpoint, a JSON Array is returned containing the currently known entitymentions found in the given article name including their indexes. The format of the JSON array is formatted as follows:
The `/entitymentions/all` endpoint is a <span style="color:lightgreen">**GET**</span> endpoint. When doing a <span style="color:lightgreen">**GET**</span> request to the endpoint, a JSON Array is returned containing the all articles with their currently known entitymentions found. The JSON array is formatted as follows:

```JSON
[
{
"name": "ENTITY MENTION",
"startIndex": INT,
"endIndex":INT,
"fileName":"FILENAME.EXTENSION"
},
{
"name": "ENTITY MENTION",
"startIndex": INT,
"endIndex":INT,
"fileName":"FILENAME.EXTENSION"
"fileName": STRING,
"language": STRING,
"sentences": [
{
"sentence": STRING,
"sentenceStartIndex": INT,
"sentenceEndIndex": INT,
"entityMentions": [
{
"name": STRING,
"type": STRING,
"label": STRING,
"startIndex": INT,
"endIndex": INT,
"iri": STRING
}
]
}
]
}
]
```

### Example Output

Here is an example of an output from the endpoint when getting for <span style="color:lightgreen">Artikel.txt/entities</span>. For simplification, only a single file has been processed by the Entity Recognizer and Linker, and just a few of the found entity mentions is shown below:
Here is an example of an output from the endpoint when getting all articles. For simplification, only two files has been processed by the Entity Recognizer and Linker:

```JSON
[
{
"name": "Martin Kjærs",
"startIndex": 28,
"endIndex": 40,
"fileName": "Artikel.txt"
"fileName": "test.txt",
"language": "en",
"sentences": [
{
"sentence": "Hi my name is marc",
"sentenceStartIndex": 0,
"sentenceEndIndex": 47,
"entityMentions": [
{
"name": "marc",
"type": "Entity",
"label": "PERSON",
"startIndex": 14,
"endIndex": 18,
"iri": "knox-kb01.srv.aau.dk/marc"
}
]
}
]
},
{
"name": "Region Nordjylland",
"startIndex": 100,
"endIndex": 118,
"fileName": "Artikel.txt"
},
{
"name": "Aalborg",
"startIndex": 285,
"endIndex": 292,
"fileName": "Artikel.txt"
"fileName": "test2.txt",
"language": "en",
"sentences": [
{
"sentence": "Hi my name is joe",
"sentenceStartIndex": 0,
"sentenceEndIndex": 47,
"entityMentions": [
{
"name": "Joe",
"type": "Entity",
"label": "PERSON",
"startIndex": 14,
"endIndex": 17,
"iri": "knox-kb01.srv.aau.dk/joe"
}
]
}
]
}
]
```



## detectlanguage <sup><span style="color:lightgreen">POST</span></sup>
## /detectlanguage <sup><span style="color:orange">POST</span></sup>
This endpoint expects the given request body to contain some input text and returns its language. It uses the [langdetect](https://pypi.org/project/langdetect/) library.

This endpoint will check the language in the given text.
Send the text in the request body and it will return the language.
The given text has to be longer than 4 characters.
The function will return the lanugage in 2 charaters.
> **_NOTE:_** The function will return the language as a [ISO 639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) code.
### Example
<span style="color:lightgreen">Request body: </span> The man was walking down the street
<span style="color:lightgreen">Response: </span> en
<span style="color:orange">Request body: </span> "The man was walking down the street"\
<span style="color:orange">Response: </span> en


### Constraints
- The given text has to be longer than 4 characters.

### Supported languages
`af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he,
hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl,
pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw`

> **_NOTE:_** see [List of ISO 639-1 codes](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) for more information
10 changes: 8 additions & 2 deletions docs/pypi.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,13 @@ Uploading requires no authentication as the repository is only available when on
When the domain is eventually up, the following twine command is also applicable

```BASH
twine upload -r http://pypi.knox.cs.aau.dk --sign PACKAGENAME.whl
twine upload -r http://pypi.knox.cs.aau.dk:443 --sign PACKAGENAME.whl
```

If you are working from another campus, like CREATE, the following command should also work:

```BASH
twine upload -r http://knox-web01.srv.aau.dk:443 --sign PACKAGENAME.whl
```

## Installing through the repository
Expand All @@ -58,7 +64,7 @@ pip3 install --index-url http://localhost:8081/simple PACKAGE-NAME
If the domain is available simply replace the localhost:8081 with the domain:

```BASH
pip3 install --index-url http://pypi.knox.cs.aau.dk/simle PACKAGE-NAME
pip3 install --index-url http://knox-web01.srv.aau.dk:443/simle PACKAGE-NAME
```

## Creating a whl package from Spacy
Expand Down

0 comments on commit 4fcb3c9

Please sign in to comment.