This algorithm is part of VANTAGE6. A docker build of this algorithm can be obtained from harbor.vantage6.ai/algorithms/dsummary |
Algorithm that is inspired by the Summary
function in R. It report the Min
, Q1
, Mean
, Median
, Q3
, Max
and number of Nan
values per column from each Node
.
On top of the functionality provided by IKNL's version of this algorithm, it looks up the amount of NANs in non-numeric columns. It also automatically detects which columns are present and whether they are numeric or categorical.
This algorithm can also be used for SPARQL queries, building this is as simple as
commenting out the standard docker wrapper and uncommenting the SPARQL one. The
query can then be supplied under kwargs['query']
.
🚨 Categorial column with only one category
🚨 Min
an Max
for each column is reported
🚨 Column names are returned
✔️ If column names do not match nothing else is reported
✔️ If dataset has less that 10 rows, no statistical analysis is performed
✔️ Only statistical results Min
, Q1
, Mean
, Median
, Q3
, Max
and number of Nan
values per column are reported.
from vantage6.client import Client
from pathlib import Path
# Create, athenticate and setup client
client = Client("http://127.0.0.1", 5000, "")
client.authenticate("frank@iknl.nl", "password")
client.setup_encryption(None)
# Define algorithm input
# include the columns you want to summarize,
# and specify if they are categorical ("category" or "c") or numeric ("numeric" or "n")
input_ = {
"master": True,
"method":"master",
"args": [],
"kwargs": {
"query": "SELECT * WHERE {?s ?p ?o}"
}
}
# Send the task to the central server
task = client.task.create(name="algo_testing-summary",
image="harbor2.vantage6.ai/testing/summary:latest",
input=input_,
collaboration=1,
organizations=[2],
description=""
)
# Retrieve the results
print("Waiting for results")
task_id = task.get("id")
task_info = client.task.get(task_id)
while not task_info.get("complete"):
task_info = client.task.get(task_id, include_results=True)
print("Waiting for results")
time.sleep(3)
print("Results are ready!")
result_info = client.result.get(task_info.get("results")[0].get("id"))
result = result_info["result"]
You need to have Docker installed.
To Build (assuming you are in the project-directory):
docker build -t harbor.vantage6.ai/algorithms/summary .
To test/run locally the folder local
is included in the repository. The following command mounts these files and sets the docker ENVIROMENT_VARIABLE
DATABASE_URI
.
docker run -e DATABASE_URI=/app/database.csv -v .\local\input.txt:/app/input.txt -v .\local\output.txt:/app/output.txt -v .\local\database.csv:/app/database.csv harbor.vantage6.ai/algorithms/summary