Datengeist is a streamlit built application which is made to understand unstructured data through visualization of its components. Datengeist is working with .csv files. Datengeist has this key functionalities:
- Categorization of features
- Visualization of distributions
- Convenient handling of missing data
- Tools for feature comparison
To run datengeist you can install via pip
$ pip install datengeist
$ datengeist start
Or you can create a virtual environment and then run it (recommended)
$ python3 -m venv datengeist_env
$ source datengeist_env/bin/activate
$ pip install datengeist
Sample the Dataset is where you can sample data, load it and have your first overview of the data
General Info is where you can divide your features into corresponding categories and view your missing values in each feature
Feature Info is where you can view your features more closely, the distributions and missing value percentage
Relate Features is where you can view the correlation between your features and relate them via box plotting
Apache 2.0