Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KB expansion #133

Merged
merged 30 commits into from
Jul 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
4c271e8
Create images
divsan93 Jun 8, 2022
9ddda16
Delete images
divsan93 Jun 8, 2022
0e0021c
Adding the script for KG augmentation
divsan93 Jul 6, 2022
6700789
latest database file
divsan93 Jul 6, 2022
ef18b2c
Adding new images URL's
divsan93 Jul 6, 2022
aa84f29
Adding code for image search and updating the readme
divsan93 Jul 6, 2022
f1eaaa5
fixed NULL value for db
divsan93 Jul 12, 2022
b4a3e89
integration code
lamwassi Jul 13, 2022
567afff
removed print statements
lamwassi Jul 13, 2022
45a79ec
latest image search. Fixed False/True for docker_images
lamwassi Jul 14, 2022
f7766d5
changed Os to OS
lamwassi Jul 14, 2022
06a5652
Delete kg_utils/test directory
lamwassi Jul 14, 2022
307878e
Delete kg_utils/src directory
lamwassi Jul 14, 2022
32e26a5
Removed last column(Other operator)
lamwassi Jul 14, 2022
5c6445a
fixed columns
lamwassi Jul 14, 2022
17dee25
Removed 'Other Operators' from operator_images table
lamwassi Jul 19, 2022
700af55
Removed 'Other operator' column operator_images table
lamwassi Jul 19, 2022
0e9f345
added comments
lamwassi Jul 19, 2022
2d4029e
latest
lamwassi Jul 19, 2022
56814e3
Added test_csv_columns , test_filter_entity
lamwassi Jul 19, 2022
c41ce8f
latest search. Removed 'Other operator' from operator_images.csv
lamwassi Jul 19, 2022
8f7ca1f
updated README
lamwassi Jul 20, 2022
be064b8
Changed default OS to 576(Linux|*)
lamwassi Jul 21, 2022
2c18251
Adding test case for KG augmentation
divsan93 Jul 20, 2022
6647126
Creating a requirements.txt in the root folder
divsan93 Jul 21, 2022
00bcecd
latest
lamwassi Jul 21, 2022
7da7c91
Removed comments
lamwassi Jul 21, 2022
6dcd18b
Updated Readme
divsan93 Jul 21, 2022
c669294
[kgaug test] fixing the test output
divsan93 Jul 22, 2022
53b0756
Adding word2number package
divsan93 Jul 22, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,11 @@ RUN groupadd --gid $USER_GID $USERNAME \
WORKDIR /app
RUN python -m pip install --upgrade pip setuptools wheel

COPY service/requirements.txt .

RUN pip --no-cache-dir install -U pip wheel && \
pip --no-cache-dir install -r requirements.txt

# Enable color terminal for docker exec bash
ENV TERM=xterm-256color

Expand Down
5 changes: 5 additions & 0 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@
"remoteEnv": {
"FLASK_ENV": "development"
},

"features": {
"docker-from-docker": "latest",
"podman-from-podman": "latest"
} ,
"extensions": [
"VisualStudioExptTeam.vscodeintellicode",
"alexkrechik.cucumberautocomplete",
Expand Down
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,14 @@ COPY ./service /app/service
COPY ./kg /app/kg
COPY ./config /app/config
COPY ./entity_standardizer /app/entity_standardizer
COPY ./requirements.txt /app/requirements.txt
RUN python -m pip install --upgrade pip wheel build setuptools; \
pip install -r entity_standardizer/requirements.txt; \
cd entity_standardizer; python -m build; pip install dist/entity_standardizer_tca-1.0-py3-none-any.whl; cd ..; \
pip install -r service/requirements.txt; \
pip install -r /app/requirements.txt; \
python benchmarks/generate_data.py; \
python benchmarks/run_models.py;

RUN chown -R 1001:0 ./

# Become a non-root user again
Expand Down
18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,19 +31,19 @@ For OpenShift, TCA generates the following images.

## TCA Pipeline

<img width="1000" alt="Screen Shot 2021-07-29 at 4 10 10 PM" src="https://user-images.githubusercontent.com/8302569/127559151-bc9f3176-fcc4-4032-a0b7-ba1a29212b5b.png">
<img width="1000" alt="TCA Pipeline" src=https://github.com/konveyor/tackle-container-advisor/blob/main/images/tca_pipeline.png>

The pipeline ingests raw inputs from clients data and standardizes the data to generate named entities and versions. For standardizing or normalizing raw inputs we use a tf-idf similarity based approach. To find container images we represent images in terms of named entities as well. The normalized representation helps to match legacy applications with container images to suggest the best possible recommendations.

## Setting up your environment

Requires Python >= 3.6 environment. You cannot run this code without having a proper
Python environment first. We recommend that you follow the instructions
Requires Python >= 3.6 environment. You cannot run this code without having a proper
Python environment first. We recommend that you follow the instructions
in the [Developer's Guide](docs/development.md) before proceeding further.

## Running TCA as a service

There are 4 options for deploying TCA as a service.
There are 4 options for deploying TCA as a service.

1. Install the service requirements and start the service from command line.

Expand All @@ -52,12 +52,12 @@ Requires *gunicorn* standalone installation on your system.
bash setup.sh
gunicorn --workers=2 --threads=500 --timeout 300 service:app
OR
waitress-serve --listen=*:8000 service:app
waitress-serve --listen=*:8000 service:app
```

2. Running the service as a container.
2. Running the service as a container.

Using a bash script.
Using a bash script.
```
bash run.sh
```
Expand All @@ -82,7 +82,7 @@ bash deploy.sh

## Run a performance test for TCA service
A performance test measures the response time of TCA service under
various load conditions. Before running
various load conditions. Before running
performance test, update *config/test.ini* with the hostname
and port where TCA service has been deployed

Expand All @@ -101,7 +101,7 @@ Please perform the following steps.
version = <new_db>

3. Modify the *setup.sh* and *clean.sh* scripts to reflect the version accordingly.

version=<new_db>

4. Re-run *setup.sh* and then deploy the service.
Expand Down
16 changes: 16 additions & 0 deletions config/kg.ini
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,19 @@ inverted_operatorimageKG = inverted_operatorimageKG.json
inverted_compatibilityKG = inverted_compatibilityKG.json
compatibilityKG = compatibilityKG.json
compatibilityOSKG = compatibilityOSKG.json

[database]
database_path = /app/db/1.0.4.db

[quay]
max_pages = 2
page_increments = 1
find_path = find/repositories
quay_api = https://quay.io/api/v1/
top_popular_images = 5

[dockerhub]

top_relevant = 40

[operators]
128 changes: 64 additions & 64 deletions db/1.0.3.sql

Large diffs are not rendered by default.

26,552 changes: 12,917 additions & 13,635 deletions db/1.0.4.sql

Large diffs are not rendered by default.

24 changes: 12 additions & 12 deletions db/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@

We represent the knowledge base in terms of a database. Below we provide an entity-relationship diagram.

<img width="800" alt="ER_DIAGRAM" src="https://user-images.githubusercontent.com/85893516/139103279-3020f7cd-6304-40e4-908b-f3349bf1b440.png">
<img width="800" alt="ER_DIAGRAM" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/ER_diagram.png">


### Setting up TCA's Knowledge Base
Expand All @@ -35,7 +35,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti

##### This contains all the entity types present in our taxonomy. Under each entity type we define entities. For example, the OS entity type contains the Linux|RedHat Linux as an entity.

<img width="200" alt="Screen Shot 2021-06-08 at 1 48 30 PM" src="https://media.github.ibm.com/user/26986/files/4fbb7400-c861-11eb-837b-371341c7e0a5">
<img width="200" alt="Entity Types" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/entity_types.png">

##### A new entry can be added as

Expand All @@ -46,7 +46,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti

##### This contains all the named entities along with their types and mappings to Wikidata or DBPedia. The scores are obtained based on an entity linking algorithm.

<img width="800" alt="Screen Shot 2021-06-08 at 1 50 10 PM" src="https://media.github.ibm.com/user/26986/files/5cd86300-c861-11eb-8515-b39be9d5480a">
<img width="800" alt="Entities" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/entities.png">

##### A new entry can be added as

Expand All @@ -65,7 +65,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti

##### This contains mappings of raw mentions with their entities. Each entity could have multiple mentions. For example, Apache Tomcat can be called as Tomcat or Apache Tomcat.

<img width="500" alt="Screen Shot 2021-06-08 at 1 50 25 PM" src="https://media.github.ibm.com/user/26986/files/695cbb80-c861-11eb-9a01-b380305fa501">
<img width="500" alt="Entity Mentions" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/entity_mentions.png">

##### A new entry can be added as

Expand All @@ -75,7 +75,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti

##### This contains mappings of entities based on their compatibilities. For example, a relation might exists between Linux|* and Apache Tomcat which suggest Apache Tomcat is compatible with different variants of Linux such as RedHat Linux, Ubuntu, CentOS and so on.

<img width="800" alt="Screen Shot 2021-07-09 at 2 21 12 PM" src="https://user-images.githubusercontent.com/8302569/125120916-280a3680-e0c1-11eb-9347-8d3c62820534.png">
<img width="800" alt="Entity Relations" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/operator_images.png">

##### A new entry can be added as

Expand All @@ -85,7 +85,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti

##### This contains Docker specific base OS images. For example, RedHat Linux along with its mapping a DockerHub image.

<img width="1000" alt="Screen Shot 2021-07-09 at 2 21 35 PM" src="https://user-images.githubusercontent.com/8302569/125120953-39534300-e0c1-11eb-927b-c5527c028886.png">
<img width="1000" alt="Docker Base OS Images" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/docker_baseos.png">

##### A new entry can be added as

Expand All @@ -96,7 +96,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti

##### This contains Openshift specific base OS images. For example, RedHat Linux along with its mapping a OpenShift image.

<img width="1000" alt="Screen Shot 2021-07-09 at 2 21 52 PM" src="https://user-images.githubusercontent.com/8302569/125121014-4cfea980-e0c1-11eb-8bab-f20db5039dc6.png">
<img width="1000" alt="Openshift Base OS Images" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/OS_baseos.png">

##### A new entry can be added as

Expand All @@ -107,7 +107,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti

##### This contains Docker specific images. For example, Apache Tomcat long with its mapping a DockerHub image.

<img width="1000" alt="Screen Shot 2021-07-09 at 2 22 07 PM" src="https://user-images.githubusercontent.com/8302569/125121060-5c7df280-e0c1-11eb-9b3c-305efde20b2a.png">
<img width="1000" alt="Docker Images" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/docker_images.png">

##### A new entry can be added as

Expand All @@ -118,7 +118,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti

##### This contains OpenShift specific images. For example, Apache Tomcat long with its mapping a OpenShift image.

<img width="1000" alt="Screen Shot 2021-07-09 at 2 22 26 PM" src="https://user-images.githubusercontent.com/8302569/125121100-6acc0e80-e0c1-11eb-9193-19297a7e1c67.png">
<img width="1000" alt="Openshift Images" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/OS_images.png">

##### A new entry can be added as

Expand All @@ -127,15 +127,15 @@ We represent the knowledge base in terms of a database. Below we provide an enti
**9. entity versions**
##### This contains versions and licensing costs for all entities.

<img width="1000" alt="entity_versions" src="https://user-images.githubusercontent.com/85893516/139103250-90bba02d-0689-49f9-9436-e7ca2896ecaa.png">
<img width="1000" alt="entity_versions" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/entity_versions.png">

##### A new entry can be added as
INSERT INTO entity_versions (id, entity_id, version, release_date, end_date, cost) VALUES (?,?,?,?,?,?)

**10. docker environment variable**
##### This contains environment variables for all docker images.

<img width="1000" alt="Docker_env_var" src="https://user-images.githubusercontent.com/85893516/139103125-cef00aee-3add-4239-8547-f85dc5bb6aa4.png">
<img width="1000" alt="Docker_env_var" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/docker_env.png">


##### A new entry can be added as
Expand All @@ -145,7 +145,7 @@ We represent the knowledge base in terms of a database. Below we provide an enti
**11. operator images**
##### This contains operator specific images. For example, Postgresql along with its mapping a operator image

<img width="1000" alt="operators" src="https://user-images.githubusercontent.com/85893516/139103333-fed5a630-5083-4be7-b8a6-cf4361563f50.png">
<img width="1000" alt="operators" src="https://github.com/konveyor/tackle-container-advisor/blob/main/images/operator_images.png">

##### A new entry can be added as
INSERT INTO operator_images(container_name, OS, lang, lib, app, app_server, plugin, runlib, runtime, Operator_Correspondent_Image_URL, Operator_Repository, Other_Operators) VALUES(?,?,?,?,?,?,?,?,?,?,?,?)
109 changes: 102 additions & 7 deletions kg_utils/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,23 @@

Python scripts to generate JSON from Database

### Install Anaconda3
- Follow instructions to download and install Anaconda3
### Install Anaconda3
- Follow instructions to download and install Anaconda3

### Create conda virtual environment
# Requires python 3.8
conda create --name <env-name> python=3.8
conda activate <env-name>
### Clone TCA
### Clone TCA
git clone git@github.com:konveyor/tackle-container-advisor.git

### How to use
- ``cd tackle-container-advisor``
- ``pip3 install -r requirements.txt``
- ``cd kg_utils``
- ``cd kg_utils``
- ``db`` provides the input data
- From top level folder run ``python kg_utils/kg_utils.py`` and ``python kg_utils/generator.py``
- Outputs json are saved in: ``kg/``
- Outputs json are saved in: ``kg/``


### Generate documentation:
Expand All @@ -28,12 +28,107 @@ Python scripts to generate JSON from Database
- Setting up conf.py:
* Uncomment ``import os`` and ``import sys``
* Uncomment and Change path: ``sys.path.insert(0, os.path.abspath('..'))``

* In the ``# -- General configuration ---`` field, add ``extensions = ['sphinx.ext.autodoc']``

* In the ``# -- Options for HTML output ---`` field, add ``html_theme = 'sphinx_rtd_theme'``
- Setting up index.rst:
Add ``modules`` after line 11
- Run ``sphinx-apidoc -o . ..``
- Run ``make html``
- Documentation is located in ``/docs/_build/html/index.html``

## TCA_image_search

Allows users to search relevant or exact container images from DockerHub ,Quai.io , and Artifacthub.io(Community Operators OLM) registries.

## Search urls:

- Dockerhub
Dockerhub repository: [Dockerhub images](https://hub.docker.com/).

- RedHat OpenShift: [RedHat Quay](https://quay.io/search),

- Redhat OperatorHub: [Artifact.io](https://artifacthub.io/)

## Getting started

#### Prerequisite

- Use [TCA_KG_Aumentation](https://github.ibm.com/tca-team/TCA_KG_Augmentation) repos to augmment new entities to the database by following the instructions from the README file. You can add a single entity or a batch of entities from a csv file to the entities table.
- Make sure you have [docker](https://docs.docker.com/engine/install/) installed locally.

- The path "db\{db_version}.db", where "db_version" is the latest TCA database version and contains all entities to search(see "entity_name" table) for images.

- In VSCODE, Open the folder in a container, which will install all dependencies needed to run the script.

#### Input data to the search engine.

Data are loaded from the entities table from the database. You may search a single entity or all entities from the database.

Sample entities table

![This is an image](https://github.com/divsan93/tackle-container-advisor/blob/kb_expansion/images/entities.png)

#### Running the script

```python kg_utils/search_images.py -e <entity_name(s)> -db db\{db_version}.db``` This loads entity(ies) from the entity_name table and searches images across all catalogues.

```
python kg_utils/search_images.py -h

usage: Search container images from dockerhub , Quay.io , and Artifacthub.io

optional arguments:

-h, --help show this help message and exit

-e ENTITY, --entity ENTITY Enter entity name(s) from the database. i.e :-e nginx,tomcat,ubuntu or -e all ( to search all entities). Also enclose entities with

double words in a quote. For example: -e 'ibm i',db2,'Apache Kafka'

-db DATABASE_PATH, --database_path DATABASE_PATH Path containing the latest tackle containerization advisor database.

Try $python -e <entity_names> -db <database_path"> or type $python src/search_images.py --help

```

Results are saved into the following files: ```kg_utils\image_search_kg\images.json``` , ```kg_utils\image_search_kg\operator_images.csv``` ,```kg_utils\image_search_kg\openshift_images.csv``` ,and ```kg_utils\image_search_kg\docker_images.csv```



### KG Augmentation

##### TCA KG Augmentation script allows a semi-automatic way of ingesting data into the TCA Knowledge Base

##### The command line script (kg_aug.py) can be executed in 2 modes:

1. Interactive - This mode can be used for processing single entry. It allows the user to choose a table by listing all the tables in the TCA database. Based on the chosen table, the user can interactively enter values for the vaious fields. All the fields are listed along with the field type (e.g.- integer, text etc.) For fields that accept only certain values, the acceptable values are also displayed along with the field name. Some automatic checks are perfomed as the user enters a value for a particular field and if the value does not match the specified type or acceptable list of values mentioned, there is a prompt to re-enter the value for that particular field.

2. Batch - Batch mode is used for processing multiple entries. The input is a csv file with multiple values that need to be entered into the database. The format to enter values in the csv is as follows: Table_name, value1, value2,..., value n

The id field for all the tables is auto generated so the user does not have to specify it. As every entry is processed there is a automatic check for dulpicate entries. If a duplicate entry is found it is skipped and not inserted into the database. For every new entity thats inserted into KG, the mentions are automatically generated from wikidata.

##### A sample csv file (input.csv) has been uploaded for reference - input.csv

### Running the script
##### To run the script you can use one of the following commands based:

1. Interactive mode: python kg_aug.py -m interactive -d 1.0.4.db
2. Batch mode: python kg_aug.py -m batch -b input.csv -d 1.0.4.db

##### The -m indicated the mode (interactive or batch), -b points to the csv file for batch processing and -d specifies the database file.

#### Usage
```
usage: kg_aug.py [-h] -m MODE -d DB_FILE [-b BATCH_FILE] [-r DEL_FILE]

modify KB by adding or deleting entities

optional arguments:
-h, --help show this help message and exit
-m MODE mode: interactive or batch
-d DB_FILE database file (.db) path
-b BATCH_FILE batch file (.txt) path
-r DEL_FILE entities to delete/replace list file (.csv) path
```
Loading