Skip to content

Commit

Permalink
Merge pull request #46 from GSA/harvest-db-update
Browse files Browse the repository at this point in the history
Harvest db update
  • Loading branch information
Jin-Sun-tts authored Mar 20, 2024
2 parents 8d5bd69 + eedec56 commit f60156a
Show file tree
Hide file tree
Showing 33 changed files with 843 additions and 268 deletions.
37 changes: 19 additions & 18 deletions .github/workflows/commit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -74,25 +74,26 @@ jobs:
- name: deploy DHL
uses: cloud-gov/cg-cli-tools@main
with:
command: cf push datagov-harvesting-logic --vars-file vars.development.yml --strategy rolling --no-wait
cf_org: gsa-datagov
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
- name: smoke test
uses: cloud-gov/cg-cli-tools@main
with:
command: cf run-task datagov-harvesting-logic -c "/home/vcap/app/scripts/smoke-test.py" --name smoke-test
cf_org: gsa-datagov
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
- name: monitor task output
uses: cloud-gov/cg-cli-tools@main
with:
command: >
scripts/monitor-cf-logs.sh datagov-harvesting-logic smoke-test
command: cf push --vars-file vars.development.yml --strategy rolling --no-wait
cf_org: gsa-datagov
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
# to-do
# - name: smoke test
# uses: cloud-gov/cg-cli-tools@main
# with:
# command: cf run-task harvesting-logic -c "/home/vcap/app/scripts/smoke-test.py" --name smoke-test
# cf_org: gsa-datagov
# cf_space: ${{vars.ENVIRONMENT_NAME}}
# cf_username: ${{secrets.CF_SERVICE_USER}}
# cf_password: ${{secrets.CF_SERVICE_AUTH}}
# - name: monitor task output
# uses: cloud-gov/cg-cli-tools@main
# with:
# command: >
# scripts/monitor-cf-logs.sh harvesting-logic smoke-test
# cf_org: gsa-datagov
# cf_space: ${{vars.ENVIRONMENT_NAME}}
# cf_username: ${{secrets.CF_SERVICE_USER}}
# cf_password: ${{secrets.CF_SERVICE_AUTH}}
70 changes: 36 additions & 34 deletions .github/workflows/deploy.yml
Original file line number Diff line number Diff line change
Expand Up @@ -39,23 +39,24 @@
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
- name: smoke test
uses: cloud-gov/cg-cli-tools@main
with:
command: cf run-task harvesting-logic -c "/home/vcap/app/scripts/smoke-test.py" --name smoke-test
cf_org: gsa-datagov
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
- name: monitor task output
uses: cloud-gov/cg-cli-tools@main
with:
command: >
scripts/monitor-cf-logs.sh harvesting-logic smoke-test
cf_org: gsa-datagov
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
# to-do
# - name: smoke test
# uses: cloud-gov/cg-cli-tools@main
# with:
# command: cf run-task harvesting-logic -c "/home/vcap/app/scripts/smoke-test.py" --name smoke-test
# cf_org: gsa-datagov
# cf_space: ${{vars.ENVIRONMENT_NAME}}
# cf_username: ${{secrets.CF_SERVICE_USER}}
# cf_password: ${{secrets.CF_SERVICE_AUTH}}
# - name: monitor task output
# uses: cloud-gov/cg-cli-tools@main
# with:
# command: >
# scripts/monitor-cf-logs.sh harvesting-logic smoke-test
# cf_org: gsa-datagov
# cf_space: ${{vars.ENVIRONMENT_NAME}}
# cf_username: ${{secrets.CF_SERVICE_USER}}
# cf_password: ${{secrets.CF_SERVICE_AUTH}}
- name: Create Issue if it fails 😢
if: ${{ failure() && github.ref == 'refs/heads/main' }}
uses: JasonEtco/create-an-issue@v2
Expand Down Expand Up @@ -96,23 +97,24 @@
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
- name: smoke test
uses: cloud-gov/cg-cli-tools@main
with:
command: cf run-task harvesting-logic -c "/home/vcap/app/scripts/smoke-test.py" --name smoke-test
cf_org: gsa-datagov
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
- name: monitor task output
uses: cloud-gov/cg-cli-tools@main
with:
command: >
scripts/monitor-cf-logs.sh harvesting-logic smoke-test
cf_org: gsa-datagov
cf_space: ${{vars.ENVIRONMENT_NAME}}
cf_username: ${{secrets.CF_SERVICE_USER}}
cf_password: ${{secrets.CF_SERVICE_AUTH}}
# to-do
# - name: smoke test
# uses: cloud-gov/cg-cli-tools@main
# with:
# command: cf run-task harvesting-logic -c "/home/vcap/app/scripts/smoke-test.py" --name smoke-test
# cf_org: gsa-datagov
# cf_space: ${{vars.ENVIRONMENT_NAME}}
# cf_username: ${{secrets.CF_SERVICE_USER}}
# cf_password: ${{secrets.CF_SERVICE_AUTH}}
# - name: monitor task output
# uses: cloud-gov/cg-cli-tools@main
# with:
# command: >
# scripts/monitor-cf-logs.sh harvesting-logic smoke-test
# cf_org: gsa-datagov
# cf_space: ${{vars.ENVIRONMENT_NAME}}
# cf_username: ${{secrets.CF_SERVICE_USER}}
# cf_password: ${{secrets.CF_SERVICE_AUTH}}
- name: Create Issue if it fails 😢
if: ${{ failure() && github.ref == 'refs/heads/main' }}
uses: JasonEtco/create-an-issue@v2
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,4 +24,5 @@ tmp/
# vscode debugger
.vscode/
.env
requirements.txt

4 changes: 3 additions & 1 deletion .profile
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,6 @@ function vcap_get_service () {
export APP_NAME=$(echo $VCAP_APPLICATION | jq -r '.application_name')

export URI=$(vcap_get_service aws-rds .credentials.uri)
export DATABASE_URI=$(echo $URI | sed 's/postgres:\/\//postgresql:\/\//g')
export DATABASE_URI=$(echo $URI | sed 's/postgres:\/\//postgresql:\/\//g')

flask db upgrade
5 changes: 4 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,7 @@ RUN pip install --no-cache-dir -r requirements.txt

EXPOSE 8080

CMD ["python", "app.py"]
ENV FLASK_APP=run.py

# Run run.py when the container launches
CMD ["flask", "run", "--host=0.0.0.0", "--port=8080"]
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,14 @@ If you followed the instructions for `CKAN load testing` and `Harvester testing`

This will start the necessary services and execute the test.

3. when there are database DDL changes, use following steps to generate migration scripts and update database:

```bash
docker-compose db up
docker-compose run app flask db migrate -m "migration description"
docker-compose run app flask db upgrade
```

### Deployment to cloud.gov

#### Database Service Setup
Expand Down Expand Up @@ -143,4 +151,10 @@ Accessing the service can be done with service keys. They can be created with `c
```bash
poetry export -f requirements.txt --output requirements.txt --without-hashes
cf push --vars-file vars.development.yml
```
```
3. when there are database DDL changes, use following to do the database update:
```bash
cf run-task harvesting-logic --command "flask db upgrade" --name database-upgrade
```
22 changes: 22 additions & 0 deletions app/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
from flask import Flask
from .models import db
from flask_migrate import Migrate
import os
from dotenv import load_dotenv

load_dotenv()

DATABASE_URI = os.getenv('DATABASE_URI')

def create_app():
app = Flask(__name__)
app.config['SQLALCHEMY_DATABASE_URI'] = DATABASE_URI
db.init_app(app)

# Initialize Flask-Migrate
Migrate(app, db)

from .routes import register_routes
register_routes(app)

return app
22 changes: 22 additions & 0 deletions app/flask-app-structure.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
DATAGOV-HARVESTING-LOGIC
├── app/
│ ├── __init__.py
│ ├── models.py
│ ├── routes.py
│ ├── forms.py (to-do)
│ └── templates/
│ ├── index.html
│ ├── harvest_source_form.html (to-do)
│ └── xxx.html (to-do)
│ └── static/
│ └── styles.css (to-do)
├── migrations/
│ └── versions/
│ ├── alembic.ini
│ ├── env.py
│ └── script.py.mako
├── docker-compose.yml
├── Dockerfile
└── run.py
18 changes: 16 additions & 2 deletions harvester/database/interface.py → app/interface.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from sqlalchemy import create_engine, inspect
from sqlalchemy.orm import sessionmaker, scoped_session
from harvester.database.models import HarvestSource, HarvestJob, HarvestError
from app.models import Organization, HarvestSource, HarvestJob, HarvestError
from . import DATABASE_URI

class HarvesterDBInterface:
Expand All @@ -19,13 +19,27 @@ def _to_dict(obj):
return {c.key: getattr(obj, c.key)
for c in inspect(obj).mapper.column_attrs}

def add_harvest_source(self, source_data):
def add_organization(self, org_data):
new_org = Organization(**org_data)
self.db.add(new_org)
self.db.commit()
self.db.refresh(new_org)
return new_org

def add_harvest_source(self, source_data, org_id):
source_data['organization_id'] = org_id
new_source = HarvestSource(**source_data)
self.db.add(new_source)
self.db.commit()
self.db.refresh(new_source)
return new_source

def get_all_organizations(self):
orgs = self.db.query(Organization).all()
orgs_data = [
HarvesterDBInterface._to_dict(org) for org in orgs]
return orgs_data

def get_all_harvest_sources(self):
harvest_sources = self.db.query(HarvestSource).all()
harvest_sources_data = [
Expand Down
82 changes: 82 additions & 0 deletions app/models.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy.dialects.postgresql import UUID, ARRAY
from sqlalchemy.sql import text
from sqlalchemy import Enum
from sqlalchemy.schema import UniqueConstraint

db = SQLAlchemy()

class Base(db.Model):
__abstract__ = True # Indicates that this class should not be created as a table
id = db.Column(UUID(as_uuid=True), primary_key=True,
server_default=text("gen_random_uuid()"))

class Organization(Base):
__tablename__ = 'organization'

name = db.Column(db.String(), nullable=False, index=True)
logo = db.Column(db.String())

class HarvestSource(Base):
__tablename__ = 'harvest_source'

name = db.Column(db.String, nullable=False)
notification_emails = db.Column(ARRAY(db.String))
organization_id = db.Column(UUID(as_uuid=True),
db.ForeignKey('organization.id'),
nullable=False)
frequency = db.Column(db.String, nullable=False)
url = db.Column(db.String, nullable=False, unique=True)
schema_type = db.Column(db.String, nullable=False)
source_type = db.Column(db.String, nullable=False)
jobs = db.relationship('HarvestJob', backref='source')

class HarvestJob(Base):
__tablename__ = 'harvest_job'

harvest_source_id = db.Column(UUID(as_uuid=True),
db.ForeignKey('harvest_source.id'),
nullable=False)
status = db.Column(Enum('new', 'in_progress', 'complete', name='job_status'),
nullable=False,
index=True)
date_created = db.Column(db.DateTime, index=True)
date_finished = db.Column(db.DateTime)
records_added = db.Column(db.Integer)
records_updated = db.Column(db.Integer)
records_deleted = db.Column(db.Integer)
records_errored = db.Column(db.Integer)
records_ignored = db.Column(db.Integer)
errors = db.relationship('HarvestError', backref='job', lazy=True)

class HarvestError(Base):
__tablename__ = 'harvest_error'

harvest_job_id = db.Column(UUID(as_uuid=True),
db.ForeignKey('harvest_job.id'),
nullable=False)
harvest_record_id = db.Column(db.String)
# to-do
# harvest_record_id = db.Column(UUID(as_uuid=True),
# db.ForeignKey('harvest_record.id'),
# nullable=True)
date_created = db.Column(db.DateTime)
type = db.Column(db.String)
severity = db.Column(Enum('CRITICAL', 'ERROR', 'WARN', name='error_serverity'),
nullable=False,
index=True)
message = db.Column(db.String)

class HarvestRecord(Base):
__tablename__ = 'harvest_record'

job_id = db.Column(UUID(as_uuid=True),
db.ForeignKey('harvest_job.id'),
nullable=False)
identifier = db.Column(db.String(), nullable=False)
ckan_id = db.Column(db.String(), nullable=False, index=True)
type = db.Column(db.String(), nullable=False)
source_metadata = db.Column(db.String(), nullable=True)
__table_args__ = (
UniqueConstraint('job_id', 'identifier', name='uix_job_id_identifier'),
)
Loading

1 comment on commit f60156a

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coverage

Coverage Report
FileStmtsMissCoverMissing
harvester
   __init__.py50100% 
   ckan_utils.py4222 95%
   exceptions.py420100% 
   harvest.py4256565 85%
   logger_config.py10100% 
   utils.py3522 94%
TOTAL5506987% 

Tests Skipped Failures Errors Time
28 0 💤 0 ❌ 0 🔥 2.153s ⏱️

Please sign in to comment.