-
-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from MongoDB to Postgres #378
Conversation
New schemaNow comes probably the biggest part of the work, converting the mongodb schema to postgres. I will try to keep things as simple as possible. Instead of creating complex tables for inputs, outputs, I'll just use a JSONB field. So it won't be too different than what we had in mongodb, maybe we will end up with two tables. To compare data, I started the Then, with
This way it's easy to consult the old DB schema to assist with the migration. MigrationGood reference: https://developer.okta.com/blog/2019/02/20/spring-boot-with-postgresql-flyway-jsonb |
321f248
to
734b2c0
Compare
Unit TestsWith exception of one, all unit tests passed. Doesn't mean it is almost working though :-) the one test that failed used to start a MongoDB database for the test. With PostgreSQL, the closest is running Docker containers for tests. Luckily there is the TestContainers project that supports SpringBoot. I've used it successfully today, and it started the Docker containers with each test, then stopped them. Still trying to confirm it created the DDL and executed for tests (won't do it in production), and also need to fix this one test that is failing due to the SQL query executed having some issues. |
One last unit test failing. Almost there 🙏 |
Stashing work for today. Before I knew where the problem was, but didn't know what the problem was. Now I know what the problem is, just trying to figure out a way to fix it :-) We have at least two open problems to be fixed. The first one is in the unit tests that I committed now. The fields in Postgres are being correctly created as However, the Java code appears to be converting the Once that's fixed, I will have to uncomment a part of the JSON deserialization, since it was complaining about fields that are not annotated with the Troubleshooting notes
create table queued_workflow (id varchar(36) not null, cwltool_status jsonb, cwltool_version varchar(1000), message varchar(1000), temp_representation jsonb, workflow_list jsonb, primary key (id));
create table workflow (id varchar(36) not null, cwltool_version varchar(1000), doc varchar(1000), docker_link varchar(1000), inputs jsonb, label varchar(1000), last_commit varchar(255), license_link varchar(1000), outputs jsonb, retrieved_from jsonb, retrieved_on timestamp, ro_bundle_path varchar(1000), steps jsonb, visualisation_dot varchar(1000), primary key (id));
create index IDX14ahubfm3f1ynds84uhdx7ews on workflow (retrieved_on);
alter table workflow add constraint UK13qbig8o1om524ht4txhbbvhf unique (retrieved_from);
# I re-wrote it to simplify querying
insert into queued_workflow (cwltool_status, cwltool_version, message, temp_representation, workflow_list, id)
values ('{"status": "OK"}', 'v1.2', 'message?', '{"name": "buffy"}', '[{"name": "willow"}]', 1);
# Works!
SELECT * FROM queued_workflow q
WHERE
cast(q.temp_representation -> 'name' as text) = '"buffy"';
;
# The last part, with ?1 is a bytea type, need to figure a way that that's converted in jsonb IN SPRING JPA NATIVE
SELECT * FROM queued_workflow q
WHERE q.temp_representation -> 'name' = ?1
# to test the type of a column:
SELECT q.temp_representation, pg_typeof(q.temp_representation -> 'name') FROM queued_workflow q; |
Argh. CI will be broken for a little while, but I have finally understood what was wrong with the Spring repository when trying to query a JSONB type in Postgres. In case another dev needs it (from CWL or from another community/project), here it goes:
Once you have done that, don't touch anything. Commit (fix & squash later) and go enjoy your weekend! 🍻 At least this issue was enough for me to refresh my memory about Spring Data, Hibernate, JSON/Jackson, and get more familiar with the old Mongo structure. This last failing test must be fixed soon, then will compare the 2 databases, and the final stage will commence where I will have to start importing workflows, comparing, planning the migration, etc 👍 -Bruno p.s. a good place to set a breakpoint is in |
b4ad46d
to
0a5a7d0
Compare
Great to see the progress! |
Managed to update the docs and remaining parts of the code that referenced Mongo. Also updated the Finally, the application was successfully initialized, latest spring, Jena Fuseki, and Postgres. I used Docker Compose and the updated docker-compose up --force-recreate
docker-compose down -v I've marked the subtask for Docker as completed! Now I will work on manual tests. First test failed, but that's not too bad. At least now I already have an idea what's broken, and even think I know how to fix it quickly 🙂 |
With the latest commit, we are now able to visualize workflows, and the data gets put into the Postgres DB 🙂 We are almost done now I think! |
Hmmm, Flyway license doesn't seem very good. Might have to use the other migration library that is integrated with Spring Boot 😞
|
@@ -18,9 +18,6 @@ jobs: | |||
ref: ${{ github.event.pull_request.head.ref }} | |||
repository: ${{ github.event.pull_request.head.repo.full_name }} | |||
|
|||
- name: MongoDB in GitHub Actions | |||
uses: supercharge/mongodb-github-action@1.3.0 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed anymore. We are using TestContainers in the QueuedWorkflowRepositoryTest
. That's a Maven dependency that starts a container for that unit test only. We can control when the container is started, destroyed, etc. Later we can think about splitting the tests into groups/suites, so that mvn test
doesn't need to start the container.
This comment was marked as outdated.
This comment was marked as outdated.
…ion with Docker/compose
8b812b4
to
4a5c13b
Compare
This pull request fixes 1 alert when merging 4a5c13b into 15d269f - view on LGTM.com fixed alerts:
|
Ready for review 🎉 working on how to migrate the data now. |
This pull request fixes 1 alert when merging 7426f6b into 15d269f - view on LGTM.com fixed alerts:
|
Woo-hoo! Thank you @kinow ! Were you able to do a dump/restore from view.commonwl.org? https://github.com/common-workflow-language/cwlviewer#dumprestore I would recommend doing this in two phases
That way you know which errors are probably due to these changes versus external resources being down. |
Thanks @mr-c ! And woo-hoo indeed!!! 🥳
Not yet. I spoke with Ward & Peter, and they told me the dump/restore process can be quite lengthy... so I was looking how hard it would be to just dump Mongo and import into PostgreSQL. But not sure how this final part will be done. At least the code changes are done I think. Just a matter of defining how to migrate to the new PostgreSQL DB 🚀 🛏️ now for meeting in ~6:30 hrs 👋 |
Please get good sleep @kinow ! I've done a dump/restore cycle before and yes the restore takes a while. The point would be to stress-test the PR. You could do a full dump & partial restore as a sanity check. Maybe 1%? |
I did a dump/load cycle until my local hard-drive filled up. Almost reached 1%! Here's the new dump (the subset that successfully loaded). Going to load it into the codebase from this PR now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After loading up ~210 workflows, this seems to work!
Whew!!! Thanks for testing it 👏 🍾 🥳 🎂 How long did it take to load these workflows, BTW? |
Closes #254
Description
Replacing MongoDB by Postgres in CWLViewer.
pom.xml
dependencies, removing Mongo, adding Postgres + Hibernate (or data/jdbc)Motivation and Context
Licensing issues with MongoDB, see linked issue.
How Has This Been Tested?
mvn spring-boot:run
to trigger the flyway migrationsScreenshots (if appropriate):
Types of changes
Manual tests
WorkflowRepository
findByRetrievedFrom
(tested by submitting thecompile.wdl
example Workflow in the landing page)findByCommitAndPath
(tested by accessinghttp://localhost:8080/git/767d700e602805112a4c953d166e570cddfa2605/workflows/compile/compile1.cwl?part=main&format=jsonld
)findByCommit
(tested by accessinghttp://localhost:8080/git/767d700e602805112a4c953d166e570cddfa2605/workflows/compile/compile1.cwl?format=raw
)findAllByOrderByRetrievedOnDesc
(tested by clicking the "Explore" top menu link)findByLabelContainingOrDocContainingIgnoreCase
(tested by searching for "aaa" and "compile" in the search bar, after submitting thecompile.cwl
workflow)QueuedWorkflowRepositoryImpl
findByRetrievedFrom
(tested in unit testQueuedWorkflowRepositoryTest.java
)deleteByTempRepresentation_RetrievedFrom
(tested in unit testQueuedWorkflowRepositoryTest.java
)QueuedWorkflowRepository
deleteByTempRepresentation_RetrievedOnLessThanEqual
(tested by adding a queued workflow manually, below, and changing the scheduler to temporarily run every minute inapplication.properties
)findByTempRepresentation_RetrievedOnLessThanEqual
(not used??)To add a really old queued workflow:
Then query the table:
And wait for the log line to confirm, or use a debugger. The log line should look like:
Checklist: