-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix squonk job execution #1559
Comments
@tdudgeon says that the work needed depends largely on whether the shared filesystem between Fragalysis and Squonk is needed. @phraenquex says that the first step is to get the job submission working again and that conversations with SC have ruled out a shared file system between the Fragalysis stack and thread for job execution @alanbchristie A schematic of the architecture for Fragalysis and Squonk would help immensely |
Here are some references for the original design: - The original high-level design for Squonk Job Execution in Fragalysis: - A simplified low-level design document (Fragalysis/Squonk AC (937) LLD) A brief but practical documentation on how to use Job Execution on the b/e ReadTheDocs site: - An issue discussing the global Job Configuration Other related issues include: -Fix Squonk access control APIjob_request: https://fragalysis.xchem.diamond.ac.uk/api/job_request/ Views (in viewer.views)
Basic flow
Transferring Target files (to squonk)After a lot of "preamble" logic the JobFileTransferView: -
Uploading Job results (from squonk)The JobCallbackView received notifications about completed Jobs. When a job is complete: -
These four functions retrieve the SD file (and JSON parameters) from Squonk and 'format' the content to better fit Fragalysis, then simply validate and process the file (using EffortIf Squonk Job execution remains the same, as does the configuration, then the only things broken will be the file transfer's ability to locate the files. We think the RHS upload will still work, but might need an adjustment. In summary: -
Maybe 2-3 days? |
@alanbchristie asks if the f/e feature for starting squonk jobs is still there @boriskovar-m2ms says it's there but won't work with the new data format, and expects there to be several issues |
Before we start to tackle the broken Squonk/2 job execution I think it would be extremely valuable if we first used the existing fragalysis-stack-behaviour-tests repo to write a small number of behaviour (gherkin) CI tests (based on a clean deployment of the latest stack code). The repo is functional and is already able to create stacks and run tests against the Fragalysis Stack REST API. So let's use this repository to write some business-level tests that will help to document the feature and define its precise behaviour. Tests that will: -
While we're doing this any work we do is clearly going to benefit XCHem-Align automated (CI) testing as we'll build the foundations of a test suite that could test all sorts of future target loading behaviour (two birds, one stone?).
A target loader "behaviour" testA typical (and real) behaviour test for target loading might look like this: - Feature: Verify legitimate Public Targets can be loaded into the stack
This test ensures that 'well known' target data (generated by XChem-Align)
can be loaded into a new stack against the public proposal,
while also verifying the results. We rely on a clean stack
and a CAS-authenticated user.
Background: Start with a new (empty) stack
Given an empty behaviour stack tagged latest
Then the stack landing page should return http 200
Scenario Template: Load a public target
Given I can login to the behaviour stack
When I load the target <target> from <file> under TAS <tas>
Then the operation should return a status of CREATED
And the operation should return a Task ID
And the Task should complete within 10 minutes
And I should find target <target> in the stack against TAS <tas>
Examples:
| target | file | tas |
| A71EV2A | A71EV2A.zip | lb18145-1 | A Squonk/2 job execution "behaviour" testA typical (and real) behaviour test for Squonk/2 job execution might start-out looking like this: - Feature: Verify the stack can run Squonk/2 Jobs
Here we check that a properly configured to run Squonk/2 Jobs.
Tests typically load target data, run a Job and verify the results.
Background: Start with a new (empty) stack
Given an empty behaviour stack tagged latest
Then the stack landing page should return http 200
And the stack should support Squonk/2 Job execution
Scenario Template: Run a Squonk/2 Job
Given I can login to the behaviour stack
And I load the target <target> from <file> under TAS <tas>
And the target has been loaded successfully
When I run job <job> on the target <target>
Then the operation should return a status of CREATED
And the operation should return a Job Request ID
And the Job should complete within 10 minutes
And I should find the result loaded into the stack
Examples:
| target | file | tas | job |
| A71EV2A | A71EV2A.zip | lb18145-1 | Job XYZ | |
@alanbchristie clarifies that the behaviour testing has been implemented in another repository and is able to create a fresh stack on the development cluster for testing. @tdudgeon please provide @alanbchristie with the two test data sets from XCA #1588 and help him write a minimal LHS upload test |
@alanbchristie has implemented daily automated testing of two XChem targets in the [fragalysis-stack-behaviour-tests](https://github.com/xchem/fragalysis-stack-behaviour-tests repository) @alanbchristie says that @kaliif will be needed to help locate the files needed for job executing in the v2 schema @alanbchristie says that getting jobs running via the API is easier to achieve than getting the frontend UI support for jobs resuscitated The API to launch jobs passes JSON which exposes specific files to the algorithms running in the job |
@alanbchristie says that @kaliif is not needed continuously for this ticket @alanbchristie is believes that the job execution via API will be ready by the hackathon |
API-based Job execution has been restored using minor code modifications on a development branch of the backend. Automated tests that create a stack, load suitable target data, and run the These tests now execute automatically, every day. As a basic reminder of the existing Job execution logic, I've produced a couple of block diagrams. One illustrates the file transfer step, and the other the job execution step: - |
One problem we have with the current Job is the content of the generated SD-File, which has the required
These What is the recommended fix for this? |
@alanbchristie what are the input files to fragmenstein that produced an output with the long-codes in the SDF? I believe fragmenstein uses the molecule names from the input SD files. In case we need to map between the short and long observation codes we can use the metadata.csv we serve in the downloads |
The files used in the test (for
So we either need to a) manipulate the outgoing file (but remember this is just one example) or b) add new information on the way out to Squonk or c) do a much more complicated search for the original SiteObservations for the One of the outgoing MOLs: -
|
@alanbchristie in my opinion we should solve this by including a custom script to run before/after fragmenstein to rename the molecules / properties using the |
@alanbchristie and @mwinokan agree that we should rename files sent to jobs so that they most resemble what is observed in the download, i.e. the .mols are named after the observation short code. Then a small change would be needed for Fragmenstein to use the filename as the fragment identifier rather than the _Name property in the MOL file. Further discussion at the hackathon |
Option 1: Unify the long-code spec across XCA & FragalysisIn order to have compatibility between the longcode generated by the loader and the XCA observation identifier the "v" needs to be added to the XCA code: XCA code (now): Then:
See further discussion in #1634 Option 2: store the XCA long identifier as a database fieldStore the XCA code identifier as it is now in the database, e.g. Then:
|
@boriskovar-m2ms needs to fix some last frontend bugs then this can go to staging |
@tdudgeon and @alanbchristie please give us an estimate for the amount of work needed to update the deployments to use the new data model and resuscitate the algorithms
The text was updated successfully, but these errors were encountered: