Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H5wasm provider #1031

Closed
wants to merge 21 commits into from
Closed

Conversation

bmaranville
Copy link
Contributor

This creates a new provider that reads the HDF5 file directly in the browser, using hdf5 libraries compiled to WebAssembly (h5wasm)

Currently it is set up to receive a URL for an HDF5 file that is available from a web server;

It is also straightforward to add another option for the user to directly "upload" a file from their computer (it only gets loaded into the browser context, not actually uploaded to a server), or to enable "drag-and-drop" so that users could drag a file from their filesystem onto the page and have it upload and process/view the file.

It is not in a finished state, and there is still some cleanup to do. I thought it might be interesting to the developers, though.

Copy link
Contributor

@axelboc axelboc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, thanks for this! Support for reading raw HDF5 files in the browser has been a long-awaited feature.

The main thing I'd like to discuss first is the scope of the provider. Here, you've implemented something that is tied to your back-end: you have an endpoint that serves raw HDF5 files and another to download specific datasets as CSV, NPY and TIFF.

The way I see it, H5WasmProvider should be more generic than this and not tied to a back-end solution. Maybe it should just accept an array buffer and that's it? The provider for your back-end could then use H5WasmProvider internally and focus on fetching the raw HDF5 file as an array buffer. This solution would also allow us to implement a demo for opening local files without having to integrate any of the file-picking logic in the provider itself (since this is an application-level problem).

When it comes to exporting datasets as CSV/NPY/TIFF, the way I see it, H5WasmProvider could accept an optional getExportURL prop.

Is your back-end open source and can we install it on our demo server? If so, then no problem with adding a specific provider for it in H5Web (assuming the wasm part is abstracted as explained above). Also, does it have a more specific name that we could use to name the provider?

Comment on lines +9 to +16
esbuild: {
target: 'es2020',
sourcemap: true,
},
build: {
target: 'es2020',
sourcemap: true,
},
Copy link
Contributor

@axelboc axelboc Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this relates to h5wasm requiring BigInt support? A few thoughts/questions:

  • It may be worth adding a comment that mentions this, with a link to https://esbuild.github.io/content-types/#javascript.
  • What's the reasoning for the esbuild config object? It seems to be working fine without it. 🤷
  • Good call adding sourcemap: true 👍

This config is probably needed in @h5web/app as well, isn't it? That being said, I'm a bit worried about what this all means with regard to supporting older browsers. Perhaps a better approach would be to make h5wasm an optional peer dependency in @h5web/app so that it is up to the consumer application (e.g. the demo) to install it and configure the build to support BigInt. We could also consider importing H5WasmProvider dynamically with React.lazy so it ends up in a separate chunk, and try to configure Vite's build.target only for this chunk. Not sure it's possible, but it would allow the demos that use the other providers to remain compatible with older browsers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of this was me getting familiar with the build environment you are using. I was under the impression one of those sections was used for the preview mode, and the other during build of production assets.

You are correct that this is related to BigInt support. BigInt is being used for pointers being passed to the WASM code. These could be wrapped into plain numbers since currently wasm in browsers only supports 32-bit addresses, and 32-bit ints fit into a plain javascript number, though that would require a bit of work in the h5wasm code. BigInt is pretty well supported across browsers and platforms at this time.

Lazy loading sounds nice - I am new to React and Vite so I wouldn't know how to set that up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can take a look at that, no worries ;)

<>
{JSON.stringify(value, (key, value) =>
typeof value === 'bigint' ? Number(value) : value
)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a big int somewhere in the test file you could point me to for testing? It may be worth adding bit int support in a separate PR, as this probably has an impact in other places (AttributeInfo, ScalarVis, etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is! I was surprised. For some reason we are writing the 'signal' attribute of 'entry/data/areaDetector' as a 64-bit integer, which is translated by h5wasm into a BigInt (arrays are transformed to a BigInt64Array)

Why we're doing that, I don't know. But that is a separate question.

@@ -22,6 +22,7 @@
},
"dependencies": {
"@h5web/app": "workspace:*",
"h5wasm": "0.3.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My other comment may change this, but if h5wasm remains a direct dependency of @h5web/app, I don't think you need to install it here.

@bmaranville
Copy link
Contributor Author

bmaranville commented Mar 29, 2022

Hey, thanks for this! Support for reading raw HDF5 files in the browser has been a long-awaited feature.

The main thing I'd like to discuss first is the scope of the provider. Here, you've implemented something that is tied to your back-end: you have an endpoint that serves raw HDF5 files and another to download specific datasets as CSV, NPY and TIFF.

The way I see it, H5WasmProvider should be more generic than this and not tied to a back-end solution. Maybe it should just accept an array buffer and that's it? The provider for your back-end could then use H5WasmProvider internally and focus on fetching the raw HDF5 file as an array buffer. This solution would also allow us to implement a demo for opening local files without having to integrate any of the file-picking logic in the provider itself (since this is an application-level problem).

When it comes to exporting datasets as CSV/NPY/TIFF, the way I see it, H5WasmProvider could accept an optional getExportURL prop.

Is your back-end open source and can we install it on our demo server? If so, then no problem with adding a specific provider for it in H5Web (assuming the wasm part is abstracted as explained above). Also, does it have a more specific name that we could use to name the provider?

It is probably not clear from the way it is written, but the "backend" is just an http server. Any hdf5 file that is available on the web can be loaded with this provider, as is. I agree about having another route to load ArrayBuffer objects directly, in order to support the local-file-upload workflow.

The "export" functions were copied from the h5grove provider and do not actually represent any implemented functionality on our server. I have separate logic for creating downloads in the browser, but I have not added it yet.

@bmaranville
Copy link
Contributor Author

The React component hierarchy used is complex enough that I can't quickly figure out how to add something like an upload button to the demo. If someone could help with that, I could quickly wire it up so that it loads the uploaded file into an arraybuffer and displays it.

Copy link
Member

@loichuder loichuder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey 👋 ! Thank you so much for starting this, this will be very helpful 🙂 !

Following the discussion with @axelboc, I don't expect the "file-upload-into-buffer" provider to be part of this PR. We can work on this together on a later time.

This being said, it would be good to be able to distinguish the provider in this PR from the future one. This is why I suggested at several places to add Http in front of h5wasm to convey the fact that the file is served in Http.

I also think that api could be reworked a bit to remove typecasts (and be closer to the actual API from h5wasm in fact). If you wish, I can help with this part.

apps/demo/.env Outdated
@@ -7,6 +7,9 @@
VITE_H5GROVE_URL=
VITE_H5GROVE_FALLBACK_FILEPATH="water_224.h5"

# h5wasm backend parameters
VITE_H5WASM_FALLBACK_FILEPATH="https://ncnr.nist.gov/pub/ncnrdata/ngbsans/202009/nonims294/data/sans114141.nxs.ngb"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
VITE_H5WASM_FALLBACK_FILEPATH="https://ncnr.nist.gov/pub/ncnrdata/ngbsans/202009/nonims294/data/sans114141.nxs.ngb"
VITE_HTTPH5WASM_FALLBACK_FILEPATH="https://ncnr.nist.gov/pub/ncnrdata/ngbsans/202009/nonims294/data/sans114141.nxs.ngb"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - but it is true that the file-upload-into-buffer provider will be identical to this one after the initial file fetch.

@@ -6,6 +6,7 @@ import {
} from 'react-router-dom';

import H5GroveApp from './H5GroveApp';
import H5WasmApp from './H5WasmApp';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import H5WasmApp from './H5WasmApp';
import HttpH5WasmApp from './HttpH5WasmApp';

@@ -16,6 +17,7 @@ function DemoApp() {
<Route path="/" element={<H5GroveApp />} />
<Route path="mock" element={<MockApp />} />
<Route path="hsds" element={<HsdsApp />} />
<Route path="h5wasm" element={<H5WasmApp />} />
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<Route path="h5wasm" element={<H5WasmApp />} />
<Route path="httph5wasm" element={<HttpH5WasmApp />} />

@@ -3,6 +3,7 @@ export { default as App } from './App';
export { default as MockProvider } from './providers/mock/MockProvider';
export { default as HsdsProvider } from './providers/hsds/HsdsProvider';
export { default as H5GroveProvider } from './providers/h5grove/H5GroveProvider';
export { default as H5WasmProvider } from './providers/h5wasm/H5WasmProvider';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
export { default as H5WasmProvider } from './providers/h5wasm/H5WasmProvider';
export { default as HttpH5WasmProvider } from './providers/h5wasm/HttpH5WasmProvider';

children: ReactNode;
}

function H5WasmProvider(props: Props) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
function H5WasmProvider(props: Props) {
function HttpH5WasmProvider(props: Props) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would even go for just HttpProvider -- as in a provider for a static HTTP server. Same everywhere else. The fact that it uses h5wasm internally (or in a future PR, the lower-level H5WasmProvider) is an implementation detail.

}

return `${baseURL as string}/data/?${searchParams.toString()}`;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the HTTP server does not provide an endpoint to serve other formats, you can remove this function. It is fine for me to not deal with other export formats now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, agreed. If the server provides an endpoint, then it is no longer a simple http server, so serving raw HDF5 files instead of using H5Grove or the like makes a bit less sense.

): Promise<H5WasmDataResponse> {
const { dataset } = params;
const file = await this.file_promise;
const dataset_obj = file.get(dataset.path) as h5wasm_Dataset;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could remove such type casts by using the same scheme as in other providers.

This means having fetch... functions of the API that are very low-level function that only fetch the relevant object (data, entity, attr) and return a Promise<...Response> with the interface ...Response describing the response from the endpoint.

For example. in this case, we could imagine having

async function fetchDataset(path: string): Promise<H5WasmDataset> {
    const file = await this.file_promise;
    return file.get(dataset.path)
}

instead of using directly file.get.

The common get... functions would then make use of process... functions to convert H5Wasm objects into common Entity/Value/... objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - that sounds fine. I removed the process... functions because most of them are trivial in this case, but keeping a structure that is similar to other providers is important.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if the process... is trivial and can be removed, that is even better. , Same for ...Response types if you can use directly h5wasm classes.

To me, the important part is that we make clear in the api what is the format/shape of the objects returned by h5wasm. In other words, have the return type of fetch... functions match the response. Then, the TypeScript processing will flow naturally.

| EntityKind.Group
| 'external_link'
| 'soft_link'
| 'other';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the dumb question but does h5wasm support links ? Can you get a link object though file.get ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment soft links are directly resolved and not returned as "links" from h5wasm. External links don't mean much in the context of a URL-based file, since URLs are not supported by HDF5 External Links, and they are not handled by h5wasm.
If it would help with compatibility I could certainly add special handling for these - though I don't see the benefit of identifying soft links instead of just resolving them.

Copy link
Contributor

@axelboc axelboc Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In H5Grove, soft links are also resolved by default, so the front-end doesn't know they are soft links. We return a soft_link entity only when H5Grove encounters a broken soft link that it cannot resolve. Do you know how h5wasm behaves in such case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it just breaks! I can add some handling for that case.

Copy link
Contributor

@axelboc axelboc Mar 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No rush, it can break for now, and we can keep the soft links out of the code until it no longer does, no worries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file.get(broken_soft_link) currently returns "undefined" from h5wasm.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just updated h5wasm so file.get(path) returns an object

{ type: "BrokenSoftLink", target: "/not/an/existing/object" }

for dangling soft links, and this for external links:

{ type: "ExternalLink", filename: "external_filename", obj_path: "/path/in/external/file" }

(it is not yet published to npm)

dtype: string;
shape: number[];
attributes: H5WasmAttribute[];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understood from api, the H5WasmDatasetReponse also includes the value ? Same for H5WasmAttribute ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - this is part of the cleanup I didn't do before posting. The H5WasmDatasetResponse is not actually used in the api. I realized that I didn't need any of the ...Response classes, since the Entity could be directly constructed in getEntity. (and there is no external service to query for a response). You could construct "responses" from the h5wasm api calls and then process them, but that seemed to add nothing to the logic since it is pretty simple.

response: H5WasmEntityResponse
): response is H5WasmGroupResponse {
return response.type === EntityKind.Group;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear this will not work as EntityKind.Group is group (lowercase) while the type fields from h5wasm return capitalized strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to change the constants in h5wasm to be lower case, though I'm not sure why e.g. EntityKind.Group is not "Group" since they refer to class or interface names.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be traced back to HSDS. We also use lower-case in H5Grove. At this point it's just a convention, I guess. No need to change it in h5wasm, though -- a simple response.type.toLowerCase() here would suffice.

@bmaranville
Copy link
Contributor Author

The demo is working with dangling soft links and external links... I just tried it with a test file I'm hosting on localhost.

The single input property of H5WasmProvider is altered to be either a File object (result of choosing in HTML file input element) or a URL representing a
remote file to be loaded over http(s).  Some cleanup is required if the H5WasmProvider is re-rendered, and this is done in the useEffect callback.
@bmaranville
Copy link
Contributor Author

bmaranville commented Apr 6, 2022

I made some further refinements and implemented the slicing (selection) functionality. See operating demo at
https://ncnr.nist.gov/ncnrdata/view/panosc-hdf-viewer.html?pathlist=ncnrdata/ngbsans/201708/nonims152/data
(clicking on a file in the left tree loads it into the viewer in the right panel using the public URL for the file and an HTTPS fetch)

Note that the "upload File" button at the bottom of the demo is also functional, allowing users to upload their own HDF5 file for viewing.

@axelboc
Copy link
Contributor

axelboc commented Apr 7, 2022

Thanks, and cool demo! 🤩

I'm not sure about the class-based component, but we can take care of that. Loïc has a PR in progress on top of your branch so we'll try to finish it and merge it first and then reassess.

@axelboc
Copy link
Contributor

axelboc commented Apr 7, 2022

Hey, just noticed a small CSS issue your demo: you seem to have a stylesheet called library.css, but the styles inside it are in the wrong order. The global styles (normalize.css, react-reflex, btnClean and the like, etc.) are all at the bottom of the file, after the app and lib styles, when they should come first. The exact order in the demo is as follows: normalize.css, react-reflex, @h5web/shared (btnClean, etc.), @h5web/lib (toolbar, visualizations), @h5web/app (custom CSS properties, axis mapper, breadcrumbs, vis manager, etc.)

This leads to some subtle bugs, like the bottom border of the visualization tabs to disappear:

image

@bmaranville
Copy link
Contributor Author

bmaranville commented Apr 7, 2022

@axelboc I agree that there is a CSS issue. I am building as a library and I think this is the reason for the problem - the bundler is handling the CSS files differently in this case, for some reason. I couldn't figure out how to get it to output the same CSS as when bundled as an app. (it made no difference if cssCodeSplit was enabled or not - that was an attempt to address the issue)

from vite.config.js:

build: {
    target: 'es2020',
    sourcemap: true,
    cssCodeSplit: true,
    lib: {
      entry: path.resolve(__dirname, 'src/library.tsx'),
      name: 'NCNRBrowser',
      fileName: (format) => `NCNRBrowser.${format}.js`,
    },
  },
...

@bmaranville
Copy link
Contributor Author

Thanks, and cool demo! star_struck

I'm not sure about the class-based component, but we can take care of that. Loïc has a PR in progress on top of your branch so we'll try to finish it and merge it first and then reassess.

Please note that there is non-trivial state to be managed in the component, and the handling of that state is different in the most recent class-based component compared to the original code in the PR. When re-rendering with a new source, the old H5WasmFile object has to be "closed" in order to free memory, and the old backing file in the emscripten virtual filesystem has to be unlinked before a new file is created from the source.

If the component is only ever rendered once you don't need to manage this state, but if you want to reuse the component and change the source, these steps are required.

@axelboc
Copy link
Contributor

axelboc commented Apr 7, 2022

I'm thinking we could add a close method to the API class, and then do something like this in the provider:

const api = useMemo(
  () => new H5WasmApi(url, filepath, axiosParams),
  [filepath, url, axiosParams]
);

useEffect(() ==> {
  return () => {
    api.close();
  }
}, [api]);

return <Provider api={api}>{children}</Provider>

@axelboc
Copy link
Contributor

axelboc commented Apr 7, 2022

I am building as a library and I think this is the reason for the problem - the bundler is handling the CSS files differently in this case, for some reason.

How do you import the styles in your app? From the JS code or from CSS? I find that CSS imports are often more predictable.

@bmaranville
Copy link
Contributor Author

The app is modeled directly on the demo app. I imported the styles exactly the same way: styles.css is a copy of the one in apps/demo/src, and it is imported at the top of library.tsx as

import './styles.css';

@bmaranville
Copy link
Contributor Author

bmaranville commented Apr 7, 2022

The real problem with adding state-management with useState, useEffect and useMemo to a functional component (I did go down that road) is the error handling for the async part (I should have mentioned this in my earlier comment). When fetching the source asynchronously, you have to catch the errors and rethrow them in the synchronous rendering pipeline. Otherwise you cannot use ErrorBoundary and the app crashes at the first uncaught error. I'm sure this could be done in a functional component but it requires multiple useEffect and useState calls and it was starting to be hard to read or understand just by looking at it.

@axelboc
Copy link
Contributor

axelboc commented Apr 7, 2022

With the code I posted above, the asynchronicity would occur in the async methods of h5wasm-api.ts (like you had done initially with this.filePromise), and since those async methods are called during the rendering pipeline (via react-suspense-fetch), any errors would be correctly handled by the error boundaries.

Only the errors thrown in the close method I suggest would remain problematic, as the method would be called from the clean-up function of a useEffect. That being said, I don't think these errors need to be surfaced to the user, so we could just log them silently.

Let's keep this PR in its current state for now. Loïc and I will start integrating the new providers bit by bit in separate branches/PRs, taking into account everything we've discussed. We'll keep you updated.


Arf, Vite in library mode does deal with CSS in strange ways 😞.

We've had to work around it in our packages by adding a separate build config and entry point specifically for CSS. For instance in @h5web/app: https://github.com/silx-kit/h5web/blob/main/packages/app/vite.styles.config.js and https://github.com/silx-kit/h5web/blob/main/packages/app/src/styles.ts. We then have to concatenate the app and lib styles in an NPM script: https://github.com/silx-kit/h5web/blob/main/packages/app/package.json#L33. It's pretty hacky, but at least it's predictable. Worse comes to worst, if a single JS entrypoint still doesn't work, you could have one JS entry point per CSS import 😂

@bmaranville
Copy link
Contributor Author

Yes, I saw the Suspense wrappers in the Explorer logic, and that makes sense.

For the CSS I will look over what you did with vite.styles.config.js and try to replicate. Thanks for the pointer!

@loichuder
Copy link
Member

Closing this as a basic version of the provider was implemented in #1080

Let's track the improvements to follow in #1086

@loichuder loichuder closed this Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants