Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recommended way of loading Arrow data #929

Closed
nevi-me opened this issue Feb 18, 2020 · 3 comments
Closed

Recommended way of loading Arrow data #929

nevi-me opened this issue Feb 18, 2020 · 3 comments
Labels
JS question Questions about use, potential features, or improvements

Comments

@nevi-me
Copy link

nevi-me commented Feb 18, 2020

Support Question

I have an endpoint that receives a stream of Arrow data (coming from Rust). The data is streamed via gRPC as record batches, which I then join when the stream completes, and generate an Arrow table with JavaScript.

I'm comfortable that Rust generates the correct data, as I can read it correctly in JS, but I'm struggling with figuring out how to create a table out of this data.

I have the below snippet;

import * as Arrow from "@apache-arrow/esnext-umd/Arrow";
import * as viewer from "@finos/perspective-viewer";

// ... getting a stream of bytes and joining them
const data: Uint8Array[] = streamData; // already collected

// create JS table
let table = Arrow.Table.from(data); // this works as I can read the schema and the data

// get the viewer
let pV: viewer.HTMLPerspectiveViewerElement = components.perspectiveViewer.nativeElement;

// option 1: loading arrow data from the table
pV.load(table);

// option 2: loading the bytes
pV.load(data);

Option 1

This fails with the below, I presume because I'm actually passing a JavaScript object, which might be unsupported:

perspective.wasm.worker.js:7 Uncaught Could not determine data format for {"_nullCount":-1,"_type":{"children":[ ...

Option 2

I would have expected this to work as I'm passing raw data which would be compatible with cpp/wasm. I however get OOMs even if I'm passing a very small table.


Is there a way that I can pass this data without converting it to CSV or the like? I can't stream a table through via websockets.


Similar issue(s)

This is similar to #601, but after looking at what RandomFractals was doing for the vscode-preview, I couldn't find a workaround.

Thanks

@timkpaine timkpaine added JS question Questions about use, potential features, or improvements labels Feb 18, 2020
@nevi-me
Copy link
Author

nevi-me commented Feb 18, 2020

I found a solution after @sc1f pointed out that I need to pass my data as an ArrayBuffer.

My solution looks like:

let data: Uint8Array[] = [all, my, data, as, arrays, of, record, batches];
let table = Arrow.Table.from(data); // arrow seems to be flexible in what it takes

// count the length of the chunked data
let length = 0;
data.forEach(d => length += d.length);
// create a new array (couldn't figure out how to combine the other arrays without creating a new one
let buffer = new Uint8Array(length);
let offset = 0;
data.forEach(d => {
  // append the data
  buffer.set(d, offset);
  offset = offset + d.length;
});

// finally load the data from an array buffer
pV.load(<any>buffer.buffer);

Interestingly, I had to cast my buffer to <any> because the load() function expects a string or object[]. I think if the typings had an ArrayBuffer I could have self-discovered what I needed to do.

@nevi-me nevi-me closed this as completed Feb 18, 2020
@timkpaine
Copy link
Member

@nevi-me since the project is not in typescript, the typings arent always to be trusted :-)

@sc1f
Copy link
Contributor

sc1f commented Feb 18, 2020

@nevi-me you should also be able to call update on the viewer instance after you've loaded the first batch:

let data: Uint8Array[] = [all, my, data, as, arrays, of, record, batches];
let loaded: boolean = false;

data.forEach(d => {
  if (loaded) {
    pV.update(<any>d.buffer);
  } else {
    pV.load(<any>d.buffer);
    loaded = true;
  }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
JS question Questions about use, potential features, or improvements
Projects
None yet
Development

No branches or pull requests

3 participants