Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The record in basic datasets needs more flexibility #1594

Closed
WHALEEYE opened this issue Apr 22, 2022 · 4 comments
Closed

The record in basic datasets needs more flexibility #1594

WHALEEYE opened this issue Apr 22, 2022 · 4 comments
Labels
enhancement New feature or request

Comments

@WHALEEYE
Copy link
Contributor

WHALEEYE commented Apr 22, 2022

Description

I'm not a user but a developer who had recently contributed to DJL. During my implementation on the dataset, I found that the design of Record can't support all kinds of records in dataset. For example, as I mentioned in #1554, the best implementation of SQuAD is to return a record with:

  • data: a NDList containing three NDArrays
  • label: two groups of NDArray

If I don't get it wrong, the implementation of the label is hard because a NDList can only contain NDArray and NDArrays inside it can not be grouped in a simple way, and I think maybe there needs more flexibility.

I have came up several ideas:

  1. Change NDList to allow it to contain NDLists.

    This is not so possible since all the method inside NDList works for NDArray, and this may cause a redesign of NDList to make it more complicated.

  2. Do not change the current structure but allow NDList to group the NDArray inside, just like naming the NDArrays.

    This is more realistic since it only adds a logical group to the NDArrays, so that it won't change the current structure.

Will this change the current api? How?

For the second idea, No.

Who will benefit from this enhancement?

The developers trying to add datasets with complicated (one-to-many mapping) records.

@WHALEEYE WHALEEYE added the enhancement New feature or request label Apr 22, 2022
@zachgk
Copy link
Contributor

zachgk commented May 24, 2022

I'm not entirely sure what you are thinking for idea 2 @WHALEEYE. Can you share some pseudo-code for maybe creating or accessing a group?

@WHALEEYE
Copy link
Contributor Author

We know that there is a method getName() in NDArray that can return the name of the array, and in NDList there is a method get(String name) that returns the array of the specified name.
So I'm just thinking that maybe each array can have another attribute called gourpName and we can add a new method called getGroup(String groupName) in NDList, which will return a List (or NDList) containing all the NDArrays in the specified group.

@frankfliu
Copy link
Contributor

@WHALEEYE
We intentional make the NDArray name flat for simplicity.

To support complex NDArray structure (e.g. IValue in PyTorch), you can following the naming convention:

  1. "group.name1", "group.name2" ... -> Map of IValue
  2. "name[]", "name[]" ... -> List of IValue
  3. "name()", "name()" ... -> Tuple of IValue

See: https://github.com/deepjavalibrary/djl/blob/master/engines/pytorch/pytorch-engine/src/main/java/ai/djl/pytorch/jni/IValueUtils.java#L76

@WHALEEYE
Copy link
Contributor Author

I see, this makes sense. Thank you for answering!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants