Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New design of the transformer API to support causal and masked pre-training approach #1008

Closed
wants to merge 7 commits into from

Conversation

sararb
Copy link
Contributor

@sararb sararb commented Feb 28, 2023

This is a placeholder to support the Transformer-API for the GTC tutorial 2023. This branch is rebased with release-23.02.

For the latest work intended to be merged with the main branch, please refer to #1022

@sararb sararb added bug Something isn't working enhancement New feature or request area/api size/L P0 breaking Breaking change area/session-based labels Feb 28, 2023
@sararb sararb added this to the Merlin 23.02 milestone Feb 28, 2023
@sararb sararb self-assigned this Feb 28, 2023
Copy link
Member

@gabrielspmoreira gabrielspmoreira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sara. Does our new Transformer API already support dense tensors for inputs and targets instead of RaggedTensor?
The dataloader provides dense tensors for sequential features in some cases (as summarized in this ADR):

  • In the current dataloader API, if value_count.max is not None and is_ragged == False
  • In the future dataloader API, if is_ragged == False

# losses does not support RaggedVariantTensor on GPU:
prediction = prediction.flat_values
if isinstance(target, tf.RaggedTensor):
target = target.flat_values
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you are flattening the values here to 1D, is there a way to reshape the losses output back to be a RaggedTensor? Otherwise the 1D loss will not match the sample weights, that can be either 1D or 2D (ragged).

@rnyak rnyak modified the milestones: Merlin 23.02, Merlin 23.03 Mar 1, 2023
@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sararb sararb force-pushed the tf/transformer-api branch from a02d8d1 to 20a40d7 Compare March 1, 2023 22:17
@sararb
Copy link
Contributor Author

sararb commented Mar 1, 2023

Sara. Does our new Transformer API already support dense tensors for inputs and targets instead of RaggedTensor? The dataloader provides dense tensors for sequential features in some cases (as summarized in this ADR):

  • In the current dataloader API, if value_count.max is not None and is_ragged == False
  • In the future dataloader API, if is_ragged == False

Thank you for checking out the PR! This PR only addresses how to support the different masking approaches in the TransformerBlock, but we still need to work on extending the SequenceTransforms to support dense tensors as inputs (as mentioned in this ADR).

@sararb sararb force-pushed the tf/transformer-api branch from 5b5ef80 to a92bdc2 Compare March 7, 2023 23:00
@sararb
Copy link
Contributor Author

sararb commented Mar 23, 2023

closing as this was a placeholder for the tutorial image.

@sararb sararb closed this Mar 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api area/session-based breaking Breaking change bug Something isn't working enhancement New feature or request P0 size/L
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants