Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

epic: Reviews of Ichigo paper #99

Open
tikikun opened this issue Oct 24, 2024 · 9 comments
Open

epic: Reviews of Ichigo paper #99

tikikun opened this issue Oct 24, 2024 · 9 comments
Assignees
Labels
type: epic A major feature or initiative

Comments

@tikikun
Copy link
Collaborator

tikikun commented Oct 24, 2024

review it!

@tikikun tikikun added the type: epic A major feature or initiative label Oct 24, 2024
@tikikun
Copy link
Collaborator Author

tikikun commented Oct 24, 2024

Screenshot 2024-10-24 at 22 17 11 Screenshot 2024-10-24 at 22 17 21

It's important to cite SpiritLM from META to make our paper more robust, but we really have similiar results.

@tikikun
Copy link
Collaborator Author

tikikun commented Oct 24, 2024

Screenshot 2024-10-24 at 22 22 22

We need similar analysis but with our own way of alignment using transcription prompting, probably need to refer to qwen audio technique also

@tikikun
Copy link
Collaborator Author

tikikun commented Oct 24, 2024

@hahuyhoang411 Also remember to add the visualization for non speech pattern

@tikikun
Copy link
Collaborator Author

tikikun commented Oct 24, 2024

intent classification also need to pay attention

@tikikun
Copy link
Collaborator Author

tikikun commented Nov 21, 2024

@hahuyhoang411 should review

@hahuyhoang411
Copy link
Contributor

Was discussing with @tuanlda78202 he pointed out a few missing key numbers . Will add in

@tuanlda78202 tuanlda78202 self-assigned this Nov 22, 2024
@tuanlda78202
Copy link
Contributor

Review of Ichigo Paper

  1. Figure Inference Pipeline: The main figure should mention its inference pipeline, current figure may lead to confusion with the training pipeline. It is recommended to have separate figures for each stage, clearly illustrating the Whisper Encoder (continuous embedding) and VQ (codebooks) separately. This could be labeled as Quantizer Whisper Encoder.

  2. Pre-training Dataset: The paper lacks clarity regarding the type of data used for pre-training (only semantic tokens). It is unclear why only ASR data is utilized (theoretically, any sound data could be used). Additionally, the processing and saving of the pre-training data (stored on disk) and the number of tokens used for this stage are not specified. It would be beneficial to provide a clearer explanation, specifically using raw sound data, generating semantic tokens using Quantizer Whisper Encoder, and saving them on disk for efficient training of all stages.

  3. Post-training: The post-training stage should be clearly defined after the header, with specific statements like T2T, S2T (speech-to-text) and ST2T (transcription). This will prevent confusion among readers.

  4. Reader Questions: Readers may question why our paper does not include fine-tuning on the S2S task. They may assume that since we have a semantic token encoder, we only need a decoder to generate sound.

  5. Transcribe Task: The transcribe task should be explicitly defined to enhance understanding. It should be noted as S+T -> T, with input consisting of Instruction (text tokens) and Audio (semantic tokens) and output as Text tokens.

  6. Noise Part: Readers may be confused about the pipeline and the number of semantic tokens produced. It is unclear why 512 semantic tokens result in 513^n. Additionally, the generation of refuse answers from LLMs is not explicitly mentioned. It would be beneficial to clarify this aspect as a figure pipeline: random noise -> Ichigo -> Text (generated refuse answers from LLMs).

cc @tikikun @hahuyhoang411

@hiento09 hiento09 added this to Menlo Nov 22, 2024
@github-project-automation github-project-automation bot moved this to Investigating in Menlo Nov 22, 2024
@tikikun
Copy link
Collaborator Author

tikikun commented Nov 25, 2024

#132

@hahuyhoang411

@bachvudinh bachvudinh added this to the Ichigo v0.3 milestone Nov 25, 2024
@dan-menlo dan-menlo modified the milestones: Ichigo v0.5, Ichigo v0.4 Nov 27, 2024
@hahuyhoang411
Copy link
Contributor

On going review

@hahuyhoang411 hahuyhoang411 removed this from the Ichigo v0.4 milestone Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: epic A major feature or initiative
Projects
Status: Investigating
Development

No branches or pull requests

5 participants