-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add speech beta samples #1151
Add speech beta samples #1151
Conversation
…rd level confidence
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small things here and there, looks good and thanks for updating the code.
speech/cloud-client/pom.xml
Outdated
@@ -40,7 +40,7 @@ | |||
<dependency> | |||
<groupId>com.google.cloud</groupId> | |||
<artifactId>google-cloud-speech</artifactId> | |||
<version>0.52.0-alpha</version> | |||
<version>0.52.1-alpha-SNAPSHOT</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update lib version when released.
.setSampleRateHertz(8000) | ||
.setEnableSpeakerDiarization(true) | ||
.setDiarizationSpeakerCount(2) | ||
.setEnableAutomaticPunctuation(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove the automatic punctuation for this sample. Unless its needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Especially in a sample, it should only have the minimum needed to get the feature you're demonstrating working.
.setEnableSpeakerDiarization(true) | ||
.setDiarizationSpeakerCount(2) | ||
.setEnableAutomaticPunctuation(true) | ||
.setModel("phone_call") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are models required here? If not, remove.
"Speaker Tag : %s \n", | ||
alternative.getWords((alternative.getWordsCount() - 1)).getSpeakerTag()); | ||
System.out.format( | ||
"Word: %s\n\n", alternative.getWords((alternative.getWordsCount() - 1)).getWord()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lines: 838-841:
Is this printing out the last speaker and their last word?
Is is possible to maybe do something like: (Not sure what the results look like)
Speaker Tag ###: Hey, how are you?
Speaker Tag ***: I'm doing good.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is printing out the last speaker and last word. The words array contains the entire transcript up until that point.
Speaker Tag ###: Hey, how are you? : Definitely makes more sense. Will switch it out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this as a comment? ie the explanation of why you're getting the last word of the alternative instead of, say, all the words or the first word.
.setEnableSpeakerDiarization(true) | ||
.setDiarizationSpeakerCount(2) | ||
.setEnableAutomaticPunctuation(true) | ||
.setModel("phone_call") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, remove setEnableAutomaticPunctuation, and setModel (if possible)
.setSampleRateHertz(44100) | ||
.setAudioChannelCount(2) | ||
.setEnableSeparateRecognitionPerChannel(true) | ||
.setEnableAutomaticPunctuation(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove setEnableAutomaticPunctuation (if possible)
RecognitionAudio recognitionAudio = | ||
RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build(); | ||
|
||
// Configure request to enable enhanced models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Configure request to enable multiple channels
.setSampleRateHertz(44100) | ||
.setAudioChannelCount(2) | ||
.setEnableSeparateRecognitionPerChannel(true) | ||
.setEnableAutomaticPunctuation(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
try (SpeechClient speechClient = SpeechClient.create()) { | ||
RecognitionAudio recognitionAudio = | ||
RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build(); | ||
// Configure request to enable multiple channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update comment
public static void transcribeWordLevelConfidenceGcs(String gcsUri) throws Exception { | ||
try (SpeechClient speechClient = SpeechClient.create()) { | ||
|
||
// Configure request to enable multiple channels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update commment
StreamingRecognitionConfig config = StreamingRecognitionConfig.newBuilder() | ||
.setConfig(recConfig) | ||
.build(); | ||
RecognitionConfig recConfig = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be helpful if you could do these formatting changes in a separate PR, so that there isn't this giant diff of unrelated changes for reviewers to review..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, given my access issues, I'm not sure how it may affect creating a separate PR. Putting everything together in one for now.
.setSampleRateHertz(8000) | ||
.setEnableSpeakerDiarization(true) | ||
.setDiarizationSpeakerCount(2) | ||
.setEnableAutomaticPunctuation(true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 Especially in a sample, it should only have the minimum needed to get the feature you're demonstrating working.
"Speaker Tag : %s \n", | ||
alternative.getWords((alternative.getWordsCount() - 1)).getSpeakerTag()); | ||
System.out.format( | ||
"Word: %s\n\n", alternative.getWords((alternative.getWordsCount() - 1)).getWord()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add this as a comment? ie the explanation of why you're getting the last word of the alternative instead of, say, all the words or the first word.
All changes done. Please let me know if this is good to merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 small things
System.out.format("Transcript : %s\n", alternative.getTranscript()); | ||
// The words array contains the entire transcript up until that point. | ||
//Referencing the last spoken word to get the associated Speaker tag | ||
System.out.format("Speaker Tag %s:%s\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a space between %s:%s --> %s: %s
RecognitionAudio recognitionAudio = | ||
RecognitionAudio.newBuilder().setContent(ByteString.copyFrom(content)).build(); | ||
|
||
// Configure request to enable enhanced models |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Configure request to enable multiple channels
I signed it! |
All changes have been made. |
Checking if this triggers the CLA bot |
@nnegrey - Updated the client library too. Please let me know if this is good to merge |
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) |
Per an email from OSPO, nirupa-kumar has a signed CLA. |
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) |
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) |
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here (e.g. What to do if you already signed the CLAIndividual signers
Corporate signers
|
A Googler has manually verified that the CLAs look good. (Googler, please make sure the reason for overriding the CLA status is clearly documented in these comments.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Speech beta samples: Diarization,Multi-channel, Multi-language and Word level confidence * Update client library * Updates after review * Updates after review : Please let this be the last one :) * Update to released client library * Update to Inc. * Update to Inc. * Update to reference bucket for files
### Migrating samples from [googleapis/java-speech](https://togithub.com/googleapis/java-speech/tree/main/samples) into [java-docs-samples/speech](https://togithub.com/GoogleCloudPlatform/java-docs-samples) --- - samples: Speech GA - library update (#1212) - samples: Due to API backend changes, update the samples to match (#1595) - fix: update retry configs, adds generated samples (#26) - build: move clirr to separate check (#30) - feat: add speaker_tag to WordInfo (#40) - chore: update common templates, regenerate tests - samples: Fix flaky speech test for speaker diarization (#1829) - chore(regen): update license year for generated files (#82) - chore(regen): regenerate with updated year - samples: move generated samples to generated directory (#105) - chore: update common templates - samples: fix: flaky tests in speech (#2286) - samples: speech: move samples out of branch (#2324) - samples: scaffold pom.xml files (#118) - chore(deps): update dependency com.google.cloud:libraries-bom to v4.3.0 (#122) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.13 (#126) - samples: update shared config (#2443) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.14 (#130) - chore(deps): update dependency com.google.cloud:libraries-bom to v4.4.0 (#131) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.15 (#133) - chore(deps): update dependency com.google.cloud:libraries-bom to v4.4.1 (#134) - chore(deps): update dependency com.google.cloud:libraries-bom to v5 (#144) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.16 (#149) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.17 (#153) - chore: fix samples snippets and update name in repo-metadata (#155) - chore(deps): update dependency com.google.cloud:libraries-bom to v5.2.0 (#160) - chore(deps): update dependency com.google.cloud:libraries-bom to v5.3.0 (#167) - chore(deps): update dependency com.google.cloud:libraries-bom to v5.5.0 (#177) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.18 (#200) - chore(deps): update dependency com.google.cloud:libraries-bom to v5.7.0 (#199) - chore(deps): update dependency com.google.cloud:libraries-bom to v6 (#210) - chore(deps): update dependency com.google.cloud:libraries-bom to v7 (#214) - chore(deps): update dependency com.google.cloud:libraries-bom to v7.0.1 (#222) - chore(deps): update dependency com.google.cloud:libraries-bom to v8 (#227) - chore(deps): update dependency com.google.cloud:libraries-bom to v8.1.0 (#237) - samples: Add Speech API quickstart sample. (#497) - samples: Adds sync / async examples for local and remote files - samples: Fixes whitespace around while blocks - samples: Adds some basic javadocs and comments - samples: Infer project from env - samples: Updates to use v1 release. - samples: Fixes checkstyle issues. - samples: Adds streaming example and tests. - samples: Nits found in self-review. - samples: Removes commented out code snippet and adds note on async local file limit. - samples: Speech async examples (#612) - samples: Vision speech upgrade (#641) - samples: updating to latest google-cloud-* dependencies (#723) - samples: Upgrades client and addresses changes to long running operations - samples: Adds support for word time offset - samples: Minimizes cloud maven dependencies and fixes lint warnings - samples: Fixes seconds reported in word time offsets and enables maven checks - samples: Updates to highlight word time offsets (#787) - samples: Use only first alternative. Comments for clarity (#837) - samples: Auto-update dependencies. (#853) - samples: Auto-update dependencies. (#912) - samples: Updated mlengine, monitoring, pubsub, spanner, and speech. (#993) - samples: Speech samples (#1036) - samples: Add model selection to streaming sample (#1073) - samples: Model selection (#1074) - samples: Add Auto-Punctuation samples to speech (#1079) - samples: Add samples for enhanced models and metadata (#1093) - samples: Add speech beta samples (#1151) - samples: [DO_NOT_MERGE] Microphone streaming with a 1 minute duration. (#1185) - samples: Speech region tag update (#1188) - samples: updates word time offsets region tag (#1191) - samples: Speech GA - library update (#1212) - samples: Bump QuickStartSample to v1 (#1285) - samples: Infinite Stream recognition (#1297) - samples: Speech multi-channel GA (#1341) - samples: Data logging opt-in is no longer required for enhanced models (#1360) - samples: Updated Infinite streaming sample (#1422) - samples: Revert Tests, product team rolled back changes, Auto Punctuation behavior is back to the expected output (#1428) - samples: Increase timeout to 5 mins (#1453) - samples: Update Recognize.java (#1460) - samples: Add back missing break statement (#1512) - samples: Added command line option class + option to pass different lang code as argument (#1504) - samples: Update a default value to parameter (#1522) - samples: Add samples for speech diarization ga (auto-punctuation samples alrea… (#1744) - samples: speech: add ga samples and fix some flaky tests (#2049) - samples: update shared config (#2443) - samples: speech: make flaky tests generic (#2825) - samples: fix test dependencies - chore(deps): update dependency com.google.cloud:libraries-bom to v9 (#263) - chore(deps): update dependency com.google.cloud:libraries-bom to v10 (#271) - chore(deps): update dependency com.google.cloud:libraries-bom to v11 - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.21 (#294) - chore(deps): update dependency com.google.cloud:libraries-bom to v12 (#298) - test(deps): update dependency junit:junit to v4.13.1 - chore(deps): update dependency com.google.cloud:libraries-bom to v12.1.0 (#310) - chore(deps): update dependency com.google.cloud:libraries-bom to v13 (#321) - chore(deps): update dependency com.google.cloud:libraries-bom to v13.1.0 (#326) - test(deps): update dependency com.google.truth:truth to v1.1 (#322) - chore(deps): update dependency com.google.cloud:libraries-bom to v13.2.0 (#332) - chore(deps): update dependency com.google.cloud:libraries-bom to v13.3.0 (#334) - chore(deps): update dependency com.google.cloud:libraries-bom to v13.4.0 (#338) - chore(deps): update dependency com.google.cloud:libraries-bom to v14 (#347) - chore(deps): update dependency com.google.cloud:libraries-bom to v15 (#350) - chore(deps): update dependency com.google.cloud:libraries-bom to v15.1.0 (#357) - chore(deps): update dependency com.google.cloud:libraries-bom to v16 (#364) - samples: add recognize sample with profanity filter (#376) - samples: refactor quickstart to use a gcs file (#378) - chore(deps): update dependency com.google.cloud:libraries-bom to v16.2.0 (#389) - samples: add multi region transcribe sample (#394) - chore(deps): update dependency com.google.cloud:libraries-bom to v16.2.1 (#398) - chore(deps): update dependency com.google.cloud:libraries-bom to v16.3.0 (#405) - test(deps): update dependency com.google.truth:truth to v1.1.2 (#407) - chore(deps): update dependency com.google.cloud:libraries-bom to v16.4.0 (#423) - test(deps): update dependency junit:junit to v4.13.2 (#428) - chore(deps): update dependency com.google.cloud:libraries-bom to v17 (#441) - chore(deps): update dependency com.google.cloud:libraries-bom to v18 (#445) - chore(deps): update dependency com.google.cloud:libraries-bom to v18.1.0 (#456) - chore(deps): update dependency com.google.cloud:libraries-bom to v19 (#459) - chore(samples): adds model adaptation sample (#468) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.22 (#482) - chore(deps): update dependency com.google.cloud:libraries-bom to v20 (#486) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.1.0 (#493) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.2.0 (#505) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.3.0 (#514) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.4.0 (#523) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.5.0 (#535) - test(deps): update dependency com.google.truth:truth to v1.1.3 (#537) - chore: change region (#538) - samples: adds export to GCS sample (#544) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.6.0 (#552) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.23 (#551) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.7.0 (#568) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.8.0 (#578) - chore(deps): update dependency com.google.cloud:libraries-bom to v20.9.0 (#589) - chore(deps): update dependency com.google.cloud:libraries-bom to v21 (#625) - chore(deps): update dependency com.google.cloud:libraries-bom to v22 (#650) - chore(deps): update dependency com.google.cloud:libraries-bom to v23 (#663) - chore: migrate to owlbot (#660) - chore(deps): update dependency com.google.cloud:libraries-bom to v23.1.0 (#702) - chore(deps): update dependency com.google.cloud:libraries-bom to v24 (#719) - deps: update dependency commons-cli:commons-cli to v1.5.0 (#720) - sample: Configure polling algorithm in long recognition sample (#464) - chore: cleanup cloud RAD generation (#1269) (#725) - docs(samples): refactors the export-to-gcs sample (#737) - deps: update dependency org.json:json to v20211205 (#745) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.0.24 (#742) - chore(deps): update dependency com.google.cloud.samples:shared-configuration to v1.2.0 (#753) - chore(deps): update dependency com.google.cloud:libraries-bom to v24.1.0 (#758) - chore(deps): update dependency com.google.cloud:libraries-bom to v24.1.1 (#759) - chore(deps): update dependency com.google.cloud:libraries-bom to v24.1.2 (#764) - chore(deps): update dependency com.google.cloud:libraries-bom to v24.2.0 (#775) - chore(deps): update dependency com.google.cloud:libraries-bom to v24.3.0 (#794) - chore(deps): update dependency com.google.cloud:libraries-bom to v24.4.0 (#823) - deps: update dependency org.json:json to v20220320 (#835) - chore(deps): update dependency com.google.cloud:libraries-bom to v25 (#834) - chore(deps): update dependency com.google.cloud:libraries-bom to v25.1.0 (#849) - chore(deps): update dependency com.google.cloud:libraries-bom to v25.2.0 (#876) - chore(deps): update dependency com.google.cloud:libraries-bom to v25.3.0 (#883) - chore(deps): update dependency com.google.cloud:libraries-bom to v25.4.0 (#892) - chore(deps): update dependency com.google.cloud:libraries-bom to v26 (#918) - chore(deps): update dependency com.google.cloud:libraries-bom to v26.1.0 (#938) - chore(deps): update dependency com.google.cloud:libraries-bom to v26.1.1 (#941) - chore(deps): update dependency com.google.cloud:libraries-bom to v26.1.2 (#957) - deps: update dependency org.json:json to v20220924 (#961) - chore(deps): update dependency com.google.cloud:libraries-bom to v26.1.3 (#975) Fixes #issue > It's a good idea to open an issue first for discussion. - [ ] I have followed [Sample Format Guide](https://togithub.com/GoogleCloudPlatform/java-docs-samples/blob/main/SAMPLE_FORMAT.md) - [ ] `pom.xml` parent set to latest `shared-configuration` - [ ] Appropriate changes to README are included in PR - [ ] API's need to be enabled to test (tell us) - [ ] Environment Variables need to be set (ask us to set them) - [ ] **Tests** pass: `mvn clean verify` **required** - [ ] **Lint** passes: `mvn -P lint checkstyle:check` **required** - [ ] **Static Analysis**: `mvn -P lint clean compile pmd:cpd-check spotbugs:check` **advisory only** - [ ] Please **merge** this PR for me once it is approved.
Diarization,Multi-channel, Multi-language and Word level confidence
@nnegrey Please review