- 4.73.0 (latest)
- 4.71.0
- 4.70.0
- 4.69.0
- 4.68.0
- 4.67.0
- 4.65.0
- 4.63.0
- 4.62.0
- 4.59.0
- 4.58.0
- 4.57.0
- 4.55.0
- 4.54.0
- 4.53.0
- 4.52.0
- 4.51.0
- 4.50.0
- 4.49.0
- 4.48.0
- 4.47.0
- 4.46.0
- 4.44.0
- 4.43.0
- 4.42.0
- 4.41.0
- 4.40.0
- 4.39.0
- 4.38.0
- 4.37.0
- 4.36.0
- 4.35.0
- 4.34.0
- 4.32.0
- 4.31.0
- 4.30.0
- 4.29.0
- 4.28.0
- 4.27.0
- 4.26.0
- 4.25.0
- 4.24.0
- 4.23.0
- 4.22.0
- 4.19.0
- 4.18.0
- 4.17.0
- 4.16.0
- 4.15.0
- 4.14.0
- 4.13.0
- 4.12.0
- 4.11.0
- 4.10.0
- 4.9.0
- 4.8.0
- 4.7.0
- 4.6.0
- 4.4.0
- 4.3.0
- 4.2.0
- 4.1.0
- 4.0.0
- 3.0.0
- 2.6.1
- 2.5.9
- 2.4.0
- 2.3.0
- 2.2.15
A client to Cloud Speech-to-Text API
The interfaces provided are listed below, along with usage samples.
SpeechClient
Service Description: Service that implements Google Cloud Speech API.
Sample for SpeechClient:
 try (SpeechClient speechClient = SpeechClient.create()) {
   RecognitionConfig config = RecognitionConfig.newBuilder().build();
   RecognitionAudio audio = RecognitionAudio.newBuilder().build();
   RecognizeResponse response = speechClient.recognize(config, audio);
 }
 Classes
CustomClass
A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.
 Protobuf type google.cloud.speech.v1.CustomClass
CustomClass.Builder
A set of words or phrases that represents a common concept likely to appear in your audio, for example a list of passenger ship names. CustomClass items can be substituted into placeholders that you set in PhraseSet phrases.
 Protobuf type google.cloud.speech.v1.CustomClass
CustomClass.ClassItem
An item of the class.
 Protobuf type google.cloud.speech.v1.CustomClass.ClassItem
CustomClass.ClassItem.Builder
An item of the class.
 Protobuf type google.cloud.speech.v1.CustomClass.ClassItem
LongRunningRecognizeMetadata
 Describes the progress of a long-running LongRunningRecognize call. It is
 included in the metadata field of the Operation returned by the
 GetOperation call of the google::longrunning::Operations service.
 Protobuf type google.cloud.speech.v1.LongRunningRecognizeMetadata
LongRunningRecognizeMetadata.Builder
 Describes the progress of a long-running LongRunningRecognize call. It is
 included in the metadata field of the Operation returned by the
 GetOperation call of the google::longrunning::Operations service.
 Protobuf type google.cloud.speech.v1.LongRunningRecognizeMetadata
LongRunningRecognizeRequest
 The top-level message sent by the client for the LongRunningRecognize
 method.
 Protobuf type google.cloud.speech.v1.LongRunningRecognizeRequest
LongRunningRecognizeRequest.Builder
 The top-level message sent by the client for the LongRunningRecognize
 method.
 Protobuf type google.cloud.speech.v1.LongRunningRecognizeRequest
LongRunningRecognizeResponse
 The only message returned to the client by the LongRunningRecognize method.
 It contains the result as zero or more sequential SpeechRecognitionResult
 messages. It is included in the result.response field of the Operation
 returned by the GetOperation call of the google::longrunning::Operations
 service.
 Protobuf type google.cloud.speech.v1.LongRunningRecognizeResponse
LongRunningRecognizeResponse.Builder
 The only message returned to the client by the LongRunningRecognize method.
 It contains the result as zero or more sequential SpeechRecognitionResult
 messages. It is included in the result.response field of the Operation
 returned by the GetOperation call of the google::longrunning::Operations
 service.
 Protobuf type google.cloud.speech.v1.LongRunningRecognizeResponse
PhraseSet
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
 Protobuf type google.cloud.speech.v1.PhraseSet
PhraseSet.Builder
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
 Protobuf type google.cloud.speech.v1.PhraseSet
PhraseSet.Phrase
 A phrases containing words and phrase "hints" so that
 the speech recognition is more likely to recognize them. This can be used
 to improve the accuracy for specific words and phrases, for example, if
 specific commands are typically spoken by the user. This can also be used
 to add additional words to the vocabulary of the recognizer. See
 usage limits.
 List items can also include pre-built or custom classes containing groups
 of words that represent common concepts that occur in natural language. For
 example, rather than providing a phrase hint for every month of the
 year (e.g. "i was born in january", "i was born in febuary", ...), use the
 pre-built $MONTH class improves the likelihood of correctly transcribing
 audio that includes months (e.g. "i was born in $month").
 To refer to pre-built classes, use the class' symbol prepended with $
 e.g. $MONTH. To refer to custom classes that were defined inline in the
 request, set the class's custom_class_id to a string unique to all class
 resources and inline classes. Then use the class' id wrapped in ${...}
 e.g. "${my-months}". To refer to custom classes resources, use the class'
 id wrapped in ${} (e.g. ${my-months}).
 Speech-to-Text supports three locations: global, us (US North America),
 and eu (Europe). If you are calling the speech.googleapis.com
 endpoint, use the global location. To specify a region, use a
 regional endpoint with matching us or
 eu location value.
 Protobuf type google.cloud.speech.v1.PhraseSet.Phrase
PhraseSet.Phrase.Builder
 A phrases containing words and phrase "hints" so that
 the speech recognition is more likely to recognize them. This can be used
 to improve the accuracy for specific words and phrases, for example, if
 specific commands are typically spoken by the user. This can also be used
 to add additional words to the vocabulary of the recognizer. See
 usage limits.
 List items can also include pre-built or custom classes containing groups
 of words that represent common concepts that occur in natural language. For
 example, rather than providing a phrase hint for every month of the
 year (e.g. "i was born in january", "i was born in febuary", ...), use the
 pre-built $MONTH class improves the likelihood of correctly transcribing
 audio that includes months (e.g. "i was born in $month").
 To refer to pre-built classes, use the class' symbol prepended with $
 e.g. $MONTH. To refer to custom classes that were defined inline in the
 request, set the class's custom_class_id to a string unique to all class
 resources and inline classes. Then use the class' id wrapped in ${...}
 e.g. "${my-months}". To refer to custom classes resources, use the class'
 id wrapped in ${} (e.g. ${my-months}).
 Speech-to-Text supports three locations: global, us (US North America),
 and eu (Europe). If you are calling the speech.googleapis.com
 endpoint, use the global location. To specify a region, use a
 regional endpoint with matching us or
 eu location value.
 Protobuf type google.cloud.speech.v1.PhraseSet.Phrase
RecognitionAudio
 Contains audio data in the encoding specified in the RecognitionConfig.
 Either content or uri must be supplied. Supplying both or neither
 returns google.rpc.Code.INVALID_ARGUMENT. See
 content limits.
 Protobuf type google.cloud.speech.v1.RecognitionAudio
RecognitionAudio.Builder
 Contains audio data in the encoding specified in the RecognitionConfig.
 Either content or uri must be supplied. Supplying both or neither
 returns google.rpc.Code.INVALID_ARGUMENT. See
 content limits.
 Protobuf type google.cloud.speech.v1.RecognitionAudio
RecognitionConfig
Provides information to the recognizer that specifies how to process the request.
 Protobuf type google.cloud.speech.v1.RecognitionConfig
RecognitionConfig.Builder
Provides information to the recognizer that specifies how to process the request.
 Protobuf type google.cloud.speech.v1.RecognitionConfig
RecognitionMetadata
Description of audio data to be recognized.
 Protobuf type google.cloud.speech.v1.RecognitionMetadata
RecognitionMetadata.Builder
Description of audio data to be recognized.
 Protobuf type google.cloud.speech.v1.RecognitionMetadata
RecognizeRequest
 The top-level message sent by the client for the Recognize method.
 Protobuf type google.cloud.speech.v1.RecognizeRequest
RecognizeRequest.Builder
 The top-level message sent by the client for the Recognize method.
 Protobuf type google.cloud.speech.v1.RecognizeRequest
RecognizeResponse
 The only message returned to the client by the Recognize method. It
 contains the result as zero or more sequential SpeechRecognitionResult
 messages.
 Protobuf type google.cloud.speech.v1.RecognizeResponse
RecognizeResponse.Builder
 The only message returned to the client by the Recognize method. It
 contains the result as zero or more sequential SpeechRecognitionResult
 messages.
 Protobuf type google.cloud.speech.v1.RecognizeResponse
SpeakerDiarizationConfig
Config to enable speaker diarization.
 Protobuf type google.cloud.speech.v1.SpeakerDiarizationConfig
SpeakerDiarizationConfig.Builder
Config to enable speaker diarization.
 Protobuf type google.cloud.speech.v1.SpeakerDiarizationConfig
SpeechAdaptation
Speech adaptation configuration.
 Protobuf type google.cloud.speech.v1.SpeechAdaptation
SpeechAdaptation.Builder
Speech adaptation configuration.
 Protobuf type google.cloud.speech.v1.SpeechAdaptation
SpeechClient
Service Description: Service that implements Google Cloud Speech API.
This class provides the ability to make remote calls to the backing service through method calls that map to API methods. Sample code to get started:
 try (SpeechClient speechClient = SpeechClient.create()) {
   RecognitionConfig config = RecognitionConfig.newBuilder().build();
   RecognitionAudio audio = RecognitionAudio.newBuilder().build();
   RecognizeResponse response = speechClient.recognize(config, audio);
 }
 
Note: close() needs to be called on the SpeechClient object to clean up resources such as threads. In the example above, try-with-resources is used, which automatically calls close().
The surface of this class includes several types of Java methods for each of the API's methods:
- A "flattened" method. With this type of method, the fields of the request type have been converted into function parameters. It may be the case that not all fields are available as parameters, and not every API method will have a flattened method entry point.
- A "request object" method. This type of method only takes one parameter, a request object, which must be constructed before the call. Not every API method will have a request object method.
- A "callable" method. This type of method takes no parameters and returns an immutable API callable object, which can be used to initiate calls to the service.
See the individual methods for example code.
Many parameters require resource names to be formatted in a particular way. To assist with these names, this class includes a format method for each type of name, and additionally a parse method to extract the individual identifiers contained within names that are returned.
This class can be customized by passing in a custom instance of SpeechSettings to create(). For example:
To customize credentials:
 SpeechSettings speechSettings =
     SpeechSettings.newBuilder()
         .setCredentialsProvider(FixedCredentialsProvider.create(myCredentials))
         .build();
 SpeechClient speechClient = SpeechClient.create(speechSettings);
 
To customize the endpoint:
 SpeechSettings speechSettings = SpeechSettings.newBuilder().setEndpoint(myEndpoint).build();
 SpeechClient speechClient = SpeechClient.create(speechSettings);
 
Please refer to the GitHub repository's samples for more quickstart code snippets.
SpeechContext
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
 Protobuf type google.cloud.speech.v1.SpeechContext
SpeechContext.Builder
Provides "hints" to the speech recognizer to favor specific words and phrases in the results.
 Protobuf type google.cloud.speech.v1.SpeechContext
SpeechGrpc
Service that implements Google Cloud Speech API.
SpeechGrpc.SpeechBlockingStub
Service that implements Google Cloud Speech API.
SpeechGrpc.SpeechFutureStub
Service that implements Google Cloud Speech API.
SpeechGrpc.SpeechImplBase
Service that implements Google Cloud Speech API.
SpeechGrpc.SpeechStub
Service that implements Google Cloud Speech API.
SpeechProto
SpeechRecognitionAlternative
Alternative hypotheses (a.k.a. n-best list).
 Protobuf type google.cloud.speech.v1.SpeechRecognitionAlternative
SpeechRecognitionAlternative.Builder
Alternative hypotheses (a.k.a. n-best list).
 Protobuf type google.cloud.speech.v1.SpeechRecognitionAlternative
SpeechRecognitionResult
A speech recognition result corresponding to a portion of the audio.
 Protobuf type google.cloud.speech.v1.SpeechRecognitionResult
SpeechRecognitionResult.Builder
A speech recognition result corresponding to a portion of the audio.
 Protobuf type google.cloud.speech.v1.SpeechRecognitionResult
SpeechResourceProto
SpeechSettings
Settings class to configure an instance of SpeechClient.
The default instance has everything set to sensible defaults:
- The default service address (speech.googleapis.com) and default port (443) are used.
- Credentials are acquired automatically through Application Default Credentials.
- Retries are configured for idempotent methods but not for non-idempotent methods.
The builder of this class is recursive, so contained classes are themselves builders. When build() is called, the tree of builders is called to create the complete settings object.
For example, to set the total timeout of recognize to 30 seconds:
 SpeechSettings.Builder speechSettingsBuilder = SpeechSettings.newBuilder();
 speechSettingsBuilder
     .recognizeSettings()
     .setRetrySettings(
         speechSettingsBuilder
             .recognizeSettings()
             .getRetrySettings()
             .toBuilder()
             .setTotalTimeout(Duration.ofSeconds(30))
             .build());
 SpeechSettings speechSettings = speechSettingsBuilder.build();
 SpeechSettings.Builder
Builder for SpeechSettings.
StreamingRecognitionConfig
Provides information to the recognizer that specifies how to process the request.
 Protobuf type google.cloud.speech.v1.StreamingRecognitionConfig
StreamingRecognitionConfig.Builder
Provides information to the recognizer that specifies how to process the request.
 Protobuf type google.cloud.speech.v1.StreamingRecognitionConfig
StreamingRecognitionResult
A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.
 Protobuf type google.cloud.speech.v1.StreamingRecognitionResult
StreamingRecognitionResult.Builder
A streaming speech recognition result corresponding to a portion of the audio that is currently being processed.
 Protobuf type google.cloud.speech.v1.StreamingRecognitionResult
StreamingRecognizeRequest
 The top-level message sent by the client for the StreamingRecognize method.
 Multiple StreamingRecognizeRequest messages are sent. The first message
 must contain a streaming_config message and must not contain
 audio_content. All subsequent messages must contain audio_content and
 must not contain a streaming_config message.
 Protobuf type google.cloud.speech.v1.StreamingRecognizeRequest
StreamingRecognizeRequest.Builder
 The top-level message sent by the client for the StreamingRecognize method.
 Multiple StreamingRecognizeRequest messages are sent. The first message
 must contain a streaming_config message and must not contain
 audio_content. All subsequent messages must contain audio_content and
 must not contain a streaming_config message.
 Protobuf type google.cloud.speech.v1.StreamingRecognizeRequest
StreamingRecognizeResponse
 StreamingRecognizeResponse is the only message returned to the client by
 StreamingRecognize. A series of zero or more StreamingRecognizeResponse
 messages are streamed back to the client. If there is no recognizable
 audio, and single_utterance is set to false, then no messages are streamed
 back to the client.
 Here's an example of a series of StreamingRecognizeResponses that might be
 returned while processing audio:
- results { alternatives { transcript: "tube" } stability: 0.01 }
- results { alternatives { transcript: "to be a" } stability: 0.01 }
- results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
- results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
- results { alternatives { transcript: " that's" } stability: 0.01 }
- results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
- results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true } Notes:
- Only two of the above responses #4 and #7 contain final results; they are
indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".
- The others contain interim results. #3 and #6 contain two interimresults: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stabilityresults.
- The specific stabilityandconfidencevalues shown above are only for illustrative purposes. Actual values may vary.
- In each response, only one of these fields will be set:
  error,speech_event_type, or one or more (repeated)results.
 Protobuf type google.cloud.speech.v1.StreamingRecognizeResponse
StreamingRecognizeResponse.Builder
 StreamingRecognizeResponse is the only message returned to the client by
 StreamingRecognize. A series of zero or more StreamingRecognizeResponse
 messages are streamed back to the client. If there is no recognizable
 audio, and single_utterance is set to false, then no messages are streamed
 back to the client.
 Here's an example of a series of StreamingRecognizeResponses that might be
 returned while processing audio:
- results { alternatives { transcript: "tube" } stability: 0.01 }
- results { alternatives { transcript: "to be a" } stability: 0.01 }
- results { alternatives { transcript: "to be" } stability: 0.9 } results { alternatives { transcript: " or not to be" } stability: 0.01 }
- results { alternatives { transcript: "to be or not to be" confidence: 0.92 } alternatives { transcript: "to bee or not to bee" } is_final: true }
- results { alternatives { transcript: " that's" } stability: 0.01 }
- results { alternatives { transcript: " that is" } stability: 0.9 } results { alternatives { transcript: " the question" } stability: 0.01 }
- results { alternatives { transcript: " that is the question" confidence: 0.98 } alternatives { transcript: " that was the question" } is_final: true } Notes:
- Only two of the above responses #4 and #7 contain final results; they are
indicated by is_final: true. Concatenating these together generates the full transcript: "to be or not to be that is the question".
- The others contain interim results. #3 and #6 contain two interimresults: the first portion has a high stability and is less likely to change; the second portion has a low stability and is very likely to change. A UI designer might choose to show only high stabilityresults.
- The specific stabilityandconfidencevalues shown above are only for illustrative purposes. Actual values may vary.
- In each response, only one of these fields will be set:
  error,speech_event_type, or one or more (repeated)results.
 Protobuf type google.cloud.speech.v1.StreamingRecognizeResponse
TranscriptOutputConfig
Specifies an optional destination for the recognition results.
 Protobuf type google.cloud.speech.v1.TranscriptOutputConfig
TranscriptOutputConfig.Builder
Specifies an optional destination for the recognition results.
 Protobuf type google.cloud.speech.v1.TranscriptOutputConfig
WordInfo
Word-specific information for recognized words.
 Protobuf type google.cloud.speech.v1.WordInfo
WordInfo.Builder
Word-specific information for recognized words.
 Protobuf type google.cloud.speech.v1.WordInfo
Interfaces
CustomClass.ClassItemOrBuilder
CustomClassOrBuilder
LongRunningRecognizeMetadataOrBuilder
LongRunningRecognizeRequestOrBuilder
LongRunningRecognizeResponseOrBuilder
PhraseSet.PhraseOrBuilder
PhraseSetOrBuilder
RecognitionAudioOrBuilder
RecognitionConfigOrBuilder
RecognitionMetadataOrBuilder
RecognizeRequestOrBuilder
RecognizeResponseOrBuilder
SpeakerDiarizationConfigOrBuilder
SpeechAdaptationOrBuilder
SpeechContextOrBuilder
SpeechRecognitionAlternativeOrBuilder
SpeechRecognitionResultOrBuilder
StreamingRecognitionConfigOrBuilder
StreamingRecognitionResultOrBuilder
StreamingRecognizeRequestOrBuilder
StreamingRecognizeResponseOrBuilder
TranscriptOutputConfigOrBuilder
WordInfoOrBuilder
Enums
RecognitionAudio.AudioSourceCase
RecognitionConfig.AudioEncoding
 The encoding of the audio data sent in the request.
 All encodings support only 1 channel (mono) audio, unless the
 audio_channel_count and enable_separate_recognition_per_channel fields
 are set.
 For best results, the audio source should be captured and transmitted using
 a lossless encoding (FLAC or LINEAR16). The accuracy of the speech
 recognition can be reduced if lossy codecs are used to capture or transmit
 audio, particularly if background noise is present. Lossy codecs include
 MULAW, AMR, AMR_WB, OGG_OPUS, SPEEX_WITH_HEADER_BYTE, MP3,
 and WEBM_OPUS.
 The FLAC and WAV audio file formats include a header that describes the
 included audio content. You can request recognition for WAV files that
 contain either LINEAR16 or MULAW encoded audio.
 If you send FLAC or WAV audio file format in
 your request, you do not need to specify an AudioEncoding; the audio
 encoding format is determined from the file header. If you specify
 an AudioEncoding when you send  send FLAC or WAV audio, the
 encoding configuration must match the encoding described in the audio
 header; otherwise the request returns an
 google.rpc.Code.INVALID_ARGUMENT error code.
 Protobuf enum google.cloud.speech.v1.RecognitionConfig.AudioEncoding
RecognitionMetadata.InteractionType
Use case categories that the audio recognition request can be described by.
 Protobuf enum google.cloud.speech.v1.RecognitionMetadata.InteractionType
RecognitionMetadata.MicrophoneDistance
Enumerates the types of capture settings describing an audio file.
 Protobuf enum google.cloud.speech.v1.RecognitionMetadata.MicrophoneDistance
RecognitionMetadata.OriginalMediaType
The original media the speech was recorded on.
 Protobuf enum google.cloud.speech.v1.RecognitionMetadata.OriginalMediaType
RecognitionMetadata.RecordingDeviceType
The type of device the speech was recorded with.
 Protobuf enum google.cloud.speech.v1.RecognitionMetadata.RecordingDeviceType
StreamingRecognizeRequest.StreamingRequestCase
StreamingRecognizeResponse.SpeechEventType
Indicates the type of speech event.
 Protobuf enum google.cloud.speech.v1.StreamingRecognizeResponse.SpeechEventType