The natural language speech audio to be processed.
A single request can contain up to 1 minute of speech audio data.
The [transcribed
text][google.cloud.dialogflow.cx.v3.QueryResult.transcript] cannot contain
more than 256 bytes.
For non-streaming audio detect intent, both config and audio must be
provided.
For streaming audio detect intent, config must be provided in
the first request and audio must be provided in all following requests.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["\u003cp\u003eThe latest version available for the Dialogflow v3 API class AudioInput is 2.23.0, with the documentation providing a historical list of previous versions going all the way back to 1.0.0.\u003c/p\u003e\n"],["\u003cp\u003eThe AudioInput class represents natural speech audio to be processed by the Dialogflow API, supporting both streaming and non-streaming audio detect intent, but limited to one minute of data per request.\u003c/p\u003e\n"],["\u003cp\u003eThe AudioInput class requires both \u003ccode\u003econfig\u003c/code\u003e and \u003ccode\u003eaudio\u003c/code\u003e to be provided for non-streaming audio detect intent, whereas for streaming audio, \u003ccode\u003econfig\u003c/code\u003e is needed in the first request, and \u003ccode\u003eaudio\u003c/code\u003e in subsequent requests.\u003c/p\u003e\n"],["\u003cp\u003eThe AudioInput class has properties such as Audio, representing the ByteString of the speech audio, and Config, an InputAudioConfig that dictates how the recognizer processes the audio.\u003c/p\u003e\n"]]],[],null,[]]