Reference documentation and code samples for the Google Cloud Gke Recommender V1 Client class PerformanceStats.
Performance statistics for a model deployment.
Generated from protobuf message google.cloud.gkerecommender.v1.PerformanceStats
Namespace
Google \ Cloud \ GkeRecommender \ V1Methods
__construct
Constructor.
| Parameters | |
|---|---|
| Name | Description |
data |
array
Optional. Data for populating the Message object. |
↳ queries_per_second |
float
Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput. |
↳ output_tokens_per_second |
int
Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds. |
↳ ntpot_milliseconds |
int
Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens. |
↳ ttft_milliseconds |
int
Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request. |
↳ cost |
array<Cost>
Output only. The cost of running the model deployment. |
getQueriesPerSecond
Output only. The number of queries per second.
Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
| Returns | |
|---|---|
| Type | Description |
float |
|
setQueriesPerSecond
Output only. The number of queries per second.
Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
| Parameter | |
|---|---|
| Name | Description |
var |
float
|
| Returns | |
|---|---|
| Type | Description |
$this |
|
getOutputTokensPerSecond
Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
| Returns | |
|---|---|
| Type | Description |
int |
|
setOutputTokensPerSecond
Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
| Parameter | |
|---|---|
| Name | Description |
var |
int
|
| Returns | |
|---|---|
| Type | Description |
$this |
|
getNtpotMilliseconds
Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.
This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
| Returns | |
|---|---|
| Type | Description |
int |
|
setNtpotMilliseconds
Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.
This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
| Parameter | |
|---|---|
| Name | Description |
var |
int
|
| Returns | |
|---|---|
| Type | Description |
$this |
|
getTtftMilliseconds
Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
| Returns | |
|---|---|
| Type | Description |
int |
|
setTtftMilliseconds
Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
| Parameter | |
|---|---|
| Name | Description |
var |
int
|
| Returns | |
|---|---|
| Type | Description |
$this |
|
getCost
Output only. The cost of running the model deployment.
| Returns | |
|---|---|
| Type | Description |
Google\Protobuf\Internal\RepeatedField |
|
setCost
Output only. The cost of running the model deployment.
| Parameter | |
|---|---|
| Name | Description |
var |
array<Cost>
|
| Returns | |
|---|---|
| Type | Description |
$this |
|