Some or all of the information on this page might not apply to Cloud de Confiance by S3NS. See Differences from Google Cloud for more details.

Google Cloud Gke Recommender V1 Client - Class PerformanceStats (0.2.0)

Reference documentation and code samples for the Google Cloud Gke Recommender V1 Client class PerformanceStats.

Performance statistics for a model deployment.

Generated from protobuf message google.cloud.gkerecommender.v1.PerformanceStats

Namespace

Google \ Cloud \ GkeRecommender \ V1

Methods

__construct

Constructor.

Parameters
Name	Description
`data`	`array` Optional. Data for populating the Message object.
`↳ queries_per_second`	`float` Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.
`↳ output_tokens_per_second`	`int` Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
`↳ ntpot_milliseconds`	`int` Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
`↳ ttft_milliseconds`	`int` Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.
`↳ cost`	`array<Cost>` Output only. The cost of running the model deployment.

getQueriesPerSecond

Output only. The number of queries per second.

Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

Returns
Type	Description
`float`

setQueriesPerSecond

Output only. The number of queries per second.

Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

Parameter
Name	Description
`var`	`float`

Returns
Type	Description
`$this`

getOutputTokensPerSecond

Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

Returns
Type	Description
`int`

setOutputTokensPerSecond

Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

Parameter
Name	Description
`var`	`int`

Returns
Type	Description
`$this`

getNtpotMilliseconds

Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.

This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

Returns
Type	Description
`int`

setNtpotMilliseconds

Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.

This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

Parameter
Name	Description
`var`	`int`

Returns
Type	Description
`$this`

getTtftMilliseconds

Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

Returns
Type	Description
`int`

setTtftMilliseconds

Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

Parameter
Name	Description
`var`	`int`

Returns
Type	Description
`$this`

getCost

Output only. The cost of running the model deployment.

Returns
Type	Description
`Google\Protobuf\RepeatedField<Cost>`

setCost

Output only. The cost of running the model deployment.

Parameter
Name	Description
`var`	`array<Cost>`

Returns
Type	Description
`$this`