Google Cloud Gke Recommender V1 Client - Class PerformanceStats (0.1.0)

Reference documentation and code samples for the Google Cloud Gke Recommender V1 Client class PerformanceStats.

Performance statistics for a model deployment.

Generated from protobuf message google.cloud.gkerecommender.v1.PerformanceStats

Namespace

Google \ Cloud \ GkeRecommender \ V1

Methods

__construct

Constructor.

Parameters
Name Description
data array

Optional. Data for populating the Message object.

↳ queries_per_second float

Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

↳ output_tokens_per_second int

Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

↳ ntpot_milliseconds int

Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

↳ ttft_milliseconds int

Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

↳ cost array<Cost>

Output only. The cost of running the model deployment.

getQueriesPerSecond

Output only. The number of queries per second.

Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

Returns
Type Description
float

setQueriesPerSecond

Output only. The number of queries per second.

Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

Parameter
Name Description
var float
Returns
Type Description
$this

getOutputTokensPerSecond

Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

Returns
Type Description
int

setOutputTokensPerSecond

Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

Parameter
Name Description
var int
Returns
Type Description
$this

getNtpotMilliseconds

Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.

This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

Returns
Type Description
int

setNtpotMilliseconds

Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds.

This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

Parameter
Name Description
var int
Returns
Type Description
$this

getTtftMilliseconds

Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

Returns
Type Description
int

setTtftMilliseconds

Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.

Parameter
Name Description
var int
Returns
Type Description
$this

getCost

Output only. The cost of running the model deployment.

Returns
Type Description
Google\Protobuf\Internal\RepeatedField

setCost

Output only. The cost of running the model deployment.

Parameter
Name Description
var array<Cost>
Returns
Type Description
$this