GKE Recommender V1 API - Class Google::Cloud::GkeRecommender::V1::PerformanceStats (v0.1.0)

Reference documentation and code samples for the GKE Recommender V1 API class Google::Cloud::GkeRecommender::V1::PerformanceStats.

Performance statistics for a model deployment.

Inherits

  • Object

Extended By

  • Google::Protobuf::MessageExts::ClassMethods

Includes

  • Google::Protobuf::MessageExts

Methods

#cost

def cost() -> ::Array<::Google::Cloud::GkeRecommender::V1::Cost>
Returns

#ntpot_milliseconds

def ntpot_milliseconds() -> ::Integer
Returns
  • (::Integer) — Output only. The Normalized Time Per Output Token (NTPOT) in milliseconds. This is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.

#output_tokens_per_second

def output_tokens_per_second() -> ::Integer
Returns
  • (::Integer) — Output only. The number of output tokens per second. This is the throughput measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.

#queries_per_second

def queries_per_second() -> ::Float
Returns
  • (::Float) — Output only. The number of queries per second. Note: This metric can vary widely based on context length and may not be a reliable measure of LLM throughput.

#ttft_milliseconds

def ttft_milliseconds() -> ::Integer
Returns
  • (::Integer) — Output only. The Time To First Token (TTFT) in milliseconds. This is the time it takes to generate the first token for a request.