Reference documentation and code samples for the GKE Recommender V1 API class Google::Cloud::GkeRecommender::V1::PerformanceRange.
Performance range for a model deployment.
Inherits
- Object
Extended By
- Google::Protobuf::MessageExts::ClassMethods
Includes
- Google::Protobuf::MessageExts
Methods
#ntpot_range
def ntpot_range() -> ::Google::Cloud::GkeRecommender::V1::MillisecondRange
Returns
- (::Google::Cloud::GkeRecommender::V1::MillisecondRange) — Output only. The range of NTPOT (Normalized Time Per Output Token) in milliseconds. NTPOT is the request latency normalized by the number of output tokens, measured as request_latency / total_output_tokens.
#throughput_output_range
def throughput_output_range() -> ::Google::Cloud::GkeRecommender::V1::TokensPerSecondRange
Returns
- (::Google::Cloud::GkeRecommender::V1::TokensPerSecondRange) — Output only. The range of throughput in output tokens per second. This is measured as total_output_tokens_generated_by_server / elapsed_time_in_seconds.
#ttft_range
def ttft_range() -> ::Google::Cloud::GkeRecommender::V1::MillisecondRange
Returns
- (::Google::Cloud::GkeRecommender::V1::MillisecondRange) — Output only. The range of TTFT (Time To First Token) in milliseconds. TTFT is the time it takes to generate the first token for a request.