Class GkeInferenceQuickstartGrpc.GkeInferenceQuickstartBlockingStub (0.1.0)
public static final class GkeInferenceQuickstartGrpc.GkeInferenceQuickstartBlockingStub extends AbstractBlockingStub<GkeInferenceQuickstartGrpc.GkeInferenceQuickstartBlockingStub>
A stub to allow clients to do limited synchronous rpc calls to service GkeInferenceQuickstart.
GKE Inference Quickstart (GIQ) service provides profiles with performance
metrics for popular models and model servers across multiple accelerators.
These profiles help generate optimized best practices for running inference
on GKE.
Inheritance
java.lang.Object >
io.grpc.stub.AbstractStub >
io.grpc.stub.AbstractBlockingStub >
GkeInferenceQuickstartGrpc.GkeInferenceQuickstartBlockingStub
Inherited Members
io.grpc.stub.AbstractBlockingStub.<T>newStub(io.grpc.stub.AbstractStub.StubFactory<T>,io.grpc.Channel)
io.grpc.stub.AbstractBlockingStub.<T>newStub(io.grpc.stub.AbstractStub.StubFactory<T>,io.grpc.Channel,io.grpc.CallOptions)
io.grpc.stub.AbstractStub.<T>withOption(io.grpc.CallOptions.Key<T>,T)
io.grpc.stub.AbstractStub.build(io.grpc.Channel,io.grpc.CallOptions)
io.grpc.stub.AbstractStub.getCallOptions()
io.grpc.stub.AbstractStub.getChannel()
io.grpc.stub.AbstractStub.withCallCredentials(io.grpc.CallCredentials)
io.grpc.stub.AbstractStub.withChannel(io.grpc.Channel)
io.grpc.stub.AbstractStub.withCompression(java.lang.String)
io.grpc.stub.AbstractStub.withDeadline(io.grpc.Deadline)
io.grpc.stub.AbstractStub.withDeadlineAfter(java.time.Duration)
io.grpc.stub.AbstractStub.withDeadlineAfter(long,java.util.concurrent.TimeUnit)
io.grpc.stub.AbstractStub.withExecutor(java.util.concurrent.Executor)
io.grpc.stub.AbstractStub.withInterceptors(io.grpc.ClientInterceptor...)
io.grpc.stub.AbstractStub.withMaxInboundMessageSize(int)
io.grpc.stub.AbstractStub.withMaxOutboundMessageSize(int)
io.grpc.stub.AbstractStub.withOnReadyThreshold(int)
io.grpc.stub.AbstractStub.withWaitForReady()
Methods
build(Channel channel, CallOptions callOptions)
protected GkeInferenceQuickstartGrpc.GkeInferenceQuickstartBlockingStub build(Channel channel, CallOptions callOptions)
| Parameters |
| Name |
Description |
channel |
io.grpc.Channel
|
callOptions |
io.grpc.CallOptions
|
Overrides
io.grpc.stub.AbstractStub.build(io.grpc.Channel,io.grpc.CallOptions)
fetchBenchmarkingData(FetchBenchmarkingDataRequest request)
public FetchBenchmarkingDataResponse fetchBenchmarkingData(FetchBenchmarkingDataRequest request)
Fetches all of the benchmarking data available for a profile. Benchmarking
data returns all of the performance metrics available for a given model
server setup on a given instance type.
fetchModelServerVersions(FetchModelServerVersionsRequest request)
public FetchModelServerVersionsResponse fetchModelServerVersions(FetchModelServerVersionsRequest request)
Fetches available model server versions. Open-source servers use their own
versioning schemas (e.g., vllm uses semver like v1.0.0).
Some model servers have different versioning schemas depending on the
accelerator. For example, vllm uses semver on GPUs, but returns nightly
build tags on TPUs. All available versions will be returned when different
schemas are present.
fetchModelServers(FetchModelServersRequest request)
public FetchModelServersResponse fetchModelServers(FetchModelServersRequest request)
Fetches available model servers. Open-source model servers use simplified,
lowercase names (e.g., vllm).
fetchModels(FetchModelsRequest request)
public FetchModelsResponse fetchModels(FetchModelsRequest request)
Fetches available models. Open-source models follow the Huggingface Hub
owner/model_name format.
fetchProfiles(FetchProfilesRequest request)
public FetchProfilesResponse fetchProfiles(FetchProfilesRequest request)
Fetches available profiles. A profile contains performance metrics and
cost information for a specific model server setup. Profiles can be
filtered by parameters. If no filters are provided, all profiles are
returned.
Profiles display a single value per performance metric based on the
provided performance requirements. If no requirements are given, the
metrics represent the inflection point. See Run best practice inference
with GKE Inference Quickstart
recipes
for details.
generateOptimizedManifest(GenerateOptimizedManifestRequest request)
public GenerateOptimizedManifestResponse generateOptimizedManifest(GenerateOptimizedManifestRequest request)
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-12-17 UTC."],[],[]]