An inference is the output of a trained machine learning model. This page provides an overview of the workflow for getting inferences from your models on Vertex AI.
Vertex AI offers two methods for getting inferences:
-
Online inferences are synchronous requests made to
a model that is deployed to an
Endpoint
. Therefore, before sending a request, you must first deploy theModel
resource to an endpoint. This associates compute resources with the model so that the model can serve online inferences with low latency. Use online inferences when you are making requests in response to application input or in situations that require timely inference. -
Batch
inferences are asynchronous requests made to a model
that isn't deployed to an endpoint. You send the request (as a
BatchPredictionJob
resource) directly to theModel
resource. Use batch inferences when you don't require an immediate response and want to process accumulated data by using a single request.
Get inferences from custom trained models
To get inferences, you must first import your
model. After it's imported, it becomes a
Model
resource that is visible in
Vertex AI Model Registry.
Then, read the following documentation to learn how to get inferences:
Get inferences from AutoML models
Unlike custom trained models, AutoML models are automatically imported into the Vertex AI Model Registry after training.
Other than that, the workflow for AutoML models is similar, but varies slightly based on your data type and model objective. The documentation for getting AutoML inferences is located alongside the other AutoML documentation. Here are links to the documentation:
Image
Learn how to get inferences from the following types of image AutoML models:
Tabular
Learn how to get inferences from the following types of tabular AutoML models:
Tabular classification and regression models
Tabular forecasting models (batch inferences only)
Text
Learn how to get inferences from the following types of text AutoML models:
Video
Learn how to get inferences from the following types of video AutoML models:
- Video action recognition models (batch inferences only)
- Video classification models (batch inferences only)
- Video object tracking models (batch inferences only)
Get inferences from BigQuery ML models
There are two ways to get inferences from BigQuery ML models:
- You can request batch inferences directly from the model in BigQuery ML.
- You can register the models directly with the Model Registry, without exporting them from BigQuery ML or importing them into the Model Registry.