Serving Tensorflow models with custom output signatures using TF Serving


There are rather easy-to-follow and well-written tutorials to learn about serving tensorflow models for production using the Tensorflow Serving framework. These tutorials deal with machine learning models that typically return a single value following predict() or classify() calls. TF Serving requires some fine-tuning to properly serve machine learning models that can return multiple values like dictionaries. I only found scarce documentation to up-configure TF Serving to serve this type of models which can return dictionaries.

The purpose of this article is to explain some basic concepts of Tensorflow models and the Tensorflow Serving framework using a simple language, and give an hands-on introduction on serving models that can returns multiple output values.

Few Concepts for understanding Tensorflow Serving:

Model Signature:

Model Signature is a JSON-like serialized string (I think it is called Protocol Buffer in Googlosphere, I believe it is Google’s inhouse serialization tool) that informs an external agent on how to one could call a tensorflow model and what could one expect in return.

External agent here is the Tensorflow Serving framework which wraps tensorflow model artefacts, and obviously it has no idea on how to call these. Model signatures determine how this can be done. This is by the way not a Tensorflow specific concept.

Model Artefacts: Saved Models

Well initially you are writing a Python code using tensor objects as implemented by the Tensorflow mathematical library, representing ultimately a bunch of connected graphs and nodes. However, TF Serving is not going to execute your model represented as a Python code, it will call an abstraction of it. That is to say, it doesn’t require a Python backend interpreter.

This is what is called in TF a Concrete Model. First, one needs to serialize the model written in Python using the Tensorflow library, using the API. This will serialize your model and save it in a folder with a precise structure.

Still, a serialized model (aka a saved_model) will not be of any good use for the Tensorflow Serving framework, as TF Serving has still no idea how to call that model and what to expect in return. Therefore, we need to have model_signatures. This is true if our model code subclasses tf.Module to implement our model as the tf.Module does not automatically generate model signatures as other classes such keras.Layer or keras.Model would automatically do.

How to interact with models

One way is of course, run your model with the Python interpreter. This does not require model signatures. Another option is to fire up the Tensorflow Serving docker container and make requests to your models. Or alternatively the easiest way to start investigating your models is using the saved_model_cli utility to show and run models without the need to start the docker container each time. Using the --run option you can pass to it input arguments that are directly fed to your models method. How does saved_model_cli know how to interact with your model? You got it, this was specified in the model_signature.

tf.keras.layers.Layer, tf.keras.Model, tf.Module?

tf.keras.layers.Layer, tf.keras.Model are subclasses of tf.Module with rather tailored and specific purposes, like building neural networks. Therefore these objects generate their model signatures automatically, without you needed to worry about this. On the other hand what you can serve with these classes is less flexible. So the more general class to start with to code your model is tf.Module, however the flexibility comes with a price, which is the need to generate model signatures for the models you are creating.

Layer and Model classes have their own call method.


A necessary file for TF serving to understand how many models there are available for serving. It is just a friendly json-like file listing all models. This makes it possible for Tensorflow Serving to prepare the end points for serving your models.

  name: "my_super_model"
  base_path: "/models/FraudulentUsersDetection/Australia"
  model_platform: "tensorflow"

It contains the name of the model and where it is located.

# start with an object that inherits from a Module.
import tensorflow as tf
class CustomModuleWithOutputName(tf.Module):
  def __init__(self):
    super(CustomModuleWithOutputName, self).__init__(name='Australia')
    self.v = tf.Variable(1.)

  @tf.function(input_signature=[tf.TensorSpec([], tf.float32)])
  def __call__(self, x):
    #return a dictionary of stuff
    return {'kangourus': x * self.v , "koalas" : 120 * x }

#create an instance
module_output = CustomModuleWithOutputName()
#generate model signature
call_output = module_output.__call__.get_concrete_function(tf.TensorSpec(None, tf.float32))
module_output_path = "./my_models/" + module_output.__class__.__name__ + "/" + + "/0/"
print(module_output_path), module_output_path, signatures={'serving_default': call_output})

WARNING:tensorflow:From /Users/sonat/Documents/repos/surge-models/.venv/lib/python3.7/site-packages/tensorflow_core/python/ops/ calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
INFO:tensorflow:Assets written to: ./my_models/CustomModuleWithOutputName/Australia/0/assets


At this stage we can already pick inside our model with saved_model_cli. The following command prints the signature of the serialized model. You can see that dict that was returned is in the __call__ function is nicely reflected as outputs in list of model_signature below.

!saved_model_cli show --dir ./my_models/CustomModuleWithOutputName/Australia/0/ --tag_set serve --signature_def serving_default
The given SavedModel SignatureDef contains the following input(s):
  inputs['x'] tensor_info:
      dtype: DT_FLOAT
      shape: ()
      name: serving_default_x:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['kangourus'] tensor_info:
      dtype: DT_FLOAT
      shape: ()
      name: StatefulPartitionedCall:0
  outputs['koalas'] tensor_info:
      dtype: DT_FLOAT
      shape: ()
      name: StatefulPartitionedCall:1
Method name is: tensorflow/serving/predict

We can even run this model in the command line using again the saved_model_cli

!saved_model_cli run --dir ./my_models/CustomModuleWithOutputName/Australia/0/ --tag_set serve --signature_def serving_default --input_exprs='x=12.0'
2020-10-20 10:32:26.692194: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2020-10-20 10:32:26.704802: I tensorflow/compiler/xla/service/] XLA service 0x7ff1a233b6f0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-10-20 10:32:26.704822: I tensorflow/compiler/xla/service/]   StreamExecutor device (0): Host, Default Version
WARNING:tensorflow:From /Users/sonat/Documents/repos/surge-models/.venv/lib/python3.7/site-packages/tensorflow_core/python/tools/ load (from tensorflow.python.saved_model.loader_impl) is deprecated and will be removed in a future version.
Instructions for updating:
This function will only be available through the v1 compatibility library as tf.compat.v1.saved_model.loader.load or tf.compat.v1.saved_model.load. There will be a new function for importing SavedModels in Tensorflow 2.0.
Result for output key kangourus:
Result for output key koalas:


Remember, we still need to create the model.config file. Let’s do this simply by saving a string, there are more advanced ways of doing this using protocol-buffer.

s = '''model_config_list {config {
  name: "Australia"
  base_path: "/models/CustomModuleWithOutputName/Australia"
  model_platform: "tensorflow"

with open("./my_models/model.config", 'w+') as f:
!cat ./my_models/model.config
model_config_list {config {
  name: "Australia"
  base_path: "/models/CustomModuleWithOutputName/Australia"
  model_platform: "tensorflow"

Fire Up the Tensorflow Serving Docker

So let’s have a look at the folder which contains our models so far.

!tree my_models/
├── CustomModuleWithOutputName
│   └── Australia
│       └── 0
│           ├── assets
│           ├── saved_model.pb
│           └── variables
│               ├──
│               └── variables.index
└── model.config

5 directories, 4 files

To serve this model, we will use the docker container which you have to pull with

docker pull tensorflow/serving

and we have to run this as follows:

# docker run --rm -p 8501:8501 --mount type=bind,
#                                source=`pwd`/my_models/,
#                                target=/models/ 
#                               -t tensorflow/serving 
#                               --model_config_file=/models/model.config

What is what here on the command above?

Shortly, source is the directory where we saved our models in our local or remote storage. This directory should contain at least one model as we created above. target is the directory in the container where the models are stored. and the model_config_file is self-explanatory, but you must ensure that it must be present in the source directory so that it is mapped to the target directory in the container.

# kill all running containers if needed.
# !docker kill  docker stop $(docker ps -aq) 
# let's run the following in the detached mode with -d.
!docker run --rm -d -p 8501:8501 --mount type=bind,source=`pwd`/my_models/,target=/models/ -t tensorflow/serving --model_config_file=/models/model.config    

Make a request

!curl -d '{"instances": [[1.0]]}' -X POST http://localhost:8501/v1/models/Australia:predict

    "predictions": [
            "kangourus": [1.0],
            "koalas": [120.0]

We are done 🤟. I hope this will speed you up to enter in the business of Tensorflow model serving, now you just need to train your deep neuronal network to save the planet without generating too much CO2.