top of page
Writer's pictureEndri Veizaj

Serving Dog Breed Classification model with Seldon-Core, TensorFlow Serving and Streamlit




In a modern Machine Learning workflow, after figuring out the best performing model, the next step is to bring it to the users. At Data Max we believe:

No users, no party!

In this post, we are going to write about how to bring a transfer learning model into production using TensorFlow, Kubeflow, Seldon-Core, TensorFlow Serving, and Streamlit.


For Data scientists and ML engineers, putting a model into production used to be challenging. However, a lot of tooling has been developed around that nowadays.

This post is a step-by-step guide on how to train and deploy a machine learning model with Docker and then Kubernetes.


Problems to be tackled:

  • Training models at scale

  • Serving the models

  • Creating a web UI for our models


We will start by shortly talking about the tools we have used for this post.


What is Helm?


Helm  is a package manager for Kubernetes. It deploys packaged applications to Kubernetes and structures them into charts. In our case, it will help to deploy all of our needed components using Helm charts.


What is a Helm chart?


Helm chart is a simply Kubernetes YAML manifest combined into a single package that can be advertised to your Kubernetes clusters. A single chart might be used to deploy an application into your Kubernetes cluster.


What is TensorFlow?


TensorFlow is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.


What is Seldon-Core?


Seldon-Core is used to serve models built in any open-source or commercial model-building framework. You can make use of powerful Kubernetes features like custom resource definitions to manage model graphs. And then connect your continuous integration and deployment (CI/CD) tools to scale and update your deployment. It provides out-of-the-box model monitoring and performance monitoring capabilities.

What is TensorFlow Serving?

TensorFlow Serving is a flexible, high-performance model serving system. It is designed for production environments. TensorFlow Serving makes it easy to deploy new algorithms and experiments while keeping the same server architecture and APIs.


What is Streamlit?


Streamlit is an open-source app framework for Python language. It helps create a web user interface for data science and machine learning apps in a short time. Streamlit allows you to write an app the same way you write a python code. Streamlit makes it seamless to work on the interactive loop of coding and viewing results in the web app.


What is Kubeflow?


Kubeflow is a platform for data scientists who want to build and experiment with ML pipelines. Kubeflow is also for ML engineers and operational teams who want to deploy ML systems to various environments for development, testing, and production-level serving.


Create a CNN (ResNet-50) to Classify Dog Breeds (using Transfer Learning)


To reduce training time without sacrificing accuracy, we train a CNN using Transfer Learning. Transfer Learning is the fine-tuning of a network that was pre-trained on some big dataset with new classification layers. In our case, we are using ResNet50 to train the model up to a point, then chopping off the final dense part of the model and adding a fully connected layer with the output that we want (e.g. a 133-node classifier for the dog breeds). After the model is trained we save both the model and labels.


Developing with Docker Compose


Docker Compose is a tool that was developed to help define and share multi-container applications. With Docker Compose, we can create a YAML file to define the services and with a single command, can spin everything up or tear it all down.


First, we need to define our app's environment with a Dockerfile so it can be reproduced anywhere. For our first Docker Compose file, we need two Dockerfiles, one for Seldon-Core and one for the Streamlit application. 


We have also created a second Docker Compose file which replaces Seldon-Core with TensorFlow Serving as it provides out-of-the-box integration with TensorFlow models.


Dockerfile for Seldon-Core:

For Seldon-Core, FROM instruction specifies the parent image from which you are building. In our case, a Python image. We copy the dependencies, which are specified in the requirements.txt file, and install them using COPY and RUN instructions. Also, we need to copy our local application file. The EXPOSE instruction informs Docker that the container listens on the specified network ports at runtime. Ultimately, we need to keep the Seldon service up and running. This is done by using CMD instruction.


Dockerfile for Streamlit:

For Streamlit, Python is also used as a parent image. EXPOSE instruction informs Docker that the container listens on port 8502. We copy the dependencies, which are specified in the requirements-streamlit.txt file, and install them using COPY and RUN instructions. CMD instruction is used to run the app at port 8502. We are using the 8502 port because the default port of Streamlit, 8501, is used by TensorFlow Serving.


docker-compose-seldon.yaml is the Docker Compose file used to deploy Seldon-Core and Streamlit images by using the Dockerfiles we previously mentioned.


docker-compose-tfserve.yaml is the Docker Compose file used to deploy TensorFlow Serving and Streamlit images. One of the easiest ways to get started using TensorFlow Serving is with Docker. In this Docker Compose, we are deploying tensorflow/serving image from Docker Hub and Streamlit image, which is built using the Dockerfile.


The project structure with Docker Compose:


Docker Compose using Seldon-Core as a model server (Right diagram)

  1. The model is trained and saved locally.

  2. Docker Compose commands start and run the entire app in an isolated environment.

  3. Users can access the Streamlit UI at http://localhost:8502 and upload a photo.

  4. Streamlit then sends the request to Seldon-Core.

  5. Seldon-Core will predict the dog breed and send the response back in Streamlit.


Docker Compose using TensorFlow Serving as a model server (Left diagram)

  1. The model is trained and saved locally.

  2. Docker Compose commands start and run the entire app in an isolated environment.

  3. Users can access the Streamlit UI at http://localhost:8502 and upload a photo.

  4. Streamlit then sends the request to TensorFlow Serving.

  5. TensorFlow Serving will predict the dog breed and send the response back in Streamlit.



Developing in Kubernetes


Kubernetes, also known as 'K8s', is a container orchestration framework. It can run in almost any environment, be it public cloud private cloud, or on-premises. We will be using minikube to create a local Kubernetes cluster.


First off, we start the minikube cluster, then build and load the previously mentioned docker images in the minikube registry.


In Docker Compose part we didn't need a Kubeflow pipeline because the model was trained and saved locally. Unlike Docker Compose, when we use Kubernetes we use Kubeflow as a pipeline orchestrator. By utilizing a Kubeflow pipeline, we train the model and save it inside the minikube cluster, so all the deployments will be able to access the model.




To create all K8s resources, we use helm charts, which are a collection of files that describe a related set of K8s resources. In these charts, we define the replicas, the image for each deployment, the environment variables that make it possible to choose the method for model deployment and the volumes. Volumes specify the location of the model and create a copy inside the pod.


helmfile-seldon.yaml is the helmfile that deploys Seldon-Core, Emissary-ingress, and Streamlit.

helmfile-tfserve.yaml is the helmfile that deploys TensorFlow Serving and Streamlit.


Helm chart example of Streamlit:


- values.yaml : In this file, we specify the image that we are using, volumes and the environment variables.


- Chart.yaml : Description of the charts.


- templates folder: The manifest file for the deployment and service.




For TensorFlow Serving and Seldon-Core, we use in the Kubernetes the Emissary-ingress or formerly known as Ambassador. Emissary-ingress is an open-source ingress controller and API gateway. Since Seldon-Core can only support v1 Ambassador we deploy the v1 Ambassador for Seldon-Core otherwise for TensorFlow Serving we use its latest version, 3.0 at the time of this writing. Emissary-ingress provides us with a standard architecture to manage the flow of the ingress traffic to the services in Kubernetes.



The project structure in Kubernetes:


Kubernetes using Seldon-Core as a model server (Right diagram)

  1. The model is trained using Kubeflow and saved in minikube.

  2. Helm charts create K8s resources for the entire application.

  3. Users can access the Streamlit UI at http://localhost:8080/streamlit/ and upload a photo.

  4. Streamlit then sends the request to Seldon-Core.

  5. Seldon-Core will predict the dog breed and send the response back in Streamlit.


Kubernetes using TensorFlow Serving as a model server (Left diagram)

  1. The model is trained and saved locally.

  2. Docker Compose commands start and run the entire app in an isolated environment.

  3. Users can access the Streamlit UI at http://localhost:8080/streamlit/ and upload a photo.

  4. Streamlit then sends the request to TensorFlow Serving.

  5. TensorFlow Serving will predict the dog breed and send the response back in Streamlit.





After everything was run successfully, open Streamlit UI. Go to the Predict tab. Upload a photo and then click the Predict button. Enjoy predicting!



Comments


bottom of page