Implementing FaaS in Kubernetes Using Kubeless
Learn about the Function as a Service (FaaS) is new architectural pattern and how to implement FaaS.
What is The FaaS Architecture?
The Function as a Service (FaaS) is a relatively new architectural pattern. It came into existence when major cloud providers like AWS started offering products like Lambda Functions, followed by Azure Functions (Microsoft Azure), and Google Cloud Functions (Google Cloud). The idea behind those products is that sometimes, you may not need a service in the “always-on” mode. Rather, you want a “one-off” type of service; it is activated only when a request arrives and then “dies.” If a new request needs fulfillment, a new instance of the service is launched, and so on.
To help you better understand when the FaaS model is suitable, consider the following use cases:
ETL (Extract, Transform, and Load): assuming that part of your application is responsible for consuming data from a message queuing system, a social media platform, or even from a traditional FTP site, once the data is obtained, it’s processed and finally saved to a database. The problem is that this data arrives at random intervals (e.g., let’s say your API is retrieving any tweets containing the #globalwarming hashtag). If you deploy a compute instance that runs 24/7 waiting for new tweets, you may incur unnecessary costs. Your infrastructure bill may increase if data processing needs considerable CPU and memory. A much better approach would be to use FaaS. With this mode, as soon as new data arrives from the API, a new function is launched (ideally through a container) to do the required processing. When execution is complete, the container is terminated releasing any resources it was utilizing. Using FaaS in ETL systems can be depicted in the following diagram:
Two-factor authentication: you have an awesome web application that receives tons of visitors every day and because of its increased popularity, you decide to increase the security measures to avoid any hacking attempts. Thus, you implement a two-factor authentication system in which users enter their passwords and also have to enter a one-time password (OTP) code that gets sent to their registered cell phones. The problem is that having the web application binary send the SMS requires a blocking execution process that not only causes latency in displaying the page for the end-user but also increases the load on the web application server, especially during busy/peak times. A possible solution for this is to delegate sending the SMS to a function. Once triggered, the function gets executed in another container to fulfill the request (i.e., sending the SMS). Having verified that the username and password were entered by the user, the web application immediately displays a second page that requires the user to enter the OTP code sent to his/her cell phone. The following graph helps explain the advantage of using FaaS in such a scenario:
Serverless programming: some applications of FaaS include creating whole web applications without having to manage any servers. For example, a web application that is made up of some static content (HTML, JavaScript, images, CSS, and so on) in addition to dynamic content that gets generated from an application server (PHP, Go, Ruby, Python, etc.), a backend database can be architected using FaaS and serverless programming as follows:
- The request arrives at the webserver. Cloud providers like AWS offer cloud-managed static content servers, for example, AWS S3. Then, any static files are served from the bucket.
- When dynamic content is requested, the static HTML page makes a request (typically through AJAX) to the API gateway, which is the interface between the application and the cloud function.
- The cloud function gets triggered, it fulfills the request and returns the result before it is terminated. Behind the scenes, the cloud provider launches a container in which the function is executed. Once the execution is done, the container dies. We can depict the serverless hosting, using AWS as an example, in the following diagram:
What is Kubeless?
Now that you have an understanding of FaaS, and why and when it should be used, it’s time to do a quick hands-on exercise to demonstrate how we can use this model in Kubernetes. Among the well-known tools that make this possible is kubeless. Kubeless can be thought of as an add-on to Kubernetes. It creates a Custom Resource (and a Controller that handles it) and it also offers a handy command-line tool that allows you to easily issue commands. Through the rest of this article, we are going to install kubeless in our cluster and use it to implement a very simple, minimalistic FaaS model for a web application.
Kubeless Installation
Installation is pretty simple. It only involves issuing three commands against your running Kubernetes cluster:
$ export RELEASE=$(curl -s https://api.github.com/repos/kubeless/kubeless/releases/latest |
grep tag_name | cut -d '"' -f 4)
$ kubectl create ns kubeless
$ kubectl create -f https://github.com/kubeless/kubeless/releases/download/$RELEASE
/kubeless-$RELEASE.yaml
The above commands create a namespace and deploy the necessary components for kubeless to work. If you’re planning to use kubeless extensively, we strongly recommend that you also install the command-line tool that they provide:
export OS=$(uname -s| tr '[:upper:]' '[:lower:]')
curl -OL https://github.com/kubeless/kubeless/releases/download/$RELEASE/kubeless_$OS-amd64.zip && \
unzip kubeless_$OS-amd64.zip && \
sudo mv bundles/kubeless_$OS-amd64/kubeless /usr/local/bin/
The above commands will install the kubeless CLI tool on your linux/macOS system. If you’re running on Windows, you may want to refer to the documentation for specific steps.
LAB: Implementing a Web Scraper In Kubernetes Using FaaS
We’re going to write a very simple web application that accepts a URL from the user and then scrapes the website for important data. Eventually, the data is sanitized and saved to a backend database. Using FaaS, we can launch a function to scrape the data asynchronously so that it does not block the UI for the user. The model can be extended further to call another function that notifies the user that their data has been scraped successfully and is ready for download (perhaps in the form of CSV or PDF).
Our workflow can be detailed as follows:
- The application receives a URL through an AJAX POST request.
- Instead of handling the task by itself and blocking the UI, the web application dispatches a function (through a Kubernetes Service that’s created by Kubeless) to scrape the web page for data.
- Eventually, data can be saved to permanent storage (MySQL, Redis, etc.).
We can draw a simple diagram that describes the application’s workflow as follows:
Now that you know how the system looks, let’s move on to actually building it.
The website we’re using is https://webscraper.io/test-sites/e-commerce/allinone. It’s a website created specifically for testing web scraping tools. Now, let’s go through the lab steps.
Creating The Kubeless Function
You can use a number of runtime environments for the function, which means that you can code the function in your favorite programming language. In this lab, we’re using Python. Our Kubeless function lives in the scraper.py file (the name doesn't matter). The contents of the file are as follows:
from bs4 import BeautifulSoup
import requests
import json
def main(event, context):
url = event['data']['url']
r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
items = [x.find("p",{"class":"description"}).text for x in soup.find_all("div",
{"class":["col-sm-4", "col-lg-4", "col-md-4"]})]
return json.dumps(items)
The script simply scrapes the page and returns the titles of the items which are displayed for sale. The first thing we need to notice here is the structure of the file:
- The code must exist in a function (the name does’nt matter) that accepts two arguments: event and context. The event is an object that contains the data passed to the function, while context contains any other metadata related to the function.
- You may use any number of other functions in the file. However, you’ll need to select the bootstrap function that starts code execution (more on that later when we deploy our function)
If you’ve used Python before, you may notice that we are using a couple of non-standard libraries, namely: requests and bs4 (BeautifulSoup). This brings up an important question:
What if the code requires dependencies?
More often than not, you’ll need to install third-party libraries to carry out complex tasks. In our lab, we’re using bs4 for web scraping and requests to easily fetch the web page data. To allow the Kubeless function to use external libraries you must do two things:
- Install the libraries to the root directory where your function exists. This can be done easily in Python using pip: pip3 install -t . bs4 requests
- Package the whole directory that contains the function file + the external libraries in a zip file. So, assuming that you’re inside the directory, zip -r9 ~/package.zip * zips all the contents to a file package.zip in your home directory.
Now, you can simply deploy the zip file as the function file. However, etcd - by design - cannot store objects that are more than 1.5 MB in size. This means that we cannot store our function in the Kubernetes database if the zip file is large (which is often the case). The solution for this issue is to store the file on a file server and provide a URL to the file location. In our lab, we used GitHub for this purpose but you can use any HTTP server as long as it’s reachable from the pod. Now, let’s see how we can deploy our function.
Deploying The Kubeless Function
As is the case with other Kubernetes objects, you can deploy Kubeless functions either imperatively using the kubeless tool, or declaratively using a YAML file. In this lab, we are using the declarative way since we can put the YAML file under version control. Our file, function.yml looks as follows:
apiVersion: kubeless.io/v1beta1
kind: Function
metadata:
name: scraper
namespace: default
label:
created-by: kubeless
function: scraper
spec:
runtime: python3.7
timeout: "180"
handler: scraper.main
deps: ""
checksum: sha256:8c1136a7ecf95aef19c7565b9acb3977645fe98d1f877dd2397aa6455673805e
function-content-type: url+zip
function: https://github.com/MagalixCorp/k8s-faas/blob/master/packagae.zip?raw=true
Let’s have a look at the important parts of this definition:
- The file creates an object of type Function. This is a Custom Resource that was created for us when we installed Kubeless.
- The spec part contains a number of interesting features:
- The runtime: the environment in which the function will operate. Technically, this defines the image which the function’s container will use. In this lab, we’re using Python 3.7. Kubeless supports a number of runtimes like Go, Python, Ruby, Java among others. You can have a look at the full list here: https://kubeless.io/docs/runtimes/
- The timeout specifies the maximum amount of time (in seconds) in which the function should run. If the execution time exceeds this number, the function will get terminated prematurely. This may become handy if the code runs into issues (for example, infinite loops) and is wasting precious system resources.
- The handler is where you tell Kubeless how to execute the function. It contains the file name followed by the function’s name separated by a dot. So, if you have many files in the zip file and many functions in the code file, this is where you instruct Kubeless as to how to find the entry point for the program.
- The checksum is a SHA256 hash string for the function’s file. It ensures that the file that was downloaded is indeed the file that should get executed and it hasn't been tampered with. If you entered an incorrect hash value, the pod won’t start. So, how do you calculate the hash string for the file? Simply, execute the following command: shasum -a 256 ~/packagae.zip. Notice that any changes to the zip file contents will require you to recalculate and update the hash value.
- The function-content-type is url+zip to indicate that we need to use a compressed file that should be downloaded from a remote location. Other possible values include text and base64 (you can optionally add +zip to any of the formats to indicate that the file is compressed).
- The function: the source file of the function.
Now, we can apply this definition the same way we do with any other Kubernetes object:
kubectl apply -f function.yml
Once deployed, you will notice that we have a new pod created for us as well as a service, both with the name scraper:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
scraper-dfc478fd6-n8w6c 1/1 Running 0 3h37m
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.12.0.1 443/TCP 11h
scraper ClusterIP 10.12.5.46 8080/TCP 9h
Notice that the Service Type is ClusterIP, which means that it can only be called from another pod inside the cluster. Kubeless, however, supports adding an Ingress to the Service so that it’s accessible from the outside world. Please refer to the Kubeless documentation for more information on how to do this.
Before going any further, we need to ensure that our function is working. Since it can’t be called except inside the cluster, we have two options:
- Use a tool like kubectl proxy to access the cluster’s internal network and issue a raw HTTP POST request to the Service through a tool like cURL.
- Use the kubeless command-line tool.
The data is passed to the function in JSON format. You can access the data sent to the function by examining the event['data'] object. So, now that we’re confident that our function responds to HTTP POST requests and returns the expected response, let’s add the wrapper web application.
Building The Web Application And Wrapping Up
The web application we’re using is a slightly modified version of the one we used in our sidecar pattern article. It basically has the following components:
- An HTML page served by a web server (Nginx). The page contains an input box and a submit button that triggers an AJAX request to the backend Flask application when clicked. The user is supposed to enter the URL to be scraped in the box and click “submit.”
- The backend application is running Python Flask and Gunicorn. Once it receives the URL from the frontend page, it establishes a POST request to the scraper service which, in turn, triggers the Kubeless function.
The complete working project can be found at our GitHub repo https://github.com/MagalixCorp/k8s-faas. Now, let’s go through the application files:
The frontend (web)
- index.html is where the user submits the URL.
- script.js: contains the AJAX logic that sends the data to the backend API.
- default.conf: the Nginx configuration that routes requests to /api/ to the backend service.
- Dockerfile to build the web image.
The backend (app)
- mainapp.py receives the request from the frontend part and sends a POST request to the scraper service.
- wswgi.py activates the Gunicorn service that acts as a wrapper for Flask.
- Dockerifle to build the app image.
Kubernetes deployment (deploy.yml)
The file contains a NodePort service type and a Deployment. The Deployment creates a pod that hosts two containers: the app and the web. Notice the sidecar pattern used here where Nginx relays backend requests to the API server by calling localhost as if both services are running on the same host.
Running The Lab
The final step here is to test our work. Navigate to any of your cluster’s node IP addresses on port 32001. Enter https://webscraper.io/test-sites/e-commerce/allinone in the URL box and click submit.
Because our function was able to execute the task so fast, we can see the results returned to us by exploring the network tab of our browser’s developer tools. In a real-world scenario, the user should just receive a nice “Thank you” message while the function starts its execution journey asynchronously. When execution is done, there should be some sort of alerting logic to notify clients that their data is ready. The function can store the results in permanent storage where the user can view it via another page.
TL;DR
- Through the FaaS (Function as a Service) pattern, you can execute code on an “on-demand” basis. For example, you can use the client-facing part of a web application to execute validation enforce business logic rules while handing the heavy-lifting to Functions.
- All major cloud providers have FaaS already implemented through an offering (for example, AWS Lambda functions).
- You can implement your own FaaS logic in your Kubernetes cluster by using tools like Kubeless.
- Kubeless works by creating a function in one of the supported runtime environments and deploying it to the cluster.
- The function receives the invocation request through a POST request sent to the Service dedicated to the function’s pod.
- Once received, the function is triggered. You can optionally pass data to the function in JSON format.