What is Zipkin and how does it work?

5 min readJan 28, 2022

Zipkin is a project that originated at Twitter in 2010 and is based on the Google Dapper papers. Observing the system from different angles is critical when troubleshooting, especially when a system is complex and distributed.

This blog will provide you with comprehensive information about Zipkin and will show you how to install it on RHEL, activate Zipkin in your application, and configure elasticsearch for storage.

Zipkin helps gather timing data needed to troubleshoot latency problems in service architectures. Features include both the collection and lookup of this data. It also helps you find out exactly where a request to the application has spent more time. Whether it’s an internal call inside the code or an internal or external API call to another service, you can instrument the system to share a context. Microservices usually share context by correlating requests with a unique ID.

Here’s an example sequence of HTTP tracing where the user code calls the resource /foo. This results in a single span, sent asynchronously to Zipkin after the user code receives the HTTP response.

Trace instrumentation report spans asynchronously to prevent delays or failures relating to the tracing system from delaying or breaking user code.

To install Zipkin on an RHEL:

We’ll be running Zipkin tracing system using the following two options:

Using Java (jar file)
Running in Docker Container

Install Zipkin Using Docker

Prerequisite: Docker.
Then fire the following command to Install Zipkin after Installing Docker.
docker run -d -p 9411:9411 openzipkin/zipkin

Installing Zipkin Using Java

Install java by running:

sudo yum -y install epel-release*(run the below two steps if java is not installed)*
sudo yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel jq vim
sudo alternatives --config java

To check if java has been installed run the following command:

java -version

After installing the pre-requisites, go to the latest release of Zipkin

curl -SSL <https://zipkin.io/quickstart.sh> | bash -s

Run the executable to Install Zipkin

java -jar zipkin.jar

Configure Systemd:
Running Zipkin with the java -jar command will not persist system reboots. If your system has support for systemd, you can create a service for it.
For your system, create a service for Zipkin to manage the application. Move the jar files to /opt directory

sudo mkdir /opt/zipkin
sudo mv zipkin.jar /opt/zipkin
ls /opt/zipkin

First of all, here create a system group for the user

sudo groupadd -r zipkin
sudo useradd -r -s /bin/false -g zipkin zipkin
sudo chown -R zipkin:zipkin /opt/zipkin

Then create a systemd service file.

sudo vim /etc/systemd/system/zipkin.service

Paste the below data into a file.

Zipkin System Service
 [Unit]
 Description=Manage Java service
 Documentation=https://zipkin.io/
 [Service]
 WorkingDirectory=/opt/zipkin
 ExecStart=/usr/bin/java -jar zipkin.jar
 User=zipkin
 Group=zipkin
 Type=simple
 Restart=on-failure
 RestartSec=10
 [Install]
 WantedBy=multi-user.target

Next, reload the daemon to take effect.

sudo systemctl daemon-reload

Then, start the services again

sudo systemctl start zipkin.service

See the status by typing

sudo systemctl status zipkin.service

To enable Zipkin in the microservice:

The first change is the build.gradle, where we add the cloud-starter dependencies for both Sleuth and Zipkin.

implementation('org.springframework.cloud:spring-cloud-sleuth-zipkin')
implementation 'org.springframework.cloud:spring-cloud-starter-sleuth'

The second change is to add the URL, in the application.yml for spring to publish data to Zipkin.

spring.zipkin.baseUrl=http://localhost:9411/

All the services that need to use the Distributed Tracing feature, will need the above two changes/additions.

Zipkin provides a nice interface for viewing traces based on service, time, and annotations. Browse to http://localhost:9411/zipkin/ to access Zipkin Web UI and find traces.

Zipkin helps you find out exactly where a request to the application has spent more time. Whether it’s an internal call inside the code or an internal or external API call to another service. You can see how much time this request spent on each function. For example, you could say that the problem is with the call to the microservice and focus on reducing the latency in that service first. Or, you could use these traces to understand what the workflow of a request is. What if you’re calling a dependency more than once? With Zipkin, it’s easy to spot those types of issues.

To check if the Zipkin service is working on the RHEL:
curl localhost:9411/zipkin/
You'll receive the following output:

<!doctype html><html><head><base href="/zipkin/"><meta charset="utf-8"/><link rel="icon" href="./favicon.ico"/><link href="./static/css/2.287eba14.chunk.css" rel="stylesheet"><link href="./static/css/main.dc336df7.chunk.css" rel="stylesheet"></head><body><noscript>You need to enable JavaScript to run this app.</noscript><div id="root"></div><script>!function(e){function r(r){for(var n,i,l=r[0],p=r[1],f=r[2],c=0,s=[];c<l.length;c++)i=l[c],Object.prototype.hasOwnProperty.call(o,i)&&o[i]&&s.push(o[i][0]),o[i]=0;for(n in p)Object.prototype.hasOwnProperty.call(p,n)&&(e[n]=p[n]);for(a&&a(r);s.length;)s.shift()();return u.push.apply(u,f||[]),t()}function t(){for(var e,r=0;r<u.length;r++){for(var t=u[r],n=!0,l=1;l<t.length;l++){var p=t[l];0!==o[p]&&(n=!1)}n&&(u.splice(r--,1),e=i(i.s=t[0]))}return e}var n={},o={1:0},u=[];function i(r){if(n[r])return n[r].exports;var t=n[r]={i:r,l:!1,exports:{}};return e[r].call(t.exports,t,t.exports,i),t.l=!0,t.exports}i.m=e,i.c=n,i.d=function(e,r,t){i.o(e,r)||Object.defineProperty(e,r,{enumerable:!0,get:t})},i.r=function(e){"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},i.t=function(e,r){if(1&r&&(e=i(e)),8&r)return e;if(4&r&&"object"==typeof e&&e&&e.__esModule)return e;var t=Object.create(null);if(i.r(t),Object.defineProperty(t,"default",{enumerable:!0,value:e}),2&r&&"string"!=typeof e)for(var n in e)i.d(t,n,function(r){return e[r]}.bind(null,n));return t},i.n=function(e){var r=e&&e.__esModule?function(){return e.default}:function(){return e};return i.d(r,"a",r),r},i.o=function(e,r){return Object.prototype.hasOwnProperty.call(e,r)},i.p="./";var l=this["webpackJsonpzipkin-lens"]=this["webpackJsonpzipkin-lens"]||[],p=l.push.bind(l);l.push=r,l=l.slice();for(var f=0;f<l.length;f++)r(l[f]);var a=p;t()}([])</script><script src="./static/js/2.5dabcfd5.chunk.js"></script><script src="./static/js/main.1f5ed6e2.chunk.js"></script></body></html>

You could do tunneling and check on your local system if you can access the UI:
ssh -L 9411:localhost:9411 raj@34.69.77.64

Open http://localhost:9411/zipkin/ : You should be able to see the Zipkin UI

Configure elasticsearch as the storage type:

Zipkin server bundles extension for span collection and storage. By default, spans can be collected over HTTP, Kafka, or RabbitMQ transports and stored in Elasticsearch for long-term retention of the trace data.
You can specify the configuration while running the executable to install Zipkin as:

java -DSTORAGE_TYPE=elasticsearch -DES_HOSTS=http://127.0.0.1:9200 -jar zipkin.jar

Else you can pass an env file containing the following configuration:

STORAGE_TYPE=elasticsearch
ES_HOSTS=elastic:9200

And pass the env file docker run as:

docker run -d --env-file=/home/zipkin.env -p 9411:9411 openzipkin/zipkin

Conclusion:

Enterprises are increasingly adopting microservice architectures. They are developing and deploying more microservices every day. Often, these services are deployed into separate runtime containers and managed by different teams and organizations. Large enterprises can have tens of thousands of microservices. Visibility into the health and performance of the diverse service topology is extremely important for them to be able to quickly determine the root cause of issues, as well as increase overall reliability and efficiency. The ability to sample requests, Zipkin’s instrumentation libraries and native support for Elasticsearch storage are the key reasons we utilize Zipkin to find latency issues with our services.

Credits: Nidhi Mittal