How to gracefully delete a pod in Istio Service Mesh
Istio service mesh makes use of Envoy proxy as a pod sidecar to intercept all the traffic going to the application. This provides Istio users with capabilities such as traffic management, security policies enforcement and observability. At deletion time, it is needed to take some measures to have the best possible termination process.
When deleting a Pod, the kubelet sends a TERM
signal to each container in the pod, specifically to PID 1
, allowing each container to have some time to gracefully finishing its processes. If this gracefully termination exceeds the time defined in terminationGracePeriodSeconds
, which is 30 seconds by default, the kubelet gives 2 extra seconds and sends the KILL
signal, abruptly finishing PID 1
.
At deletion time, a problem will arise if your sidecar is not properly configured and synchronized with your app container. As the sidecar is the gateway for all outbound and inbound traffic to your application, if it is killed before your application has responded all its ongoing requests, you’ll probably get 5xx responses and an increased latency for those requests due to retries. With Istio and Kubernetes you have handy tools to avoid this scenario. It is a matter of harmonize the scenario.
Understand your specific use case
First of all, get to know your needs. Your application has specific response times and draining process. Make sure you have this numbers clear and config your lifecycle hook preStop
accordingly.
We will assume here your app is a database which takes a maximum of 40 seconds to respond to its longest request. So your preStop
should look like this:
lifecycle:
preStop:
exec:
command: ["sleep", "40"]
Kubernetes default is not enough
As terminationGracePeriodSeconds
is by default 30 seconds, a pod as our database requiring maximum 40 seconds to complete a request will give us problems. You should set you deployment .spec.template.spec.terminationGracePeriodSeconds
to 40. Remember the kubelet gives you 2 extra seconds on top of this time.
Envoy should hold the breath
Moving the kubelet patience to 40 seconds won’t do it. We also need to prevent the sidecar from dying during those 40 seconds, but also to avoid receiving new requests and allowing pending outbound ones.
Istio has a config for this exact behavior called terminationDrainDuration
which is:
The amount of time allowed for connections to complete on proxy shutdown. On receiving
SIGTERM
orSIGINT
,istio-agent
tells the active Envoy to start draining, preventing any new connections and allowing existing connections to complete. It then sleeps for thetermination_drain_duration
and then kills any remaining active Envoy processes. If not set, a default of5s
will be applied. Source: Istio docs
It can be set at the meshConfig
level, which is the configmap in istio-system
namespace that defines settings for your service mesh as a whole, or defined at the pod level using an annotation.
Annotation way:
template:
metadata:
annotations:
proxy.istio.io/config: |
terminationDrainDuration: 40s
Or meshConfig way:
data:
mesh: |-
accessLogFile: /dev/stdout
defaultConfig:
discoveryAddress: istiod.istio-system.svc:15012
proxyMetadata: {}
tracing:
zipkin:
address: zipkin.istio-system:9411
terminationDrainDuration: 40s
More info about customizing injection here.
Dancing is about coordination
Kubernetes can’t assure us the order in which it sends the SIGTERM
to each one of the containers inside a pod, so we must play the music. We do this by also setting a preStop
lifecycle hook to the sidecar, so both the database and the sidecar start counting at the exact same time.
Note: The containers in the Pod receive the TERM signal at different times and in an arbitrary order. If the order of shutdowns matters, consider using a
preStop
hook to synchronize. Source: Kubernetes docs.
As we already have a preStop
in the database, we have left to add it to the sidecar by adding a new container field in the database deployment like this:
spec:
containers:
- name: mydatabase
image: database-image
- name: istio-proxy # New lines start here
image: auto
lifecycle:
preStop:
exec:
command: ["kill", "-SIGTERM", "1"]
This is sending the SIGTERM
signal to the sidecar at the exact same time as the 40 seconds counter in the database starts on, making sure the edge cases are covered.
Once the termination grace period is done, any living process in the pod is killed. From Kubernetes 1.27 onward, the termination status will be Failed
if some process had to go the hard way.
Are you curious about if a pod in Terminating
state is still present it the service’s endpoint
? Check this interesting story about a little test I did.