Delete POD experiment used by chaos toolkit k8s of chaos Engineering

What is the chaos toolkit?

Today, let's play chaos toolkit, an open source tool for chaos engineering. Its goal is to provide a free, open, community driven tool set and api.

Official source code link: https://github.com/chaostoolkit/chaostoolkit

To understand this tool, you must know the key points mentioned in the principles of chaos engineering. As follows:

Remember the first point mentioned here and establish the steady-state assumption.

Before running this tool, let's take a look at its architecture.

To explain briefly, the ChaosToolkit operates your system under test through Drivers.

Its function points include the following parts:


Experimental preparation

Now let's put the tools together and play.

Environmental description:

  1. CentOS7.8
  2. k8s 1.19.5
  3. Example application

Installing Python 3

sudo yum install python3 python3-venv

Installing pipenv

gaolou@GaoMacPro ~ % pip3 install pipenv

Install the k8s extension and reporting module of chaos Toolkit

pip3 install -U chaostoolkit
pip3 install -U chaostoolkit-kubernetes
pip3 install -U chaostoolkit-reporting

If you need to operate other platforms, you can also install the corresponding extensions.

Create virtual environment

python3 -m venv .bundler
source .bundler/bin/activate

In order not to affect other environments, we use python's virtual environment operation here.

Note: the above installation process is performed on the k8s master machine. If you do not install on k8s, you can configure the corresponding k8s context. Please refer to: https://chaostoolkit.org/drivers/kubernetes/.

Experimental practice

chaos discover exploration experiment

First, execute the discover command. The chaotoolkit will generate the discovery.json file according to the contents in. / kube/config. This file will include all the operations that can be performed on k8s. The results of successful implementation are as follows:

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos discover chaostoolkit-kubernetes
[2021-06-23 12:18:07 INFO] Attempting to download and install package 'chaostoolkit-kubernetes'
[2021-06-23 12:18:08 INFO] Package downloaded and installed in current environment
[2021-06-23 12:18:09 INFO] Discovering capabilities from chaostoolkit-kubernetes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.deployment.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.node.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.pod.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.pod.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.replicaset.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.actions
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.service.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.statefulset.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.statefulset.probes
[2021-06-23 12:18:09 INFO] Searching for actions in chaosk8s.crd.actions
[2021-06-23 12:18:09 INFO] Searching for probes in chaosk8s.crd.probes
[2021-06-23 12:18:09 INFO] Discovery outcome saved in ./discovery.json
(.bundler) [root@s5 chaostoolkit_scenarios]#

chaos init generation test

Execute the initialization command to create a chaos test according to the prompt.

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos init
You are about to create an experiment.
This wizard will walk you through each step so that you can build
the best experiment for your needs.

An experiment is made up of three elements:
- a steady-state hypothesis [OPTIONAL]
- an experimental method
- a set of rollback activities [OPTIONAL]

Only the method is required. Also your experiment will
not run unless you define at least one activity (probe or action)
within it
Experiment's title: E2 #Here is the configuration of a test name

A steady state hypothesis defines what 'normality' looks like in your system
The steady state hypothesis is a collection of conditions that are used,
at the beginning of an experiment, to decide if the system is in a recognised
'normal' state. The steady state conditions are then used again when your experiment
 is complete to detect where your system may have deviated in an interesting,
weakness-detecting way

Initially you may not know what your steady state hypothesis is
and so instead you might create an experiment without one
This is why the stead state hypothesis is optional.
Do you want to define a steady state hypothesis now? [y/N]: y # To create a steady-state hypothesis, please note that this is an important concept in chaos engineering, but this step is not seen in most other chaos tools
Hypothesis's title: H2

You may now define probes that will determine
the steady-state of your system.
Add an activity
1) all_microservices_healthy
2) deployment_is_fully_available
3) deployment_is_not_fully_available
4) microservice_available_and_healthy
5) microservice_is_not_available
6) read_microservices_logs
7) service_endpoint_is_initialized
8) count_pods
9) pod_is_not_available
10) pods_in_conditions
11) pods_in_phase
12) pods_not_in_phase
13) read_pod_logs
14) statefulset_fully_available
15) statefulset_not_fully_available
16) get_cluster_custom_object
17) get_custom_object
18) list_cluster_custom_objects
19) list_custom_objects
Activity (0 to escape): 1 # Choose the judgment point of the steady-state hypothesis. In short, this is to create an expected result

!!!DEPRECATED!!!
1) kill_microservice
2) remove_service_endpoint
Do you want to use this probe? [y/N]: y # Determine whether to use the probe selected above

A steady-state probe requires a tolerance value, within which
your system is in a reognised `normal` state.

What is the tolerance for this probe?: normal

You now need to fill the arguments for this activity. Default
values will be shown between brackets. You may simply press return
to use it or not set any value.
Argument's value for 'ns' [default]: chaosnamespace # Enter k8s the namespace to operate on
Do you want to select another activity? [y/N]: y # Select an action
Add an activity
1) all_microservices_healthy
2) deployment_is_fully_available
3) deployment_is_not_fully_available
1) kill_microservice
4) microservice_available_and_healthy
5) microservice_is_not_available
6) read_microservices_logs
7) service_endpoint_is_initialized
8) count_pods
9) pod_is_not_available
10) pods_in_conditions
11) pods_in_phase
12) pods_not_in_phase
13) read_pod_logs
14) statefulset_fully_available
15) statefulset_not_fully_available
16) get_cluster_custom_object
17) get_custom_object
18) list_cluster_custom_objects
19) list_custom_objects
Activity (0 to escape): 1 # Select specific actions

!!!DEPRECATED!!!
Do you want to use this probe? [y/N]: y # OK to use the action selected above

You now need to fill the arguments for this activity. Default
values will be shown between brackets. You may simply press return
to use it or not set any value.
Argument's value for 'ns' [default]:
Do you want to select another activity? [y/N]: N # Do you want to add another test action? I won't add it here

An experiment's method contains actions and probes. Actions
vary real-world events in your system to determine if your
steady-state hypothesis is maintained when those events occur.

An experimental method can also contain probes to gather additional
information about your system as your method is executed.
Do you want to define an experimental method? [y/N]: y # Select a specific test method

Add an activity

1) kill_microservice

2) remove_service_endpoint

3) scale_microservice

4) start_microservice

5) all_microservices_healthy

6) deployment_is_fully_available

7) deployment_is_not_fully_available

8) microservice_available_and_healthy

9) microservice_is_not_available

10) read_microservices_logs

11) service_endpoint_is_initialized

12) create_deployment

13) delete_deployment

14) scale_deployment

15) deployment_available_and_healthy

16) deployment_fully_available

17) deployment_not_fully_available

18) cordon_node

19) create_node

20) delete_nodes

21) drain_nodes

22) uncordon_node

23) get_nodes

24) delete_pods

25) exec_in_pods

26) terminate_pods

27) count_pods

28) pod_is_not_available

29) pods_in_conditions

30) pods_in_phase

31) pods_not_in_phase

32) read_pod_logs

33) delete_replica_set

34) create_service_endpoint

35) delete_service

36) service_is_initialized

37) create_statefulset

38) remove_statefulset

39) scale_statefulset

40) statefulset_fully_available

41) statefulset_not_fully_available

42) create_cluster_custom_object

43) create_custom_object

44) delete_cluster_custom_object

45) delete_custom_object

46) patch_cluster_custom_object

47) patch_custom_object

48) replace_cluster_custom_object

49) replace_custom_object

50) get_cluster_custom_object

51) get_custom_object

52) list_cluster_custom_objects

53) list_custom_objects
Activity (0 to escape): 24 # Here I choose the 24th method: delete a POD

!!!DEPRECATED!!!
Do you want to use this action? [y/N]: y # Confirm selection

You now need to fill the arguments for this activity. Default
values will be shown between brackets. You may simply press return
to use it or not set any value.

Argument's value for 'name': DeleteRedisPOD # Name this method
Argument's value for 'ns' [default]: chaosnamespace # Determine the k8s namespace to operate on
Argument's value for 'label_selector' [name in ({name})]: app=redis # Enter the label of the object you want to manipulate so that you can find it
Do you want to select another activity? [y/N]: N # Do you want to add another action? I won't add it here

An experiment may optionally define a set of remedial actions
that are used to rollback the system to a given state.

Do you want to add some rollbacks now? [y/N]: N # Whether to add a rollback action? Here I want to delete the redis POD, because k8s will be automatically pulled up, so I don't need to rollback action

Experiment created and saved in './experiment.json' # Test files are generated

(.bundler) [root@s5 chaostoolkit_scenarios]#

Chaos Run execution case

(.bundler) [root@s5 chaostoolkit_scenarios]# chaos run experiment.json
[2021-06-28 23:03:23 INFO] Validating the experiment's syntax
[2021-06-28 23:03:24 INFO] Experiment looks valid
[2021-06-28 23:03:24 INFO] Running experiment: E2
[2021-06-28 23:03:24 INFO] Steady-state strategy: default
[2021-06-28 23:03:24 INFO] Rollbacks strategy: default
[2021-06-28 23:03:24 INFO] Steady state hypothesis: H2
[2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy
[2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next         releases, please use all_pods_healthy instead
[2021-06-28 23:03:24 INFO] Steady state hypothesis is met!
[2021-06-28 23:03:24 INFO] Playing your experiment's method now...
[2021-06-28 23:03:24 INFO] Action: delete_pods
[2021-06-28 23:03:24 INFO] Steady state hypothesis: H2
[2021-06-28 23:03:24 INFO] Probe: all_microservices_healthy
[2021-06-28 23:03:24 WARNING] all_microservices_healthy function is DEPRECATED and will be removed in the next         releases, please use all_pods_healthy instead
[2021-06-28 23:03:24 INFO] Steady state hypothesis is met!
[2021-06-28 23:03:24 INFO] Let's rollback...
[2021-06-28 23:03:24 INFO] No declared rollbacks, let's move on.
[2021-06-28 23:03:24 INFO] Experiment ended with status: completed
(.bundler) [root@s5 chaostoolkit_scenarios]#

Inspection results

Before performing the test:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide
NAME                                   READY   STATUS    RESTARTS   AGE     IP               NODE   NOMINATED NODE   READINESS GATES

...........................
redis-master-b96c9795b-nqzmr           1/1     Running   0          3d9h    10.100.220.84    s6     <none>           <none>
redis-slave-6b8d456947-6r42k           1/1     Running   0          3d9h    10.100.220.86    s6     <none>           <none>
redis-slave-6b8d456947-z55m5           1/1     Running   0          3d9h    10.100.53.206    s7     <none>           <none>


After performing the test:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide
NAME                                   READY   STATUS              RESTARTS   AGE     IP               NODE   NOMINATED NODE   READINESS GATES

...............................

redis-master-b96c9795b-92rc6           0/1     ContainerCreating   0          3s      <none>           s6     <none>           <none>
redis-master-b96c9795b-nqzmr           0/1     Terminating         0          3d9h    10.100.220.84    s6     <none>           <none>
redis-slave-6b8d456947-5m2xt           0/1     ContainerCreating   0          2s      <none>           s6     <none>           <none>
redis-slave-6b8d456947-6r42k           1/1     Terminating         0          3d9h    10.100.220.86    s6     <none>           <none>
redis-slave-6b8d456947-fj4xc           0/1     ContainerCreating   0          3s      <none>           s7     <none>           <none>
redis-slave-6b8d456947-z55m5           1/1     Terminating         0          3d9h    10.100.53.206    s7     <none>           <none>


POD After full startup:

[root@s5 ~]# kubectl get pods -n chaosnamespace -o wide

NAME                                   READY   STATUS    RESTARTS   AGE     IP               NODE   NOMINATED NODE   READINESS GATES

.......................

redis-master-b96c9795b-92rc6           1/1     Running   0          5m43s   10.100.220.89    s6     <none>           <none>

redis-slave-6b8d456947-5m2xt           1/1     Running   0          5m42s   10.100.220.90    s6     <none>           <none>

redis-slave-6b8d456947-fj4xc           1/1     Running   0          5m43s   10.100.53.211    s7     <none>           <none>

[root@s5 ~]#

It can be seen from the above results that the experiment was successfully implemented. Several redispods were killed and k8s pulled up.

Summary

Today we will write this experiment. You can generate other experiments according to the same steps.

Tags: Python DevOps

Posted on Tue, 28 Sep 2021 00:47:08 -0400 by PeeJay