Availability: the proportion of time a system is in a functioning condition.
Robustness: the degree to which a system or component can function correctly in the presence of invalid inputs or stressful environmental conditions.
Reliability: the ability (probability) of a system or component to function under stated conditions for a specified period of time.
For Danube release, OPNFV testing group decides to extend the OPNFV testing scope by introducing basic stress test cases. Principles, requirements and test cases are discussed in Etherpad. Bottlenecks and Yardstick projects jointly implements some of the test cases while Bottlenecks project acts as the load manager calling yardstick to do the tests. These tests which are not Danube specific are aiming at testing the system for its breaking points and provide level of confidence of the system to users.
Stress testing is a software testing activity that determines the robustness of software by testing beyond the limits of normal operation. Stress testing is particularly important for "mission critical" software, but is used for all types of software. Stress tests commonly put a greater emphasis on robustness, availability, and error handling under a heavy load, than on what would be considered correct behavior under normal circumstances.
Generally, stress testing tries to break the system in either positive or negative way. It tests for heavy loads and breaking points, i.e., the limit/bottleneck for the system. However, stress testing does not break the system for pleasure. By providing PASS or FAIL result. It will provide level of confidence of the system to users.
A) It also allows observation of how system react to the failures. Additional purpose behind this madness is to make sure that the system fails and recovers gracefully - a quality known as recoverability. A list of questions could be raised to examine it recoverability:
- What is the first thing crashed, how and why?
- Does it save its state or does it crash suddenly?
- Does it just hang and freeze or does it fail gracefully?
- Could the system/component recover smoothly?
- On restart, is it able to recover from the last good state?
- Does it print out meaningful error messages to the user, or does it merely display incomprehensible hex codes?
- Is the security of the system compromised because of unexpected failures?
- The list goes on.
Stress Testing for OPNFV
For OPNFV, stress test could find breaking points for deployments and feedback bugs. In this way to accelerate the maturity and promote the level of confidence of the platform.
More could be found in Euphrates Testing needs in Testperf proposed by Morgan requesting for long duration pod for stable release qualification.
Test Cases in Discussion
In the Etherpad, first stress test cases are discussed in Danube. A list of these test cases are provided below which are focusing on performing data-plane traffic and life-cycle event testing.
for a virtual or bare metal POD
TC1 –Determine baseline for throughput
TC2 - Determine baseline for CPU limit
for VM pairs/stacks
TC3 – Perform life-cycle events for ping
TC4 – Perform life-cycle events for throughput
TC5 – Perform life-cycle events for CPU limit
These test cases are illustrated below to better understand the testing mechanism behind them.
During Euphrates Release, more stress tests are planned (DRAFT, DETAILS ARE UNDER DISCUSSION):
B. StorPerf & VSPerf
- Scale out until maximum throughput reached
- Test of different Nova schedulers
- Run VSPERF and record numbers
- Run StorPerf and record numbers
- Run both at the same time and compare numbers.
Implementation of Stress Testing
For Danube, 2 test cases has been implemented and merged into OPNFV CI pipeline.
(TC1) Baseline testing for throughput: measure the system throughput under increasing traffic stress for a single user (a virtual or bare metal POD).
(TC3) Life-cycle event for ping: by concurrently increasing the number of life-cycle events of multi-users (VM pairs/Stacks), it measure the stability of the system under large number of concurrent requests/traffic.
(TC1) Baseline testing for throughput
(TC3) Life-cycle event for ping
|Num of Stack||12||16||20|
|demo0-a739cc95||228 secs||28 secs||demo0-51a3c47e||271 secs||25 secs||demo0-b5436968||305 secs||42 secs|
|demo0-140007f4||303 secs||29 secs||demo0-c32d6136||251 secs||31 secs||demo0-8ef0cf99||362 secs||33 secs|
|demo0-31c3bd47||419 secs||25 secs||demo0-f279f41e||238 secs||28 secs||demo0-f83d563a||225 secs||39 secs|
|demo0-28410038||279 secs||24 secs||demo0-809f9803||129 secs||39 secs||demo0-c8893740||209 secs||25 secs|
|demo0-3a13f557||443 secs||27 secs||demo0-1c277b6e||316 secs||20 secs||demo0-f7e7c460||299 secs||36 secs|
|demo0-ac6dbe45||418 secs||27 secs||demo0-9335d944||245 secs||24 secs||demo0-3d969441||204 secs||28 secs|
|demo0-6c7624aa||271 secs||21 secs||demo0-cb969ba3||138 secs||35 secs||demo0-754d034c||362 secs||22 secs|
|demo0-30e4a687||399 secs||20 secs||demo0-a5d1337c||146 secs||29 secs||demo0-19fa74f0||170 secs||39 secs|
|demo0-7b9c6cad||135 secs||34 secs||demo0-82b6ef21||220 secs||20 secs||demo0-f1dee3f4||343 secs||22 secs|
|demo0-824d4098||136 secs||32 secs||demo0-e0e32c78||179 secs||22 secs||demo0-a369add3||307 secs||45 secs|
|demo0-530f94fe||290 secs||29 secs||demo0-f70d81b1||112 secs||44 secs||demo0-f5563f25||299 secs||42 secs|
|demo0-b2a2942d||362 secs||22 secs||demo0-c5dbd4f1||113 secs||44 secs||demo0-c92bd889||321 secs||40 secs|
|demo0-4522322b||279 secs||33 secs||demo0-99f0665c||205 secs||28 secs|
|demo0-1665df83||268 secs||27 secs||demo0-ac007ea2||311 secs||25 secs|
|demo0-65e974bf||127 secs||26 secs||demo0-682e54a0||345 secs||38 secs|
|demo0-869c4cbc||320 secs||29 secs||demo0-c98826b1||277 secs||20 secs|
|demo0-b2586df1||198 secs||35 secs|
|demo0-e617af13||299 secs||31 secs|
|demo0-d180f8b2||281 secs||22 secs|
|demo0-ff700296||326 secs||24 secs|
Upstream Stress Testing Projects
Upstream stress testing projects deal with different systems/components
OPNFV Paris Plugfest (#3)
OPNFV Summit 2017 - Main Summit Talk
- Konstantin Benz, VM Reliability Tester: A tool for measuring cloud reliability of OpenStack virtual machines using Python, 2015