Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stabilize e2e test case sandbox-basic #962

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

LiboYu2
Copy link
Collaborator

@LiboYu2 LiboYu2 commented Oct 17, 2024

Increased the timeout for steps that frequently fail.
Add some latency to stabilize the cluster before moving to the unsandboxing step.
I ran the test cases locally for 5 times in a roll and they all passed.

@LiboYu2 LiboYu2 changed the title VER-95582 stabilize e2e test case sandbox-basic stabilize e2e test case sandbox-basic Oct 17, 2024
@@ -13,6 +13,7 @@

apiVersion: kuttl.dev/v1beta1
kind: TestSuite
kindNodeCache: true
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allows the downloaded images to be cached on the node. This will speed up test case execution.

@@ -15,3 +15,4 @@ apiVersion: kuttl.dev/v1beta1
kind: TestStep
commands:
- command: bash -c "../../../scripts/wait-for-verticadb-steady-state.sh -n verticadb-operator -t 360 $NAMESPACE"
- command: sleep 120
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this latency to stabilize the cluster before step 60 starts.

Copy link
Collaborator

@roypaulin roypaulin Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to add this sleep. It will take the test longer to complete. If this frequently fails due to insufficient timeout, then just increase the timeout in the script call above.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is different from timeout. The timeout is the maximum wait time for the whole test step to finish. This sleep call gives the cluster some time to stabilize its state before the next step starts. It makes the test run longer but it makes the test pass.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The script on line 17 waits for the operator to be steady(gives the cluster some time to stabilize its state before the next step) meaning there is no error and nothing going on. There is no benefit in adding another wait after. If this step fails and more time is needed, you should increase the time passed as argument to the script.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from that wait-for-verticadb-steady-state.sh

timeout $TIMEOUT bash -c -- "while ! $LOG_CMD | \ grep $WEBHOOK_FILTER | \ grep $DEPRECATION_FILTER | \ grep $VDB_FILTER | \ tail -1 | grep --quiet '\"result\": {\"Requeue\":false,\"RequeueAfter\":0}, \"err\": null'; do sleep 1; done" & pid=$! wait $pid

This is from "man timeout":
timeout - run a command with a time limit

If the script runs longer than the $TIMEOUT, an error will be reported.
If the script finishes within the $TIMEOUT, the next step starts right away.
What I want to achieve is to add a latency between the two steps to make sure the latter
step is not impacted by the previous step. Increasing the timeout will not achieve that.

@cchen-vertica
Copy link
Collaborator

As discussed, we should use an environment variable for all extended timeout values (900), and try to print all the events on vdb.

@roypaulin
Copy link
Collaborator

As discussed, we should use an environment variable for all extended timeout values (900), and try to print all the events on vdb.

What do you mean by "printing all the events on vdb"

1 removed 2 min sleep added previously
2 increased low disk space to avoid low disk volume event
3 bumped the spam filter threshold from 25 to 100 to make sure
  kuttl framework will receive all the events
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants