-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Concurrent Scale Up Down workloads files and docs #89
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# Concurrent Scale Up Down Workload | ||
|
||
The Concurrent Scale Up Down workload playbook is `workloads/concurrent-scale-up-down.yml` and will run the Concurrent Scale Up Down workload on your cluster. | ||
|
||
Concurrent Scale Up Down workload is used to test scaling up the quickstart apps and then scaling them down on OpenShift. | ||
|
||
Requirements: | ||
* kubeadmin-password and api url | ||
|
||
Running from CLI: | ||
|
||
```sh | ||
$ cp workloads/inventory.example inventory | ||
$ # Add orchestration host to inventory | ||
$ # Edit vars in workloads/vars/concurrent-scale-up-down.yml or define Environment vars (See below) | ||
$ time ansible-playbook -vv -i inventory workloads/concurrent-scale-up-down.yml | ||
``` | ||
|
||
## Environment variables | ||
|
||
### PUBLIC_KEY | ||
Default: `~/.ssh/id_rsa.pub` | ||
Public ssh key file for Ansible. | ||
|
||
### PRIVATE_KEY | ||
Default: `~/.ssh/id_rsa` | ||
Private ssh key file for Ansible. | ||
|
||
### ORCHESTRATION_USER | ||
Default: `root` | ||
User for Ansible to log in as. Must authenticate with PUBLIC_KEY/PRIVATE_KEY. | ||
|
||
### WORKLOAD_IMAGE | ||
Default: `quay.io/openshift-scale/scale-ci-workload` | ||
Container image that runs the workload script. | ||
|
||
### WORKLOAD_JOB_NODE_SELECTOR | ||
Default: `false` | ||
Enables/disables the node selector that places the workload job on the `workload` node. | ||
|
||
### WORKLOAD_JOB_TAINT | ||
Default: `false` | ||
Enables/disables the toleration on the workload job to permit the `workload` taint. | ||
|
||
### WORKLOAD_JOB_PRIVILEGED | ||
Default: `true` | ||
Enables/disables running the workload pod as privileged. | ||
|
||
### KUBECONFIG_FILE | ||
Default: `~/.kube/config` | ||
Location of kubeconfig on orchestration host. | ||
|
||
### PBENCH_INSTRUMENTATION | ||
Default: `false` | ||
Enables/disables running the workload wrapped by pbench-user-benchmark. When enabled, pbench agents can then be enabled (`ENABLE_PBENCH_AGENTS`) for further instrumentation data and pbench-copy-results can be enabled (`ENABLE_PBENCH_COPY`) to export captured data for further analysis. | ||
|
||
### ENABLE_PBENCH_AGENTS | ||
Default: `false` | ||
Enables/disables the collection of pbench data on the pbench agent Pods. These Pods are deployed by the tooling playbook. | ||
|
||
### ENABLE_PBENCH_COPY | ||
Default: `false` | ||
Enables/disables the copying of pbench data to a remote results server for further analysis. | ||
|
||
### PBENCH_SSH_PRIVATE_KEY_FILE | ||
Default: `~/.ssh/id_rsa` | ||
Location of ssh private key to authenticate to the pbench results server. | ||
|
||
### PBENCH_SSH_PUBLIC_KEY_FILE | ||
Default: `~/.ssh/id_rsa.pub` | ||
Location of the ssh public key to authenticate to the pbench results server. | ||
|
||
### PBENCH_SERVER | ||
Default: There is no public default. | ||
DNS address of the pbench results server. | ||
|
||
### SCALE_CI_RESULTS_TOKEN | ||
Default: There is no public default. | ||
Future use for pbench and prometheus scraper to place results into git repo that holds results data. | ||
|
||
### JOB_COMPLETION_POLL_ATTEMPTS | ||
Default: `3600` | ||
Number of retries for Ansible to poll if the workload job has completed. Poll attempts delay 10s between polls with some additional time taken for each polling action depending on the orchestration host setup. | ||
|
||
### CONCURRENT_SCALEUP_API_URL | ||
Default: There is no default value. | ||
This is api-server URL and port, youcan obtain by grepping for server the kubeconfig file. | ||
|
||
### CONCURRENT_SCALEUP_KUBEADMIN_PASSWD | ||
Default: There is no default value. | ||
The kubeadmin-password from the Openshift installs. | ||
|
||
### CONCURRENT_SCALEUP_TEST_PREFIX | ||
Default: `concurrent-scale-up-down` | ||
The pbench results directory name for when pbench is enabled | ||
|
||
|
||
## Smoke test variables | ||
|
||
``` | ||
CONCURRENT_SCALEUP_TEST_PREFIX=concurrent-scale-up-down | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,131 @@ | ||
--- | ||
# | ||
# Runs concurrent-scale-up-down on OpenShift 4.x cluster | ||
# | ||
|
||
- name: Runs concurrent-scale-up-down on a RHCOS OpenShift cluster | ||
hosts: orchestration | ||
gather_facts: true | ||
remote_user: "{{orchestration_user}}" | ||
vars_files: | ||
- vars/concurrent-scale-up-down.yml | ||
vars: | ||
workload_job: "concurrent-scale-up-down" | ||
tasks: | ||
- name: Create scale-ci-tooling directory | ||
file: | ||
path: "{{ansible_user_dir}}/scale-ci-tooling" | ||
state: directory | ||
|
||
- name: Copy workload files | ||
copy: | ||
src: "{{item.src}}" | ||
dest: "{{item.dest}}" | ||
with_items: | ||
- src: scale-ci-tooling-ns.yml | ||
dest: "{{ansible_user_dir}}/scale-ci-tooling/scale-ci-tooling-ns.yml" | ||
- src: workload-concurrent-scale-up-down-script-cm.yml | ||
dest: "{{ansible_user_dir}}/scale-ci-tooling/workload-concurrent-scale-up-down-script-cm.yml" | ||
|
||
- name: Slurp kubeconfig file | ||
slurp: | ||
src: "{{kubeconfig_file}}" | ||
register: kubeconfig_file_slurp | ||
|
||
- name: Slurp ssh private key file | ||
slurp: | ||
src: "{{pbench_ssh_private_key_file}}" | ||
register: pbench_ssh_private_key_file_slurp | ||
|
||
- name: Slurp ssh public key file | ||
slurp: | ||
src: "{{pbench_ssh_public_key_file}}" | ||
register: pbench_ssh_public_key_file_slurp | ||
|
||
- name: Template workload templates | ||
template: | ||
src: "{{item.src}}" | ||
dest: "{{item.dest}}" | ||
with_items: | ||
- src: pbench-cm.yml.j2 | ||
dest: "{{ansible_user_dir}}/scale-ci-tooling/pbench-cm.yml" | ||
- src: pbench-ssh-secret.yml.j2 | ||
dest: "{{ansible_user_dir}}/scale-ci-tooling/pbench-ssh-secret.yml" | ||
- src: kubeconfig-secret.yml.j2 | ||
dest: "{{ansible_user_dir}}/scale-ci-tooling/kubeconfig-secret.yml" | ||
- src: workload-job.yml.j2 | ||
dest: "{{ansible_user_dir}}/scale-ci-tooling/workload-job.yml" | ||
- src: workload-env.yml.j2 | ||
dest: "{{ansible_user_dir}}/scale-ci-tooling/workload-concurrent-scale-up-down-env.yml" | ||
|
||
- name: Check if scale-ci-tooling namespace exists | ||
shell: | | ||
oc get project scale-ci-tooling | ||
ignore_errors: true | ||
changed_when: false | ||
register: scale_ci_tooling_ns_exists | ||
|
||
- name: Ensure any stale scale-ci-concurrent-scale-up-down job is deleted | ||
shell: | | ||
oc delete job scale-ci-concurrent-scale-up-down -n scale-ci-tooling | ||
register: scale_ci_tooling_project | ||
failed_when: scale_ci_tooling_project.rc == 0 | ||
until: scale_ci_tooling_project.rc == 1 | ||
retries: 60 | ||
delay: 1 | ||
when: scale_ci_tooling_ns_exists.rc == 0 | ||
|
||
- name: Block for non-existing tooling namespace | ||
block: | ||
- name: Create tooling namespace | ||
shell: | | ||
oc create -f {{ansible_user_dir}}/scale-ci-tooling/scale-ci-tooling-ns.yml | ||
|
||
- name: Create tooling service account | ||
shell: | | ||
oc create serviceaccount useroot -n scale-ci-tooling | ||
oc adm policy add-scc-to-user privileged -z useroot -n scale-ci-tooling | ||
when: enable_pbench_agents|bool | ||
when: scale_ci_tooling_ns_exists.rc != 0 | ||
|
||
- name: Create/replace kubeconfig secret | ||
shell: | | ||
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/kubeconfig-secret.yml" | ||
|
||
- name: Create/replace the pbench configmap | ||
shell: | | ||
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/pbench-cm.yml" | ||
|
||
- name: Create/replace pbench ssh secret | ||
shell: | | ||
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/pbench-ssh-secret.yml" | ||
|
||
- name: Create/replace workload script configmap | ||
shell: | | ||
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/workload-concurrent-scale-up-down-script-cm.yml" | ||
|
||
- name: Create/replace workload script environment configmap | ||
shell: | | ||
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/workload-concurrent-scale-up-down-env.yml" | ||
|
||
- name: Create/replace workload job to that runs workload script | ||
shell: | | ||
oc replace --force -n scale-ci-tooling -f "{{ansible_user_dir}}/scale-ci-tooling/workload-job.yml" | ||
|
||
- name: Poll until job pod is running | ||
shell: | | ||
oc get pods --selector=job-name=scale-ci-concurrent-scale-up-down -n scale-ci-tooling -o json | ||
register: pod_json | ||
retries: 60 | ||
delay: 2 | ||
until: pod_json.stdout | from_json | json_query('items[0].status.phase==`Running`') | ||
|
||
- name: Poll until job is complete | ||
shell: | | ||
oc get job scale-ci-concurrent-scale-up-down -n scale-ci-tooling -o json | ||
register: job_json | ||
retries: "{{job_completion_poll_attempts}}" | ||
delay: 10 | ||
until: job_json.stdout | from_json | json_query('status.succeeded==`1` || status.failed==`1`') | ||
failed_when: job_json.stdout | from_json | json_query('status.succeeded==`1`') == false | ||
when: job_completion_poll_attempts|int > 0 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
apiVersion: v1 | ||
kind: ConfigMap | ||
metadata: | ||
name: scale-ci-workload-script | ||
data: | ||
run.sh: | | ||
#!/bin/sh | ||
set -eo pipefail | ||
workload_log() { echo "$(date -u) $@" >&2; } | ||
export -f workload_log | ||
workload_log "Configuring pbench for Concurrent Scale Up Down" | ||
mkdir -p /var/lib/pbench-agent/tools-default/ | ||
echo "${USER_NAME:-default}:x:$(id -u):0:${USER_NAME:-default} user:${HOME}:/sbin/nologin" >> /etc/passwd | ||
if [ "${ENABLE_PBENCH_AGENTS}" = true ]; then | ||
echo "" > /var/lib/pbench-agent/tools-default/disk | ||
echo "" > /var/lib/pbench-agent/tools-default/iostat | ||
echo "workload" > /var/lib/pbench-agent/tools-default/label | ||
echo "" > /var/lib/pbench-agent/tools-default/mpstat | ||
echo "" > /var/lib/pbench-agent/tools-default/oc | ||
echo "" > /var/lib/pbench-agent/tools-default/perf | ||
echo "" > /var/lib/pbench-agent/tools-default/pidstat | ||
echo "" > /var/lib/pbench-agent/tools-default/sar | ||
master_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/master= --no-headers | awk '{print $1}'` | ||
for node in $master_nodes; do | ||
echo "master" > /var/lib/pbench-agent/tools-default/remote@$node | ||
done | ||
infra_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/infra= --no-headers | awk '{print $1}'` | ||
for node in $infra_nodes; do | ||
echo "infra" > /var/lib/pbench-agent/tools-default/remote@$node | ||
done | ||
worker_nodes=`oc get nodes -l pbench_agent=true,node-role.kubernetes.io/worker= --no-headers | awk '{print $1}'` | ||
for node in $worker_nodes; do | ||
echo "worker" > /var/lib/pbench-agent/tools-default/remote@$node | ||
done | ||
fi | ||
source /opt/pbench-agent/profile | ||
workload_log "Done configuring pbench Concurrent Scale Up Down" | ||
|
||
workload_log "Running Concurrent Scale Up Down workload" | ||
if [ "${PBENCH_INSTRUMENTATION}" = "true" ]; then | ||
pbench-user-benchmark -- sh /root/workload/workload.sh | ||
result_dir="/var/lib/pbench-agent/$(ls -t /var/lib/pbench-agent/ | grep "pbench-user" | head -1)"/1/sample1 | ||
if [ "${ENABLE_PBENCH_COPY}" = "true" ]; then | ||
pbench-copy-results --prefix ${CONCURRENT_SCALEUP_TEST_PREFIX} | ||
fi | ||
else | ||
sh /root/workload/workload.sh | ||
result_dir=/tmp | ||
fi | ||
|
||
workload_log "Completed Concurrent Scale Up Down workload run" | ||
|
||
workload_log "Checking Test Results" | ||
workload_log "Checking ansible-playbook scale_up_complete.yaml execution exit code : ${exit_code}" | ||
|
||
if [ "$(jq '.exit_code==0' ${result_dir}/exit.json)" = "false" ]; then | ||
workload_log "Concurrent Scale Up Down Failure" | ||
workload_log "Test Analysis: Failed" | ||
exit 1 | ||
fi | ||
# TODO: Check pbench-agent collected metrics for Pass/Fail | ||
# TODO: Check prometheus collected metrics for Pass/Fail | ||
workload_log "Test Analysis: Passed" | ||
|
||
workload.sh: | | ||
#!/bin/sh | ||
set -o pipefail | ||
|
||
result_dir=/tmp | ||
if [ "${PBENCH_INSTRUMENTATION}" = "true" ]; then | ||
result_dir=${benchmark_results_dir} | ||
fi | ||
|
||
# Save a local copy of the read-only ~/.kube/config | ||
cp ${KUBECONFIG} /tmp/kube_config | ||
ls -ltr /tmp | ||
export KUBECONFIG=/tmp/kube_config | ||
echo ${KUBECONFIG} | ||
|
||
# git clone svt repo in /root | ||
cd /root | ||
git clone https://github.com/openshift/svt.git | ||
cd svt | ||
git status | ||
cd /root/svt/openshift_performance/ci/content | ||
ls -ltr | ||
|
||
start_time=$(date +%s) | ||
my_time=$(date +%Y-%m-%d-%H%M) | ||
|
||
ansible-playbook -vv -e api_url=${CONCURRENT_SCALEUP_API_URL} -e login_username="kubeadmin" -e login_passwd=${CONCURRENT_SCALEUP_KUBEADMIN_PASSWD} scale_up_complete.yaml 2>&1 | tee /tmp/output_scale_up_complete-${my_time}.log | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Think we need to parameterize the login username as well? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wabouhamad Suggest updating the PR to use my new changes which take advantage of existing KUBECONFIG and which avoid the need for login |
||
|
||
exit_code=$? | ||
end_time=$(date +%s) | ||
duration=$((end_time-start_time)) | ||
workload_log "Test duration was: ${duration}" | ||
|
||
workload_log "/tmp/cluster_loader.err : $(cat /tmp/cluster_loader.err)" | ||
workload_log "/tmp/cluster_loader.out : $(cat /tmp/cluster_loader.out)" | ||
workload_log "/tmp/check_app.err : $(cat /tmp/check_app.err)" | ||
workload_log "/tmp/check_app.out : $(cat /tmp/check_app.out)" | ||
workload_log "/tmp/scale_test.err : $(cat /tmp/scale_test.err)" | ||
workload_log "/tmp/scale_test.out : $(cat /tmp/scale_test.out)" | ||
|
||
workload_log "Writing ansible-playbook scale_up_complete.yaml execution exit code : ${exit_code}" | ||
jq -n '. | ."exit_code"='${exit_code}' | ."duration"='${duration}'' > "${result_dir}/exit.json" | ||
workload_log "Finished workload script" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How many nodes do we need at the minimum to run the workload?