Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[POC] Developer Documentation Revamp using mkdocs #14179

Closed
wants to merge 10 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions .github/workflows/dev_docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Development Docs

on:
pull_request:
push:

jobs:
mkdocs:
runs-on: ubuntu-latest
name: mkdocs
steps:
- uses: actions/checkout@v2

- uses: actions/setup-python@v2
with:
python-version: "3.8"

- name: Install Dependencies
run: |
pip install -r requirements/requirements_devdocs.txt


- name: Lint the files
run: |
cd docs
errors=0
for file in `find -type f -name '*.md'`; do
./lint_awx_doc.py $file || errors=$((errors+1))
done
if [[ $errors -ne 0 ]]; then
echo "Encountered $errors errors"
exit 1
fi

- name: Build the site
run: |
mkdocs build --clean

# TODO: Deploy the site somewhere (gh pages?)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading up on the limitations of this... github pages seem to want an index.html at the root level. Since this would be an artifact of the build process, we probably don't want to keep that in source ourselves. Then the URL would be somewhat limited, as we could have maybe http://ansible.github.io/awx, which would maybe be okay, maybe not.

Given what other projects are doing...

I would just assume we go with readthedocs.io

3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -165,3 +165,6 @@ use_dev_supervisor.txt

awx/ui_next/src
awx/ui_next/build

# Docs build
/site/
8 changes: 8 additions & 0 deletions docs/_tags.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Index
---

For convenience, the AWX developer documentation is categorized into tags.
You can find a list of tags and documents associated with them, here.

[TAGS]
124 changes: 64 additions & 60 deletions docs/clustering.md → docs/architecture/clustering.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,56 @@
---
tags:
- architecture
- internals
---
# AWX Clustering Overview

AWX supports multi-node configurations. Here is an example configuration with two control plane nodes.

```
┌───────────────────────────┐
│ Load-balancer │
│ (configured separately) │
└───┬───────────────────┬───┘
│ round robin API │
▼ requests ▼

AWX Control AWX Control
Node 1 Node 2
┌──────────────┐ ┌──────────────┐
│ │ │ │
│ ┌──────────┐ │ │ ┌──────────┐ │
│ │ awx-task │ │ │ │ awx-task │ │
│ ├──────────┤ │ │ ├──────────┤ │
│ │ awx-ee │ │ │ │ awx-ee │ │
│ ├──────────┤ │ │ ├──────────┤ │
│ │ awx-web │ │ │ │ awx-web │ │
│ ├──────────┤ │ │ ├──────────┤ │
│ │ redis │ │ │ │ redis │ │
│ └──────────┘ │ │ └──────────┘ │
│ │ │ │
└──────────────┴─────┬─────┴──────────────┘
┌─────▼─────┐
│ Postgres │
│ database │
└───────────┘
```mermaid
flowchart TB
lb("Load-balancer (configured separately)")

subgraph node1[AWX Control Node 1]
t1("awx-task")
e1("awx-ee")
w1("awx-web")
r1("redis")
end

subgraph node2[AWX Control Node 2]
t2("awx-task")
e2("awx-ee")
w2("awx-web")
r2("redis")
end

subgraph Postgres
p("database")
end

lb --> node1
lb --> node2
node1 --> p
node2 --> p
```

There are two main deployment types, virtual machines (VM) or K8S. Ansible Automation Platform (AAP) can be installed via VM or K8S deployments. The upstream AWX project can only be installed via a K8S deployment. Either deployment type supports cluster scaling.

- Control plane nodes run a number of background services that are managed by supervisord
- dispatcher
- wsbroadcast
- callback receiver
- receptor (*managed under systemd)
- redis (*managed under systemd)
- uwsgi
- daphne
- rsyslog
- dispatcher
- wsbroadcast
- callback receiver
- receptor (*managed under systemd)
- redis (*managed under systemd)
- uwsgi
- daphne
- rsyslog
- For K8S deployments, these background processes are containerized
- `awx-ee`: receptor
- `awx-web`: uwsgi, daphne, wsbroadcast, rsyslog
- `awx-task`: dispatcher, callback receiver
- `redis`: redis
- `awx-ee`: receptor
- `awx-web`: uwsgi, daphne, wsbroadcast, rsyslog
- `awx-task`: dispatcher, callback receiver
- `redis`: redis
- Each control node is monolithic and contains all the necessary components for handling API requests and running jobs.
- A load balancer in front of the cluster can handle incoming web requests and send them control nodes based on load balancing rules (e.g. round robin)
- All control nodes on the cluster interact single, shared Postgres database
Expand All @@ -61,10 +64,11 @@ For K8s deployments, scaling up is handled by changing the number of replicas in
After scaling up, the new control plane node is registered in the database as a new `Instance`.

Instance types:
`hybrid` (AAP only) - control plane node that can also run jobs
`control` - control plane node that cannot run jobs
`execution` - not a control node, this instance can only run jobs
`hop` (AAP only) - not a control node, this instance serves to route traffic from control nodes to execution nodes

- `hybrid` (AAP only) - control plane node that can also run jobs
- `control` - control plane node that cannot run jobs
- `execution` - not a control node, this instance can only run jobs
- `hop` (AAP only) - not a control node, this instance serves to route traffic from control nodes to execution nodes

Note, hybrid (AAP only) and control nodes are identical other than the `type` indicated in the database. `control`-type nodes still have all the machinery to run jobs, but are disabled through the API. The reason is that users may wish to provision control nodes with less hardware resources, and have a separate fleet of nodes to run jobs (i.e. execution nodes).

Expand Down Expand Up @@ -137,27 +141,27 @@ Node health is determined by the `cluster_node_heartbeat`. This is a periodic ta

1. Get a list of instances registered to the database.
2. `inspect_execution_nodes` looks at each execution node
a. get a DB advisory lock so that only a single control plane node runs this inspection at given time.
b. set `last_seen` based on Receptor's own heartbeat system
- Each node on the Receptor mesh sends advertisements out to other nodes. The `Time` field in this payload can be used to set `last_seen`
c. use `receptorctl status` to gather node information advertised on the Receptor mesh
d. run `execution_node_health_check`
- This is an async task submitted to the dispatcher and attempts to run `ansible-runner --worker-info` against that node
- This command will return important information about the node's hardware resources like CPU cores, total memory, and ansible-runner version
- This information will be used to calculate capacity for that instance
1. get a DB advisory lock so that only a single control plane node runs this inspection at given time.
2. set `last_seen` based on Receptor's own heartbeat system
- Each node on the Receptor mesh sends advertisements out to other nodes. The `Time` field in this payload can be used to set `last_seen`
3. use `receptorctl status` to gather node information advertised on the Receptor mesh
4. run `execution_node_health_check`
- This is an async task submitted to the dispatcher and attempts to run `ansible-runner --worker-info` against that node
- This command will return important information about the node's hardware resources like CPU cores, total memory, and ansible-runner version
- This information will be used to calculate capacity for that instance
3. Determine if other nodes are lost based the `last_seen` value determined in step 2
a. `grace_period = settings.CLUSTER_NODE_HEARTBEAT_PERIOD * settings.CLUSTER_NODE_MISSED_HEARTBEAT_TOLERANCE`
b. if `last_seen` is before this grace period, mark instance as lost
1. `grace_period = settings.CLUSTER_NODE_HEARTBEAT_PERIOD * settings.CLUSTER_NODE_MISSED_HEARTBEAT_TOLERANCE`
2. if `last_seen` is before this grace period, mark instance as lost
4. Determine if *this* node is lost and run `local_health_check`
a. call `get_cpu_count` and `get_mem_in_bytes` directly from ansible-runner, which is what `ansible-runner --worker-info` calls under the hood
1. call `get_cpu_count` and `get_mem_in_bytes` directly from ansible-runner, which is what `ansible-runner --worker-info` calls under the hood
5. If *this* instance was not found in the database, register it
6. Compare *this* node's ansible-runner version with that of other instances
a. if this version is older, call `stop_local_services` which shuts down itself
1. if this version is older, call `stop_local_services` which shuts down itself
7. For other instances marked as lost (step 3)
a. reap running, pending, and waiting jobs on that instance (mark them as failed)
b. delete instance from DB instance list
1. reap running, pending, and waiting jobs on that instance (mark them as failed)
2. delete instance from DB instance list
8. `cluster_node_heartbeat` is called from the dispatcher, and the dispatcher parent process passes `worker_tasks` data to this method
a. reap local jobs that are not active (that is, no dispatcher worker is actively processing it)
1. reap local jobs that are not active (that is, no dispatcher worker is actively processing it)

## Instance groups

Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
tags:
- architecture
---
# Execution Environments

All jobs use container isolation for environment consistency and security.
Expand Down
9 changes: 7 additions & 2 deletions docs/websockets.md → docs/architecture/websockets.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
---
tags:
- architecture
- internals
---
# Websockets

AWX uses websockets to update the UI in realtime as events happen within the
Expand Down Expand Up @@ -50,7 +55,7 @@ The notable modules for this component are:

## Before the web/task split

<img src="img/websockets-old.png">
![Old Design](../img/websockets-old.png)

Consider a Kubernetes deployment of AWX. Before the web task split, each pod had
a web container, a task container, and a redis container (and possibly others,
Expand All @@ -73,7 +78,7 @@ own pod-local Redis for django-channels to process it.

## Current Implementation

<img src="img/websockets-new.png">
![New Design](../img/websockets-new.png)

In the post web/task split world, web and task containers live in entirely
independent pods, each with their own Redis. The former `wsbroadcast` has been
Expand Down
10 changes: 8 additions & 2 deletions docs/tower_configuration.md → docs/configuration.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,14 @@
---
hide:
- toc
tags:
- development
---
AWX configuration gives AWX users the ability to adjust multiple runtime parameters of AWX, which enables much more fine-grained control over AWX runs.

## Usage manual

#### To Use:
#### To Use
The REST endpoint for CRUD operations against AWX configurations can be found at `/api/v2/settings/`. GETing to that endpoint will return a list of available AWX configuration categories and their URLs, such as `"system": "/api/v2/settings/system/"`. The URL given to each category is the endpoint for CRUD operations against individual settings under that category.

Here is a typical AWX configuration category GET response:
Expand All @@ -29,7 +35,7 @@ X-API-Time: 0.026s

The returned body is a JSON of key-value pairs, where the key is the name of the AWX configuration setting, and the value is the value of that setting. To update the settings, simply update setting values and PUT/PATCH to the same endpoint.

#### To Develop:
#### To Develop
Each Django app in AWX should have a `conf.py` file where related settings get registered. Below is the general format for `conf.py`:

```python
Expand Down
60 changes: 0 additions & 60 deletions docs/container_groups.md

This file was deleted.

42 changes: 0 additions & 42 deletions docs/container_groups/service-account.yml

This file was deleted.

1 change: 0 additions & 1 deletion docs/credentials/README.md

This file was deleted.

4 changes: 4 additions & 0 deletions docs/credentials/credential_plugins.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
tags:
- credentials
---
Credential Plugins
==================

Expand Down
4 changes: 4 additions & 0 deletions docs/credentials/custom_credential_types.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
---
tags:
- credentials
---
Custom Credential Types Overview
================================

Expand Down
Loading
Loading