Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature auto-heal action #120

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 48 additions & 0 deletions .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#
name: Create and publish a Docker image

# Configures this workflow to run every time a change is pushed to the branch called `release`.
on:
push:
branches: ['main']

# Defines two custom environment variables for the workflow. These are used for the Container registry domain, and a name for the Docker image that this workflow builds.
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

# There is a single job in this workflow. It's configured to run on the latest available version of Ubuntu.
jobs:
build-and-push-image:
runs-on: ubuntu-latest
# Sets the permissions granted to the `GITHUB_TOKEN` for the actions in this job.
permissions:
contents: read
packages: write
#
steps:
- name: Checkout repository
uses: actions/checkout@v4
# Uses the `docker/login-action` action to log in to the Container registry registry using the account and password that will publish the packages. Once published, the packages are scoped to the account defined here.
- name: Log in to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# This step uses [docker/metadata-action](https://github.com/docker/metadata-action#about) to extract tags and labels that will be applied to the specified image. The `id` "meta" allows the output of this step to be referenced in a subsequent step. The `images` value provides the base name for the tags and labels.
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
# This step uses the `docker/build-push-action` action to build the image, based on your repository's `Dockerfile`. If the build succeeds, it pushes the image to GitHub Packages.
# It uses the `context` parameter to define the build's context as the set of files located in the specified path. For more information, see "[Usage](https://github.com/docker/build-push-action#usage)" in the README of the `docker/build-push-action` repository.
# It uses the `tags` and `labels` parameters to tag and label the image with the output from the "meta" step.
- name: Build and push Docker image
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
50 changes: 0 additions & 50 deletions .github/workflows/github-build.yml

This file was deleted.

5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,13 @@ ENV AUTOHEAL_CONTAINER_LABEL=autoheal \
WEBHOOK_URL="" \
WEBHOOK_JSON_KEY="content" \
APPRISE_URL="" \
POST_RESTART_SCRIPT=""
POST_ACTION_SCRIPT=""\
AUTOHEAL_ACTION="RESTART"

COPY docker-entrypoint /

HEALTHCHECK --interval=5s CMD pgrep -f autoheal || exit 1

ENTRYPOINT ["/docker-entrypoint"]

CMD ["autoheal"]
CMD ["autoheal"]
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ AUTOHEAL_CONTAINER_LABEL=autoheal
AUTOHEAL_INTERVAL=5 # check every 5 seconds
AUTOHEAL_START_PERIOD=0 # wait 0 seconds before first health check
AUTOHEAL_DEFAULT_STOP_TIMEOUT=10 # Docker waits max 10 seconds (the Docker default) for a container to stop before killing during restarts (container overridable via label, see below)
AUTOHEAL_ACTION=RESTART # Action to process unhealthy containers ('RESTART' or 'STOP')
DOCKER_SOCK=/var/run/docker.sock # Unix socket for curl requests to Docker API
CURL_TIMEOUT=30 # --max-time seconds for curl requests to Docker API
WEBHOOK_URL="" # post message to the webhook if a container was restarted (or restart failed)
Expand Down
61 changes: 47 additions & 14 deletions docker-entrypoint
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ AUTOHEAL_CONTAINER_LABEL=${AUTOHEAL_CONTAINER_LABEL:-autoheal}
AUTOHEAL_START_PERIOD=${AUTOHEAL_START_PERIOD:-0}
AUTOHEAL_INTERVAL=${AUTOHEAL_INTERVAL:-5}
AUTOHEAL_DEFAULT_STOP_TIMEOUT=${AUTOHEAL_DEFAULT_STOP_TIMEOUT:-10}
AUTOHEAL_ACTION=${AUTOHEAL_ACTION:-RESTART}

docker_curl() {
curl --max-time "${CURL_TIMEOUT}" --no-buffer -s \
Expand Down Expand Up @@ -61,6 +62,46 @@ restart_container() {
docker_curl -f -X POST "${HTTP_ENDPOINT}/containers/${container_id}/restart?t=${timeout}"
}

remove_container(){
local container_id="$1"
local timeout="$2"

docker_curl -f -X POST "${HTTP_ENDPOINT}/containers/${container_id}/stop?t=${timeout}"
docker_curl -f -X DEL "${HTTP_ENDPOINT}/containers/${container_id}?force=true"
sleep 10s
}

manage_unhealthy_container(){

if [ "$AUTOHEAL_ACTION" = "STOP" ]
then

echo "$DATE Container $CONTAINER_NAME (${CONTAINER_SHORT_ID}) found to be unhealthy - Stopping container now"
if ! remove_container "$CONTAINER_ID" "$TIMEOUT"
then
echo "$DATE Stopping container $CONTAINER_SHORT_ID failed" >&2
notify_webhook "Container ${CONTAINER_NAME:1} (${CONTAINER_SHORT_ID}) found to be unhealthy. Failed to stop the container!" &
else
notify_webhook "Container ${CONTAINER_NAME:1} (${CONTAINER_SHORT_ID}) found to be unhealthy. Successfully stopped the container!" &
fi
notify_post_action_script "$CONTAINER_NAME" "$CONTAINER_SHORT_ID" "$CONTAINER_STATE" &

else

echo "$DATE Container $CONTAINER_NAME (${CONTAINER_SHORT_ID}) found to be unhealthy - Restarting container now with ${TIMEOUT}s timeout"
if ! restart_container "$CONTAINER_ID" "$TIMEOUT"
then
echo "$DATE Restarting container $CONTAINER_SHORT_ID failed" >&2
notify_webhook "Container ${CONTAINER_NAME:1} (${CONTAINER_SHORT_ID}) found to be unhealthy. Failed to restart the container!" &
else
notify_webhook "Container ${CONTAINER_NAME:1} (${CONTAINER_SHORT_ID}) found to be unhealthy. Successfully restarted the container!" &
fi
notify_post_action_script "$CONTAINER_NAME" "$CONTAINER_SHORT_ID" "$CONTAINER_STATE" "$TIMEOUT" &

fi

}

notify_webhook() {
local text="$@"

Expand All @@ -77,11 +118,11 @@ notify_webhook() {
fi
}

notify_post_restart_script() {
if [ -n "$POST_RESTART_SCRIPT" ]
notify_post_action_script() {
if [ -n "$POST_ACTION_SCRIPT" ]
then
# execute post restart script as background process to prevent healer from blocking
$POST_RESTART_SCRIPT "$@" &
$POST_ACTION_SCRIPT "$@" &
fi
}

Expand Down Expand Up @@ -141,20 +182,12 @@ then

if [ "$CONTAINER_NAME" = "null" ]
then
echo "$DATE Container name of (${CONTAINER_SHORT_ID}) is null, which implies container does not exist - don't restart" >&2
echo "$DATE Container name of (${CONTAINER_SHORT_ID}) is null, which implies container does not exist - no action" >&2
elif [ "$CONTAINER_STATE" = "restarting" ]
then
echo "$DATE Container $CONTAINER_NAME (${CONTAINER_SHORT_ID}) found to be restarting - don't restart"
echo "$DATE Container $CONTAINER_NAME (${CONTAINER_SHORT_ID}) found to be restarting - no action"
else
echo "$DATE Container $CONTAINER_NAME (${CONTAINER_SHORT_ID}) found to be unhealthy - Restarting container now with ${TIMEOUT}s timeout"
if ! restart_container "$CONTAINER_ID" "$TIMEOUT"
then
echo "$DATE Restarting container $CONTAINER_SHORT_ID failed" >&2
notify_webhook "Container ${CONTAINER_NAME:1} (${CONTAINER_SHORT_ID}) found to be unhealthy. Failed to restart the container!" &
else
notify_webhook "Container ${CONTAINER_NAME:1} (${CONTAINER_SHORT_ID}) found to be unhealthy. Successfully restarted the container!" &
fi
notify_post_restart_script "$CONTAINER_NAME" "$CONTAINER_SHORT_ID" "$CONTAINER_STATE" "$TIMEOUT" &
manage_unhealthy_container
fi
done
sleep "$AUTOHEAL_INTERVAL" &
Expand Down