Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusteradm join fails with "Error: unexpected watch event received" #334

Open
nirs opened this issue May 1, 2023 · 3 comments
Open

clusteradm join fails with "Error: unexpected watch event received" #334

nirs opened this issue May 1, 2023 · 3 comments

Comments

@nirs
Copy link
Contributor

nirs commented May 1, 2023

We get this random error from time to time:

         command: ('clusteradm', 'join', '--hub-token', 'eyJhbGciOiJSUzI1NiIsImtpZCI6ImZ3WWtHcUM1S19IVnNrNXNsWDByY0NIdXNuVjhadXkwUGR1MXgzaEJuRTAifQ.eyJhdWQiOlsiaHR0cHM6Ly9rdWJlcm5ldGVzLmRlZmF1bHQuc3ZjLmNsdXN0ZXIubG9jYWwiXSwiZXhwIjoxNjgyOTU3MzQzLCJpYXQiOjE2ODI5NTM3NDMsImlzcyI6Imh0dHBzOi8va3ViZXJuZXRlcy5kZWZhdWx0LnN2Yy5jbHVzdGVyLmxvY2FsIiwia3ViZXJuZXRlcy5pbyI6eyJuYW1lc3BhY2UiOiJvcGVuLWNsdXN0ZXItbWFuYWdlbWVudCIsInNlcnZpY2VhY2NvdW50Ijp7Im5hbWUiOiJjbHVzdGVyLWJvb3RzdHJhcCIsInVpZCI6IjlkN2E3N2RhLTIzYWItNDcxMy1iYmFiLTdjOGE1NTVlMDQ3MyJ9fSwibmJmIjoxNjgyOTUzNzQzLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6b3Blbi1jbHVzdGVyLW1hbmFnZW1lbnQ6Y2x1c3Rlci1ib290c3RyYXAifQ.BglZLoyyvGKP8vUy7zV4y-SplMroRm0rLMnYUlr1VaZncOJOvAN_oAQfnFAnXtBJ558cjILjqpEZ8cXem0eXNXFY5inq38ArB6Out8jVrZXRvUR62NiBEkJYilT44-4C9Hmao2YxYPIgS461nH_sxHcVoRoaBQExAZcUepQQh7-punljPkK9Igz9cskkBpFY40VwD25Jtgxnt9pl2G7LDSiliAkU-C9dTDpA2Z7Nhm4UG7sHK9-d8d7RkNjw2LOzUwfW-5EXMaTCpjAFcvBqnH_0Hiv3dR5-u4Eb8HcLQogyFnTFnEJP8ZfvddS1YXkaq7S9dO04L3uAoikcaG1ZRQ', '--hub-apiserver', 'https://192.168.122.211:8443', '--cluster-name', 'dr2', '--wait', '--context', 'dr2')
         exitcode: 1
         error:
            Error: unexpected watch event received

I think it only happens when using less reliable network - laptop running minikube clusters
connected via phone cellular network.

Issues:

  • clusteradm should not fail because of unexpected events, it should log them and continue
    to wait for the expected events, until the operation times out.
  • clusteradm must include the unexpected event details in the error message
@mikeshng
Copy link
Member

mikeshng commented May 2, 2023

This is probably related to the --wait flag because it needs to maintain that network connection to keep monitoring the progress.

To work around this issue for now, you can remove the wait flag and the join command will work async and should return successfully right away.

CC @ycyaoxdu

@nirs
Copy link
Contributor Author

nirs commented May 2, 2023

@mikeshng How can we wait for completion without --wait? Do we have a condition or status we can wait for?

@mikeshng
Copy link
Member

mikeshng commented May 2, 2023

@nirs in the join command code, it seems to be waiting for the klusterlet registration agent pod to be running.
https://github.com/open-cluster-management-io/clusteradm/blob/main/pkg/cmd/join/exec.go#L398-L443
I think this is due to the join command is running against the managed cluster context.

For a watch that is outside of the clusteradm join command, maybe it's better to:

  • run the join command on the managed cluster
  • watch for the registration agent pod running status on the managed cluster
  • switch context to the hub cluster
  • watch for the creation of the ManagedCluster resource on the hub cluster: kubectl get managedcluster cluster-name-here
  • watch for the creation of the CSR on the hub cluster: kubectl get csr -l open-cluster-management.io/cluster-name=cluster-name-here

When both the ManagedCluster and CSR resources are created on the hub cluster, that's when you know the join command ran successfully.

mresvanis added a commit to mresvanis/kernel-module-management that referenced this issue May 9, 2023
This change updates the CI e2e-hub test to wait for the `clusteradm join`
command to succeed asynchronously. Waiting for the latter synchronously
leads some times errors like the following:

```log
+ clusteradm join --hub-token *** --hub-apiserver \
    https://192.168.49.2:8443 --wait --cluster-name minikube
  Error: no matches for kind "Klusterlet" in version "operator.open-cluster-management.io/v1"
```

It also removes the unneeded `curl` installation. It also remove the
`apt-get upgrade` that could lead to more reliability issues on the CI
e2e steps.

open-cluster-management-io/clusteradm#334 (comment)

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
mresvanis added a commit to mresvanis/kernel-module-management that referenced this issue May 9, 2023
This change updates the CI e2e-hub test to wait for the `clusteradm join`
command to succeed asynchronously. Waiting for the latter synchronously
leads some times errors like the following:

```log
+ clusteradm join --hub-token *** --hub-apiserver \
    https://192.168.49.2:8443 --wait --cluster-name minikube
  Error: no matches for kind "Klusterlet" in version "operator.open-cluster-management.io/v1"
```

It also removes the unneeded `curl` installation. It also remove the
`apt-get upgrade` that could lead to more reliability issues on the CI
e2e steps.

open-cluster-management-io/clusteradm#334 (comment)

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
mresvanis added a commit to mresvanis/kernel-module-management that referenced this issue May 9, 2023
This change updates the CI e2e-hub test to wait for the `clusteradm join`
command to succeed asynchronously. Waiting for the latter synchronously
leads some times errors like the following:

```log
+ clusteradm join --hub-token *** --hub-apiserver \
    https://192.168.49.2:8443 --wait --cluster-name minikube
  Error: no matches for kind "Klusterlet" in version "operator.open-cluster-management.io/v1"
```

It also removes the unneeded `curl` installation. It also remove the
`apt-get upgrade` that could lead to more reliability issues on the CI
e2e steps.

open-cluster-management-io/clusteradm#334 (comment)

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
mresvanis added a commit to mresvanis/kernel-module-management that referenced this issue May 9, 2023
This change updates the CI e2e-hub test to wait for the `clusteradm join`
command to succeed asynchronously. Waiting for the latter synchronously
leads some times errors like the following:

```log
+ clusteradm join --hub-token *** --hub-apiserver \
    https://192.168.49.2:8443 --wait --cluster-name minikube
  Error: no matches for kind "Klusterlet" in version "operator.open-cluster-management.io/v1"
```

It also removes the unneeded `curl` installation. It also remove the
`apt-get upgrade` that could lead to more reliability issues on the CI
e2e steps.

open-cluster-management-io/clusteradm#334 (comment)

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
mresvanis added a commit to mresvanis/kernel-module-management that referenced this issue May 10, 2023
This change updates the CI e2e-hub test to wait for the `clusteradm join`
command to succeed asynchronously. Waiting for the latter synchronously
leads some times errors like the following:

```log
+ clusteradm join --hub-token *** --hub-apiserver \
    https://192.168.49.2:8443 --wait --cluster-name minikube
  Error: no matches for kind "Klusterlet" in version "operator.open-cluster-management.io/v1"
```

It also removes the unneeded `curl` installation. It also remove the
`apt-get upgrade` that could lead to more reliability issues on the CI
e2e steps.

open-cluster-management-io/clusteradm#334 (comment)

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
mresvanis added a commit to mresvanis/kernel-module-management that referenced this issue May 10, 2023
This change updates the CI e2e-hub test to wait for the `clusteradm join`
command to succeed asynchronously. Waiting for the latter synchronously
leads some times errors like the following:

```log
+ clusteradm join --hub-token *** --hub-apiserver \
    https://192.168.49.2:8443 --wait --cluster-name minikube
  Error: no matches for kind "Klusterlet" in version "operator.open-cluster-management.io/v1"
```

It also removes the unneeded `curl` installation. It also remove the
`apt-get upgrade` that could lead to more reliability issues on the CI
e2e steps.

open-cluster-management-io/clusteradm#334 (comment)

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
qbarrand pushed a commit to kubernetes-sigs/kernel-module-management that referenced this issue May 10, 2023
This change updates the CI e2e-hub test to wait for the `clusteradm join`
command to succeed asynchronously. Waiting for the latter synchronously
leads some times errors like the following:

```log
+ clusteradm join --hub-token *** --hub-apiserver \
    https://192.168.49.2:8443 --wait --cluster-name minikube
  Error: no matches for kind "Klusterlet" in version "operator.open-cluster-management.io/v1"
```

It also removes the unneeded `curl` installation. It also remove the
`apt-get upgrade` that could lead to more reliability issues on the CI
e2e steps.

open-cluster-management-io/clusteradm#334 (comment)

Signed-off-by: Michail Resvanis <mresvani@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants