Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node 259 can't provision deployments #2496

Open
Omarabdul3ziz opened this issue Nov 20, 2024 · 4 comments
Open

node 259 can't provision deployments #2496

Omarabdul3ziz opened this issue Nov 20, 2024 · 4 comments
Assignees
Labels
type_bug Something isn't working
Milestone

Comments

@Omarabdul3ziz
Copy link
Contributor

Describe the bug

deployment with zdb workload failed with timeout after about 10m

To Reproduce

try deploy zdb on node 259 on devnet

Screenshots

TimeoutError: TimeoutError: Deployment with contract_id: 172829 failed to be ready after 10 minutes.
@Omarabdul3ziz Omarabdul3ziz self-assigned this Nov 20, 2024
@Omarabdul3ziz Omarabdul3ziz added the type_bug Something isn't working label Nov 20, 2024
@Omarabdul3ziz Omarabdul3ziz added this to the 3.13 milestone Nov 20, 2024
@Omarabdul3ziz
Copy link
Contributor Author

trying to reproduce on a virtual node and here is what happened

  • the deployment looks succeeded, i got the myc IP and i manage to connect to it
  • but the node logs shows this
    image
    but the file already exists
    image

but when i tried to deploy on node 259, looks the workload doesn't get ready even after about 15 min.
also looks the sock file is exists and the zdb instance is running
image
but couldn't get an idea what is happening there, it misses debug logs

i will try to change the log level there to figure out what is happening

@Omarabdul3ziz
Copy link
Contributor Author

Update

  • it is not a zdb issue, it is a node issue
    not only zdb deployments fails, deploying network/vms also fails after waiting about 10m, contract is created on the chain but the node didn't start the deployment.
  • it is not a global zos4 issue
    other devnet zos4 nodes (249, and my local node) works fine. only this node doesn't
  • also it is not specific to a client, it happens in go/ts client

i see nothing more we could debug in the current state

Debug = "zos-debug"

this is a kernel flag that will help viewing debug logs, but will require a reboot for the node

@Omarabdul3ziz Omarabdul3ziz changed the title zdb failed to start on zos4 node node 259 can't provision deployments Nov 21, 2024
@Omarabdul3ziz
Copy link
Contributor Author

just noticed this in the contracts list
image

same code made a deployment type

  • vm on node 259 which is failing
  • vm-light on node 249 which succeeded

so it is either

  • the node report as zos4 and it is not
  • the (both) client agreed to send zos3 to the node and it is zos4

the node looks good running as zos4
image
and also repot with the light features https://gridproxy.dev.grid.tf/nodes/259

more debugging next week :D

@Eslam-Nawara
Copy link
Contributor

tried to deploy on the same node and had a problem deploying on it, it couldn't complete any deployment it always times out with no log showing the problem

tried to restart provisiond and deployed a vm, I was able to successfully deploy a vm of type vm-light and network-light network on the node and is accessible through mycelium

image

I think this means the client sent a valid deployment with valid metadata but still not sure what happened that resulted of a deployment of type vm instead of vm-light before,

maybe an old version of the client was used, or other client than the go client

I agree with Omar we need to reboot the node with the debug flag to get more info about the problems causing the node to not be able to provision

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type_bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants