Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Service start timeout inside container servercore:ltsc2019 since v0.17.0 #946

Closed
gillg opened this issue Feb 21, 2022 · 24 comments
Closed
Labels

Comments

@gillg
Copy link
Contributor

gillg commented Feb 21, 2022

Hello,

I discovered during a test to upgrade, that any version >= 0.17.0 are not more able to start as service inside a container.
If I launch it in CLI it works, but as service it fails.

I highly suspect #863 because even if IsWindowsService is the good practice it seems has been rewrited pretty recently in golang codebase due to bugs.

So maybe a golang update could solve the issue, else adding a workaround to "force" the service mode should be considered.
There is a similar trick on otel collector project : https://github.com/open-telemetry/opentelemetry-collector/blob/7ed3f75ef84d9e9d11b175a0859060f765faca0b/docs/troubleshooting.md#startup-failing-in-windows-docker-containers and used here https://github.com/open-telemetry/opentelemetry-collector/blob/4439e9b49c4de55bdc050ee4928b5b0c79c317cb/cmd/builder/internal/builder/templates/main_windows.go.tmpl#L32

@breed808
Copy link
Contributor

I highly suspect #863 because even if IsWindowsService is the good practice it seems has been rewrited pretty recently in golang codebase due to bugs.

Do you know which Golang release(s) contains these re-writes? I think it would be worth testing newer Golang versions to see if this is fixed. If not, checking an environment variable similar to the opentelemetry-collector link you provided would be an option.

@gillg
Copy link
Contributor Author

gillg commented Mar 12, 2022

I would say why not test the latest golang release... I found the commits some times ago but they are probably dispatched across several releases.
Any concern about using the latest golang version ?

@breed808
Copy link
Contributor

breed808 commented Mar 12, 2022

I don't have an problem with that, but it should be tested against this issue.

I'm not able to run the container with my current setup, would you be OK to build the image using the latest Golang version and test?

@gillg
Copy link
Contributor Author

gillg commented Mar 23, 2022

@breed808 I can test it but I don't succeed to build if for now. Is there some prerequisites to install before launching the makefile ?

@breed808
Copy link
Contributor

breed808 commented Apr 6, 2022

You'll need promu installed to build the executable via the makefile. Building the image will also require Docker or a substitute like Podman.

@LoSunny
Copy link

LoSunny commented May 8, 2022

I've tried to build it with the latest go version v1.18.1, however I still aren't able to run it inside Docker.
Here is the repo: https://github.com/LoSunny/windows_exporter
I've modified the build script in Github Action, as those action won't complete with the default settings.
The build artifacts can be downloaded here: https://github.com/LoSunny/windows_exporter/releases/tag/vtest
Error is the same as issue #962 mentioned above

@breed808
Copy link
Contributor

I've tried to build it with the latest go version v1.18.1, however I still aren't able to run it inside Docker.

To clarify, is the exporter unable to run as a service or via a CLI command?

@hpoznanski
Copy link

hpoznanski commented May 25, 2022

via cli:
C:>windows_exporter.exe
time="2022-05-25T15:26:21Z" level=fatal msg="CreateObject SWbemLocator error: Invalid class string" source="exporter.go:254"

I tried to run in powershell container:lts-nanoserver-1809 to version 0.14 and unfortunately in every case there is the same error. Running from cli - .\windows_exporter.exe

@gillg
Copy link
Contributor Author

gillg commented Sep 21, 2022

Hello,

Sorry for the big delay...
Using the version v0.19.0 I can launch it without any problem as CLI.

PS C:\ChocoTests> & '.\windows_exporter_amd64.exe'
time="2022-09-21T09:59:59+02:00" level=warning msg="No where-clause specified for service collector. This will generate a very large number of metrics!" 
source="service.go:48"
time="2022-09-21T09:59:59+02:00" level=info msg="Running as User Manager\\ContainerAdministrator" source="exporter.go:355"
time="2022-09-21T09:59:59+02:00" level=warning msg="Running as a preconfigured Windows Container user. This may mean you do not have Windows HostProcess 
containers configured correctly and some functionality will not work as expected." source="exporter.go:357"
time="2022-09-21T09:59:59+02:00" level=info msg="Enabled collectors: logical_disk, net, os, service, system, textfile, cpu, cs" source="exporter.go:360" 
time="2022-09-21T09:59:59+02:00" level=info msg="Starting windows_exporter (version=0.19.0, branch=heads/tags/v0.19.0, revision=752d467b123798309c5a57c8b7d47267f2f46565)" source="exporter.go:412"
time="2022-09-21T09:59:59+02:00" level=info msg="Build context (go=go1.18.3, user=runneradmin@fv-az282-285, date=20220723-09:43:37)" source="exporter.go:413"
time="2022-09-21T09:59:59+02:00" level=info msg="Starting server on :9182" source="exporter.go:416"
time="2022-09-21T09:59:59+02:00" level=info msg="TLS is disabled." source="gokit_adapter.go:38"

But if I start the installed service the process seems crash when the process starts :

The Windows_Exporter service failed to start due to the following error: %%1053
A timeout was reached (30000 milliseconds) while waiting for the Windows_Exporter service to connect.

@hpoznanski Using a "nanoserver" is probably not a good idea and by experience the windows version 1809 is far from perfect.
The good containers from microsoft starts at ltsc2019.

@breed808
Copy link
Contributor

@gillg is the container and/or node under load when starting as a service? While unlikely, it could be related to the timeout issue in #551.

@gillg
Copy link
Contributor Author

gillg commented Sep 24, 2022

@breed808 not at all, I just start manually a vanilla container, download and run win exporter on it.
As cli no issue, as service the service times out (because the exporter not really starts)

@breed808
Copy link
Contributor

breed808 commented Oct 2, 2022

Thanks for the info. We may need to document the issue with running the service in a container; I don't use Windows containers so I wouldn't be able to debug this issue.

@gillg
Copy link
Contributor Author

gillg commented Oct 18, 2022

Ah ! I thought it was related to goland fwk itself but I juste discovered working on something else that it's part of the package "x/sys" https://pkg.go.dev/golang.org/x/[email protected]/windows/svc
So because we are targeting a version 0.0.0-snapshot.... we should bump the dependancy to the v0.1.0 !

I take the bets it will solve the issue ! :)

@gillg
Copy link
Contributor Author

gillg commented Oct 18, 2022

@breed808 I can finaly take some time to take a look deeper. For now my tests are not great but at least I'm able to build and launch win exporter in a container !
I keep you in touch

@gillg
Copy link
Contributor Author

gillg commented Oct 18, 2022

OK.... I thought the IsWindowsService() was used, but I just discovered I was completely out of the way !
It has been completely removed by #1046
@jammiemil any thought in the current issue related to your change ?

To summarize, if you launch win_exporter inside a container as CLI it works, but if you launch it as service it crashes, or timeout, or never start (hard to say preceisely)

EDIT: my bad, your PR has never been merged, but you have a commit on master... a5f22eb

@jammiemil
Copy link
Contributor

OK.... I thought the IsWindowsService() was used, but I just discovered I was completely out of the way !
It has been completely removed by #1046
@jammiemil any thought in the current issue related to your change ?

To summarize, if you launch win_exporter inside a container as CLI it works, but if you launch it as service it crashes, or timeout, or never start (hard to say preceisely)

EDIT: my bad, your PR has never been merged, but you have a commit on master... a5f22eb

Have you tried v0.20 as it should include my change which attempts to workaround the issue you seem to be describing by 'starting' the windows service as early as possible rather than waiting for all the dependencies to load.

In a 'typical' Windows server this was happening due to a lack of resources (cpu) to load the dependencies within the 30s timeout foe a Windows service so you may have been having a similar issue in containers?

@jammiemil
Copy link
Contributor

#551 contains a lot of the background on this.

@gillg
Copy link
Contributor Author

gillg commented Oct 18, 2022

Awsome @jammiemil I also face that issue on regular hosts !
Also, when I stop the service, sometimes it seems to be still running, probably because it was detected as crashed and relaunched instead of a basic stop.
I'm currently playing on the master branch, and I trided to bump /x/sys to 0.1.0 instead of 0.0.0-snapshot. But nothing better at this stage... So I'm in a v0.20+

Moreover I have the feeling we never enter in the init() function of initiate package, because I don't see any log like Checking if We are a service

@jammiemil
Copy link
Contributor

Yeah I saw that happen occasionally even with the rejigged init to try to start asap, ultimately the workaround I put in place stops a good chunk of the failures on startup but it can still happen because under certain conditions it can take the underlying golang subroutines more than 30 seconds to start, there's pretty much nothing you can do about that in any particular codebase as far as I can tell, but I will admit my golang abilities are limited so I'm very much open to a more robust solution, I know the guys working on Grafana Agent are trying to come up with something a little more robust than my 'fudge' to resolve the same issue I that repo.

@jammiemil
Copy link
Contributor

Slight correction to my previous comment. the delay is either in the underlying golang subroutines OR the remaining dependencies like sys

@gillg
Copy link
Contributor Author

gillg commented Oct 19, 2022

So ! Thanks a lot @jammiemil for crossing informations here !
That was useful for me but not for the bug itself ^^
BUT ! I found the solution. The Golang framework has some issues with windows world...! It's not the first I encounter.
The function /x/sys/svc.IsAnInteractiveSession() is officialy deprecated because not working well (an interractive CLI inside a container is detected as non interractive for example). But the "correct" function IsWindowsService() seems not return that a service is a service when it runs inside a container... (so win exporter fails as service).

I finaly reverted the initiator to use IsAnInteractiveSession() and reused the same logic implemented in open telemtry collector to have a way to force an interractive mode if needed by an env var NO_WINDOWS_SERVICE.
I would like to go deeper in my troubleshooting to find the root cause in the golang function, but that seems not easy to make tests....

@gillg
Copy link
Contributor Author

gillg commented Oct 19, 2022

I created an issue on the x/sys lib to follow that error golang/go#56335
The fix seems simple, but needs an external eye.

Copy link

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

@dansimov04012022
Copy link

Hi all.

Any clue how I can mitigate this issue? :)

Great thanks in advance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants