Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

x509pop server plugin support for servers trust bundle #5572

Merged
merged 56 commits into from
Dec 16, 2024

Conversation

kfox1111
Copy link
Contributor

Enables the x509pop node attestor server plugin to be configured to use the SPIRE Servers own trust bundle.

Enables the x509pop node attestor server plugin to be
configured to use the SPIRE Servers own trust bundle.

Signed-off-by: Kevin Fox <[email protected]>
Signed-off-by: Kevin Fox <[email protected]>
Copy link
Member

@azdagron azdagron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the bundle, without any sort of restriction, seems scary. Doesn't this imply that any workload can just turn around and attest as a node?

pkg/server/plugin/nodeattestor/x509pop/x509pop.go Outdated Show resolved Hide resolved
}
go func() {
ctx := context.Background()
sleep := 1 * time.Second
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than kick off a goroutine that may or may not be run after the IdentityProvider host service is ready, it would be better to lazily initialize this in the first call to Attest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@kfox1111
Copy link
Contributor Author

Using the bundle, without any sort of restriction, seems scary. Doesn't this imply that any workload can just turn around and attest as a node?

Yeah, that I think can be handled like:

    NodeAttestor "x509pop" {
      plugin_data {
        spire_trust_bundle = true
        # Only allow spiffe identifiers of the form spiffe://<trustDomain>/k8s-spire-agent-helper/<hostname>
        # Generate identifiers that look like spiffe://<trustDomain>/x509pop/k8s-spire-agent/<hostname>
        agent_path_template = "{{ ${d}p := printf \"spiffe://%s/k8s-spire-agent-helper/\" .TrustBundle }}{{ ${d}s := printf \"%s\" (index .Certificate.URIs 0) }}{{ if gt (len ${d}s) (len ${d}p) }}{{ ${d}ps := slice ${d}s 0 (len ${d}p)
 }}{{ if eq ${d}ps ${d}p }}{{ printf \"/x509pop/k8s-spire-agent/%s\" (slice ${d}s (len ${d}p)) }}{{ end }}{{ end }}"
      }
    }

if it agent_path renders out to "", then it should be disallowed.

I did hit:

Trying to get it to work though.

@azdagron azdagron self-assigned this Oct 15, 2024
@azdagron
Copy link
Member

Hey @kfox1111, do you have an arch diagram or something that spells out the use case clearly that you can share (either here or privately in slack with the maintainer group)? Before we consider taking this we'd like to see if there are alternatives.

@kfox1111
Copy link
Contributor Author

kfox1111 commented Oct 18, 2024

Here's the general idea. Got the host -> k8s part working. Still working on the k8s -> vm part.

With this configuration, the spire-agent on the host managed by systemd is the first to start, and establishes the trust chain used by all other parts of the node. The trust never leaves the node. The SPIRE server then really is the bottom turtle in this setup. 🐢

x509pop

@kfox1111
Copy link
Contributor Author

Related to: #5206

@kfox1111
Copy link
Contributor Author

Used with the sprig pr, this works:

    agent_path_template = "{{ $$p := printf \"spiffe://%s/k8s-spire-agent-helper/\" .TrustDomain }}{{ $$s := printf \"%s\" (index .Certificate.URIs 0) }}{{ if hasPrefix $$p $$s }}{{ printf \"/x509pop/k8s-spire-agent/%s\" (trimPrefix $$p $$s) }}{{ else }}{{ fail \"Invalid prefix\" }}{{ end }}"

@kfox1111
Copy link
Contributor Author

kfox1111 commented Oct 19, 2024

Updated diagram

@sorindumitru
Copy link
Contributor

Here's the general idea. Got the host -> k8s part working. Still working on the k8s -> vm part.

With this configuration, the spire-agent on the host managed by systemd is the first to start, and establishes the trust chain used by all other parts of the node. The trust never leaves the node. The SPIRE server then really is the bottom turtle in this setup. 🐢

x509pop

I've had some thoughts about a similar (or maybe the same, I have yet to think it through fully) architecture, chaining trust all the way to the hardware (which would likely be tpms too). Just a +1 that this kind of thing is something that other users might want to do.

My thinking here is that for something like this we might want a separate node attestor (maybe x509svidpop?) because the differences between x509 and x509-SVID matter here. For example you might want to limit which SPIFFE-IDs you allow in some way (e.g. a list of SPIFFE-IDs or at least some kind of validation of the SPIFFE-ID format) as well as extracting some information for the SPIFFE ID to make available to the template.

Another thing that would be useful to me would be to allow having different trust domains, e.g. the physical nodes using TPMs could be a different trust domain than the ones on k8s. In large corporations it's possible that different groups manage those different infrastructures so they may be different trust domains.

@kfox1111
Copy link
Contributor Author

@sorindumitru I think we're pretty much on the same page.

We could make a new x509svidpop plugin but it would end up being almost the same code as the x509pop one. (The patch here is pretty small). So a lot more stuff to maintain.

the svid use case really isn't that different from the normal x509pop one. Either way you probably want some way to filter out which certs are allowed to a subset of all possible ones. That can be done via agent_path_template as done in this pr without any special code in the plugin.

100% agree on the future ability to use some other spire trustDomain too. Was going to propose that in a different patch. The primary use case for me being:

k8s multitenant cluster. Secured with spire. Call this, the resource provider cluster.

Tenant uses resource provider cluster with kubevirt to launch a bunch of vm's for their tenant. Inside, they launch their own spire server, and workloads (maybe even their own k8s cluster in the vms). It would greatly simplify management if they could use x509 certs from the resource provider cluster to node attest to their own spire-server.

That again, would basically be this same x509pop plugin as described here, but getting its trustBundle from disk and keep it refreshed (existing functionality extended to refresh from disk), or get it over vsock (not sure which is best yet)

@sorindumitru
Copy link
Contributor

I agree it's going to be very similar to x509pop, but dealing with that is an implementation detail, I'm sure some parts can be shared. I just don't think that the template language is a good way of dealing with those differences.

@azdagron azdagron removed their assignment Nov 19, 2024
doc/plugin_server_nodeattestor_x509pop.md Outdated Show resolved Hide resolved
doc/plugin_server_nodeattestor_x509pop.md Outdated Show resolved Hide resolved
Comment on lines +72 to +73
if hclConfig.CABundlePath != "" && len(hclConfig.CABundlePaths) > 0 {
status.ReportError("only one of ca_bundle_path or ca_bundle_paths can be configured, not both")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is not the same error happens in line 80?
may we return here instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.... The check here checks if both are configured. The check at 80 ensures at least one is configured. Both are needed I think.

I was thinking status.ReportError was fatal. Is it not? Maybe all the ReportError's here then need retruns?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, it is like if both are configured, you must never get the second one right?
status.Report is not a fatal, but just adding a list of errors, than then are used in configure or verify,
but this approach has a good win than you can get the complete list of errors to resolve all together,
but it makes very hard to follow and see dependency errors...
In any case we are fine for now

pkg/server/plugin/nodeattestor/x509pop/x509pop.go Outdated Show resolved Hide resolved
if len(trustBundles) > 0 {
return util.NewCertPool(trustBundles...), nil
}
return nil, nil
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this happens we'll reject attestations, but it is going to be hard to understand why it is happening (basically we are not going to have a trusted bundle)
May we at least return wartning or error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we log a warning about not currently finding a trust bundle? Its nor fatal as it might show up at some point later?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, I agree but this attestation will fail and mention a verification error
so a warning may help users to understand than that the that attestation failure is cuased by the attestor being unabled to get the trust bundle.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a warning.

@@ -158,7 +160,9 @@ func (s *Suite) TestAttestFailure() {

s.T().Run("not configured", func(t *testing.T) {
attestor := new(nodeattestor.V1)
plugintest.Load(t, BuiltIn(), attestor)
plugintest.Load(t, BuiltIn(), attestor,
Copy link
Collaborator

@MarcosDY MarcosDY Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add test case where no trustedbundle is returned?
and how about defaults?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Struggling a bit with how to write the tests....

Seems like I need to create a new cert in ./test/fixture/nodeattestor/x509pop/generate.go with the url based spiffe id to test with? There are some magic numbers in that file. Are they just arbitrary, or are there some reason for the specific numbers chosen?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through it some more, it looks like they just have patterns in it. so rather arbitrary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I have tests for no trust bundle, along with a valid svid, an invalid svid, and a no svid.

@MarcosDY MarcosDY added this to the 1.11.2 milestone Dec 16, 2024
@MarcosDY MarcosDY merged commit befc54f into spiffe:main Dec 16, 2024
35 checks passed
@kfox1111 kfox1111 deleted the x509pop-trustbundle branch December 16, 2024 13:12
@kfox1111 kfox1111 restored the x509pop-trustbundle branch December 16, 2024 13:12
@kfox1111 kfox1111 deleted the x509pop-trustbundle branch December 16, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants