Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Log actions #1777

Closed
wants to merge 1 commit into from

Conversation

frozenprocess
Copy link
Contributor

Product Version(s):

Issue:
Log actions doesn’t have a clear example.
Ebpf log actions are now supported.

@frozenprocess frozenprocess requested a review from a team as a code owner November 20, 2024 01:12
Copy link

netlify bot commented Nov 20, 2024

Deploy Preview for calico-docs-preview-next ready!

Name Link
🔨 Latest commit 3d8fd8d
🔍 Latest deploy log https://app.netlify.com/sites/calico-docs-preview-next/deploys/6765f2be88a8ad000816ff6e
😎 Deploy Preview https://deploy-preview-1777--calico-docs-preview-next.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

netlify bot commented Nov 20, 2024

Deploy Preview for tigera failed. Why did it fail? →

Built without sensitive environment variables

Name Link
🔨 Latest commit 3d8fd8d
🔍 Latest deploy log https://app.netlify.com/sites/tigera/deploys/6765f2be8b7c0c00081474b1

Copy link
Contributor

@tomastigera tomastigera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohhh I fogot to click to submit my earlier feedback :doh:

# Use Log action to debug policies

## Big picture
Calico Log rules are powerful resources that can be implemented in an environment to log the matching traffic and allowing you to determine how your policies are behaving.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log rules are powerful resources that can be implemented in an environment

what do you mean by that? It is enough to say that you can use it to see how other rules are behaving.

calico/network-policy/policy-rules/log-rules.mdx Outdated Show resolved Hide resolved
calico/network-policy/policy-rules/log-rules.mdx Outdated Show resolved Hide resolved
calico/network-policy/policy-rules/log-rules.mdx Outdated Show resolved Hide resolved
calico/network-policy/policy-rules/log-rules.mdx Outdated Show resolved Hide resolved
calico/network-policy/policy-rules/log-rules.mdx Outdated Show resolved Hide resolved
calico/network-policy/policy-rules/log-rules.mdx Outdated Show resolved Hide resolved
Copy link
Member

@fasaxc fasaxc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please can you cross-link it with the bpf troubleshooting page/next steps links on the BPF install page

@frozenprocess frozenprocess force-pushed the log-actions branch 3 times, most recently from 3e82feb to 0153fbd Compare November 22, 2024 23:52
- Remove the not supported `log` from eBPF pages
- Add an explanation how and when `log` action should be used
- bpf log format svg
@frozenprocess
Copy link
Contributor Author

Hey, any updates on this?

@ctauchen
Copy link
Collaborator

@frozenprocess Yes, I've spent some time with this but was planning to get notes to you Monday when you're back from holiday ...

@frozenprocess
Copy link
Contributor Author

@ctauchen if changes are not that severe let's merge this and I'll raise another PR to address them.

Copy link
Collaborator

@ctauchen ctauchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! We badly needed information about log rules. Here's a few things to get you started.

The main things for this review are high-level structural items. If you have any questions, let me know. I've commented only on the vNext log-rules.mdx.

Editing prose in Github is made far simpler when the code is written with only one sentence per line. Usually doable with a quick regex.

---
# Use Log action to debug policies

## Big picture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't use the old template anymore. Based on what I'm seeing, you should aim for a structure like this:

# title

quick summary about this stuff

## About log actions

Here you say what they are, how they work, and who should use them.

## Prerequisites

## Do this procedure

## Do that procedure

## Additional resources

Copy link
Contributor Author

@frozenprocess frozenprocess Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a template file or an example that we should follow which I missed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't, but there should be. I really should publicize this better.

$[prodname] has a unique `Log` action that provides traffic observability and logging which is missing in the standard Kubernetes Network Policy. This unique action can be used by security teams and admins to troubleshoot their policies and make sure that their cluster security posture is doing what its expected to do.


## Requirements
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's scrap the tab structure. Something more direct, like this:

## Prerequisites
* Your cluster is not a containerized Kubernetes environment, such as Kind.
* If you're running $[prodname] with the eBPF dataplane:
   * You installed $[prodname] 3.29 or later.
   * Your cluster node or VM host is running on a Linux distribution based on the Linux kernel 5.16 or later.

Copy link
Contributor Author

@frozenprocess frozenprocess Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we talking about removing tabs from the whole document or just this section?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the whole doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe tabs are essential given that logs are different in each dataplane and we are trying to make eBPF more visible in the docs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see what you mean. You're right that we need to clearly present the differences that occur with the two datap lanes. But I don't see that there's a good reason to use tabs in this case. The information can be mixed (as in the case of the prerequisites) or organized with simpler methods. Like paragraphs and headings.

- $[prodname] 3.29 and above
- Linux Kernel 5.16 and above

eBPF policy logs depend on eBPF printing capabilities introduced in Linux Kernel 5.16 although they may have been backported to your distros kernel. If you encounter a scenario where logs only display the traffic verdict (e.g., ALLOWED or DENIED) without detailed information about IP packets, it indicates that your system is running on an older kernel version lacking the necessary capabilities.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, but unnecessary. We can with just the version number.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused are we talking about Kernel or Calico version number here?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, grabbed too many lines. Just meant line 23, which explains the prerequisite on line 21.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 23 and 21 are two different contexts. Line 21 is what you should run, and line 23 is if you are running a version and it's not working (Keep in mind distros can have different builds) and what is the cause.
In line 23 we say what is the function that has been invoked in the Kernel that makes everything possible.

Comment on lines +33 to +40
:::

The $[prodname] Log action creates a LOG rule in iptables. Any logs matching this rule are recorded by the kernel's logging service, usually through syslog.

The following suffix, is an example of iptables LOG ruled programmed by $[prodname]:
```bash
-j LOG --log-prefix "calico-packet: " --log-level 5
```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, interesting, but not necessary for the user's main goal here.

</Tabs>

## Concepts
A $[prodname] policy with the action `Log` serves as a diagnostic tool that captures and logs network traffic information based on the rules and criteria specified. By leveraging the `Log` action, administrators gain insight into how traffic is evaluated by their policies, enabling them to debug, refine, and validate their network security posture.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed, moved up as indicated above.

For the 'about' section, I think it would be good to back up a bit and explain the basics of what's going on here. Specifically, explain how 'Log' works with 'Allow' and 'Deny', where it goes in the yaml, how it's part of .spec.ingress or .spec.egress.

I think it's important to explain that Allow or Deny rules give no direct feedback. You can test whether the rules work as you like, but you can't see what's actually happening.

Adding a log rule with the same configuration will show you detailed information about the actual traffic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would there also be some use in explaining how to use log rules to test policy without implementing it? A DIY staged policy for open source?

Comment on lines +124 to +128
:::caution
For a complete and efficient observability solution, consider exploring the use of [Flow Logs](/calico-enterprise/latest/visibility/visualize-traffic) available in Calico Enterprise and Calico Cloud.

Using log rules is not be the most effective way to achieve comprehensive observability. While network observability can be achieved by utilizing log actions, this approach may introduce significant performance overhead.
:::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can add links like this to Additional resources at the bottom of the doc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the link to the "Additional resources" and keep this one too?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just additional resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the idea here is if someone reads "flowlogs," which is a Calico enterprise thing, they can immediately click on it and go to the correspondent page to learn more.
Removing the hyperlink from that sentence and moving it only to the additional resources would make what we are talking about less visible since there will be no indication of a link at the bottom of the page for that footnote when the user is reading the sentence and when our user gets there they might not even remember what flowlogs was ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add these sorts of things to additional resources, where they are available for folks to check out if they're interested. But without unnecessarily distracting them from their main job: understanding log actions and how to use them.

I see that I wasn't entirely clear in my earlier note. It's not just the link. This admonition needs to go away. Best to condense to 1-2 sentences and add to additional resources.

- action: Log
```

### Logging in a namespace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Logging in a namespace
### Log all traffic for a namespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch thanks!

By strategically placing log actions at the bottom of your policy tiers, you can capture unhandled traffic and identify gaps in your policy configuration. Analyze the logged traffic to gather insights about source, destination, and protocol, and use this data to create targeted policies. Once implemented, validate by checking if the corresponding traffic is no longer logged, indicating the new policies are effective.


### Logging an unprotected cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### Logging an unprotected cluster
### Log all traffic for an unprotected cluster

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch thanks!

- action: Log
```

### Logging all traffic in a cluster
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log traffic for non-namespaced resources, hosts, and VMs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

Comment on lines +190 to +197
By default, $[prodname] policies apply only to namespaced resources (e.g., Pods, ServiceAccounts). To achieve total protection, enable hostEndpoint support to include non-namespaced resources and host nodes.

To log all traffic involving host endpoints, enable auto-creation of host endpoints:
```bash
kubectl patch kubecontrollersconfiguration default --type=merge --patch='{"spec": {"controllers": {"node": {"hostEndpoint": {"autoCreate": "Enabled"}}}}}'
```

Please ensure you disable policy logging by removing the Log policies once you have finalized your environment security, as it can impact cluster performance.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems very important. You should make the full sequence clear from the beginning.

I wonder whether we should have a section somewhere about lifecycle for log actions. Is it best to set and forget? Only run for a set period? What about the time range, data allowances, and so on? Even if this isn't the place for all that detail, it may be worth mentioning and linking from Additional resources.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in the 1on1, a new user should not try to enable the host endpoint right at the bat since it may break their cluster if they don't know what they are doing. Moving this step any higher would create a lot of support cases like I enabled logs and it broke my cluster.
In my opinion, the flow of discovery should be as it is written on the page.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not suggesting a change to the actual workflow. What I'm saying is that it would be better for users if you first explain what the workflow is, and then show them how to do it.

In particular, I'm interested in the idea that you suggest here at the end. It seems that, at least in some cases, log rules are best used on a temporary basis. You set them, figure something else, and then remove them. That's an idea that would be good to present up at the top.

<Tabs groupId='log-rules'>
<TabItem label="eBPF" value="eBPF">

eBPF policy logs are sent to the trace pipe and can be viewed by using the following command:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this might be good to be broken out as a procedure, no?

View your logs, find your logs, retrieve logs, etc.

If there are different commands per dataplane, then duplicate the procedure section:

Find your logs (eBPF)
Find your logs (iptables)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to this, all the information about the logs themselves should have its own section up with the other conceptual information. Log format reference, or something like that.

- $[prodname] 3.29 and above
- Linux Kernel 5.16 and above

eBPF policy logs depend on eBPF printing capabilities introduced in Linux Kernel 5.16 although they may have been backported to your distros kernel. If you encounter a scenario where logs only display the traffic verdict (e.g., ALLOWED or DENIED) without detailed information about IP packets, it indicates that your system is running on an older kernel version lacking the necessary capabilities.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, grabbed too many lines. Just meant line 23, which explains the prerequisite on line 21.

$[prodname] has a unique `Log` action that provides traffic observability and logging which is missing in the standard Kubernetes Network Policy. This unique action can be used by security teams and admins to troubleshoot their policies and make sure that their cluster security posture is doing what its expected to do.


## Requirements
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the whole doc.

---
# Use Log action to debug policies

## Big picture
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't, but there should be. I really should publicize this better.

Comment on lines +124 to +128
:::caution
For a complete and efficient observability solution, consider exploring the use of [Flow Logs](/calico-enterprise/latest/visibility/visualize-traffic) available in Calico Enterprise and Calico Cloud.

Using log rules is not be the most effective way to achieve comprehensive observability. While network observability can be achieved by utilizing log actions, this approach may introduce significant performance overhead.
:::
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just additional resources.

@ctauchen
Copy link
Collaborator

Reopening in #1827

@ctauchen ctauchen closed this Dec 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants