Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sys/gen: generated files are too big #5542

Open
tarasmadan opened this issue Nov 26, 2024 · 5 comments
Open

sys/gen: generated files are too big #5542

tarasmadan opened this issue Nov 26, 2024 · 5 comments

Comments

@tarasmadan
Copy link
Collaborator

tarasmadan commented Nov 26, 2024

Is your feature request related to a problem? Please describe.
We currently generate <10 go files, ~8.8MB each.

These files are the problem for some analysis tools.

Describe the solution you'd like

  1. Generate smaller files?
  2. Generate other machine readable format?
  3. Syz-sysgen takes only 7 seconds to generate *.go files for all platforms on my machine. Do we need it at all?
    • Reading the descriptions directly from disk/db we'll enable live descriptions modification. How to validate their correctness and store long-term are another questions.
@dvyukov
Copy link
Collaborator

dvyukov commented Nov 27, 2024

Things will get worse after #5545.

@a-nogikh
Copy link
Collaborator

  1. go:embed the descriptions and parse them at the start of the syzkaller binary.

@tarasmadan
Copy link
Collaborator Author

Thanks. I've learned about embed.FS.

@a-nogikh
Copy link
Collaborator

a-nogikh commented Nov 28, 2024

FTR go:embed keeps data uncompressed, but it will likely not be a huge problem for us.

$ find sys/linux -type f \( -iname "*.txt" -o -iname "*.const" \) -exec du -ch {} +
< ... >
3.6M    total

But if we also decide to include the seeds, that's already going to be troublesome:

$ du -sh sys/linux/test/
46M     sys/linux/test/

That being said, we will likely still need to parse the descriptions at compile time because we auto-generate parts of syz-executor code in sys-syzgen

writeFile(filepath.Join(*outDir, "executor", "defs.h"), buf.Bytes())
buf.Reset()
if err := syscallsTempl.Execute(buf, data); err != nil {
tool.Failf("failed to execute syscalls template: %v", err)
}
writeFile(filepath.Join(*outDir, "executor", "syscalls.h"), buf.Bytes())

var defsTempl = template.Must(template.New("").Parse(`// AUTOGENERATED FILE
struct call_attrs_t { {{range $attr := $.CallAttrs}}
uint64_t {{$attr}};{{end}}
};
struct call_props_t { {{range $attr := $.CallProps}}
{{$attr.Type}} {{$attr.Name}};{{end}}
};
#define read_call_props_t(var, reader) { \{{range $attr := $.CallProps}}
(var).{{$attr.Name}} = ({{$attr.Type}})(reader); \{{end}}
}

@dvyukov
Copy link
Collaborator

dvyukov commented Dec 2, 2024

I would go with go:embed for descriptions+consts for now. Executor stills needs generated files, seeds can be handled as they are now for now.

Theoretically we can make generate step that will produce a compressed archive, and then we embed the archive. But I would start w/o it. syz-manager is quite big already, and it's not causing problems.

To avoid compilation on start we could serialize what's currently in generated files as json, and then simply deserialize it on start. All generated structures use only public fields of prog types, so it should all work out of the box. Instead of having compiler create these global vars, we deserialize them from json and call prog.RegisterTarget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants