-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use salsa accumulators for diagnostics #14760
base: main
Are you sure you want to change the base?
Conversation
7db3778
to
1f6247c
Compare
CodSpeed Performance ReportMerging #14760 will degrade performances by 10.43%Comparing Summary
Benchmarks breakdown
|
|
1d26d75
to
040f4d9
Compare
5cbbfc2
to
e0f43ee
Compare
e0f43ee
to
35f3815
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The use of report_
everywhere makes me feel like I'm reading the pyright codebase haha. But in most cases it seems like it probably is the best verb to use. Still, there's a couple of places where I think we could use some better names:
pub struct CompileDiagnostic(std::sync::Arc<dyn Diagnostic>); | ||
|
||
impl CompileDiagnostic { | ||
pub fn report<T>(db: &dyn Db, diagnostic: T) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is report
the best verb here? It feels to me like we're adding a diagnostic to be reported later, rather than reporting it to the user immediately. Maybe this could be called CompileDiagnostic::add()
?
format_args!("Name `{id}` used when not defined"), | ||
); | ||
/// Reports a diagnostic for the given node and file if diagnostic reporting is enabled for this file. | ||
pub(crate) fn report_type_diagnostic( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
similarly here, add_type_diagnostic
would be a more natural name for me
Hmm, I'm looking into suppressions right now, specifically how we'd recognize unused suppression comments. Another alternative to emitting the diagnostic is to emit a |
diagnostic | ||
})); | ||
let mut diagnostics = check_types::accumulated::<CompileDiagnostic>(db, test_file.file); | ||
// Filter out diagnostics that are not related to the current file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only tangentially related: do we have a feeling for how many unrelated-file diagnostics we generate typically? If this would be a large fraction of all diagnostics, we should probably detect that earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be any number of diagnostics and salsa doesn't provide a way to detect this earlier.
Salsa accumulators work by traversing the entire dependency tree of a query. In our case, this is check_file
. check_file
can branch out into arbitrary files when resolving imports (and checking those files in turn). The only way for us to truncate this earlier is by not using accumulators because salsa doesn't know about the file boundaries.
I don't think this is very terrible because the CLI uses check_workspace
to get all diagnostics. The LSP uses check_file
today but I don't think it should. The wasm playground uses check_file
... this is probably fine
use crate::Db; | ||
|
||
type AnnotationParseResult = Result<Parsed<ModExpression>, TypeCheckDiagnostics>; | ||
type AnnotationParseResult = Result<Parsed<ModExpression>, ()>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: Something like
type AnnotationParseResult = Result<Parsed<ModExpression>, ()>; | |
struct StringAnnotationParseError; | |
type AnnotationParseResult = Result<Parsed<ModExpression>, StringAnnotationParseError>; |
would maybe make match
es at the call sites of this function a bit easier to understand
diagnostics.sort_unstable_by(|a, b| { | ||
let a_file = a.file(); | ||
let b_file = b.file(); | ||
|
||
a_file.cmp(&b_file).then_with(|| { | ||
a_file | ||
.path(db) | ||
.as_str() | ||
.cmp(b_file.path(db).as_str()) | ||
.then_with(|| { | ||
a.range() | ||
.unwrap_or_default() | ||
.start() | ||
.cmp(&b.range().unwrap_or_default().start()) | ||
}) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few questions:
- Do we really want an unstable sort here? Two different diagnostics could compare equal according to the ordering defined here if they refer to the same range, or even just the same range start.
- Why do we order by file and then by the path of the file? Can multiple files refer to the same path?
- If performance is not a major concern here, we could potentially do something like
diagnostics.sort_unstable_by_key(|diagnostic| { let file = diagnostic.file(); let path = file.path(db).as_str(); let range_start = diagnostic.range().unwrap_or_default().start(); (file, path, range_start) });
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we order by file and then by the path of the file? Can multiple files refer to the same path?
Oh, that's just wrong. The idea was to short-cut if the files are identical.
Do we really want an unstable sort here? Two different diagnostics could compare equal according to the ordering defined here if they refer to the same range, or even just the same range start.
I don't think it's important for us to preserve the original order if there are multiple diagnostics at the same line, for as long as the order is deterministic (Running Red Knot multiple times should give you the same result). We should probably include the rule code as well as disambiguator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for as long as the order is deterministic
Exactly, that was my concern.
We should probably include the rule code as well as disambiguator
👍
## Summary I'm currently on the fence about landing the #14760 PR because it's unclear how we'd support tracking used and unused suppression comments in a performant way: * Salsa adds an "untracked" dependency to every query reading accumulated values. This has the effect that the query re-runs on every revision. For example, a possible future query `unused_suppression_comments(db, file)` would re-run on every incremental change and for every file. I don't expect the operation itself to be expensive, but it all adds up in a project with 100k+ files * Salsa collects the accumulated values by traversing the entire query dependency graph. It can skip over sub-graphs if it is known that they contain no accumulated values. This makes accumulators a great tool for when they are rare; diagnostics are a good example. Unfortunately, suppressions are more common, and they often appear in many different files, making the "skip over subgraphs" optimization less effective. Because of that, I want to wait to adopt salsa accumulators for type check diagnostics (we could start using them for other diagnostics) until we have very specific reasons that justify regressing incremental check performance. This PR does a "small" refactor that brings us closer to what I have in #14760 but without using accumulators. To emit a diagnostic, a method needs: * Access to the db * Access to the currently checked file This PR introduces a new `InferContext` that holds on to the db, the current file, and the reported diagnostics. It replaces the `TypeCheckDiagnosticsBuilder`. We pass the `InferContext` instead of the `db` to methods that *might* emit diagnostics. This simplifies some of the `Outcome` methods, which can now be called with a context instead of a `db` and the diagnostics builder. Having the `db` and the file on a single type like this would also be useful when using accumulators. This PR doesn't solve the issue that the `Outcome` types feel somewhat complicated nor that it can be annoying when you need to report a `Diagnostic,` but you don't have access to an `InferContext` (or the file). However, I also believe that accumulators won't solve these problems because: * Even with accumulators, it's necessary to have a reference to the file that's being checked. The struggle would be to get a reference to that file rather than getting a reference to `InferContext`. * Users of the `HasTy` trait (e.g., a linter) don't want to bother getting the `File` when calling `Type::return_ty` because they aren't interested in the created diagnostics. They just want to know what calling the current expression would return (and if it even is a callable). This is what the different methods of `Outcome` enable today. I can ask for the return type without needing extra data that's only relevant for emitting a diagnostic. A shortcoming of this approach is that it is now a bit confusing when to pass `db` and when an `InferContext`. An option is that we'd make the `file` on `InferContext` optional (it won't collect any diagnostics if `None`) and change all methods on `Type` to take `InferContext` as the first argument instead of a `db`. I'm interested in your opinion on this. Accumulators are definitely harder to use incorrectly because they remove the need to merge the diagnostics explicitly and there's no risk that we accidentally merge the diagnostics twice, resulting in duplicated diagnostics. I still value performance more over making our life slightly easier.
Summary
This is a reimplementation of #14116 now that Salsa salsa accumulators are cheaper (close to "free" for queries without diagnostics).
The main benefit of salsa accumulators is that they're an easy way to emit a diagnostic from anywhere inside a salsa query, and salsa takes care to deduplicate diagnostics if the same query is called multiple times.
Performance
This still significantly regresses the incremental performance. I created another salsa PR to reduce the regression to 9%. I don't think we can do much better except reducing the number of inputs by using coarse-grained salsa dependencies because:
Considering this, I'm still leaning towards migrating to salsa accumulators because of what they unlock: We can now emit diagnostics from any part of the code. This includes the module resolver (e.g. emit a warning if we find an invalid
pth
file?)Other changes
This PR upgrades Salsa to a version with "cheap" accumulators. The new salsa version now requires that
Db
structs implementClone
for its parallel db support (which doesn't seem to work yet).Test Plan
cargo test