Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: persistent lists #11767

Draft
wants to merge 9 commits into
base: master
Choose a base branch
from
Draft

Conversation

ConnorBaker
Copy link
Contributor

@ConnorBaker ConnorBaker commented Oct 30, 2024

Important

This PR is a work in progress. I may force-push.

Experimental work on switching to persistent lists via immer (thanks for the library suggestion @Mic92!).

Current blockers:

  • Various places use collections from std without configuring them to use the GC.
    • Resolved
  • A printing test fails (see TODOs in src/libexpr/print-ambiguous.cc), but only when run through nix-instantiate --eval -- nix eval returns the correct result. That's odd and makes me think the failing test is symptomatic of a larger problem I introduced.
    • Resolved, fix involved check condition on printAmbiguous
  • I'm unable to evaluate my own system closure because an uninitialized value sneaks in. To help debug this, I've a printError to Value::type (see src/libexpr/value.hh), which is what was throwing the error. I've also added an assert to EvalState::forceValue (see src/libexpr/eval-inline.hh) to ensure an error is thrown if an uninitialized value is forced.
    • Resolved, though I had to remove all uses of transient
  • Reasonably sure I'm "holding it wrong" with respect to allocation of ValueList through the new keyword.
  • ValueList::push_back uses std::move on the argument; I don't know enough CPP to know what that means if the argument is used afterwards.
  • Should re-add the small-list optimization though might be worth investigating whether it inhibits structural sharing?
  • I removed tests for the nix_api because I don't know how (or with what) to replace them.
  • NIX_SHOW_STATS no longer shows helpful things for lists besides concatenations
    • TODO: Since the (++) operand is no longer using concatLists, it's not updating the number of list concatenations tracked by EvalState.
    • If we're using persistent structures with sharing, how do we track or communicate the number of values allocated/processed?

Motivation

As far as I can tell, lists in Nix are structures with raw C arrays and a size. Most list operations involve allocating an array large enough to hold the result and then memcpy-ing components over.

This PR looks at using persistent lists via immer -- my hope is that by leveraging techniques from function programming, we can reduce the amount of allocation and copying we do and improve performance of the interpreter.

Context

TODO

Early numbers

Setup

$ nix build github:NixOS/nix/12e31ab77d188cfa88337a308430b4c90b311d8e --out-link nix-before
$ nix build github:NixOS/nix/2ee7758cf160b454cdb8534a02a614895e962bac --out-link nix-after
$ curl -L "github.com/NixOS/nixpkgs/archive/91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66.tar.gz" | tar -xz

Run on a NixOS system with an Intel i9-13900K.

The current implementation is about 1.13x faster than the one in this PR. Put another way, the changes in this PR slow down eval by around 12% (using system evaluation times, 3.339-2.924)/3.339=0.124).

NOTE: I verified outputs from both versions are the same.

System evaluation

$ hyperfine --warmup 2 --min-runs 20 --export-markdown system-eval-bench.md --parameter-list nix-version nix-before,nix-after './{nix-version}/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel'
Benchmark 1: ./nix-before/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel
  Time (mean ± σ):      2.924 s ±  0.026 s    [User: 2.599 s, System: 0.314 s]
  Range (min … max):    2.906 s …  3.030 s    20 runs
 
  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
 
Benchmark 2: ./nix-after/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel
  Time (mean ± σ):      3.339 s ±  0.024 s    [User: 2.980 s, System: 0.346 s]
  Range (min … max):    3.318 s …  3.397 s    20 runs
 
Summary
  ./nix-before/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel ran
    1.14 ± 0.01 times faster than ./nix-after/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel
Command Mean [s] Min [s] Max [s] Relative
./nix-before/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel 2.924 ± 0.026 2.906 3.030 1.00
./nix-after/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel 3.339 ± 0.024 3.318 3.397 1.14 ± 0.01
$ NIX_SHOW_STATS_PATH="$PWD/system-eval-bench-before.json" NIX_SHOW_STATS=1 ./nix-before/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel
trace: Obsolete option `hardware.opengl.extraPackages' is used. It was renamed to `hardware.graphics.extraPackages'.
trace: Obsolete option `hardware.opengl.package' is used. It was renamed to `hardware.graphics.package'.
«derivation /nix/store/21amxwj96mzly75kci5ii6kgblyfp971-nixos-system-nixos-desktop-24.11.20241014.a3c0b3b.drv»
{
  "cpuTime": 2.796797037124634,
  "envs": {
    "bytes": 147284032,
    "elements": 10772297,
    "number": 7638207
  },
  "gc": {
    "cycles": 5,
    "heapSize": 1024122880,
    "totalBytes": 1482768144
  },
  "list": {
    "bytes": 20914368,
    "concats": 390624,
    "elements": 2614296
  },
  "nrAvoided": 8090856,
  "nrExprs": 1469182,
  "nrFunctionCalls": 6649159,
  "nrLookups": 3978975,
  "nrOpUpdateValuesCopied": 31774690,
  "nrOpUpdates": 607131,
  "nrPrimOpCalls": 3261148,
  "nrThunks": 11004466,
  "sets": {
    "bytes": 679170688,
    "elements": 40539768,
    "number": 1908400
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 16,
    "Env": 8,
    "Value": 24
  },
  "symbols": {
    "bytes": 1016494,
    "number": 85193
  },
  "time": {
    "cpu": 2.796797037124634,
    "gc": 0.973,
    "gcFraction": 0.3478979658103235
  },
  "values": {
    "bytes": 412291368,
    "number": 17178807
  }
}
$ NIX_SHOW_STATS_PATH="$PWD/system-eval-bench-after.json" NIX_SHOW_STATS=1 ./nix-after/bin/nix eval --no-eval-cache --read-only github:ConnorBaker/nixos-configs/39f01b05c37494627242162863b4725e38600b50#nixosConfigurations.nixos-desktop.config.system.build.toplevel
trace: Obsolete option `hardware.opengl.extraPackages' is used. It was renamed to `hardware.graphics.extraPackages'.
trace: Obsolete option `hardware.opengl.package' is used. It was renamed to `hardware.graphics.package'.
«derivation /nix/store/21amxwj96mzly75kci5ii6kgblyfp971-nixos-system-nixos-desktop-24.11.20241014.a3c0b3b.drv»
{
  "cpuTime": 3.241765022277832,
  "envs": {
    "bytes": 147284032,
    "elements": 10772297,
    "number": 7638207
  },
  "gc": {
    "cycles": 6,
    "heapSize": 1141563392,
    "totalBytes": 1912521536
  },
  "list": {
    "bytes": 0,
    "concats": 103533,
    "elements": 0
  },
  "nrAvoided": 8090856,
  "nrExprs": 1469182,
  "nrFunctionCalls": 6649159,
  "nrLookups": 3978975,
  "nrOpUpdateValuesCopied": 31774690,
  "nrOpUpdates": 607131,
  "nrPrimOpCalls": 3261148,
  "nrThunks": 11004466,
  "sets": {
    "bytes": 679170688,
    "elements": 40539768,
    "number": 1908400
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 16,
    "Env": 8,
    "Value": 24
  },
  "symbols": {
    "bytes": 1016494,
    "number": 85193
  },
  "time": {
    "cpu": 3.241765022277832,
    "gc": 1.223,
    "gcFraction": 0.37726361768832245
  },
  "values": {
    "bytes": 426071736,
    "number": 17752989
  }
}

List-concatenation-heavy evaluation

$ hyperfine --warmup 2 --min-runs 20 --export-markdown list-concat-eval-bench.md --parameter-list nix-version nix-before,nix-after './{nix-version}/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names'
Benchmark 1: ./nix-before/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names
  Time (mean ± σ):     15.560 s ±  0.041 s    [User: 13.723 s, System: 1.798 s]
  Range (min … max):   15.492 s … 15.642 s    20 runs
 
Benchmark 2: ./nix-after/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names
  Time (mean ± σ):     17.644 s ±  0.077 s    [User: 15.699 s, System: 1.902 s]
  Range (min … max):   17.537 s … 17.860 s    20 runs
 
Summary
  ./nix-before/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names ran
    1.13 ± 0.01 times faster than ./nix-after/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names
Command Mean [s] Min [s] Max [s] Relative
./nix-before/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names 15.560 ± 0.041 15.492 15.642 1.00
./nix-after/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names 17.644 ± 0.077 17.537 17.860 1.13 ± 0.01
$ NIX_SHOW_STATS_PATH="$PWD/list-concat-eval-bench-before.json" NIX_SHOW_STATS=1 ./nix-before/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names &> /dev/null
{
  "cpuTime": 15.477017402648926,
  "envs": {
    "bytes": 1025735272,
    "elements": 83935767,
    "number": 44281142
  },
  "gc": {
    "cycles": 10,
    "heapSize": 7333933056,
    "totalBytes": 10287749840
  },
  "list": {
    "bytes": 131652904,
    "concats": 1976668,
    "elements": 16456613
  },
  "nrAvoided": 47813368,
  "nrExprs": 3631225,
  "nrFunctionCalls": 38051112,
  "nrLookups": 18372127,
  "nrOpUpdateValuesCopied": 207325193,
  "nrOpUpdates": 10128855,
  "nrPrimOpCalls": 14286327,
  "nrThunks": 72867737,
  "sets": {
    "bytes": 4695210432,
    "elements": 276587147,
    "number": 16863505
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 16,
    "Env": 8,
    "Value": 24
  },
  "symbols": {
    "bytes": 1903646,
    "number": 157132
  },
  "time": {
    "cpu": 15.477017402648926,
    "gc": 7.973,
    "gcFraction": 0.5151509359054803
  },
  "values": {
    "bytes": 2496420984,
    "number": 104017541
  }
}
$ NIX_SHOW_STATS_PATH="$PWD/list-concat-eval-bench-after.json" NIX_SHOW_STATS=1 ./nix-after/bin/nix eval --no-eval-cache --read-only --json --file ./nixpkgs-91bc7dadf7bf0bf3fb9203e611bd856f40fc2b66/pkgs/top-level/release-attrpaths-superset.nix names &> /dev/null
{
  "cpuTime": 18.587953567504883,
  "envs": {
    "bytes": 1025735272,
    "elements": 83935767,
    "number": 44281142
  },
  "gc": {
    "cycles": 12,
    "heapSize": 7954685952,
    "totalBytes": 12852417616
  },
  "list": {
    "bytes": 0,
    "concats": 247488,
    "elements": 0
  },
  "nrAvoided": 47813368,
  "nrExprs": 3631225,
  "nrFunctionCalls": 38051112,
  "nrLookups": 18372127,
  "nrOpUpdateValuesCopied": 207325193,
  "nrOpUpdates": 10128855,
  "nrPrimOpCalls": 14286327,
  "nrThunks": 72867737,
  "sets": {
    "bytes": 4695210432,
    "elements": 276587147,
    "number": 16863505
  },
  "sizes": {
    "Attr": 16,
    "Bindings": 16,
    "Env": 8,
    "Value": 24
  },
  "symbols": {
    "bytes": 1903646,
    "number": 157132
  },
  "time": {
    "cpu": 18.587953567504883,
    "gc": 8.347,
    "gcFraction": 0.4490542743011835
  },
  "values": {
    "bytes": 2579421624,
    "number": 107475901
  }
}

Priorities and Process

Add 👍 to pull requests you find important.

The Nix maintainer team uses a GitHub project board to schedule and track reviews.

@github-actions github-actions bot added new-cli Relating to the "nix" command c api Nix as a C library with a stable interface labels Oct 30, 2024
@roberth
Copy link
Member

roberth commented Oct 30, 2024

  • I'm unable to evaluate my own system closure because an uninitialized value sneaks in.

Memory management bugs are tricky to debug, because their effects tend to be discovered only later on, and in rather uninformative ways.

The warnings in https://sinusoid.es/immer/memory.html#garbage-collected-heap hold true, but I wouldn't be surprised if another property is involved in our crash here.

Does immer require internal pointers to be enabled? If not documented, perhaps just change our call to GC_set_all_interior_pointers.

Does this work if GC is disabled? If so, we might be dealing with a missing GC root. If it also fails with GC disabled, we might be looking at some sort of memory corruption, such as the lack of initialization you suggested.

Copy link
Member

@roberth roberth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Things that jumped out browsing through the code (not a review)

@@ -133,34 +146,6 @@ class ExternalValueBase
std::ostream & operator << (std::ostream & str, const ExternalValueBase & v);


class ListBuilder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll want to keep this around for the existing C API.
It should be easy to implement with immer types, I suppose.
Whether we continue to use ListBuilder internally is another question, and I could see "no" as a likely answer to it.

Comment on lines -30 to -31
tList1,
tList2,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two are an optimization that lets the evaluator produce small lists without any allocations besides e.g. the thunk that already existed. I think we'll want to keep tList1 and tList2 for this purpose.

// declare a memory policy for using a tracing garbage collector
using gc_policy = immer::memory_policy<immer::heap_policy<immer::gc_heap>,
immer::no_refcount_policy,
immer::default_lock_policy,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder what the default lock policy does.
I suppose clever locking might save allocations in a few cases, but I'm inclined to think that just allocating all the time is more efficient, because we tend to keep the "old version" around because purely functional operations are non-destructive and our scopes/closures (Env) tend to stick around fairly often in practice, and if they don't, we don't really know locally.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The locking is not used for allocation actually, but it's used by immer::atom so no need to worry about it :)

@Mic92
Copy link
Member

Mic92 commented Oct 30, 2024

Thanks for doing this experiment. Maybe to see if this goes in the right direction you could try to do some micro benchmark before having to fix all builtins etc? Than we could already get a performance preview.

@Mic92
Copy link
Member

Mic92 commented Oct 30, 2024

also cc @arximboldi

@ConnorBaker
Copy link
Contributor Author

ConnorBaker commented Oct 30, 2024

I can eval my NixOS config!

I tried re-adding uses of transient, but ran into the same errors in Value::type where internalType has some garbage value. However, running with GC_DONT_GC=1 allowed the calls to transient to work! (Unfortunately, setting GC_set_all_interior_pointers(1) did not.) @roberth do have anything you'd recommend reading in terms of GC roots? Sadly, I'm largely unaware of how modern GC works.

@github-actions github-actions bot added the with-tests Issues related to testing. PRs with tests have some priority label Oct 30, 2024
@ConnorBaker
Copy link
Contributor Author

Rebased to fix merge conflicts.

@roberth
Copy link
Member

roberth commented Oct 30, 2024

do have anything you'd recommend reading

No reading list unfortunately; except perhaps for the bdwgc docs, which are mostly in its headers. It's reference documentation, not really learning material.

The symptoms you describe suggest that something is not getting marked and then collected by accident, or in other words, some heap objects aren't reachable from the garbage collection roots such as the (main) stack or C++ heap objects allocated with traceable_allocator.
We often create the latter with allocRootValue.
For STL collections, we generally just pass the allocator along in the template arguments.

@Mic92
Copy link
Member

Mic92 commented Oct 31, 2024

Adding NIX_SHOW_STATS_PATH=/tmp/trace NIX_SHOW_STATS=1 to the benchmark would also allow to get allocation statistics. So it would be interesting if we don't improve CPU time if we at least save memory, after that it might be worth investigating if we can re-gain those 12% back.

@ConnorBaker
Copy link
Contributor Author

@Mic92 I added stats to the numbers section.

@roberth I know that with Haskell, one can use allocation stats provided by GHC's RTS as a weak proxy for performance -- any idea if that's the case with Nix as well? My understanding was that a non-trivial amount of time is spent in the garbage collector.

@ConnorBaker
Copy link
Contributor Author

@arximboldi I'm having trouble with transient data structures -- do you have any insight?

In eval.cc I'm writing this:

void ExprList::eval(EvalState & state, Env & env, Value & v)
{
    auto list = state.allocList();
    for (auto & i : elems)
        *list = list->push_back(i->maybeThunk(state, env));
    v.mkList(list);
}

I've tried to rewrite it to use transient:

void ExprList::eval(EvalState & state, Env & env, Value & v)
{
    auto list = state.allocList();
    auto transient = list->transient();
    for (auto & i : elems) {
        transient.push_back(i->maybeThunk(state, env));
    }
    *list = transient.persistent();
    v.mkList(list);
}

Unfortunately, when I do that I get an exception when calling transient.push_back!

nix: /nix/store/gkghbi5d6849mwsbcdhnqljz2xnjvnis-immer-0.8.1-unstable-2024-09-18/include/immer/transience/gc_transience_policy.hpp:95: ownee &immer::gc_transience_policy::apply<immer::heap_policy<immer::gc_heap>>::type::ownee::operator=(edit) [HeapPolicy = immer::heap_policy<immer::gc_heap>]: Assertion `e != noone' failed.

I believe something is being garbage collected when it shouldn't be, and verified this by checking that running with GC_DONT_GC=1 succeeds.

I know that in value.hh, we have typedefs for STL containers to make sure that we're using containers which allocate with traceable_allocator:

typedef std::vector<Value *, traceable_allocator<Value *>> ValueVector;
typedef std::unordered_map<Symbol, Value *, std::hash<Symbol>, std::equal_to<Symbol>, traceable_allocator<std::pair<const Symbol, Value *>>> ValueMap;
typedef std::map<Symbol, ValueVector, std::less<Symbol>, traceable_allocator<std::pair<const Symbol, ValueVector>>> ValueVectorMap;

Do I need to do something similar for flex_vector_transient?

Do you have any ideas on what I can do to troubleshoot or debug this? From the error, I'm not entirely sure if it's the Value which is being collected, or the nodes belonging to transient.

@roberth
Copy link
Member

roberth commented Oct 31, 2024

one can use allocation stats provided by GHC's RTS as a weak proxy for performance

To my understanding, this is a consequence of widespread use of immutable data structures.
GHC will optimize away some allocations, which Nix does not, so by that token Nix heap allocations may be a better proxy for performance than GHC's allocations.

My understanding was that a non-trivial amount of time is spent in the garbage collector.

Recent versions of Nix will tell you how much time is spent in GC. Evaluating my NixOS config it spends 10% of time in GC, but an individual package might only spend 1.5% in GC. (anecdata)

@Mic92
Copy link
Member

Mic92 commented Nov 1, 2024

So currently we are actually increasing memory usage? I wonder what happens if we remove the optimization from nix for 1,2 element lists from the current implementation. Maybe the comparison is than fairer?

@roberth
Copy link
Member

roberth commented Nov 1, 2024

Arrays are pretty compact, fwiw.

Maybe this optimization would pay off if we had constant folding? IIRC we haven't previously done constant folding because it didn't pay off, but perhaps that was because of all the copying.
Don't know, but might be worth a shot.

Another possibility is that the data structure can't be used efficiently because it has to assume that multiple references to it exist, even though those references might be all from the garbage.
And that's in the best case; #8285 causes reachable references to linger.

@arximboldi
Copy link

Hi @ConnorBaker !

Thanks a lot for this contribution. I am a fan of Nix and use it in all my systems, so this gets me very excited! 🎉 😄

Taking a look at the code, the two things I can see could improve:

  1. Use of transients. I see that you are trying that already.

  2. I see a few places where access into the data structure happens by index. It seems maybe even we may be even iterating by index in some cases depending on how some of the interfaces are used. The preferred ways of iterating over Immer data-structures are:
    a) immer::for_each, definitelly the fastest way!
    b) Where that's not possible, use iterators, still significantly faster than using random access...

As to why are those crashes happening... you mention that the values are being inappropriately GC'ed. You are using the immer::gc_policy that assumes that you're using Bohem's GC (conservative tracing GC). I suspect you probably need to provide a custom heap policy that properly interacts with whatever GC Nix is using? Are there any docs/resources on Nix's GC I could maybe take a look at?

@arximboldi
Copy link

Ok, just took a look and it seems we are indeed using Boehm's GC here. And the the implementation of traceable_allocator and it just forwards into GC_MALLOC_UNCOLLECTABLE... That somewhat does not seem right for this particular use-case, unless we combine it with reference counting.

I see instances of something that resembles refcounting in the code though (`nix_gc_incref, etc.). That could be integrated via the memory policy.

I think it'd be nice for someone with a bit more knowledge about how the lifetimes of C++ values is managed in Nix and how it all interacts with the GC. It's unclear to me: which allocations are should be made with Boehm's GC GC_MALLOC?

Also, what's nix_gc_incref/decref? From looking at the implementation, I suspect it's for FFI, in orderto keep objects alive outside of what Boehm's GC can see?

@arximboldi
Copy link

Looking at it again... I wouldn't fully discard that there may be a bug in Immer's use of gc_transient policy, which is less battle tested than the refcounting version. I will try find some time this week to investigate...

@nixos-discourse
Copy link

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/is-it-possible-for-us-to-remove-builtins-functionargs/51960/4

@github-actions github-actions bot added the repl The Read Eval Print Loop, "nix repl" command and debugger label Nov 16, 2024
@ConnorBaker
Copy link
Contributor Author

I’ve stalled a bit here in terms of will to make progress.

I’ve been thinking about implementations for the data structures we use for lists and attribute sets, and though this is naive, I would think the best implementation would make the most common operations the cheapest to perform.

Seeing as how lists are implemented as arrays of pointers to values instead of values themselves, I don’t know that the overhead involved in persistent data structures is worth what we can get back in sharing.

Having lazy evaluation, I don’t know that it would make sense to have lists contain actual values rather than pointers to them — for most of our usage of the language, would cache locality matter?

Attribute sets are vectors of Attr, but I remember that at some point they were two arrays of values — one for names, and one for values. Given that outside attribute set merging common patterns of use involve iterating over the names or values of attribute sets, would it make sense for the implementation to make such operations cheap? As I understand it, getting the names or values involved iterating over the vector of Attr and storing the desired components in a new collection.

Not related to lists or attribute sets — strings! I understand the SymbolTable is doing something like string interning? Is there a better data structure we could use rather than a vector? @tomberek in #9034 you mentioned using the symbol table for all strings; are you aware of any existing library we could use for more efficient string implementations or operations, or should we look into implementing our own?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c api Nix as a C library with a stable interface new-cli Relating to the "nix" command repl The Read Eval Print Loop, "nix repl" command and debugger with-tests Issues related to testing. PRs with tests have some priority
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants