Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use MemoryIO. #44

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Use MemoryIO. #44

wants to merge 2 commits into from

Conversation

mirek
Copy link

@mirek mirek commented May 28, 2016

No description provided.

@asterite
Copy link
Contributor

Can you benchmark the difference? Creating a MemoryIO allocates memory so this should result in a slowdown. The idea is good, but I wouldn't merge this if it makes things slower (speed is more important)

@mirek
Copy link
Author

mirek commented May 28, 2016

Ok, let me think a bit. This is not final version, just first PR of something like this:

  • remove manual swaps in favor of IO provided functions (also makes code correct compiled on big-endian machines)
  • add support for 1d arrays
  • update decoding to one of...
    1. maybe move decoding into #from_io (probably not, it feels like too much unnecessary OO) or...
    2. simply individual functions on single Decoder class (probably yes), ie. Decoder#decode_jsonb etc. It's easy to reuse decoding logic this way and the code is very compact. Extending decoder with custom types will have to go through map of oid -> proc maybe.
  • read the whole frame into MemoryIO and use that for decoder
  • possibly use pool for MemoryIO buffers

...what do you think?

@mirek
Copy link
Author

mirek commented May 28, 2016

...and I thought to do each step as PR to stay in comprehensible size for each PR, otherwise it's going to be a bit big (but I can do it in single one as well).

@mirek
Copy link
Author

mirek commented May 28, 2016

Just a quick two thoughts:

  1. this introduction of MemoryIO doesn't overwrite anything, it's just a fallback. For example String is decoded using Slice(UInt8). Default implementation in Decoder just falls back to the IO-based so there's no performance penalty if the type decoding wants to work on Slice directly (I've messed up UUID decoding performance wise, that's true, I'll change it back to previous implementation which is much faster).
  2. the only reason to use MemoryIO is to have nice advancing interface for reading. Ideally crystal would provide in standard library something like Bytes (or Buffer) struct, lightweight, similar to Slice but not generic (there's no point because it works on bytes) with reader/writer offsets in internal state and methods for encoding/decoding in specified endianness (endianness could be specialised at the type level, ie. NetworkEndian::Buffer).

By the way, do you know if it's feasible to allow struct inheritance with restriction that it can't change width? Ie. just adding methods or it's not possible?

@mirek
Copy link
Author

mirek commented May 28, 2016

@asterite btw. this code is using Slice(UInt8)-constructor-variant of MemoryIO, which doesn't do malloc so it shouldn't be slower, there are no extra allocations.

@asterite
Copy link
Contributor

I mean, creating the MemoryIO is what allocates memory. Maybe it's insignificant, though, but I don't know.

@mirek
Copy link
Author

mirek commented May 29, 2016

@asterite ok, you're right, it is slower.

# ./bench/decoding.cr
# crystal run --release bench/decoding.cr 

require "benchmark"
require "../src/pg"

DB_URL = ENV["DATABASE_URL"]? || "postgres:///"
db = PG.connect(DB_URL)

Benchmark.ips do |x|
  x.report("int series") { db.exec("select * from generate_series(1, 1000);").rows }
end

# int series   1.78k (± 2.53%) will:master 
# int series   1.54k (± 2.23%) mirek:memory-io

Let me fix that.

@will
Copy link
Owner

will commented May 31, 2016

Thanks for the patch. I'm on a couple of cross country fights tomorrow (actually to go speak about crystal), so I should have enough down time to check out this in detail then.

However, in response to the endianness, I believe the current implementation does work on regardless of system endianness https://commandcenter.blogspot.se/2012/04/byte-order-fallacy.html

@will
Copy link
Owner

will commented Jun 1, 2016

I updated my previous benchmark of various byte swapping methods https://gist.github.com/will/c348bf90a7cbd7e93fd0 to include going through the NetworkEndian interface, and it's quite a bit slower.

$ crystal run --release byte_swap.cr
32bit
==========
RES = 67305985
true

    ntohl  42.64M (± 6.40%)  1.09× slower
   intrin  43.25M (± 4.54%)  1.08× slower
handshift  46.57M (± 3.72%)       fastest
 handswap  42.42M (± 4.45%)  1.10× slower
 memoryIo    9.8M (± 9.81%)  4.75× slower

64 bit
==========
RES64 = 578437695752307201
true

    ntohl64  46.17M (±20.54%)  1.21× slower
   intrin64  54.08M (± 9.65%)  1.03× slower
handshift64  53.88M (± 4.56%)  1.04× slower
 handswap64  55.81M (± 5.68%)       fastest
 memoryIo64  10.83M (±21.91%)  5.15× slower

I think probably part of that is because this creates a MemoryIO and the memory and gc pressure there, but also the NetworkEndian code https://github.com/crystal-lang/crystal/blob/master/src/io/byte_format.cr#L57-L68 is also making some objects it seems, and even if that's all structs, doesn't look as llvm optimizable as the other methods. I haven't looked into the generated assembly for the NetworkEndian code to verify though.

Other things on your list

  • we do need support the for the array types
  • individual functions could work, and maybe captured as procs, but we'd need to benchmark that

and I have been considering on the protocol side of the driver taking manual control over the buffering and byte swapping to get more speed out, but started with the stdlib just to get things going. I think maybe though MemoryIO is one abstraction too high for this project. The safety is good for general programs, but by being careful and deliberate with the memory access, things are a lot faster by avoiding the extra objects and bookkeeping.

Again, I appreciate you looking into this, thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants