Add support for deserializing multiple DB::Serializable entities in a single row #137

jgaskins · 2020-10-08T15:00:35Z

I posted a while back on the forum asking whether something like the following was possible:

db.query_all <<-SQL, app_id, as: {App, User}
  SELECT apps.*, users.* -- I specify the columns explicitly but it would just be noise here
  FROM apps
    JOIN app_access ON app_access.app_id = apps.id
    JOIN users ON app_access.user_id = users.id
  WHERE apps.id = $1
SQL

… where both App and User include DB::Serializable. It isn't and it looks like it's due to the initialize method generated by the mixin encodes the assumption that the class that includes the mixin owns the full row.

I've gotten a lot of mileage out of DB::Serializable and it seems the only thing I can't do well with it (for my purposes, at least) is chop up a row that is the result of joining two relations. But I think it might be possible.

I'm thinking along the lines of how as: {String, Int} don't each consume the full row, they only consume the one column they need. Similarly, we could probably make DB::Serializable classes only consume columns until they're fully initialized.

I'm going to try to put together a proof of concept for this, but I figured I'd post here in case anyone has any ideas that might help. I think there will definitely be some caveats to it, like the classes must appear in the tuple in the same order as their properties appear in the row, so whatever is executing the query will need to match the classes to the SELECT clause, so it may need to have some knowledge of how the query was generated if it wasn't handwritten like my query above.

The text was updated successfully, but these errors were encountered:

bcardiff · 2020-10-08T15:18:52Z

I like the idea. DB::Serializable has in its core that the order of columns does not matter and the keys are used.

Maybe a may to go is to allow DB::Serializable be constructed from named tuples. Getting the named tuples from the result set by as: {name: String, age: Int} could state the expected order of fields.

This is also related to how entities can be composed or embedded within each other. Eg: defining Money as an Amount+Currency and being able to use those values within another mapping.

jgaskins · 2020-10-08T15:58:36Z

DB::Serializable has in its core that the order of columns does not matter and the keys are used.

Indeed, and that makes sense when it's just one entity being deserialized, but let's say I have a query like this:

SELECT u.*, g.*
FROM users u
  JOIN group_memberships gm ON gm.group_id = u.id
  JOIN groups g ON gm.group_id = g.id

The columns we get back are:

rs.@fields.not_nil!.map(&.name)
# ["id",
#  "email",
#  "name",
#  "created_at",
#  "updated_at",
#  "id",
#  "name",
#  "active",
#  "created_at",
#  "updated_at"]

Since id, name, created_at, and updated_at keys would all be duplicated, relying on column names alone sounds like it wouldn't work. Even the NamedTuple idea would break on duplicate keys. 😕 I think it might have to rely on the order of the columns when there are multiple complex entities in play. They could be out of order within a single entity, and probably would be most of the time, but if I'm understanding correctly, each entity as a segment of a row would have to be in the same order for this to work.

This is also related to how entities can be composed or embedded within each other. Eg: defining Money as an Amount+Currency and being able to use those values within another mapping.

This sounds really cool. I'm curious about the feasibility (for example, a dump/restore might reorder columns in some DBs), but the idea is really neat. 🙂

jgaskins · 2020-10-08T16:04:44Z

What I mean is that the query in my previous post will always be in this structure:

id	email	name	created_at	updated_at	id	name	active	created_at	updated_at

The users.* and groups.* may reorder the columns in the source relations due to storage logistics on disk (such as after a dump/restore):

created_at	email	id	name	updated_at	active	created_at	id	name	updated_at

but it should always be:

user columns	group columns

So if we pass as: {User, Group}, we should be able to make this work.

jgaskins · 2020-10-09T14:27:22Z

Okay, I've got a proof of concept and I'd love to get some thoughts on it. Here's what I ran:

pp db.query_all(<<-SQL, as: {User, Group})
  SELECT u.*, g.*
  FROM users u
  JOIN group_memberships gm ON gm.user_id = u.id
  JOIN groups g ON gm.group_id = g.id
SQL
# [{User(
#    @created_at=2020-10-07 03:46:09.074747000 UTC,
#    @email="[email protected]",
#    @id=UUID(2caa48fa-6be8-441f-9e0a-adf92dcdfe5d),
#    @name="A different user",
#    @updated_at=2020-10-07 03:46:09.074747000 UTC),
#   Group(
#    @active=true,
#    @created_at=2020-10-07 03:46:09.172841000 UTC,
#    @id=UUID(3e29edc9-2e8c-4fcd-976f-6dcf90f1673c),
#    @name="My Group",
#    @updated_at=2020-10-07 03:46:09.172841000 UTC)},
#  {User(
#    @created_at=2020-10-07 03:46:09.179586000 UTC,
#    @email="[email protected]",
#    @id=UUID(72e988d1-9841-4dd3-8d40-02a2f9967933),
#    @name="Jamie",
#    @updated_at=2020-10-07 03:46:09.179586000 UTC),
#   Group(
#    @active=true,
#    @created_at=2020-10-07 03:46:09.179824000 UTC,
#    @id=UUID(4272dfe0-f406-49d4-9489-08ca6c0247c1),
#    @name="Members",
#    @updated_at=2020-10-07 03:46:09.179824000 UTC)},
#  {User(
#    @created_at=2020-10-07 03:46:09.179586000 UTC,
#    @email="[email protected]",
#    @id=UUID(72e988d1-9841-4dd3-8d40-02a2f9967933),
#    @name="Jamie",
#    @updated_at=2020-10-07 03:46:09.179586000 UTC),
#   Group(
#    @active=true,
#    @created_at=2020-10-07 03:46:09.180032000 UTC,
#    @id=UUID(0c53a235-40b1-41ac-85c6-ceb3f5fe4d88),
#    @name="Admins",
#    @updated_at=2020-10-07 03:46:09.180032000 UTC)}]

Here is the patch I'm working with:

-        rs.each_column do |col_name|
+        rs.each_column_from_last do |col_name| # I don't know what to call this
           case col_name
             {% for name, value in properties %}
               when {{value[:key]}}
                 %found{name} = true
                 %var{name} =
                   {% if value[:converter] %}
                     {{value[:converter]}}.from_rs(rs)
                   {% elsif value[:nilable] || value[:default] != nil %}
                     rs.read(::Union({{value[:type]}} | Nil))
                   {% else %}
                     rs.read({{value[:type]}})
                   {% end %}
 
+                 {% for name_for_check, __value in properties %}
+                   next unless %found{name_for_check}
+                 {% end %}
+                 break
             {% end %}
           else
             rs.read # Advance set, but discard result
             on_unknown_db_column(col_name)
           end
         end

And a monkeypatch in will/crystal-pg to provide the new method:

class PG::ResultSet
  def each_column_from_last
    (@column_index...column_count).each do |i|
      yield column_name(i)
    end
  end
end

It looks like it'd be pretty much the same in crystal-lang/crystal-mysql other than the namespace, including the ivar name. But also maybe each_column shouldn't start from 0 if it's called multiple times. It doesn't look like we could actually resume reading from the beginning of the row on subsequent calls, so each_column starting from 0 on a second call looks like it wouldn't work as intended. I recognize that this method was probably not originally designed to be re-entrant, though. 🙂

jgaskins · 2020-10-09T23:26:41Z

Another caveat I just thought about is that, as written, this only supports INNER JOINs. Support for any type of OUTER JOIN would be more complicated. In the following query:

SELECT u.*, g.*
FROM users u
LEFT JOIN group_memberships gm ON gm.user_id = u.id
LEFT JOIN groups g ON gm.group_id = g.id

In this query, you'd have to specify as: {User, Group?} because all the columns in g.* might be NULL. String | Nil only requires looking at a single column, but we'd have to look at every column for a User to detect whether User | Nil was indeed nil.

See crystal-lang/crystal-db#137

jgaskins added a commit to jgaskins/interro that referenced this issue Oct 10, 2020

Add support for multiple entities per query

afbeb0c

See crystal-lang/crystal-db#137

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for deserializing multiple DB::Serializable entities in a single row #137

Add support for deserializing multiple DB::Serializable entities in a single row #137

jgaskins commented Oct 8, 2020

bcardiff commented Oct 8, 2020

jgaskins commented Oct 8, 2020

jgaskins commented Oct 8, 2020 •

edited

Loading

jgaskins commented Oct 9, 2020

jgaskins commented Oct 9, 2020

Add support for deserializing multiple DB::Serializable entities in a single row #137

Add support for deserializing multiple DB::Serializable entities in a single row #137

Comments

jgaskins commented Oct 8, 2020

bcardiff commented Oct 8, 2020

jgaskins commented Oct 8, 2020

jgaskins commented Oct 8, 2020 • edited Loading

jgaskins commented Oct 9, 2020

jgaskins commented Oct 9, 2020

jgaskins commented Oct 8, 2020 •

edited

Loading