Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for deserializing multiple DB::Serializable entities in a single row #137

Open
jgaskins opened this issue Oct 8, 2020 · 5 comments

Comments

@jgaskins
Copy link
Contributor

jgaskins commented Oct 8, 2020

I posted a while back on the forum asking whether something like the following was possible:

db.query_all <<-SQL, app_id, as: {App, User}
  SELECT apps.*, users.* -- I specify the columns explicitly but it would just be noise here
  FROM apps
    JOIN app_access ON app_access.app_id = apps.id
    JOIN users ON app_access.user_id = users.id
  WHERE apps.id = $1
SQL

… where both App and User include DB::Serializable. It isn't and it looks like it's due to the initialize method generated by the mixin encodes the assumption that the class that includes the mixin owns the full row.

I've gotten a lot of mileage out of DB::Serializable and it seems the only thing I can't do well with it (for my purposes, at least) is chop up a row that is the result of joining two relations. But I think it might be possible.

I'm thinking along the lines of how as: {String, Int} don't each consume the full row, they only consume the one column they need. Similarly, we could probably make DB::Serializable classes only consume columns until they're fully initialized.

I'm going to try to put together a proof of concept for this, but I figured I'd post here in case anyone has any ideas that might help. I think there will definitely be some caveats to it, like the classes must appear in the tuple in the same order as their properties appear in the row, so whatever is executing the query will need to match the classes to the SELECT clause, so it may need to have some knowledge of how the query was generated if it wasn't handwritten like my query above.

@bcardiff
Copy link
Member

bcardiff commented Oct 8, 2020

I like the idea. DB::Serializable has in its core that the order of columns does not matter and the keys are used.

Maybe a may to go is to allow DB::Serializable be constructed from named tuples. Getting the named tuples from the result set by as: {name: String, age: Int} could state the expected order of fields.

This is also related to how entities can be composed or embedded within each other. Eg: defining Money as an Amount+Currency and being able to use those values within another mapping.

@jgaskins
Copy link
Contributor Author

jgaskins commented Oct 8, 2020

DB::Serializable has in its core that the order of columns does not matter and the keys are used.

Indeed, and that makes sense when it's just one entity being deserialized, but let's say I have a query like this:

SELECT u.*, g.*
FROM users u
  JOIN group_memberships gm ON gm.group_id = u.id
  JOIN groups g ON gm.group_id = g.id

The columns we get back are:

rs.@fields.not_nil!.map(&.name)
# ["id",
#  "email",
#  "name",
#  "created_at",
#  "updated_at",
#  "id",
#  "name",
#  "active",
#  "created_at",
#  "updated_at"]

Since id, name, created_at, and updated_at keys would all be duplicated, relying on column names alone sounds like it wouldn't work. Even the NamedTuple idea would break on duplicate keys. 😕 I think it might have to rely on the order of the columns when there are multiple complex entities in play. They could be out of order within a single entity, and probably would be most of the time, but if I'm understanding correctly, each entity as a segment of a row would have to be in the same order for this to work.

This is also related to how entities can be composed or embedded within each other. Eg: defining Money as an Amount+Currency and being able to use those values within another mapping.

This sounds really cool. I'm curious about the feasibility (for example, a dump/restore might reorder columns in some DBs), but the idea is really neat. 🙂

@jgaskins
Copy link
Contributor Author

jgaskins commented Oct 8, 2020

What I mean is that the query in my previous post will always be in this structure:

id email name created_at updated_at id name active created_at updated_at

The users.* and groups.* may reorder the columns in the source relations due to storage logistics on disk (such as after a dump/restore):

created_at email id name updated_at active created_at id name updated_at

but it should always be:

user columns group columns

So if we pass as: {User, Group}, we should be able to make this work.

@jgaskins
Copy link
Contributor Author

jgaskins commented Oct 9, 2020

Okay, I've got a proof of concept and I'd love to get some thoughts on it. Here's what I ran:

pp db.query_all(<<-SQL, as: {User, Group})
  SELECT u.*, g.*
  FROM users u
  JOIN group_memberships gm ON gm.user_id = u.id
  JOIN groups g ON gm.group_id = g.id
SQL
# [{User(
#    @created_at=2020-10-07 03:46:09.074747000 UTC,
#    @email="[email protected]",
#    @id=UUID(2caa48fa-6be8-441f-9e0a-adf92dcdfe5d),
#    @name="A different user",
#    @updated_at=2020-10-07 03:46:09.074747000 UTC),
#   Group(
#    @active=true,
#    @created_at=2020-10-07 03:46:09.172841000 UTC,
#    @id=UUID(3e29edc9-2e8c-4fcd-976f-6dcf90f1673c),
#    @name="My Group",
#    @updated_at=2020-10-07 03:46:09.172841000 UTC)},
#  {User(
#    @created_at=2020-10-07 03:46:09.179586000 UTC,
#    @email="[email protected]",
#    @id=UUID(72e988d1-9841-4dd3-8d40-02a2f9967933),
#    @name="Jamie",
#    @updated_at=2020-10-07 03:46:09.179586000 UTC),
#   Group(
#    @active=true,
#    @created_at=2020-10-07 03:46:09.179824000 UTC,
#    @id=UUID(4272dfe0-f406-49d4-9489-08ca6c0247c1),
#    @name="Members",
#    @updated_at=2020-10-07 03:46:09.179824000 UTC)},
#  {User(
#    @created_at=2020-10-07 03:46:09.179586000 UTC,
#    @email="[email protected]",
#    @id=UUID(72e988d1-9841-4dd3-8d40-02a2f9967933),
#    @name="Jamie",
#    @updated_at=2020-10-07 03:46:09.179586000 UTC),
#   Group(
#    @active=true,
#    @created_at=2020-10-07 03:46:09.180032000 UTC,
#    @id=UUID(0c53a235-40b1-41ac-85c6-ceb3f5fe4d88),
#    @name="Admins",
#    @updated_at=2020-10-07 03:46:09.180032000 UTC)}]

Here is the patch I'm working with:

-        rs.each_column do |col_name|
+        rs.each_column_from_last do |col_name| # I don't know what to call this
           case col_name
             {% for name, value in properties %}
               when {{value[:key]}}
                 %found{name} = true
                 %var{name} =
                   {% if value[:converter] %}
                     {{value[:converter]}}.from_rs(rs)
                   {% elsif value[:nilable] || value[:default] != nil %}
                     rs.read(::Union({{value[:type]}} | Nil))
                   {% else %}
                     rs.read({{value[:type]}})
                   {% end %}
 
+                 {% for name_for_check, __value in properties %}
+                   next unless %found{name_for_check}
+                 {% end %}
+                 break
             {% end %}
           else
             rs.read # Advance set, but discard result
             on_unknown_db_column(col_name)
           end
         end

And a monkeypatch in will/crystal-pg to provide the new method:

class PG::ResultSet
  def each_column_from_last
    (@column_index...column_count).each do |i|
      yield column_name(i)
    end
  end
end

It looks like it'd be pretty much the same in crystal-lang/crystal-mysql other than the namespace, including the ivar name. But also maybe each_column shouldn't start from 0 if it's called multiple times. It doesn't look like we could actually resume reading from the beginning of the row on subsequent calls, so each_column starting from 0 on a second call looks like it wouldn't work as intended. I recognize that this method was probably not originally designed to be re-entrant, though. 🙂

@jgaskins
Copy link
Contributor Author

jgaskins commented Oct 9, 2020

Another caveat I just thought about is that, as written, this only supports INNER JOINs. Support for any type of OUTER JOIN would be more complicated. In the following query:

SELECT u.*, g.*
FROM users u
LEFT JOIN group_memberships gm ON gm.user_id = u.id
LEFT JOIN groups g ON gm.group_id = g.id

In this query, you'd have to specify as: {User, Group?} because all the columns in g.* might be NULL. String | Nil only requires looking at a single column, but we'd have to look at every column for a User to detect whether User | Nil was indeed nil.

jgaskins added a commit to jgaskins/interro that referenced this issue Oct 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants