Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward pagination with nodes, edges, totalCount and pageInfo #2

Closed
simonw opened this issue Aug 2, 2020 · 26 comments
Closed

Forward pagination with nodes, edges, totalCount and pageInfo #2

simonw opened this issue Aug 2, 2020 · 26 comments
Labels
enhancement New feature or request

Comments

@simonw
Copy link
Owner

simonw commented Aug 2, 2020

https://graphql.org/learn/pagination/

@simonw simonw added the enhancement New feature or request label Aug 2, 2020
@simonw simonw added this to the First non-alpha release milestone Aug 2, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 2, 2020

It looks like this is the community standard to follow: https://relay.dev/graphql/connections.htm

@simonw
Copy link
Owner Author

simonw commented Aug 3, 2020

I'm going to start by imitating the GitHub API, but implementing only nodes and not edges.

As far as I can tell edges are a GraphQL convention for when the relationship between two things might itself have additional properties - the date it was created, for example.

I don't need that yet, and it looks like I can implement nodes and then implement edges later if I need them without needing to build them in from the start.

Here's a sample GitHub GraphQL API query: https://developer.github.com/v4/explorer/

query { 
  viewer { 
    repositories(first:5) {
      __typename
       nodes {
        name
      }
      totalCount
      pageInfo {
        endCursor
        startCursor
      }
    }
  }
}

Which produces this output:

{
  "data": {
    "viewer": {
      "repositories": {
        "__typename": "RepositoryConnection",
        "nodes": [
          {
            "name": "django-debug-toolbar"
          },
          {
            "name": "simonw.github.com"
          },
          {
            "name": "ratelimitcache"
          },
          {
            "name": "api-explorer"
          },
          {
            "name": "python-guardianapi"
          }
        ],
        "totalCount": 385,
        "pageInfo": {
          "endCursor": "Y3Vyc29yOnYyOpHOAAIfBA==",
          "startCursor": "Y3Vyc29yOnYyOpHN1ZQ="
        }
      }
    }
  }
}

Here's a query that also shows their support for edges:

{
  viewer {
    repositories(first: 5, after: "Y3Vyc29yOnYyOpHOAAGSAg==") {
      __typename
      nodes {
        id
        name
      }
      edges {
        cursor
        node {
          id
          name
        }
      }
      totalCount
      pageInfo {
        endCursor
        startCursor
        hasNextPage
        hasPreviousPage
      }
    }
  }
}

@simonw simonw changed the title Support pagination (maybe with nodes and edges) Support pagination with nodes Aug 3, 2020
@simonw simonw changed the title Support pagination with nodes Support pagination with nodes and pageInfo Aug 3, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 3, 2020

I think I can only do endCursor though - Datasette doesn't currently support the startCursor concept.

Urgh cursors will be hard because that cursor logic is embedded deep in the tangle of the Datasette TableView class.

@simonw
Copy link
Owner Author

simonw commented Aug 3, 2020

Maybe I do need to dispatch these calls to the TableView class.

@simonw
Copy link
Owner Author

simonw commented Aug 3, 2020

I'm going to try that. I can create a fake datasette.utils.asgi.Request using the Request.fake() method, with this signature:

    def fake(cls, path_with_query_string, method="GET", scheme="http"):

Then I can create a datasette.views.table.TableView instant (passing datasette to the constructor)and call its.data()` method which has this signature:

    async def data(
        self,
        request,
        database,
        hash,
        table,
        default_labels=False,
        _next=None,
        _size=None,
    ):

That method ends with this return statement - which includes the stuff that I need:

        return (
            {
                "database": database,
                "table": table,
                "is_view": is_view,
                "human_description_en": human_description_en,
                "rows": rows[:page_size],
                "truncated": results.truncated,
                "filtered_table_rows_count": filtered_table_rows_count,
                "expanded_columns": expanded_columns,
                "expandable_columns": expandable_columns,
                "columns": columns,
                "primary_keys": pks,
                "units": units,
                "query": {"sql": sql, "params": params},
                "facet_results": facet_results,
                "suggested_facets": suggested_facets,
                "next": next_value and str(next_value) or None,
                "next_url": next_url,
                "private": private,
                "allow_execute_sql": await self.ds.permission_allowed(
                    request.actor, "execute-sql", database, default=True
                ),
            },
            extra_template,
            (
                "table-{}-{}.html".format(to_css_class(database), to_css_class(table)),
                "table.html",
            ),
        )

Using Datasette internals like this is a bit risky, since future changes to Datasette could easily break this plugin. This should encourage me to finally refactor the TableView to be cleaner and easier to use from other plugins though.

@simonw
Copy link
Owner Author

simonw commented Aug 3, 2020

It looks like the Relay specification really wants you to use edges:

I'm going to stick with my nodes plan, on the basis that I can always add edges later without breaking backwards compatibility.

Adapted from the example on https://relay.dev/graphql/connections.htm I'm going to go with:

    friends(first: 10, after: "opaqueCursor") {
      nodes {
        id
        name
      }
      pageInfo {
        hasNextPage
        cursor
      }
    }

The current Relay PageInfo spec asks for the following fields:

  • hasPreviousPage
  • hasNextPage
  • startCursor
  • endCursor

The spec says:

Relay Legacy did not define startCursor and endCursor, and relied on selecting the cursor of each edge

I'm going to just return hasNextPage and endCursor for the moment - which will have the alias of cursor.

@simonw
Copy link
Owner Author

simonw commented Aug 3, 2020

https://jeffersonheard.github.io/python/graphql/2018/12/08/graphene-python.html is really useful. Helped me understand that even without additional properties the "edge" concept is useful because you can have a cursor on each edge - at which point the startCursor and endCursor are just the first and last of those values for the current page.

@simonw
Copy link
Owner Author

simonw commented Aug 3, 2020

https://github.com/graphql-python/graphene-sqlalchemy/blob/master/examples/flask_sqlalchemy/app.py is an example app using graphene-sqlalchemy that implements edges/node pagination.

It all comes down to subclasses of graphene.relay.Connection here https://github.com/graphql-python/graphene/blob/d085c8852be40552428f74fdf891b20e54674540/graphene/relay/connection.py#L59

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

I'm going to support both edges AND nodes - where edges just means nodes with an additional cursor option.

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

This may be the kick I need to get Datasette pagination to work in reverse too.

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

For the non-alpha release I'm going to support forwards pagination only - after: and endCursor and hasNextPage. I can support startCursor too at this point even though it's not useful for anything yet.

I'll split hasPreviousPage and before: into a separate ticket.

@simonw simonw changed the title Support pagination with nodes and pageInfo Forward pagination with nodes, edges and pageInfo Aug 4, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

How easy is it for me to generate cursors for individual records? In Datasette core the next parameter is an encoded version of the row's sortable column AND the primary key as a tie-breaker.

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

Here's the Datasette core logic that figures out the next_value token: https://github.com/simonw/datasette/blob/7ca8c0521ac1ea48a3cd8d0fe9275d1316e54b43/datasette/views/table.py#L709-L722

            if is_view:
                next_value = int(_next or 0) + page_size
            else:
                next_value = path_from_row_pks(rows[-2], pks, use_rowid)
            # If there's a sort or sort_desc, add that value as a prefix
            if (sort or sort_desc) and not is_view:
                prefix = rows[-2][sort or sort_desc]
                if isinstance(prefix, dict) and "value" in prefix:
                    prefix = prefix["value"]
                if prefix is None:
                    prefix = "$null"
                else:
                    prefix = urllib.parse.quote_plus(str(prefix))
                next_value = "{},{}".format(prefix, next_value)

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

path_from_row_pks(rows[-2], pks, use_rowid) reflects that in Datasette's pagination scheme you pull back 21 records for a page of 20, so that the existence of record 21 lets you know that there's at least one more page. That's why it's rows[-2].

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

As a reminder, here's a pagination example from regular Datasette:

https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators?_next=8556%2C121799&_sort_desc=replies

Here' the token is 8556%2C121799 which decodes to 8556,121799 - because it needs values with 8556 or more replies and a primary key higher than 121799.

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

Datasette encodes next tokens using URL encoding. The GraphQL standard seems to be base64 instead, so I'll use that here.

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

I'm going to set the default page size to 100 (if no first: is specified) - and have 100 as the maximum too.

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

I'm going to copy code over that I need from datasette.utils rather than importing it directly, because those are undocumented APIs and I don't want to couple the two projects together.

I may well extract duplicated code into a documented API (or a separate package entirely, maybe sqlite-utils) in the future.

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

This looks like the most relevant code from datasette.utils: https://github.com/simonw/datasette/blob/7ca8c0521ac1ea48a3cd8d0fe9275d1316e54b43/datasette/utils/__init__.py#L54-L102

_boring_keyword_re = re.compile(r"^[a-zA-Z_][a-zA-Z0-9_]*$")


def escape_sqlite(s):
    if _boring_keyword_re.match(s) and (s.lower() not in reserved_words):
        return s
    else:
        return "[{}]".format(s)


def urlsafe_components(token):
    "Splits token on commas and URL decodes each component"
    return [urllib.parse.unquote_plus(b) for b in token.split(",")]


def path_from_row_pks(row, pks, use_rowid, quote=True):
    """ Generate an optionally URL-quoted unique identifier
        for a row from its primary keys."""
    if use_rowid:
        bits = [row["rowid"]]
    else:
        bits = [
            row[pk]["value"] if isinstance(row[pk], dict) else row[pk] for pk in pks
        ]
    if quote:
        bits = [urllib.parse.quote_plus(str(bit)) for bit in bits]
    else:
        bits = [str(bit) for bit in bits]

    return ",".join(bits)


def compound_keys_after_sql(pks, start_index=0):
    # Implementation of keyset pagination
    # See https://github.com/simonw/datasette/issues/190
    # For pk1/pk2/pk3 returns:
    #
    # ([pk1] > :p0)
    #   or
    # ([pk1] = :p0 and [pk2] > :p1)
    #   or
    # ([pk1] = :p0 and [pk2] = :p1 and [pk3] > :p2)
    or_clauses = []
    pks_left = pks[:]
    while pks_left:
        and_clauses = []
        last = pks_left[-1]
        rest = pks_left[:-1]
        and_clauses = [
            "{} = :p{}".format(escape_sqlite(pk), (i + start_index))
            for i, pk in enumerate(rest)
        ]
        and_clauses.append(
            "{} > :p{}".format(escape_sqlite(last), (len(rest) + start_index))
        )
        or_clauses.append("({})".format(" and ".join(and_clauses)))
        pks_left.pop()
    or_clauses.reverse()
    return "({})".format("\n  or\n".join(or_clauses))

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

This was really helpful in reminding me how to do this: simonw/datasette#190

@simonw
Copy link
Owner Author

simonw commented Aug 4, 2020

On closer reading of https://relay.dev/graphql/connections.htm#sec-undefined.PageInfo.Fields it looks like hasPreviousPage is only expected to be populated if you're already paginating backwards, using before:.

There's some further context from GraphQL co-creator Dan Schafer in this comment: graphql/graphql-relay-js#58 (comment)

We went with this approach because it's easy to envision a backend system (and we had some at Facebook) where it was be prohibitively expensive to compute hasPreviousPage when paginating forward (or vice versa). The cursors that we were using for pagination made it very efficient to skip straight to the item in question, but trying to traverse backwards to see if there was anything before it would have been costly. Hence, we didn't assign meaning to hasPreviousPage, because we had no way to provide an accurate value there.

Your feedback is spot on, though; this definitely messes with bidirectional pagination. One way forward would be to modify the Relay specification and allow, though not require, hasPreviousPage to have the expected meaning. Connections where hasPreviousPage is easy to compute can then include it, where cases where it is expensive can continue returning false. Clients can then enable the bidirectional pagination behavior you're looking to add only if they ever see hasPreviousPage be true when paginating forward.

It looks like they went ahead and made that change: https://relay.dev/graphql/connections.htm#sec-undefined.PageInfo.Fields specifically says about hasPreviousPage:

If the server can efficiently determine that elements exist prior to after, return true.

So I only need it to actually work when I implement before: pagination in #17. I can have hasPreviousPage exist and return false in the initial implementation.

@simonw simonw changed the title Forward pagination with nodes, edges and pageInfo Forward pagination with nodes, edges, totalCount and pageInfo Aug 5, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 5, 2020

Writing the code for this is tricky. I'm going to build a one-off implementation for just a single table first, then figure out how to generalize it to every table.

simonw added a commit that referenced this issue Aug 5, 2020
simonw added a commit that referenced this issue Aug 5, 2020
This query now returns the correct-ish results:

{
  repos(filters: [stargazers__gt=1]) {
    totalCount
    nodes {
      id
      full_name
    }
    edges {
      cursor
      node {
        id
        full_name
      }
    }
  }
}
simonw added a commit that referenced this issue Aug 5, 2020
@simonw
Copy link
Owner Author

simonw commented Aug 5, 2020

It looks to me like most of the work will happen in the resolve method for the table:

async def resolve_repos(root, info, filters=None, first=None, after=None):
if first is None:
first = 10
table_name = "repos"
print("filters=", filters, "first=", first)
where_clause = ""
params = {}
if filters:
pairs = [f.split("=", 1) for f in filters]
print(" pairs = ", pairs)
filter_obj = Filters(pairs)
where_clause_bits, params = filter_obj.build_where_clauses(table_name)
print(" where_clause_bits={}, params={}".format(where_clause_bits, params))
where_clause = " where " + " and ".join(where_clause_bits)
print("select * from [{}]{}".format(table_name, where_clause), params)
results = await db.execute(
"select * from [{}]{} limit {}".format(table_name, where_clause, first),
params,
)
print("len", len(results.rows))
return [dict(row) for row in results.rows]

With a little bit of extra code on the class for that table - but really just to pull out the right bits into the edges / nodes / pageInfo fields:

class Repos(graphene.ObjectType):
totalCount = graphene.Int()
pageInfo = graphene.Field(PageInfo)
nodes = graphene.List(Repo)
edges = graphene.List(RepoEdge)
def resolve_totalCount(parent, info):
return len(parent)
def resolve_nodes(parent, info):
return parent
def resolve_edges(parent, info):
return [
{"cursor": path_from_row_pks(row, ["id"], use_rowid=False), "node": row}
for row in parent
]
def resolve_pageInfo(parent, info):
last_row = parent[-1]
return {
"endCursor": path_from_row_pks(last_row, ["id"], use_rowid=False),
"hasNextPage": False,
}

As such, I think the async def resolve_table(...) method should return an object which represents the query that was built up using the filters and after and first parameters. That object only executes the count(*) query if the resolve_totalCount() method asks for it. It executes the select * from ... query just once the first time that data is needed by something.

That way if you ask for both edges and nodes the select * from... only gets executed once.

@simonw
Copy link
Owner Author

simonw commented Aug 5, 2020

This mechanism will also handle the "ask for page size + 1 so we can tell if there's a next page" logic.

@simonw
Copy link
Owner Author

simonw commented Aug 5, 2020

I'm going to have a quick go at reusing the TableView class directly, just to see how feasible that approach would be (and to save me having to re-implement pagination) https://github.com/simonw/datasette/blob/7ca8c0521ac1ea48a3cd8d0fe9275d1316e54b43/datasette/views/table.py#L242

I could have the resolve_repos() method execute that entire thing and return the resulting Datasette JSON - which the other resolve methods could then pick bits off of and return them.

@simonw
Copy link
Owner Author

simonw commented Aug 6, 2020

Needs documentation, then I can merge it.

simonw added a commit that referenced this issue Aug 6, 2020
@simonw simonw closed this as completed in d8691f3 Aug 6, 2020
simonw added a commit that referenced this issue Aug 6, 2020
Also fixed first link to live demo. Refs #2
simonw added a commit that referenced this issue Aug 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant