-
-
Notifications
You must be signed in to change notification settings - Fork 22
Overall plans for the project #150
Comments
Oh, I guess we also need to rename the repo, which is a bit annoying. I think we'll need to break the github metadata that says that this is a "fork" of urllib3 upstream, because we're splitting off to become an independent project and if we don't do that then github will do things like assume that PRs intended for us should instead be directed back to Regarding orgs: it just occurred to me that the |
Oh, look what @pquentin found:
So I guess we'll do that, maybe after we get confirmation on ownership of our new name? |
Thank you for doing this. I wholeheartedly agree with 1/ the vision 2/ the governance with you as BDFL 3/ the next steps. But my agreement is also loosely held, in the sense that if someone feels strongly about some specific aspect I'd be happy to discuss to get consensus. |
I'd want you as BDFL and for the project to live under python-trio for the foreseeable future. I agree with everything else here in terms of direction. :) |
|
I think this means we all need to make new forks. |
Hey folks, So, we've poured a huge amount of work into getting a "high level" API squared away with At least from my POV, what would make a huge amount of sense would be for class AbstractAsyncHTTPTransport(ABC):
async def request(self, verb: bytes, url: bytes, headers: List[Tuple[bytes, bytes]], body: AsyncIterator[bytes]):
...
return (status_code, status_message, headers, response_body) No redirect handling. No cookie cleverness. No fancy HTTP retries. No multipart handling. No authentication or If Personally I think it'd probably be a good thing to have the low-level networking and the high-level model abstractions strictly seperated into independent nicely-scoped packages, since it helps keep different concerns strictly and clearly isolated. Some finer details: I think you'd also want a We do still have an open question on |
@tomchristie Yeah, we definitely need to figure out more about how hip/httpx relate. For me the biggest difference is the visions; as you wrote as you wrote in encode/httpx#522:
OTOH, for hip, like I wrote above, the goal is explicitly to "meet all possible use cases" and be "one size fits all" :-). So that's a pretty clear difference in focus, and I think it's been somewhat in the background of a lot of conversations we've had, and it's great that we've all managed to articulate it clearly now. We could talk about the motivation for our approach; for me it's closely related to what @pquentin wrote here:
For me, the number one challenge for building a production quality HTTP library is figuring out how handle all these edge cases that we don't even know about. This motivates us building on top of urllib3 (versus httpx's start-from-scratch strategy), but it also motivates the "one http client for everyone" philosophy, because that way we only have to address each weirdo edge case once, instead of every project having to rediscover them independently. Of course the downside is that we inherit all of urllib3's technical debt, but by the time httpx is as mature as urllib3 I'm sure it will have plenty of technical debt too, because HTTP is just intrinsically messy :-) And technical debt is something we can tackle incrementally on our own schedule, instead of being stuck in reactive mode waiting for reports to come in. I think the different visions also lead to different short-term priorities: we've certainly thought some about HTTP/2, but our primary focus for now is 100% on HTTP/1.1, because that's what we need for an MVP to replace the old libraries. OTOH, httpx has focused more on the shiny new features that older libraries lack. Both approaches make sense given the respective visions, but they're different. So I have absolutely no animosity towards httpx; it just seems focused on solving a different problem than the one I'm interested in. If we can find ways to collaborate though, then that would be excellent; there's no point in reinventing the wheel. So let's think a bit about what that might look like. Like you say, the high-level interface is probably where we have the most to learn from httpx. One my concerns before was about httpx's layered design, with redirects/cookies/etc. handled as separate concerns through the "middleware"/dispatcher mechanism. As a design, it's super elegant! But HTTP is not very elegant :-). And in particular, when I sat down to study requests, my conclusion was that the core "high-level" features are redirects, cookies, retries, and auth handling – and that these are all tightly coupled in a way that doesn't fit well with a layered architecture like this. (In fact every CVE in requests's history has been because of trying to de-couple redirects/auth.) So my feeling has been that the "high level" concerns are best handled as a single piece of tightly-integrated code. But it sounds like you're going in the same direction, so I guess that's no longer an issue? I don't think I agree about separating the high-level and low-level stuff into separate packages; but that doesn't actually affect too much. Regardless of whether hip starts growing higher-level features, we'll still have the lower-level networking stuff that httpx can use if that makes sense. And if hip does start growing its own higher-level features, we'll obviously look at httpx when doing that. So either way, I don't think any work will be wasted. I do think that simply adopting httpx might be difficult, because I think hip's approach to handling sync+async has a pretty fundamental effect on API design, so we'll need to think those issues through while adopting httpx's ideas. (And fwiw delaying
I think urllib3's idea for handling this is pretty elegant: it makes stuff like |
From my side I probably don't think that we've got any fundamental differences in goals, and the any failure to collaborate between our two sets of work would probably be down to a failure to communicate, rather than due to any technical blockers. My opening salvo in #522 isn't actually where I ended up, and was triggered by us having an underlying approach to async+sync that wasn't working for the project, which we've now removed with 0.8. There's a bunch of places where I think we're on either the same, or similar pages:
Sure, so let me frame that differently... We could trivally drop everything in
That's smart. Worth noting that it is actually functionally identical behavior to "verify/cert should be per-connection-pool settings, not per-request settings". If they're not sharing the same cache keys, then they are essentially just different connection pools. |
Now that we're transitioning out of "stealth mode" and have some new contributors joining, we should probably make sure we're all on the same page about goals, and start making some plans. Here's some notes to kick that off; discussion is very much welcome.
Goal/overall vision
For me, the goal is to create a Python HTTP client library that works for everyone. Like, nothing against other client libraries, we're not enemies or anything, but we want to eventually reach a point where our quality and features are so great that the maintainers of urllib3/requests/aiohttp/asks/treq/hyper/etc. are all comfortable recommending their users switch to our thing, or rebasing their libraries on top of ours. Of course that will take a lot of work, but it seems like in the long run it will take less work than everyone duplicating effort on maintaining all these HTTP stacks in parallel. Maintaining a production-quality HTTP client is hard.
Scope
We originally started out with a fairly narrow goal of "replace urllib3", possibly even keeping the name (as "urllib3 v2"). Over time, my feelings on this have shifted, due to a few factors:
It turns out that urllib3's public API exposes a lot of internal details, which is a major challenge for maintenance, because it's difficult to change things without accidentally breaking backwards compatibility. (This is a common problem in libraries that grew organically and then got really popular – numpy has similar problems.) And in particular, it's not actually possible to switch from httplib → h11 without breaking public API. And that's a precondition to pretty much everything else we want to do. So, keeping the urllib3 name and full public API simply isn't possible.
As we've gotten more familiar with urllib3's internals, I've found more places that I think could benefit from some cleanup :-)
The more I studied the requests/urllib3 split, the more it seemed to me that the separation doesn't make a lot of technical sense, at least in retrospect. The "high level" features that requests adds are not terribly complicated, they're pretty much always useful to have available, and they can be done in a way that doesn't add much or any overhead/complexity if you don't need them.
So... there's a tricky balance here, because we don't want to go wild trying to fix every frustration we've ever had with any HTTP library. We need to keep our schedule manageable and we want to make it easy for existing code to migrate to our new library. But I do think we should:
take this opportunity to review and pare down the public API and make sure that we're not exposing more internal details than we want to, and
increase our scope to target roughly requests+urllib3's feature set, rather than just urllib3 alone.
Decision-making (aka "governance")
So far we've been doing this all in the python-trio org, which I guess de facto makes me BDFL. I'm not particularly attached to that, but I do have some of the relevant skills and it's a simple solution to keeping things organized, so I'm happy to continue in that role for now if that's what we want. Keeping stuff in the python-trio org also lets us lean on the shared org infrastructure (e.g. code of conduct, contribution guidelines, automatic invitation bot, chat, forum, etc.). It might also make sense to switch to some other org like python-http or similar, but as of today that's more of an aspirational placeholder than a functioning organization. Or there's python-hyper, but that's effectively moribund. So I don't really know of a better place than python-trio to put stuff.
To be explicit though: even if it stays in python-trio for now, that doesn't mean that trio has some special status; we'll still have first-class support for asyncio etc., and the goal is to create something that belongs to the Python community as a whole.
Next steps
So far we've been trying to stay close to upstream urllib3 to make it easy to merge changes. But I think we've taken that about as far as it can go, so it's time to pull the trigger and start diverging for real. So some things we want to start doing soon:
That's kind of needs to happen in that order, I guess. There are also some next steps that are more parallelizable:
Again this is just a brain dump to prompt discussion, not the final word on anything, but hopefully it's a good start.
What do you all think? Any additions, concerns you want to raise, tweaks you want to make?
The text was updated successfully, but these errors were encountered: