Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add blacklist matching by name alias #4

Open
Hmiku8338 opened this issue Apr 16, 2022 · 8 comments
Open

Add blacklist matching by name alias #4

Hmiku8338 opened this issue Apr 16, 2022 · 8 comments
Labels
enhancement New feature or request

Comments

@Hmiku8338
Copy link
Collaborator

Currently, blacklist is applied to posts in a naive way -- assuming the user inputted all the tags correctly. However, it does not take the tag aliases into account. The reason we haven't implemented it with name aliases to begin with is speed: e621 does not give us a fast way to query a multitude of name aliases (for example, with a single request). So if a blacklist contains 15 tags, it will require us to do 15 web requests. So even if we cache the correct blacklist, it will still take quite a bit of time to filter any number of initial posts.

So the request is simple: Implement some fancy way of doing it quickly :)

@Hmiku8338 Hmiku8338 added the enhancement New feature or request label Apr 16, 2022
@Deer-Spangle
Copy link

What would even be the way to speed that up?
Feels like there's two options:

  • Could cache the whole name alias mapping, and check updates?
  • Could add/request/PR a batch-get endpoint to e621 API

@Hmiku8338
Copy link
Collaborator Author

Your first suggestion would work on a long-running web server but not in a library due to the sheer number of aliases. Each installation of the library would require us to make a ton of requests. We could, however, host a the mapping elsewhere, update it as often as we want, and load it with a single request. However, it still depends on the size of the mapping. If it's 10-20 MB, it would be a bit too large.

About your second point: e621 might already provide such functionality, we just need to look in the right place or ask the right people because the official documentation doesn't list anything like that.

@Deer-Spangle
Copy link

Yeaah, doing it in the library would probably be a bit much. There's 75 per page on their listing endpoint (listing endpoint doesn't seem to be in this lib btw?)
And there's 697 pages. So that would be 700 requests to load them all up, and 52k tag aliases to store, about 15MB of raw JSON, ouch.
And then you need to be checking the search endpoint by update time to keep the list up to date...

Maybe doable by a higher level application, but yeah. Too much for the library.

Yeah, the e621 docs are not great. I'm surprised you got so much API coverage in this lib though, wow! It totally tops the rankings in how much of their API is covered, loads of undocumented bits, thank you!

I could go file an issue on their repo asking about a batch get endpoint, if you want? Or a wildcard or tag list for the search endpoint I guess

@Hmiku8338
Copy link
Collaborator Author

I think all of our endpoints support listing. Here's the interface of TagAliases:
image

Yeah, that would be cool if you created the issue. But be sure to ask whether there exists such a method already.

@Deer-Spangle
Copy link

Yeah, I saw the model has get and search. I guess list is just search with no parameters defined, I didn't consider that!

Looking in e621ng, they have a method to convert a blacklist to an aliased query but there's no endpoint for doing that.
The endpoints are in the TagAliasesController and I can see there that only index and show are publicly available.
(For admins, there's an update, delete and approve endpoint also. Also an edit endpoint that doesn't seem functional?)

I'm not too sure what the endpoint would be for such a thing, I mean, doing one that just calls TagAlias.to_aliased_query(param['query']) would be stellar, but I'm a little unsure what the endpoint would look like, and how to wire that in.. I'll go make a ticket there!
(I've seen endpoints using a colon-verb kinda syntax in other APIs in the past, (e.g. /tag_aliases.json:batchGet?ids=1,2,3) not sure whether it's good practice though. That would mean an endpoint something like /tag_aliases.json:convertQuery?query=-deer or whatever

@Hmiku8338
Copy link
Collaborator Author

Problem solved :)

One of us will need to open and merge the pull request to add this into e621-py

@Deer-Spangle
Copy link

Yeah! I guess every endpoint already has batch get, that's handy!

Does the library need updating in a deeper manner, to support passing a list of IDs to all endpoints? Or does it just need some type hint updates?

@Hmiku8338
Copy link
Collaborator Author

Hmiku8338 commented Jun 9, 2022

I'm not sure whether we'll need a lot of changes (because I'm not very well informed about the interface of batch get). But I'm pretty sure that the changes will be limited to concrete endpoint classes and the abstract class will not need to change so no deep changes.

If you want to do it yourself, I can review your code. If not, I can implement it later when I have free time (which I don't have a lot of lately)

If you have any questions -- feel free to ask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants