Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iframe-based content viewer #716

Merged
merged 42 commits into from
Sep 22, 2022
Merged

Iframe-based content viewer #716

merged 42 commits into from
Sep 22, 2022

Conversation

veloman-yunkan
Copy link
Collaborator

@veloman-yunkan veloman-yunkan commented Feb 20, 2022

Will fix #394
Will fix #671

  • Added /content endpoint. Book content is served both under /content/<book>/<path>/<to>/<entry> and (for backward compatibility) under /<book>/<path>/<to>/<entry>.
  • Added a new iframe-based content viewer at '/viewer'
    • /viewer#<book>/<path>/<to>/<entry> displays in the iframe the content of <path>/<to>/<entry> of the book <book>.
    • /viewer#search?<search_query> displays in the iframe the search results for <search_query>. Enhanced book filtering in <search_query> is not supported (a single book must be specified via the content query parameter).
    • Clicking links in the iframe updates the URL in the address bar (except when the link leads to an external resource).
    • Manually editing the URL in the address bar updates the content in the iframe.
    • History maintenance works.
  • Removed support for taskbar injection in the backend.
  • PR history contains some hacks that were introduced at the prototype stage when the '/raw' endpoint was used. The hacks have not been reverted.

Questions:

  • How should we handle following an external link in the iframe?
  • Is /viewer#search?<search_query> an acceptable URL scheme for search URLs? A subtle problem with it is that it assumes that a book cannot be named search. An alternative is /viewer/search#<search_query> (or maybe you have other suggestions).

@codecov
Copy link

codecov bot commented Feb 20, 2022

Codecov Report

Base: 70.98% // Head: 70.51% // Decreases project coverage by -0.46% ⚠️

Coverage data is based on head (06af1a6) compared to base (4b6c645).
Patch coverage: 93.75% of modified lines in pull request are covered.

❗ Current head 06af1a6 differs from pull request most recent head 0a0f52f. Consider uploading reports for the commit 0a0f52f to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #716      +/-   ##
==========================================
- Coverage   70.98%   70.51%   -0.47%     
==========================================
  Files          53       53              
  Lines        3736     3653      -83     
  Branches     2078     2029      -49     
==========================================
- Hits         2652     2576      -76     
+ Misses       1082     1075       -7     
  Partials        2        2              
Impacted Files Coverage Δ
src/server/internalServer.h 89.47% <ø> (ø)
src/server/response.h 100.00% <ø> (ø)
src/tools/regexTools.cpp 100.00% <ø> (ø)
src/server/internalServer.cpp 87.94% <91.30%> (-0.11%) ⬇️
src/server/internalServer_catalog_v2.cpp 93.02% <100.00%> (+0.16%) ⬆️
src/server/response.cpp 93.65% <100.00%> (+1.59%) ⬆️
include/server.h 100.00% <0.00%> (ø)
src/server/i18n.h 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@veloman-yunkan veloman-yunkan force-pushed the iframe_based_content_viewer branch 2 times, most recently from f7287a7 to 693cb82 Compare March 21, 2022 15:24
@veloman-yunkan
Copy link
Collaborator Author

@kelson42 A prototype version of the iframe-based viewer is ready. I invite you to play with it by building kiwix-serve with this PR included, running it and opening the /viewer endpoint in your browser. If you would like to disable the new viewer you will have to clear cookies.

Known issues:

  • Some glitches in history tracking
  • Errors inside the viewer come with their own taskbar (two kiwix taskbars are displayed)
  • Links inside the viewer point to the '/raw' versions of the resources (a very minor issue, IMO)
  • Opening of external links was not addressed at all

@kelson42
Copy link
Collaborator

@veloman-yunkan Great, I will have a look.

@kelson42
Copy link
Collaborator

@veloman-yunkan I have tested and it looks promising. I have not much to say beside the known issues you have listed:

  • The fulltext results links should be modified (to use the viewer)
  • Page for SW based ZIM files using "Try Loading HTTPS URL" should be adapted
  • 404 page is not implemented at all
  • Does the redirects works fine?

@rgaudin @mgautierfr Would be good you could give a try as well but to me it looks promising and the basics seem to work fine (with Wikipedia and LowTechMagazin SW file). We should probably implement what is missing to start to check that everything works fine.

@veloman-yunkan
Copy link
Collaborator Author

@kelson42

@veloman-yunkan I have tested and it looks promising. I have not much to say beside the known issues you have listed:

  • The fulltext results links should be modified (to use the viewer)

This was done. I simply slightly messed up when publishing the viewer for public testing. The newly pushed fixup commit should make it work.

  • Page for SW based ZIM files using "Try Loading HTTPS URL" should be adapted

What is a SW-based ZIM file? I have never dealt with such files before.

  • 404 page is not implemented at all

Can you please try again with the mentioned fix? If you still have the same issue, please provide more details about it.

  • Does the redirects works fine?

That was another known issue that I forgot to mention.

@kelson42
Copy link
Collaborator

Regarding SW based ZIM files. These ZIM files are made by openzim/warc2zim. All the HTML is played by a page using Service-Workers (SW), therefore they work slightly differently from traditional ZIM. Many of them are available at https://download.kiwix.org/zim/zimit/. SW has a lot of security contstraints and basically works only through HTTPS or with localhost. Looking twice to the problem, it seems that the string is in the ZIM itself and actually this is not a problem for this PR.

@kelson42
Copy link
Collaborator

404 somehow works now but:

  • Double taskbar
  • The URL given in the error message is not the one of the URL bar (I understand why, but might be disturbing... not sure)
    image

I think you are on the good way and I could recommend to fix the last glitches so we can start a complete review. I would also recommend to make this feature as an cmd line option (so not based on a cookie) if this is not too complex to deal legacy and new taskbars together. Otherwise, we will probably have to take the risk to just move on with this new iframe based taskbar.

@veloman-yunkan
Copy link
Collaborator Author

I would also recommend to make this feature as an cmd line option (so not based on a cookie) if this is not too complex to deal legacy and new taskbars together. Otherwise, we will probably have to take the risk to just move on with this new iframe based taskbar.

@kelson42 What is the preferred way of deploying the new taskbar? The two extreme scenarios are:

  1. Keep for a while the old and new taskbars functional in the same running instance of kiwix-serve so that users have the option to switch to the old version in case of any issues in the new version. When all discovered issues are fixed (and the rate of new bug reports vanishes) the old taskbar is obsoleted and disabled. Any traces of the old implementation are gradually eliminated from the code.

    Pros:

    • This approach can be done in multiple small PRs.
    • The new taskbar can be rolled out gradually (randomly enabled for more and more users with the freedom to opt out). Entire user-base of kiwix will participate in the testing.
    • Safer

    Cons:

    • More work.
  2. Develop the new taskbar in the cleanest way in a branch (one big PR). Any code supporting the old taskbar will be mercilessly deleted/replaced when it poses barriers to the new implementation. Testing will need to be performed by dedicated testers. When the quality of the new implementation is assured, the PR is merged and the old taskbar is gone once and for all.

    Pros(?):

    • Likely, less total work (but in one big chunk). However this will strongly require that this course of action is carried out quickly, so that we avoid any potential extra work of addressing conflicts with other changes merged into master.
    • All-or-nothing result (if at some point a conceptual problem is discovered that will make us drop the idea completely, the code in master won't be polluted with attempts of implementing the wrong solution)

    Cons:

    • Higher risk (less testers, one big change)

Your recommendation (of enabling the new mode via a command line option of kiwix-serve) is between these two edge cases. What are the justifications for such a compromise?

@kelson42
Copy link
Collaborator

kelson42 commented Mar 23, 2022

@veloman-yunkan To me, if the overall overhead of #1 is not over one day, then #1 is probably preferable. Otherwise #2.

@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan To me, if the overall overhead of #1 is not over one day, then #1 is probably preferable. Otherwise #2.

Can you please clarify what you mean by overhead of one day? 24 man*hours?

@kelson42
Copy link
Collaborator

@veloman-yunkan To me, if the overall overhead of #1 is not over one day, then #1 is probably preferable. Otherwise #2.

Can you please clarify what you mean by overhead of one day? 24 man*hours?

No, around 8 hours.

@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan To me, if the overall overhead of #1 is not over one day, then #1 is probably preferable. Otherwise #2.

Can you please clarify what you mean by overhead of one day? 24 man*hours?

No, around 8 hours.

That probably means that we will have to go with option 2. But I think that we better have @mgautierfr's input too.

@veloman-yunkan
Copy link
Collaborator Author

veloman-yunkan commented Mar 23, 2022

Here are some of my thoughts regarding the final result of switching to an iframe-based viewer:

Book resources accessed through the /{book}/{path}/{to}/{item} URLs should be returned unmodified (as if served by the /raw endpoint). The /raw/{book}/content endpoint becomes redundant unless we anticipate that we may still have to apply some transformation to the resources served from the shorter/public URLs). Maybe we can take advantage of this major change and consider introducing a /content endpoint. Current hierarchy of kiwix-serve's URLs places all content/books at the top level, which means that one cannot have books with any of the following names:
- catalog
- catch
- random
- raw
- search
- skin
- suggest

Though that is a minor issue but solving it will give us more freedom when in need of adding new endpoints.

@mgautierfr
Copy link
Member

Lot of things here.

There is this PR #639 which remove jquery from the taskbar. I haven't finish it because of time and I'm not sure of some js/css feature implemented.
It would be nice to have this two PR compatibles. (As we somehow rewrite here the frontend using iframe instead of inserting the taskbar, I think my questions in #639 may be far more simpler)

About the plan, I'm in favor of .2, we would need better testing (and there is somehow already conflict between open PRs) but it would greatly simplify the code. We have a lot of code everywhere to serve "raw" content or not depending of the context, changing path and so. Having only one version would simplify this : never insert taskbar (remove the code), (almost) always return raw content... (And I'm speaking of simplification compared from what we already have, not what we would have if we go for .1)

Another point about the search.
This PR for the /search?content=foo&pattern=stuff to /viewer?search&content=foo&pattern=stuff
There is no technical reason to use a iframe as we don't render zim content here (and the main purpose of iframe is to separe chrome from content).
So we can keep the initial endpoint. (And if we go with .2 it should not be to complex to implement).
We may (but in another PR) make the search page a "full feature" page which js doing request to the backend and using the xml output (once #639 is finished and merge) to render the search result.

Current hierarchy of kiwix-serve's URLs places all content/books at the top level, which means that one cannot have books with any [....]

Totally agree with you. We should have a better "REST" API and stop serving content using the /<bookName> endpoint.

src/server/internalServer.cpp Outdated Show resolved Hide resolved
src/server/internalServer.cpp Outdated Show resolved Hide resolved
kiwix_serve_customized_resources Outdated Show resolved Hide resolved
static/skin/viewer.html Outdated Show resolved Hide resolved
Comment on lines 922 to 1133
auto entry = getEntryFromPath(*archive, itemPath);
if (entry.isRedirect() || itemPath.empty()) {
return build_redirect(bookName, entry.getItem(true), /*raw=*/true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree with this. /raw returns the raw content and if the requested path is not present, return nothing.
The empty path (`` or /) is a valid path in a zim file. Returning something different if we request it and it is not present break the "raw" feature.
(But it opens the room for the `/content` endpoint your speak about in your comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can be undone, provided that the support for old taskbar is dropped.

static/skin/viewer.html Outdated Show resolved Hide resolved
@@ -630,7 +630,9 @@ std::unique_ptr<Response> InternalServer::handle_search(const RequestContext& re
renderer.setProtocolPrefix(m_root + "/");
renderer.setSearchProtocolPrefix(m_root + "/search?");
renderer.setPageLength(pageLength);
auto response = ContentResponse::build(*this, renderer.getHtml(), "text/html; charset=utf-8");
const bool rawMode = rawModeWasRequested(request);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #731 I introduce the format=xml for opensearch results. (without being sure that format is the right name).
I wonder if there is something to define in common or not.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we drop the old taskbar, I would prefer to undo the ?mode=raw hacks for the /random and /search endpoints

@veloman-yunkan
Copy link
Collaborator Author

About the plan, I'm in favor of .2, we would need better testing (and there is somehow already conflict between open PRs) but it would greatly simplify the code. We have a lot of code everywhere to serve "raw" content or not depending of the context, changing path and so. Having only one version would simplify this : never insert taskbar (remove the code), (almost) always return raw content... (And I'm speaking of simplification compared from what we already have, not what we would have if we go for .1)

To justify taking the approach no. 2, we must properly prioritize the work on the iframe-based taskbar. For example, in the refactoring portion of #679 (currently spun off as #732) a noticeable effort has been put into handling the taskbar. Now it is quite disappointing to acknowledge that all that effort was kind of in vain (yet it shouldn't prevent us from doing the right thing).

@kelson42 @mgautierfr Can we come up with a short-term plan for our work in libkiwix?

@kelson42
Copy link
Collaborator

@veloman-yunkan @mgautierfr It is indeed really unfortunate if work is done and then thrown away. We don't have the resources here to allow us this kind of waste. I'm really thankful to any initiative to avoid this or improve the overall code througput. But, I have no strong opinion or plan in which orders things/tickets should be implemented within a milestone. To a large extend I trust you (the developers). I also trust you to talk to each other and find the optimal path to do things.

On my side, when I see a few tickets have been achieved and the current DOING pipe is a bit empty, I use to bring your attention on this or that ticket. Usually these tickets are urgent (incident) tickets or tickets in the current milestone. @veloman-yunkan Beside this, your work since a year is strongly tight to the "Better Library" project https://github.com/orgs/kiwix/projects/8.

On Friday, we had to release 10.1.0 quickly because of the XSS regression, but I have taken all what was left in the milestone and created a new https://github.com/kiwix/libkiwix/milestone/5 10.2.0 milestone. These are the tickets which should be fixed in priority. Obviously I always have an open hear to your inputs if something could/should be changed in the milestone.

@mgautierfr
Copy link
Member

Indeed it is a pity that unnecessary work has been done. When you start working on internationalization, I did plan that you would change so much things.
The switch to a iframe is issue open for a long time. I know about it but it is sadly normal that you miss it when you implement internationalization.

I see three solutions to avoid (or minimize) such problem:

  • Spending more time (mainly @kelson42 and me) on task specification, the scope of those modification and issue that can be related.
  • Developers (@veloman-yunkan, me and others) raising potential issue the sooner. Some of them are obvious to identify, some other less. Changing a lot of files and the global design of a system is a such of potential issue (I'm not saying that such modifications are not welcomed, I like what you've done in New way of building 404 error HTML responses #732)
  • We should have a public roadmap describing what we want to do and in which order. We could all refer to it when implementing something to check if we will make some change in the future in conflict with what we are doing. We would still have to discuss what to do but it would help us for the two first points. This roadmap could be in the git repository, it would allow everyone to see the changes and comment on them.

@stale
Copy link

stale bot commented Apr 16, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Apr 16, 2022
@kelson42
Copy link
Collaborator

kelson42 commented May 4, 2022

@veloman-yunkan If I follow things properly, it’s time now to rebase and complete this PR right?

@stale stale bot removed the stale label May 4, 2022
@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan If I follow things properly, it’s time now to rebase and complete this PR right?

If we go with plan 2, we must wait till there are no concurrent big changes going on in libkiwix at the same time. The search enhancement (#729 and friends) is one such big chunk of work. Let's compile a full list of such planned changes and decide an optimal implementation order for them.

@kelson42
Copy link
Collaborator

@veloman-yunkan If I follow things properly, it’s time now to rebase and complete this PR right?

If we go with plan 2, we must wait till there are no concurrent big changes going on in libkiwix at the same time. The search enhancement (#729 and friends) is one such big chunk of work. Let's compile a full list of such planned changes and decide an optimal implementation order for them.

#729 and friends are part of 10.2.0 milestone (to be released ASAP) and what we talk here is part of next 10.3.0. I can only recommend to have a look to the whole 10.3.0 milestone to make you an opinion. To me, and AFAIK, bringing the iframe is a fondamental work to do to allow a proper cache system. Therefore things should be done in this order. What runs as well, is the removal of JQuery which is almost over, you have here to agree with @mgautierfr when to bring this. All the rest of what is in 10.3.0 milestone is less prioritary and could/might be done later. #479 and #754 might need an early architecture discussion to avoid later bad suprises.

@stale
Copy link

stale bot commented Jun 12, 2022

This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions.

@stale stale bot added the stale label Jun 12, 2022
Made the viewer respect the `--blockexternal` and `--nolibrarybutton`
options of `kiwix-serve`. Those options are passed to the viewer
via the dynamically generated resource `/viewer_settings.js`.
`--nosearchbar` option of `kiwix-serve` (despite its misleading name)
was used to disable the entire taskbar. This commit accounts for the
existence of that option only partially:

1. Links to books on the welcome/library page are affected - by default
   books are displayed in the viewer, but in a kiwix-serve instance run
   with --nosearchbar books are loaded in the top window.

2. The `/viewer` endpoint is enabled unconditionally, so if anyone
   enters the viewer URL in the address bar they will see books in the
   viewer.
Auto hiding of the toolbars on narrow screens works only for the first
page loaded in the viewer. Navigating to other pages interferes with
autohiding as follows:

- If the toolbar was hidden, it stays hidden.

- If the toolbar was not hidden, it loses the ability to autohide.
If `kiwix-serve` is run with the `--nosearchbar` option the toolbar is
disabled (hidden) in its viewer.

Note however that certain actions performed by the viewer merely with
the purpose of keeping the toolbar up-to-date are still carried out.
A rebase invalidated the cacheids in the previous commits of the
iframe_based_content_viewer branch. This commit fixes only the current
state leaving the history with wrong cacheids - this can be an issue
for `git bisect` being executed on a commit range overlapping with
the iframe_based_content_viewer branch.
@veloman-yunkan veloman-yunkan force-pushed the iframe_based_content_viewer branch from a2b0cc0 to 06af1a6 Compare September 21, 2022 12:58
@veloman-yunkan
Copy link
Collaborator Author

It seems that now prependToFirstOccurence and appendToFirstOccurence are not used anymore. We could remove them. (And as prependToFirstOccurence` is not explicitly tested, it would increase our coverage).

handle_viewer_settings seems not tested. It would be nice to have a unittest fetching viewer_settings.js and check that values are correct.

@mgautierfr Done. I also rebased on recent master. You only have to look on the last four commits. Please tell me if the issue with out-of-date cacheids in the old commits has to be fixed.

Copy link
Member

@mgautierfr mgautierfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only a small change.

Please tell me if the issue with out-of-date cacheids in the old commits has to be fixed.

I don't think we need to fix the out-of-date cacheids.
It will break automatic git-bisect but manual bisect will be easily fixable. Let's move on.

test/server_testing_tools.h Outdated Show resolved Hide resolved
test/server_testing_tools.h Outdated Show resolved Hide resolved
With taskbar no longer being injected into the responses, it doesn't make any
sense testing the search on different flavours of the server.
@veloman-yunkan
Copy link
Collaborator Author

I think that before merging the code has to go through independent testing by some users.

@mgautierfr
Copy link
Member

I think that before merging the code has to go through independent testing by some users.

I think we can merge now. With master we have nightlies user can test with. And if there is bug, it can be fixed by another (smaller) PR instead of adding to this long PR. Of course we need to test master before doing the release.

@rgaudin
Copy link
Member

rgaudin commented Sep 21, 2022

I am also in favor of merging now and using our regular process. It will make this larger testing a lot easier IMO.

@mgautierfr
Copy link
Member

Time to merge this huge PR 🎉

@mgautierfr mgautierfr merged commit 3a75fac into master Sep 22, 2022
@mgautierfr mgautierfr deleted the iframe_based_content_viewer branch September 22, 2022 07:28
@kelson42
Copy link
Collaborator

@veloman-yunkan @mgautierfr The tickets corresponding to the work left, questionning, regressions have not been opne. please to that immediatly before we forget.

@rgaudin
Copy link
Member

rgaudin commented Sep 22, 2022

Should be deployed on dev with nightly tomorrow ; will announce on slack once it is

@veloman-yunkan
Copy link
Collaborator Author

@veloman-yunkan @mgautierfr The tickets corresponding to the work left, questionning, regressions have not been opne. please to that immediatly before we forget.

@kelson42 The only issue that I was aware about is now filed as #820.

mgautierfr added a commit that referenced this pull request Nov 30, 2022
* [API Break] Remove wrapper around libzim (@mgautierfr #789)
* Allow kiwix-serve to use custom resource files (@veloman-yunkan #779)
* Properly handle searchProtocolPrefix when rendering search result (@veloman-yunkan #823)
* Prevent search on multi language content (@veloman-yunkan #838)
* Use new `zim::Archive::getMediaCount` from libzim (@mgautierfr #836)
* Catalog:
 - Include tags in free text catalog search (@veloman-yunkan #802)
 - Illustration's url is based on book's uuid (@veloman-yunkan #804)
 - Cleanup of the opds-dumper (@veloman-yunkan #829)
 - Allow filtering of catalog content using multiple languages (@veloman-yunkan #841)
 - Make opds-dumper respect the namemapper (@mgautierfr #837)
* Server:
 - Correctly handle `\` in suggestion json generation (@veloman-yunkan #843)
 - Better http caching (@veloman-yunkan #833)
 - Make `/suggest` endpoint thread-safe (@veloman-yunkan #834)
 - Better redirection of main page (@veloman-yunkan #827)
 - Remove jquery (@mgautierfr @juuz0 #796)
 - Better Viewer of zim content :
   . Introduce `/content` endpoints (@veloman-yunkan #806)
   . Switch to iframe based content viewer (@veloman-yunkan #716)
 - Optimised design of the welcome page:
   . Alignement (@juuz0 @kelson42 #786)
   . Exit download modal on pressing escape key (@Juzz0 #800)
   . Add favicon for different devices (@Juzz0 #805)
   . Fix auto hidding of the toolbar (@veloman-yunkan #821)
   . Allow user to filter books by tags in the front page (@juuz0 #711)
* CI :
  - Trigger CI on pull_request (@kelson42 #791)
  - Drop Ubuntu Impish packaging (@legoktm #825)
  - Add Ubuntu Kinetic packaging (@legoktm #801)
* Testing:
  - Test ICULanguageInfo (@veloman-yunkan #795)
  - Introduce fake `test` language to test i18n (@veloman-yunkan #848)
* Fix documentation (@kelson42 #816)
* Udpate translation (#787 #839 #847)
@mgautierfr mgautierfr mentioned this pull request Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't introduce taskbar for non-front articles Use iframe instead of introduce taskbar.
4 participants