Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[import] 404 assets are downloaded #200

Open
kptdobe opened this issue Jun 7, 2023 · 1 comment
Open

[import] 404 assets are downloaded #200

kptdobe opened this issue Jun 7, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@kptdobe
Copy link
Contributor

kptdobe commented Jun 7, 2023

Using the multi output version of the import, you can reference assets. If an asset is 404, it will still download the asset which will obviously be corrupted.

Input page: https://www.xeljanz.com/psa
Import code:

export default {

  transform: ({
    document, url, html, params,
  }) => {
    const main = document.body;

    let path = WebImporter.FileUtils.sanitizePath(new URL(url).pathname.replace(/\.html$/, '').replace(/\/$/, ''));

    const results = [{
      path,
      element: main,
    }];

    main.querySelectorAll('video').forEach((video) => {
      const source = video.querySelector('source');
      if (source && source.src) {
        const u = new URL(source.src, url);
        const newPath = WebImporter.FileUtils.sanitizePath(u.pathname);
        results.push({
          path: newPath,
          from: u.toString(),
        });
      }
    });

    return results;
  },
};

This will download some mp4 but most of them do not exist. Also the status is not up-to-date.

@kptdobe kptdobe added the bug Something isn't working label Jun 7, 2023
@catalan-adobe
Copy link
Collaborator

Investigation

Issue discussed and investigated with @kptdobe: it seems it was rather an intermitent issue with the remote server. Looking at the issue today, I could not reproduce the issue on my machine and @kptdobe, after some cleanup on its machine, could not reproduce the issue either.

Extra Step

The investigation exposed a potential issue with video content served with 206 (partial content), hinting that the helix-cli proxy server might only partially cache such videos!

https://www.xeljanz.com/psa containing only small videos, we tested the 206 case on a custom webpage served from a custom application serving a big video (~150Mb) using 206 partial content (Go server borrowed here).
Importing this custom page, the full 150Mb video got cached by the proxy + the full video got downloaded via the "multi output" part of the import script from the description.

Conclusion

No issue to work on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants