Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix]#921 #925 Manual scrapping integration fixed. #924

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

arao
Copy link

@arao arao commented Nov 30, 2024

Pull Request Check List

Resolves: #921

  • Added tests for changed code.
  • Updated documentation for changed code.

Description:

Post updating debrid instant availability api fix in recent days, manual scraping endpoints logic require modification to be compaitible with updated models. In this commit, gap in models are addressed, ensuring no model change is introduced, saving changes on UI end.

In longer term, it can be looked upon if model change need to be propagated to UI as well.

[Testing]

Following testing mechanism is utilised.

  • Setup riven, riven-frontend in docker env.
  • Post correcting the issue, ensured manual scrapping and requested scrapping both are working optimally.
  • Ensure Symlinks are generated and data is indexed in updater. Currently validated with plex only.

Summary by CodeRabbit

  • New Features

    • Introduced a configurable environment variable PORT, defaulting to 8080, for enhanced flexibility in application execution.
  • Bug Fixes

    • Improved logging during the torrent filtering process, providing better feedback on ignored torrents and error handling for invalid torrents.
  • Documentation

    • Enhanced logging statements to assist in debugging and monitoring the scraper's performance.

Copy link
Contributor

coderabbitai bot commented Nov 30, 2024

Walkthrough

The changes in this pull request involve updates to the entrypoint.sh script and the ScraperRequestHandler class in shared.py. The entrypoint.sh script now includes a new environment variable PORT, defaulting to 8080, which allows for configurable port settings during application execution. In shared.py, the _parse_results function has been enhanced with additional logging for improved visibility into the torrent filtering process and error handling, although the core functionality remains unchanged.

Changes

File Change Summary
entrypoint.sh Added environment variable PORT with default value 8080 for configurable application port.
src/program/services/scrapers/shared.py Enhanced logging in _parse_results for better feedback on torrent filtering and error handling.

Assessment against linked issues

Objective Addressed Explanation
The application should not throw a TypeError when initiating a manual scrape from the frontend. (#921) The changes do not address the specific TypeError issue.

Suggested reviewers

  • davidemarcoli
  • Gaisberg
  • iPromKnight

Poem

In the burrows deep, where the code does leap,
A new port's set, for the app to keep.
With logs that sing, of torrents they bring,
Debugging made easy, let the data swing!
Hops of joy, for fixes we deploy! 🐇✨


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6eb24f3 and 90891b6.

📒 Files selected for processing (2)
  • entrypoint.sh (2 hunks)
  • src/program/services/scrapers/shared.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • entrypoint.sh
  • src/program/services/scrapers/shared.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
src/program/services/downloaders/__init__.py (1)

155-155: Update docstring to reflect new return type

The get_instant_availability method's docstring currently states "Check if the torrent is cached," but it now returns an Optional[TorrentContainer]. Please update the docstring to reflect that it may return None when the torrent is not available.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d2a6b5b and 2dd178e.

📒 Files selected for processing (2)
  • src/program/services/downloaders/__init__.py (1 hunks)
  • src/routers/secure/scrape.py (5 hunks)
🔇 Additional comments (2)
src/routers/secure/scrape.py (2)

222-225: Good handling of optional return value from get_instant_availability

You've correctly updated the code to handle the Optional[TorrentContainer] returned by get_instant_availability. Checking for None before setting stream.is_cached ensures robust error handling.


285-285: Verify the structure of containers

At line 285, containers is assigned as a list containing a single dictionary:

containers = [{file.file_id: file for file in torrent_container.files}]

Please ensure that this structure matches the expected format for the client consuming this data. If the client expects a list of container dictionaries, this is acceptable. However, if only one container is expected without nesting in a list, consider adjusting the assignment to:

containers = {file.file_id: file for file in torrent_container.files}

src/routers/secure/scrape.py Outdated Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)
entrypoint.sh (2)

6-6: Add port number validation

While setting a default port is good, it's recommended to validate that the PORT value is within the valid range (1-65535) to prevent runtime issues.

-PORT=${PORT:-8080}
+PORT=${PORT:-8080}
+if ! echo "$PORT" | grep -qE '^([1-9][0-9]{0,3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])$'; then
+    echo "PORT is not a valid port number (1-65535). Exiting..."
+    exit 1
+fi

88-90: Improve non-root execution and debug setup

Several improvements can be made to the non-root execution:

  1. debugpy should be installed during image build, not at runtime
  2. Command execution could be improved for better error handling
-        poetry add debugpy 
-        exec su -m $USERNAME -c "cd /riven/src && poetry run python3 -m debugpy --listen 0.0.0.0:5678 main.py --port $PORT"
+        exec su -m $USERNAME -c "cd /riven/src && set -e && poetry run python3 -m debugpy --listen 127.0.0.1:5678 main.py --port $PORT"
-        su -m "$USERNAME" -c "cd /riven/src && poetry run python3 main.py --port $PORT"
+        exec su -m "$USERNAME" -c "cd /riven/src && set -e && poetry run python3 main.py --port $PORT"

Also, consider adding debugpy to your pyproject.toml as a development dependency instead of installing it at runtime.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2dd178e and 81bacd5.

📒 Files selected for processing (1)
  • entrypoint.sh (2 hunks)
🔇 Additional comments (1)
entrypoint.sh (1)

81-90: Verify port configuration alignment with manual scraping functionality

The addition of port configuration seems unrelated to the PR's primary objective of fixing manual scraping integration. Let's verify if this change is intentional and necessary.

✅ Verification successful

Port configuration changes are intentional and necessary

The port-related changes are a valid addition based on the following findings:

  • A dedicated commit (81bacd5) addresses the need for configurable ports to support running multiple Riven instances in the same space
  • The changes are properly implemented in both CLI arguments (cli.py) and the main application
  • The feature is documented in the PR description and release notes
  • The default port is properly configured (8080) with command-line override capability
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if port-related changes are mentioned in the issue or PR
gh issue view 921 | grep -i "port"
gh pr view 924 | grep -i "port"

# Check if there are any other port-related changes in the codebase
rg -l "port.*=.*8080" --type py

Length of output: 156


Script:

#!/bin/bash
# Check for port-related configurations and usage in Python files
rg -l "port" --type py

# Check for any port-related command line arguments
rg -A 3 "add_argument.*port" --type py

# Look for default port configurations
rg "PORT.*=.*\d+" --type py

# Check main.py specifically for port usage
cat src/main.py

Length of output: 6380


Script:

#!/bin/bash
# Check the CLI argument handling for port
cat src/program/utils/cli.py

# Check the PR description and commits for port-related changes
gh pr view 924 --json body,commits

# Check for any recent changes to port-related configurations
git diff HEAD~5 -- src/main.py src/program/utils/cli.py entrypoint.sh

Length of output: 5212

entrypoint.sh Show resolved Hide resolved
entrypoint.sh Show resolved Hide resolved
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)
src/program/services/scrapers/shared.py (5)

93-94: Enhance year mismatch logging for better debugging

Consider including the actual year values in the log message to make debugging easier.

-    logger.debug(f"Ignoring torrent {torrent.infohash} due to not in time year range, {item.log_string}")
+    logger.debug(f"Ignoring torrent {torrent.infohash} due to year mismatch (expected: {item.aired_at.year}, got: {torrent.data.year}), {item.log_string}")

102-103: Fix grammatical error in log message

There's a minor grammatical error in the log message.

-    logger.debug(f"Ignoring torrent {torrent.infohash} due mismatch in season size, required {len(needed_seasons)} current {len(torrent.data.seasons)}, {item.log_string}")
+    logger.debug(f"Ignoring torrent {torrent.infohash} due to mismatch in season size, required: {len(needed_seasons)}, current: {len(torrent.data.seasons)}, {item.log_string}")

109-111: Include specific season information in log message

Consider including which season is missing in the log message for better debugging context.

-    logger.debug(f"Ignoring torrent {torrent.infohash} due to missing required season, {item.log_string}")
+    logger.debug(f"Ignoring torrent {torrent.infohash} due to missing required season (needed: {needed_seasons}, found: {list(torrent.data.seasons)}), {item.log_string}")

127-128: Include specific episode and season numbers in log message

Consider including the expected and actual episode/season numbers in the log message for better debugging context.

-    logger.debug(f"Ignoring torrent {torrent.infohash} due to mismatch in season and episode numbers, {item.log_string}")
+    logger.debug(f"Ignoring torrent {torrent.infohash} due to mismatch in season/episode numbers (expected S{item.parent.number}E{item.number}, found seasons: {list(torrent.data.seasons)}, episodes: {list(torrent.data.episodes)}), {item.log_string}")

130-131: Include item type in category mismatch log message

Consider including the item type in the log message for better debugging context.

-    logger.debug(f"Skipping torrent {infohash} as it does not fit to any category. {item.log_string}")
+    logger.debug(f"Skipping torrent {infohash} as item type '{item.type}' does not fit any category. {item.log_string}")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 81bacd5 and 6eb24f3.

📒 Files selected for processing (1)
  • src/program/services/scrapers/shared.py (3 hunks)
🔇 Additional comments (2)
src/program/services/scrapers/shared.py (2)

135-135: LGTM! Enhanced error logging

The improved error logging for invalid torrents and garbage torrents will help with debugging issues. The inclusion of the specific error messages is particularly helpful.

Also applies to: 143-143


Line range hint 1-156: Verify location of item_type fix

While these logging enhancements are valuable, I notice that the fix for the missing item_type argument mentioned in the PR objectives (#921) is not present in this file.

src/program/services/scrapers/shared.py Outdated Show resolved Hide resolved
@arao arao changed the title [Fix]#921 Manual scrapping integration fixed. [Fix]#921 #925 Manual scrapping integration fixed. Dec 1, 2024
@@ -290,7 +292,7 @@ def get_info_hash(magnet: str) -> str:
"message": "Started manual scraping session",
"session_id": session.id,
"torrent_id": torrent_id,
"torrent_info": torrent_info,
"torrent_info": torrent_info.dict(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the api issues in pr #915

just waiting for it to get merged

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that's in draft, with multiple comments from reviewer. Is there a timeframe you are looking to merge ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometime today #915 should be merged. Then we can get this one in as well. Lets remove the api changes here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets wait for raise, will rebase on updated mainline

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated mainline, if you want to rebase and tweak 👌

@@ -140,6 +153,7 @@ def _parse_results(item: MediaItem, results: Dict[str, str], log_msg: bool = Tru
torrents_dict[torrent.infohash] = Stream(torrent)
logger.log("SCRAPER", f"Kept {len(torrents_dict)} streams for {item.log_string} after processing bucket limit")
return torrents_dict
logger.debug(f"No valid torrent remains after filtering for {item.log_string}: {raw_title}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're given a list of streams to parse here in this function, so this line would only read out the last stream that gets parsed from the list. This line can be removed as we handle individual stream logging throughout this function already.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a additional explicit debug checkpoint never hurt, updated to remove raw_title

arao added 3 commits December 7, 2024 18:39
…ariable

This is useful if multiple instance of riven are require to run in same space.
One such usecase is to have separate instance for separate torrent profiles. In
current setting space, multiple profiles cannot be specified.

[Testing]
- Changes tested by running two separate instance in same space at different port folders
This additional information is useful for debugging issues with ranking
and torrent profiles, where if nothing is filtered for a config, it's
very hard to debug which setting cause all things to filtered out.

[TESTING]

Changes tested by runing stack in local env with docker.
@arao arao reopened this Dec 7, 2024
@arao
Copy link
Author

arao commented Dec 7, 2024

updated changes, have these reviewed and merged, removed api fix commit, only container port and debug logs changes now in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error: Downloader.get_instant_availability() missing 1 required positional argument: 'item_type'
2 participants