[Fix]#921 #925 Manual scrapping integration fixed. #924

arao · 2024-11-30T16:52:21Z

Pull Request Check List

Resolves: #921

Added tests for changed code.
Updated documentation for changed code.

Description:

Post updating debrid instant availability api fix in recent days, manual scraping endpoints logic require modification to be compaitible with updated models. In this commit, gap in models are addressed, ensuring no model change is introduced, saving changes on UI end.

In longer term, it can be looked upon if model change need to be propagated to UI as well.

[Testing]

Following testing mechanism is utilised.

Setup riven, riven-frontend in docker env.
Post correcting the issue, ensured manual scrapping and requested scrapping both are working optimally.
Ensure Symlinks are generated and data is indexed in updater. Currently validated with plex only.

Summary by CodeRabbit

New Features
- Introduced a configurable environment variable PORT, defaulting to 8080, for enhanced flexibility in application execution.
Bug Fixes
- Improved logging during the torrent filtering process, providing better feedback on ignored torrents and error handling for invalid torrents.
Documentation
- Enhanced logging statements to assist in debugging and monitoring the scraper's performance.

coderabbitai · 2024-11-30T16:52:29Z

Walkthrough

The changes in this pull request involve updates to the entrypoint.sh script and the ScraperRequestHandler class in shared.py. The entrypoint.sh script now includes a new environment variable PORT, defaulting to 8080, which allows for configurable port settings during application execution. In shared.py, the _parse_results function has been enhanced with additional logging for improved visibility into the torrent filtering process and error handling, although the core functionality remains unchanged.

Changes

File	Change Summary
entrypoint.sh	Added environment variable `PORT` with default value `8080` for configurable application port.
src/program/services/scrapers/shared.py	Enhanced logging in `_parse_results` for better feedback on torrent filtering and error handling.

Assessment against linked issues

Objective	Addressed	Explanation
The application should not throw a `TypeError` when initiating a manual scrape from the frontend. (#921)	❌	The changes do not address the specific `TypeError` issue.

Suggested reviewers

davidemarcoli
Gaisberg
iPromKnight

Poem

In the burrows deep, where the code does leap,
A new port's set, for the app to keep.
With logs that sing, of torrents they bring,
Debugging made easy, let the data swing!
Hops of joy, for fixes we deploy! 🐇✨

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6eb24f3 and 90891b6.

📒 Files selected for processing (2)

entrypoint.sh (2 hunks)
src/program/services/scrapers/shared.py (3 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

entrypoint.sh
src/program/services/scrapers/shared.py

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)

src/program/services/downloaders/__init__.py (1)

155-155: Update docstring to reflect new return type

The get_instant_availability method's docstring currently states "Check if the torrent is cached," but it now returns an Optional[TorrentContainer]. Please update the docstring to reflect that it may return None when the torrent is not available.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d2a6b5b and 2dd178e.

📒 Files selected for processing (2)

src/program/services/downloaders/__init__.py (1 hunks)
src/routers/secure/scrape.py (5 hunks)

🔇 Additional comments (2)

src/routers/secure/scrape.py (2)

222-225: Good handling of optional return value from get_instant_availability

You've correctly updated the code to handle the Optional[TorrentContainer] returned by get_instant_availability. Checking for None before setting stream.is_cached ensures robust error handling.

285-285: Verify the structure of containers

At line 285, containers is assigned as a list containing a single dictionary:

containers = [{file.file_id: file for file in torrent_container.files}]

Please ensure that this structure matches the expected format for the client consuming this data. If the client expects a list of container dictionaries, this is acceptable. However, if only one container is expected without nesting in a list, consider adjusting the assignment to:

containers = {file.file_id: file for file in torrent_container.files}

src/routers/secure/scrape.py

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (2)

entrypoint.sh (2)
6-6: Add port number validation

While setting a default port is good, it's recommended to validate that the PORT value is within the valid range (1-65535) to prevent runtime issues.
-PORT=${PORT:-8080}
+PORT=${PORT:-8080}
+if ! echo "$PORT" | grep -qE '^([1-9][0-9]{0,3}|[1-5][0-9]{4}|6[0-4][0-9]{3}|65[0-4][0-9]{2}|655[0-2][0-9]|6553[0-5])$'; then
+    echo "PORT is not a valid port number (1-65535). Exiting..."
+    exit 1
+fi
88-90: Improve non-root execution and debug setup

Several improvements can be made to the non-root execution:

debugpy should be installed during image build, not at runtime

Command execution could be improved for better error handling
-        poetry add debugpy 
-        exec su -m $USERNAME -c "cd /riven/src && poetry run python3 -m debugpy --listen 0.0.0.0:5678 main.py --port $PORT"
+        exec su -m $USERNAME -c "cd /riven/src && set -e && poetry run python3 -m debugpy --listen 127.0.0.1:5678 main.py --port $PORT"
-        su -m "$USERNAME" -c "cd /riven/src && poetry run python3 main.py --port $PORT"
+        exec su -m "$USERNAME" -c "cd /riven/src && set -e && poetry run python3 main.py --port $PORT"
Also, consider adding debugpy to your pyproject.toml as a development dependency instead of installing it at runtime.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 2dd178e and 81bacd5.

📒 Files selected for processing (1)

entrypoint.sh (2 hunks)

🔇 Additional comments (1)

entrypoint.sh (1)

81-90: Verify port configuration alignment with manual scraping functionality

The addition of port configuration seems unrelated to the PR's primary objective of fixing manual scraping integration. Let's verify if this change is intentional and necessary.

✅ Verification successful

Port configuration changes are intentional and necessary

The port-related changes are a valid addition based on the following findings:

A dedicated commit (81bacd5) addresses the need for configurable ports to support running multiple Riven instances in the same space
The changes are properly implemented in both CLI arguments (cli.py) and the main application
The feature is documented in the PR description and release notes
The default port is properly configured (8080) with command-line override capability

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if port-related changes are mentioned in the issue or PR
gh issue view 921 | grep -i "port"
gh pr view 924 | grep -i "port"

# Check if there are any other port-related changes in the codebase
rg -l "port.*=.*8080" --type py

Length of output: 156

Script:

#!/bin/bash
# Check for port-related configurations and usage in Python files
rg -l "port" --type py

# Check for any port-related command line arguments
rg -A 3 "add_argument.*port" --type py

# Look for default port configurations
rg "PORT.*=.*\d+" --type py

# Check main.py specifically for port usage
cat src/main.py

Length of output: 6380

Script:

#!/bin/bash
# Check the CLI argument handling for port
cat src/program/utils/cli.py

# Check the PR description and commits for port-related changes
gh pr view 924 --json body,commits

# Check for any recent changes to port-related configurations
git diff HEAD~5 -- src/main.py src/program/utils/cli.py entrypoint.sh

Length of output: 5212

entrypoint.sh

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (5)

src/program/services/scrapers/shared.py (5)
93-94: Enhance year mismatch logging for better debugging

Consider including the actual year values in the log message to make debugging easier.
-    logger.debug(f"Ignoring torrent {torrent.infohash} due to not in time year range, {item.log_string}")
+    logger.debug(f"Ignoring torrent {torrent.infohash} due to year mismatch (expected: {item.aired_at.year}, got: {torrent.data.year}), {item.log_string}")
102-103: Fix grammatical error in log message

There's a minor grammatical error in the log message.
-    logger.debug(f"Ignoring torrent {torrent.infohash} due mismatch in season size, required {len(needed_seasons)} current {len(torrent.data.seasons)}, {item.log_string}")
+    logger.debug(f"Ignoring torrent {torrent.infohash} due to mismatch in season size, required: {len(needed_seasons)}, current: {len(torrent.data.seasons)}, {item.log_string}")
109-111: Include specific season information in log message

Consider including which season is missing in the log message for better debugging context.
-    logger.debug(f"Ignoring torrent {torrent.infohash} due to missing required season, {item.log_string}")
+    logger.debug(f"Ignoring torrent {torrent.infohash} due to missing required season (needed: {needed_seasons}, found: {list(torrent.data.seasons)}), {item.log_string}")
127-128: Include specific episode and season numbers in log message

Consider including the expected and actual episode/season numbers in the log message for better debugging context.
-    logger.debug(f"Ignoring torrent {torrent.infohash} due to mismatch in season and episode numbers, {item.log_string}")
+    logger.debug(f"Ignoring torrent {torrent.infohash} due to mismatch in season/episode numbers (expected S{item.parent.number}E{item.number}, found seasons: {list(torrent.data.seasons)}, episodes: {list(torrent.data.episodes)}), {item.log_string}")
130-131: Include item type in category mismatch log message

Consider including the item type in the log message for better debugging context.
-    logger.debug(f"Skipping torrent {infohash} as it does not fit to any category. {item.log_string}")
+    logger.debug(f"Skipping torrent {infohash} as item type '{item.type}' does not fit any category. {item.log_string}")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 81bacd5 and 6eb24f3.

📒 Files selected for processing (1)

src/program/services/scrapers/shared.py (3 hunks)

🔇 Additional comments (2)

src/program/services/scrapers/shared.py (2)

135-135: LGTM! Enhanced error logging

The improved error logging for invalid torrents and garbage torrents will help with debugging issues. The inclusion of the specific error messages is particularly helpful.

Also applies to: 143-143

Line range hint 1-156: Verify location of item_type fix

While these logging enhancements are valuable, I notice that the fix for the missing item_type argument mentioned in the PR objectives (#921) is not present in this file.

src/program/services/scrapers/shared.py

entrypoint.sh

dreulavelle · 2024-12-01T08:13:22Z

src/routers/secure/scrape.py

@@ -290,7 +292,7 @@ def get_info_hash(magnet: str) -> str:
        "message": "Started manual scraping session",
        "session_id": session.id,
        "torrent_id": torrent_id,
-        "torrent_info": torrent_info,
+        "torrent_info": torrent_info.dict(),


I fixed the api issues in pr #915

just waiting for it to get merged

I see that's in draft, with multiple comments from reviewer. Is there a timeframe you are looking to merge ?

Sometime today #915 should be merged. Then we can get this one in as well. Lets remove the api changes here.

lets wait for raise, will rebase on updated mainline

updated mainline, if you want to rebase and tweak 👌

dreulavelle · 2024-12-01T08:17:39Z

src/program/services/scrapers/shared.py

@@ -140,6 +153,7 @@ def _parse_results(item: MediaItem, results: Dict[str, str], log_msg: bool = Tru
            torrents_dict[torrent.infohash] = Stream(torrent)
        logger.log("SCRAPER", f"Kept {len(torrents_dict)} streams for {item.log_string} after processing bucket limit")
        return torrents_dict
+    logger.debug(f"No valid torrent remains after filtering for {item.log_string}: {raw_title}")


We're given a list of streams to parse here in this function, so this line would only read out the last stream that gets parsed from the list. This line can be removed as we handle individual stream logging throughout this function already.

Just a additional explicit debug checkpoint never hurt, updated to remove raw_title

…ariable This is useful if multiple instance of riven are require to run in same space. One such usecase is to have separate instance for separate torrent profiles. In current setting space, multiple profiles cannot be specified. [Testing] - Changes tested by running two separate instance in same space at different port folders

This additional information is useful for debugging issues with ranking and torrent profiles, where if nothing is filtered for a config, it's very hard to debug which setting cause all things to filtered out. [TESTING] Changes tested by runing stack in local env with docker.

arao · 2024-12-07T18:56:33Z

updated changes, have these reviewed and merged, removed api fix commit, only container port and debug logs changes now in this PR.

coderabbitai bot reviewed Nov 30, 2024

View reviewed changes

src/routers/secure/scrape.py Outdated Show resolved Hide resolved

coderabbitai bot reviewed Dec 1, 2024

View reviewed changes

entrypoint.sh Show resolved Hide resolved

entrypoint.sh Show resolved Hide resolved

coderabbitai bot reviewed Dec 1, 2024

View reviewed changes

src/program/services/scrapers/shared.py Outdated Show resolved Hide resolved

arao changed the title ~~[Fix]#921 Manual scrapping integration fixed.~~ [Fix]#921 #925 Manual scrapping integration fixed. Dec 1, 2024

dreulavelle reviewed Dec 1, 2024

View reviewed changes

entrypoint.sh Show resolved Hide resolved

dreulavelle reviewed Dec 1, 2024

View reviewed changes

entrypoint.sh Show resolved Hide resolved

dreulavelle reviewed Dec 1, 2024

View reviewed changes

dreulavelle approved these changes Dec 6, 2024

View reviewed changes

dreulavelle self-requested a review December 7, 2024 08:57

arao closed this Dec 7, 2024

arao force-pushed the main branch from 6eb24f3 to 8a5e849 Compare December 7, 2024 18:39

arao added 3 commits December 7, 2024 18:39

[CHORE] Comments from rivenmedia#924 rivenmedia#925 are addressed

90891b6

arao reopened this Dec 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix]#921 #925 Manual scrapping integration fixed. #924

[Fix]#921 #925 Manual scrapping integration fixed. #924

arao commented Nov 30, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 30, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

dreulavelle Dec 1, 2024

arao Dec 1, 2024

dreulavelle Dec 3, 2024

arao Dec 4, 2024

dreulavelle Dec 6, 2024

dreulavelle Dec 1, 2024

arao Dec 7, 2024

arao commented Dec 7, 2024

[Fix]#921 #925 Manual scrapping integration fixed. #924

Are you sure you want to change the base?

[Fix]#921 #925 Manual scrapping integration fixed. #924

Conversation

arao commented Nov 30, 2024 • edited by coderabbitai bot Loading

Pull Request Check List

Description:

Summary by CodeRabbit

coderabbitai bot commented Nov 30, 2024 • edited Loading

Walkthrough

Changes

Assessment against linked issues

Suggested reviewers

Poem

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

dreulavelle Dec 1, 2024

Choose a reason for hiding this comment

arao Dec 1, 2024

Choose a reason for hiding this comment

dreulavelle Dec 3, 2024

Choose a reason for hiding this comment

arao Dec 4, 2024

Choose a reason for hiding this comment

dreulavelle Dec 6, 2024

Choose a reason for hiding this comment

dreulavelle Dec 1, 2024

Choose a reason for hiding this comment

arao Dec 7, 2024

Choose a reason for hiding this comment

arao commented Dec 7, 2024

arao commented Nov 30, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 30, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)