Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: transitFeedSyncProcessing implementation #819

Merged
merged 21 commits into from
Nov 28, 2024

Conversation

AlfredNwolisa
Copy link
Contributor

@AlfredNwolisa AlfredNwolisa commented Nov 12, 2024

Summary:

The pull request addresses feed sync processing, ensuring proper handling and consistency of feed data using Pub/Sub messages. It includes necessary configuration files, comprehensive tests, and documentation.

Key implementations include:

  • Created FeedProcessor class with comprehensive database interaction capabilities
  • Implemented idempotent feed processing logic that handles both new and existing feeds
  • Added support for feed URL change detection and deprecation workflow
  • Integrated with Google Cloud Pub/Sub for dataset batch processing
  • Implemented stable ID generation for feed tracking across updates
  • Added comprehensive logging and error handling at all processing stages
  • Implemented database transaction management with rollback support

Added support for:

  • Authentication type handling for feeds
  • External ID mapping and management
  • Feed redirection tracking
  • URL duplication checking
  • Feed status management (active/deprecated)

This pull request addresses the functionality described in issue https://github.com/MobilityData/product-tasks/issues/102, which is part of the https://github.com/MobilityData/product-tasks/issues/95 epic.

Expected behavior:

Feed Processing Flow:

  1. The Cloud Function receives a Pub/Sub event containing feed information:
  • Decodes base64 encoded message
  • Validates and parses the payload into a FeedPayload object
  1. For each feed processing request:
  • Checks if feed exists using external ID and source
  • Validates feed URL for duplicates across the system
  1. New Feed Processing:
  • If feed doesn't exist:
  • Generates new UUID and stable ID
  • Creates feed record with active status
  • Creates external ID mapping
  • Publishes to dataset batch topic if not authenticated
  1. Feed Update Processing:
  • If feed exists with different URL:
  • Creates new feed record with updated URL
  • Deprecates old feed record
  • Updates external ID mapping
  • Creates redirect mapping between old and new feed IDs
  • Publishes update to dataset batch topic if not authenticated
  1. Database Transaction Handling:
  • Commits successful operations
  • Rolls back on any errors
  • Maintains data consistency across all operations
  1. Error Handling:
  • Provides detailed logging at each step

Testing tips:

Provide tips, procedures and sample files on how to test the feature.
Testers are invited to follow the tips AND to try anything they deem relevant outside the bounds of the testing tips.

Please make sure these boxes are checked before submitting your pull request - thanks!

  • Run the unit tests with ./scripts/api-tests.sh to make sure you didn't break anything
  • Add or update any needed documentation to the repo
  • Format the title like "feat: [new feature short description]". Title must follow the Conventional Commit Specification(https://www.conventionalcommits.org/en/v1.0.0/).
  • Linked all relevant issues
  • Include screenshot(s) showing how this pull request works and fixes the issue(s)

This commit:
- Implements feed sync processing for Pub/Sub messages
- Ensures database consistency during sync operations
- Adds configuration files for feed sync settings
- Includes comprehensive test coverage
- Documents sync process and configuration options
@AlfredNwolisa AlfredNwolisa self-assigned this Nov 12, 2024
@AlfredNwolisa AlfredNwolisa changed the title feat: Add Transitland feed sync processor Feat: transitFeedSyncProcessing implementation Nov 12, 2024
Replaced raw SQL queries with SQLAlchemy ORM models for handling database operations in feed processing. Enhanced test coverage and updated mock configurations to align with the new ORM-based approach.
@AlfredNwolisa AlfredNwolisa marked this pull request as ready for review November 18, 2024 20:06
@davidgamez davidgamez requested a review from cka-y November 18, 2024 20:15
functions-python/feed_sync_process_transitland/src/main.py Outdated Show resolved Hide resolved
functions-python/feed_sync_process_transitland/src/main.py Outdated Show resolved Hide resolved
logger.error(error_msg)
if "payload" in locals():
self.session.rollback()
logger.debug("Database transaction rolled back due to error")
Copy link
Contributor

@cka-y cka-y Nov 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should change this to an error level logging as it is critical

functions-python/feed_sync_process_transitland/src/main.py Outdated Show resolved Hide resolved
functions-python/feed_sync_process_transitland/src/main.py Outdated Show resolved Hide resolved
functions-python/feed_sync_process_transitland/src/main.py Outdated Show resolved Hide resolved
functions-python/feed_sync_process_transitland/src/main.py Outdated Show resolved Hide resolved
@cka-y
Copy link
Contributor

cka-y commented Nov 19, 2024

New feeds and feed updates should have operational_status="wip" so they can be manually validated before becoming public.

AlfredNwolisa and others added 3 commits November 19, 2024 12:40
Replaced custom logger setup with unified Logger class. Improved error handling and rollback in database transactions. Added location support and refined feed ID management. Updated test cases to reflect these changes.
Replaced direct logger calls with a unified log_message function to support both local and GCP logging. Refactored the test cases to mock enhanced logging and implemented new test scenarios to cover additional edge cases, ensuring robustness in feed processing.
@AlfredNwolisa AlfredNwolisa requested a review from cka-y November 27, 2024 15:53
Copy link
Contributor

@cka-y cka-y left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 💪
Integration tests seem to be failing intermittently (not related to the changes in this PR)

@AlfredNwolisa AlfredNwolisa merged commit a18227e into main Nov 28, 2024
2 of 3 checks passed
@AlfredNwolisa AlfredNwolisa deleted the Tlnd_feed_sync_process branch November 28, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants