Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Potomac Appalachian Trail Club #230

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

Frankie-Figz
Copy link
Contributor

Worked on the change requested to Minh from his corresponding PR.

Converted the object he returned to the dictionary needed
Fixed the URL being returned
Added cost as zero

@csmcallister csmcallister self-assigned this Mar 5, 2020
@csmcallister csmcallister added enhancement New feature or request new event source related to adding a new event source labels Mar 5, 2020
events/patc.py Outdated Show resolved Hide resolved
events/patc.py Outdated Show resolved Hide resolved
events/patc.py Outdated Show resolved Hide resolved
@csmcallister
Copy link
Collaborator

csmcallister commented Mar 14, 2020

Just checked out the updates.

  • None of the event descriptions or costs are being pulled. All of the event descriptions are "See event website" and all of the costs are empty strings, which occurs because they are searching "See event website" for a dollar amount, which will always return an empty string.
  • The venues are still hardcoded as "Please refer to website" instead of the name of the park/trail.
  • The end_time should be an empty string when all_day_event is True.
  • I didn't notice this before, but the module's logger isn't configured properly. You need to from .utils.log import get_logger and then logger = get_logger(os.path.basename(__file__)) (make sure you also import os). This also made me realize that there's no error logging in this module. At the very least, we should catch exceptions with the requests and log appropriately. Right now, we res.raise_for_status(), which will actually raise an error whenever the status code != 200. That'll derail the entire module if even one request fails. Better to catch and log the error and then continue trying to scrape.

@Frankie-Figz
Copy link
Contributor Author

Frankie-Figz commented Mar 14, 2020 via email

@csmcallister
Copy link
Collaborator

@Frankie-Figz So the only outstanding issue is the lack of Venue Names. Thing is, that's a pretty big issue that we absolutely have to resolve afaik, especially if we ever want to upgrade to getting latitude and longitude.

I think it might be possible to parse either the event names or the descriptions in order to get the venue name, but that'll probably be a lot of work and very fickle. There's also named-entity recognition in NLP packages like nltk, but we'd need almost certainly need to download the large assets (i.e. datasets and dictionaries) that those methods rely on - and that would ballon the deployment size and likely exceed AWS Lambda's size ceiling. Doesn't mean this approach isn't worth researching though.

Because of these limitations, let me know if you want to try creating a method to extract the venue names or if you want to scrap the scraper (<-- pun intended there 🤡 ).

@Frankie-Figz
Copy link
Contributor Author

I ain't scared of no ghost! I'll figure it out

@Frankie-Figz
Copy link
Contributor Author

@csmcallister I think that from the event name we could extract the venue name. I don't know why I missed that obviously clear signal. The only caveat would be to parse away everything that was NOT venue name related; I'll try to implement something for this in the next commit.

@csmcallister
Copy link
Collaborator

@Frankie-Figz did you still want to implement this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request new event source related to adding a new event source
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants