-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Potomac Appalachian Trail Club #230
base: master
Are you sure you want to change the base?
Conversation
Just checked out the updates.
|
Weird.... I'll take a l99k.
…On Sat, Mar 14, 2020, 4:51 PM Scott McAllister ***@***.***> wrote:
Just checked out the updates.
- None of the event descriptions or costs are being pulled. All of the
event descriptions are "See event website" and all of the costs are
empty strings, which occurs because they are searching "See event
website" for a dollar amount, which will always return an empty string.
- The venues are still hardcoded as "Please refer to website" instead
of the name of the park/trail.
- The end_time should be an empty string when all_day_event is True.
- I didn't notice this before, but the module's logger isn't
configured properly. You need to from .utils.log import get_logger and
then logger = get_logger(os.path.basename(__file__)) (make sure you
also import os). This also made me realize that there's no error
logging in this module. At the very least, we should catch exceptions with
the requests and log appropriately. Right now, we
res.raise_for_status(), which will actually raise an error whenever
the status code != 200. That'll derail the entire module if even one
request fails. Better to catch and log the error and then continue trying
to scrape.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#230 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJQJN6S5HDNVCQ6ILKVGY4DRHPU5JANCNFSM4LB6RBEA>
.
|
@Frankie-Figz So the only outstanding issue is the lack of Venue Names. Thing is, that's a pretty big issue that we absolutely have to resolve afaik, especially if we ever want to upgrade to getting latitude and longitude. I think it might be possible to parse either the event names or the descriptions in order to get the venue name, but that'll probably be a lot of work and very fickle. There's also named-entity recognition in NLP packages like nltk, but we'd need almost certainly need to download the large assets (i.e. datasets and dictionaries) that those methods rely on - and that would ballon the deployment size and likely exceed AWS Lambda's size ceiling. Doesn't mean this approach isn't worth researching though. Because of these limitations, let me know if you want to try creating a method to extract the venue names or if you want to scrap the scraper (<-- pun intended there 🤡 ). |
I ain't scared of no ghost! I'll figure it out |
@csmcallister I think that from the event name we could extract the venue name. I don't know why I missed that obviously clear signal. The only caveat would be to parse away everything that was NOT venue name related; I'll try to implement something for this in the next commit. |
@Frankie-Figz did you still want to implement this one? |
Worked on the change requested to Minh from his corresponding PR.