Replies: 18 comments
-
I have the same problem, how do you decode the link? |
Beta Was this translation helpful? Give feedback.
-
I found this, I think through Stack Overflow.
|
Beta Was this translation helpful? Give feedback.
-
Same problem here! The from gnews import GNews
google_news = GNews()
json_resp = google_news.get_news('Pakistan')
article = google_news.get_full_article(json_resp[0]['url']) # newspaper3k instance, you can access newspaper3k all attributes in article
article.title
# Google News |
Beta Was this translation helpful? Give feedback.
-
I also have the same problem with the articles. I tried running the base64 decoder mentioned here, but it gives me what looks like a random series of characters. I'm curious if Google changed the article link length and this is affecting any decoding but either way, this seems to be some sort of decoding issue that defaults to no output and the title of "Google News". Example:
Output: When it should link to: |
Beta Was this translation helpful? Give feedback.
-
Having same encoding issue :( |
Beta Was this translation helpful? Give feedback.
-
@vincenzon Thanks for this, can you please create a PR for this patch |
Beta Was this translation helpful? Give feedback.
-
Fix from @vincenzon initially worked for couple of days and now getting the same output as @Isaaq-Khader for links. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
you can try the decoder function from https://gist.github.com/huksley/bc3cb046157a99cd9d1517b32f91a99e?permalink_comment_id=5132769#gistcomment-5132769 it works for me |
Beta Was this translation helpful? Give feedback.
-
That worked for me! I have it in my code now and it allows me to fetch the articles as before. Hopefully, this is a nice, permanent fix. Thank you guys for sharing :) |
Beta Was this translation helpful? Give feedback.
-
Great! Can someone make pull request for this issue? |
Beta Was this translation helpful? Give feedback.
-
I thought google blocks the base64 encoding, so used another way to solve. I get the original url by using selenium current_url
|
Beta Was this translation helpful? Give feedback.
-
Seems like this is happening again |
Beta Was this translation helpful? Give feedback.
-
New solution available here for the decoding: I tested it and it seems to solve the issue. |
Beta Was this translation helpful? Give feedback.
-
How do you resolve the 429 timeout issues when decoding the URLs? |
Beta Was this translation helpful? Give feedback.
-
In case anyone else comes across this issue, I found @SSujitX solution to work like a charm. Although it is slower due to the rate limiting, it's a great way to kick back up any news retrievals. Perhaps the package could be integrated into GNews to allow others to get their articles? |
Beta Was this translation helpful? Give feedback.
-
@Isaaq-Khader I love you man. Thank you for sharing @SSujitX solution. I have been searching all over the internet. |
Beta Was this translation helpful? Give feedback.
-
I have an automated process that searches for and downloads articles every few hours. As of July 19th 2024 it stopped getting the article text. I traced some examples and it looks like here:
https://github.com/ranahaani/GNews/blob/a322163a40a0db2294b68ab50b1a6243fb69d2d4/gnews/utils/utils.py#L25C15-L25C62
The google news url is supposed to be dereferenced to the original source url, but that is not happening. If I manually decode the google url to the original source url, things work as expected.
I'm unsure if there was a change on the Google side, or on my side that broke this. For now I am inserting a base64 decode of the Google link into my processing pipeline. If there is a cleaner or more permanent fix I'd like to hear it.
Beta Was this translation helpful? Give feedback.
All reactions