Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

raw images first #76

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

raw images first #76

wants to merge 3 commits into from

Conversation

liyiecho
Copy link

@liyiecho liyiecho commented Mar 19, 2018

Fixes: #65

Copy link
Owner

@dixudx dixudx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the redundant binary file.

@@ -68,8 +68,27 @@ def run(self):
def download(self, medium_type, post, target_folder):
try:
medium_url = self._handle_medium_url(medium_type, post)
medium_url_bak = medium_url
medium_url =re.sub(u'[^/]*media.tumblr.com', u'data.tumblr.com', medium_url)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing to this?

@@ -68,8 +68,27 @@ def run(self):
def download(self, medium_type, post, target_folder):
try:
medium_url = self._handle_medium_url(medium_type, post)
medium_url_bak = medium_url
medium_url =re.sub(u'[^/]*media.tumblr.com', u'data.tumblr.com', medium_url)
if (b'_100.' in medium_url):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like this exhaustive way. Hard coded is not a good choice.

Why not splitting the string and replacing with raw instead?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you should not change at here. Only photos/images are applicable with raw.

Method _download(**) is the right place.

@@ -68,8 +68,23 @@ def run(self):
def download(self, medium_type, post, target_folder):
try:
medium_url = self._handle_medium_url(medium_type, post)
if medium_url is not None:
self._download(medium_type, medium_url, target_folder)
#print("medium url is %s", medium_url)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this comment line.

self._download(medium_type, medium_url, target_folder, resp_raw)
elif medium_type == "photo":
medium_url_bak = medium_url
medium_url_dot = medium_url.split('.')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The url parsing here seems complex and error-prone.

Below part is a better way. WDYT?

    def download(self, medium_type, post, target_folder):
        try:
            medium_url = self._handle_medium_url(medium_type, post)
            if medium_url is not None:
                if medium_type == "photo":
                    try:
                        # try to download raw image
                        medium_url_raw = medium_url.replace("68.media.tumblr.com", "data.tumblr.com")
                        raw_matched = self.hd_photo_regex.match(medium_url_raw)
                        if raw_matched is not None:
                            replace_raw = raw_matched.groups()[0]
                            replace_raw = replace_raw.replace(raw_matched.groups()[1], "raw")
                            medium_url_raw = medium_url_raw.replace(raw_matched.groups()[0], replace_raw)
                            self._download(medium_type, medium_url_raw, target_folder)
                            return
                    except:
                        pass
                self._download(medium_type, medium_url, target_folder)
        except TypeError:
            pass

    # can register differnet regex match rules
    def _register_regex_match_rules(self):
        # will iterate all the rules
        # the first matched result will be returned
        self.regex_rules = [video_hd_match(), video_default_match()]
        self.hd_photo_regex = re.compile(r".*(tumblr_\w+_(\d+))", re.IGNORECASE)

@liyiecho
Copy link
Author

medium_url_raw = medium_url.replace("68.media.tumblr.com", "data.tumblr.com")
It doesn't always 68.media.tumblr.com

@dixudx
Copy link
Owner

dixudx commented Mar 19, 2018

It doesn't always 68.media.tumblr.com

@liyiecho So just use regex to match and replace it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants