-
-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[metascraper-amazon] Image selector matches incorrect image #50
Comments
yeah, of course, just add the right rule here: Can you specific the URl for creating a unit test? |
Hey @agchou, I think you create your own package for support this new custom rule. Can you share with us? I want to improve this in the |
Happy to accept improvements over |
Yea @Kikobeats I get just a tiny 1 pixel image every time. How do we go about fixing this? Can the rule be overridden? |
@andyk2177 need to add the specific rule for contemplating that case. Please, share the URL that is causing this behavior. We can add a code ward to don't consider images with less than N pixels. |
Well, seems to be any Amazon link for me that is doing it but here is an example https://www.amazon.com/JNH-Lifestyles-Canadian-Hemlock-Infrared/dp/B00F2Y5B6W?tag=profiledotim-20 |
I think we might just need a more specific class name to grab maybe? |
@andyk2177 yes, you're right, the problem is Amazon has a lot of different product views; need to setup the rules in a way we can maximize get the proper image. Can you make a PR? Just you need is to add the specific image selector here. |
Ok sure, why are there two selectors though? Which one is prioritized? So for example with my url above I get this image back but the page does have a |
the best way to determinate that is adding a test per every link and be sure the output is the thing you expect |
Getting "robot check" every link I've tried for an amazon product -- anyone else seeing this? Example URL: https://www.amazon.com/dp/B07SY4C5QF/ref=cm_sw_r_tw_apa_i_2qJLDbGGS3H0Q |
@bobber205 it's probably because your
|
What kind of data are you interested in? Looks almost all the data is there using Microlink API https://api.microlink.io/?url=https%3A%2F%2Fwww.amazon.com%2Fdp%2FB07SY4C5QF |
Good advice on setting the user agent! I've set it to
That's what google says is the latest User Agent for Chrome. I don't see "Robot Check" anymore but I do get https://fls-na.amazon.com/1/batch/1/OP/ATVPDKIKX0DER:144-1080801-5689911:ADD1XNCC3BW7K9PR531T$uedata=s:%2Fdp%2FB07SY4C5QF%2Fref%3Dcm_sw_r_tw_apa_i_2qJLDbGGS3H0Q%2Fuedata%2Fnvp%2Funsticky%2F144-1080801-5689911%2FNoPageType%2Fntpoffrw%3Fstaticb%26id%3DADD1XNCC3BW7K9PR531T%26pty%3DDetail%26spty%3DGlance%26pti%3DB0798MSV1F:1000 (a large black image) for the image. :( |
@Kikobeats I'm looking for the image mostly. The rest is coming through great once I've set the user agent |
Hey! So, I'm getting everything required except the product's image from this amazon URL. Looked at the selectors that are used by metascraper; those parts exist in the html but seem empty. The actual image that should be extracted doesn't have a class or an id. It can be found within a div that has the "digitalMusicProductImage_feature_div" id. Example URL: https://www.amazon.de/Vienna-Bolling-Project-»Classic-Jazz«/dp/B003604LHE Is there anything to do with this @Kikobeats ? Thanks! |
@pdesmarais perhaps https://microlink.io/docs/mql/getting-started/overview can help us out here? |
@pdesmarais Have you tried setting the useProxy init variable true? |
@bobber205 where do you set that? I don't see it in the docs? |
ah I was confusing this with the opengraph paid product. Sorry :( |
I'm running into issues with the image value not being the main image for metascraper-amazon. There are actually multiple .a-dyanmic-image classes on the screen as seen in the attached photo. Can we create some rules with priority over this like
wrapUrl($ => $('#landingImage').attr('src'))
orwrapUrl($ => $('.a-dynamic-image').first().attr('src'))
?The text was updated successfully, but these errors were encountered: