Full HTML website #9

alexandruvesa · 2023-09-11T01:10:30Z

Hello !
First of all thanks for sharing the code with us!

Do you have any idea how to extend your code to process a whole website?
For example extract the content of website which has ~107000 tokens .

Thanks,
Alex

GianfrancoCorrea · 2023-09-20T14:42:21Z

Hi @mediflux95 !
I thought about this many times, there are 2 main processes
1- the first input gives an example of HTML to extract from the website, for example, an item from an Amazon store, and the GPT bot creates an expected output format, this is a JSON with the relevant values of the item example. After that, it generates the scraping code.
2- the second input, just takes the whole HTML code and test the generated code to scrape.

The first input is hard to process automatically due to the number of tokens, but we can replace the second input with an input to paste the URL, so you can run the code without copy/paste the whole HTML code.

The problem that we can face is the pages with client-side rendering. I know that there are some paid services, but also packages like selenium that I think can help with this.

Anyway, I would like to make the whole process just with the URL, but I can't realize yet how to handle the first step.

Feel free to propose ideas to implement it if you are interested.

Regards,
Gian

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full HTML website #9

Full HTML website #9

alexandruvesa commented Sep 11, 2023

GianfrancoCorrea commented Sep 20, 2023 •

edited

Loading

Full HTML website #9

Full HTML website #9

Comments

alexandruvesa commented Sep 11, 2023

GianfrancoCorrea commented Sep 20, 2023 • edited Loading

GianfrancoCorrea commented Sep 20, 2023 •

edited

Loading