You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi @mediflux95 !
I thought about this many times, there are 2 main processes
1- the first input gives an example of HTML to extract from the website, for example, an item from an Amazon store, and the GPT bot creates an expected output format, this is a JSON with the relevant values of the item example. After that, it generates the scraping code.
2- the second input, just takes the whole HTML code and test the generated code to scrape.
The first input is hard to process automatically due to the number of tokens, but we can replace the second input with an input to paste the URL, so you can run the code without copy/paste the whole HTML code.
The problem that we can face is the pages with client-side rendering. I know that there are some paid services, but also packages like selenium that I think can help with this.
Anyway, I would like to make the whole process just with the URL, but I can't realize yet how to handle the first step.
Feel free to propose ideas to implement it if you are interested.
Hello !
First of all thanks for sharing the code with us!
Do you have any idea how to extend your code to process a whole website?
For example extract the content of website which has ~107000 tokens .
Thanks,
Alex
The text was updated successfully, but these errors were encountered: