Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

updated tests and docs for html query type #753

Merged
merged 4 commits into from
Feb 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 22 additions & 17 deletions docs/sources/html.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,30 +26,35 @@ weight: 25

# Visualizing HTML data

In the below example, we are going to convert the HTML URL `https://grafana.com/about/team/` into grafana datasource.
{{< admonition type="caution" >}}
HTML query type should be used only for retrieving data from legacy systems where there are no alternative APIs exist. Instead of HTML query type, we strongly recommend to use other query types such as JSON, CSV, XML.
{{< /admonition >}}

![image](https://user-images.githubusercontent.com/153843/92399290-faabcf80-f121-11ea-9261-b06c708e81c0.png#center)
In the below example, we are going to retrieve data from [this](https://github.com/grafana/grafana-infinity-datasource/blob/main/testdata/users.html) sample html page.

Once you open the page in browser, right click and inspect the element (first element of the array you want to display). Then copy the selector as your root / rows element.
In the Query editor, fill the following query details

![image](https://user-images.githubusercontent.com/153843/92396876-ac94cd00-f11d-11ea-850d-f1754f980fc7.png#center)
1. Select **HTML** as query type
2. Select **Default** ( frontend ) as the parser
3. Select **URL** as the source
4. Select **GET** as the http method
5. Enter `https://github.com/grafana/grafana-infinity-datasource/blob/main/testdata/users.html` in the URL field of the query

Then you can select, individual properties of the row as columns of the table as shown in the example image. You can select any element with in the row context.
Once the above initial setup is done, you need to configure the selectors.

![image](https://user-images.githubusercontent.com/153843/92382094-f4a6f600-f103-11ea-8035-e1bbd9157629.png#center)
1. In the root selector, you need to provide the selector which shall give you array of symmetrical elements. (This can be potentially rows in a table or repeating div elements with symmetrical structure ). In our case, we are entering `table:nth-child(1) tbody tr` (css selector) as our root selector. Alternatively, you can give `tr` as selector if your html content have only one table. Also If the table have any unique selectors such as id, use that as the selector instead.
2. From our html structure, we know that each row contain several div elements where each div represent a property of the user. So we need to uniquely identify the div elements corresponding to the user property.
3. Add a column and enter `td:nth-child(1)` as selector field. Also mark this as `Name` in the **as/alias** field. We can leave this as a string
4. Add another column and enter `td:nth-child(2)` as selector field. Also mark this as `Age` in the **as/alias** field. We know that this is a number, so we can change the field type to number.
5. Add another column and enter `td:nth-child(3)` as selector field. Also mark this as `Country` in the **as/alias** field
6. Likewise, add any other columns as per your need.

![image](https://user-images.githubusercontent.com/153843/92747321-fbd83900-f37b-11ea-8be9-9366386dc6e2.png#center)

Example :

- `h4` --> h4 element will be selected
- `.team__title` --> Element with the class `team__title` will be selected
- `td:nth-child(4)` --> 4th td element within the row context will be selected. This will be useful when you element doesn't have any id or duplicate class names.
Example of the above query is given in the [play.grafana](https://play.grafana.org/explore?schemaVersion=1&panes=%7B%22s9j%22:%7B%22datasource%22:%22infinity-universal%22,%22queries%22:%5B%7B%22refId%22:%22A%22,%22datasource%22:%7B%22type%22:%22yesoreyeram-infinity-datasource%22,%22uid%22:%22infinity-universal%22%7D,%22type%22:%22html%22,%22source%22:%22url%22,%22format%22:%22table%22,%22url%22:%22https:%2F%2Fgithub.com%2Fgrafana%2Fgrafana-infinity-datasource%2Fblob%2Fmain%2Ftestdata%2Fusers.html%22,%22url_options%22:%7B%22method%22:%22GET%22,%22data%22:%22%22%7D,%22root_selector%22:%22table:nth-child%281%29%20tbody%20tr%22,%22columns%22:%5B%7B%22text%22:%22Name%22,%22selector%22:%22td:nth-child%281%29%22,%22type%22:%22string%22%7D,%7B%22text%22:%22Age%22,%22selector%22:%22td:nth-child%282%29%22,%22type%22:%22number%22%7D,%7B%22text%22:%22Country%22,%22selector%22:%22td:nth-child%283%29%22,%22type%22:%22string%22%7D,%7B%22text%22:%22Occupation%22,%22selector%22:%22td:nth-child%284%29%22,%22type%22:%22string%22%7D,%7B%22text%22:%22Salary%22,%22selector%22:%22td:nth-child%285%29%22,%22type%22:%22number%22%7D%5D,%22filters%22:%5B%5D,%22global_query_id%22:%22%22%7D%5D,%22range%22:%7B%22from%22:%22now-6h%22,%22to%22:%22now%22%7D%7D%7D&orgId=1) site for reference.

## Limitations

- Only symmetrical data can be scrapped. (Example: `table` elements with `colspan` or `rowspan` will break the scrapping)
- Only text element is supported. Attribute scraping not available
- To scrap the AJAX content, use [JSON type](/docs/json) in the Query
- Only symmetrical data can be queries. (Example: `table` with `colspan` or `rowspan` will break the scrapping)
- Only text element querying is supported. Retrieving other html attributes are not supported
- If you prefer to use **backend** parser for html query type, be aware that the backend html query parser is experimental and subject to breaking changes. Also, only the html pages compatible to XML syntax, can be used with html backend query type.
- Websites may block you/your IP address, If the scrapping is at high frequency/refresh rate. Be sensible and responsible about setting your refresh limits
- Caching is not implemented. So be aware of the rate limits.
- Caching is not implemented. So be aware of the rate limits
Loading
Loading