HTMLmetadata

Extract metadata from html pages using Open Graph metadata, HTML metadata, and a series of fallbacks

Inspired in https://metascraper.js.org

Install

pip install htmlmetadata

Use

You can use it by calling the module directly.

python -m htmlmetadata http://schema.org/docs/about.html                                                                            
{
  "request": {
    "url": "http://schema.org/docs/about.html"
  },
  "summary": {
    "description": "Schema.org is a set of extensible schemas that enables webmasters to embed\n    structured data on their web pages for use by search engines and other applications.",
    "title": "about page - schema.org",
    "language": "en"
  }
}

Or use it directly in your code.

from htmlmetadata import extract_metadata

data = extract_metadata("http://schema.org/docs/about.html")

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.github/workflows		.github/workflows
htmlmetadata		htmlmetadata
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTMLmetadata

Install

Use

About

Releases 2

Packages

Contributors 3

Languages

License

mariocesar/htmlmetadata

Folders and files

Latest commit

History

Repository files navigation

HTMLmetadata

Install

Use

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 3

Languages

Packages