Skip to content

ArchiveTeam/urls-grab

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

urls-grab

More information about the archiving project can be found on the ArchiveTeam wiki: URLs

Setup instructions

General instructions

Data integrity is very important in Archive Team projects. Please note the following important rules:

We strongly encourage you to join the IRC channel associated with this project in order to be informed about project updates and other important announcements, as well as to be reachable in the event of an issue. The Archive Team Wiki has more information about IRC. We can be found at hackint IRC #//.

If you have any questions or issues during setup, please review the wiki pages or contact us on IRC for troubleshooting information.

Running the project

Archive Team Warrior (recommended for most users)

This and other archiving projects can easily be run using the Archive Team Warrior virtual machine. Follow the instructions on the Archive Team wiki for installing the Warrior, and from the web interface running at http://localhost:8001/, enter the nickname that you want to be shown as on the tracker. There is no registration, just pick a nickname you like. Then, select the URLs project in the Warrior interface.

Project-specific Docker container (for more advanced users)

Alternatively, more advanced users can also run projects using Docker. While users of the Warrior can switch between projects using a web interface, Docker containers are specific to each project. However, while the Warrior supports a maximum of 6 concurrent items, a Docker container supports a maximum of 20 concurrent items. The instructions below are a short overview. For more information and detailed explanations of the commands, follow the follow the Docker instructions on the Archive Team wiki.

It is advised to use Watchtower to automatically update the project container:

docker run -d --name watchtower --restart=unless-stopped -v /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --label-enable --cleanup --interval 3600 --include-restarting

after which the project container can be run:

docker run -d --name archiveteam --label=com.centurylinklabs.watchtower.enable=true --log-driver json-file --log-opt max-size=50m --restart=unless-stopped atdr.meo.ws/archiveteam/urls-grab --concurrent 1 YOURNICKHERE

Be sure to replace YOURNICKHERE with the nickname that you want to be shown as on the tracker. There is no registration, just pick a nickname you like.

Supporting Archive Team

Behind the scenes Archive Team has infrastructure to run the projects and process the data with. If you would like to help out with the costs of our infrastructure, a donation on our Open Collective would be very welcome.

Issues in the code

If you notice a bug and want to file a bug report, please use the GitHub issues tracker.

Are you a developer? Help write code for us! Look at our developer documentation for details.

Other problems

Have an issue not listed here? Join us on IRC and ask! We can be found at hackint IRC #//.

About

Archiving URLs (outlinks) from a variety of sources.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages