Services/Shavar
Contents
Shavar, the mighty!
Contrary to popular belief Shavar is not the name of a mini-boss in the latest World of Warcraft expansion.
It is Mozilla's service that speaks a wire protocol designed for dynamic updates of simple lists of URLs. Originally designed for phishing protection the protocol was co-opted so that the tracking protection project would be able to publish larger data sets without incurring large bandwidth usage for mobile clients.
This page is intended for client side developers and others who need to interact with the service at a programmatic level.
List names and types
Names
In shavar a list's name has to have a particular structure.
<custom identifier>-<list type>-<list format>
- custom identifier
- a short string identifying the purpose of the list. This can be pretty much any sequence of lowercase ASCII letters.
- list type
- a string which determines how this list will be treated in the client. "track" and "trackwhite" are currently in use for the tracking protection data while "block" is used for the plugin stability blocklist. Talk to a Safe Browsing module peer if you'd like to create a new list type.
- list format
- one of two formats at the time of writing: shavar and sha256. More on the formats below.
List formats
The two different formats of lists currently supported are named shavar and sha256. While both publish hashes of the actual data sets they do so in slightly different ways. shavar lists use the hash prefix style of publication described in the Safe Browsing protocol specification while sha256 lists publish the entire hash (all 32 bytes) rather than just the first 4 bytes of a hash. As a result, any list published in a shavar format has to "phone home" to the service to fetch an entire hash.
How to Publish a new data set via the Safe Browsing protocol at Mozilla
The shavar service requires that data to be published be accessible via a git repo. These are the basics of setting up a new repository.
The repository
1. Create a new github repository for your list. Best practice would be to leave it completely empty. Make note of the ssh URL for the repository.
2. Grab a copy of shavar for the script used to populate an empty repository
clone https://github.com/mozilla-services/shavar
3. Create a virtual environment so we don't modify things on your machine permanently
virtualenv . . bin/activate
4. Download all the necessary dependencies
python setup.py develop
5. Run the script that will create the skeleton of a new list's repository. Chances are very good that you have no need to deviate from the defaults.
python scripts/mknewlist <name of the new list> \ [shavar or sha256(shavar by default)] \ [organizational identifier prefix("moz" by default)] \ [-d path for the local working copy of the repository(data/<list name>)]
6. You can now stop using the virtual environment
deactivate
7. Populate the repository and push it to the master copy
cd data/<list name> git remote add origin <ssh URL for the repository from step 0> git add <list name>.txt git commit -m 'Initial data commit' git push origin master
By default, the input file name is <list name>.txt and is expected to contain one URL per line in the file. Populate this file as desired. If another filename is preferred, update publish.ini.
8. If you chose to create the list's local repository somewhere outside of the shavar directory tree, you can now delete the entire shavar repository.
9. Open a new bug in Bugzilla under Mozilla Services -> Operations in requesting that the new list repository be added to the publishing schedule. Make certain to include the list name and the URL for the new list's respository.
Historic details
Shavar actually implements the Safe Browsing API/wire protocol as developed by Mozilla and Google some years ago.