Services/Shavar

From MozillaWiki
Jump to: navigation, search

Shavar, the mighty!

Contrary to popular belief Shavar is not the name of a mini-boss in the latest World of Warcraft expansion.

It is Mozilla's service that speaks a wire protocol designed for dynamic updates of simple lists of URLs. Originally designed for phishing protection the protocol was co-opted so that the tracking protection project would be able to publish larger data sets without incurring large bandwidth usage for mobile clients.

This page is intended for client side developers and others who need to interact with the service at a programmatic level.

List names and types

Names

In shavar a list's name has to have a particular structure.

   <custom identifier>-<list type>-<list format>
custom identifier
a short string identifying the purpose of the list. This can be pretty much any sequence of lowercase ASCII letters.
list type
a string which determines how this list will be treated in the client. "track" and "trackwhite" are currently in use for the tracking protection data while "block" is used for the plugin stability blocklist. Talk to a Safe Browsing module peer if you'd like to create a new list type.
list format
one of two formats at the time of writing: shavar and sha256. More on the formats below.
List formats

The two different formats of lists currently supported are named shavar and sha256. While both publish hashes of the actual data sets they do so in slightly different ways. shavar lists use the hash prefix style of publication described in the Safe Browsing protocol specification while sha256 lists publish the entire hash (all 32 bytes) rather than just the first 4 bytes of a hash. As a result, any list published in a shavar format has to "phone home" to the service to fetch an entire hash.

How to Publish a new data set via the Safe Browsing protocol at Mozilla

The shavar service requires that data to be published be accessible via a git repo. These are the basics of setting up a new repository.

The repository

1. Create a new github repository for your list. Best practice would be to leave it completely empty. Make note of the ssh URL for the repository.

2. Grab a copy of shavar for the script used to populate an empty repository

    clone https://github.com/mozilla-services/shavar

3. Create a virtual environment so we don't modify things on your machine permanently

   virtualenv .
   . bin/activate

4. Download all the necessary dependencies

   python setup.py develop

5. Run the script that will create the skeleton of a new list's repository. Chances are very good that you have no need to deviate from the defaults.

   python scripts/mknewlist <name of the new list> \
       [shavar or sha256(shavar by default)] \
       [organizational identifier prefix("moz" by default)] \
       [-d path for the local working copy of the repository(data/<list name>)]

6. You can now stop using the virtual environment

   deactivate

7. Populate the repository and push it to the master copy

   cd data/<list name>
   git remote add origin <ssh URL for the repository from step 0>
   git add <list name>.txt
   git commit -m 'Initial data commit'
   git push origin master

By default, the input file name is <list name>.txt and is expected to contain one URL per line in the file. Populate this file as desired. If another filename is preferred, update publish.ini.

8. If you chose to create the list's local repository somewhere outside of the shavar directory tree, you can now delete the entire shavar repository.

9. Open a new bug in Bugzilla under Mozilla Services -> Operations in requesting that the new list repository be added to the publishing schedule. Make certain to include the list name and the URL for the new list's respository.

Historic details

Shavar actually implements the Safe Browsing API/wire protocol as developed by Mozilla and Google some years ago.