MDN/Projects/Completed/GitHub Integration

From MozillaWiki
< MDN‎ | Projects‎ | Completed
Jump to: navigation, search

What do we want?

This feature would enable product teams to maintain documentation in GitHub and publish it to MDN. So users of the docs would access them from MDN, but maintainers of the docs would add and edit them using GitHub. The versions of the docs on MDN would be read-only.

Specifically, we'd want it to support the following things:

  • keep the documentation source for a product in GitHub, but the published docs on MDN.
  • automate the publication of the docs to MDN when the source is changed.
  • publish the most recent build of the docs as well as the set of docs applicable to each formal release of the product.
  • indicate to users which version of the docs is applicable to the most recent release of the product.
  • enable users to download a zip of a version of the docs, for use offline.

Why?

  • edit-review workflow: GitHub has very nice tool support for creating, reviewing, and revising content. Changes can be reviewed and corrected before going live, so you avoid those "This article is in need of a technical review." banners.
  • editorial control: it's possible to control who gets to edit a page, and to make sure edits go through a particular channel. This means spam isn't a problem, and that it's possible to make sure doc updates are made consistently.
  • writers don't need to use MDN's editor. The MDN editor is fine, but when I use it I find quite often that the WYSIWYG editor doesn't quite give me the control I want, and end up having to hand-write HTML, which is not a nice thing to have to do.
  • the documentation is versioned alongside the code. This makes it simpler for versions of the documentation to track versions of the code. It's easier to say: this specific revision of the docs is applicable to this specific version of the product.
  • it's relatively easy to generate documentation from the source. This might be full-on in-source documentation, or just something like reading metadata structures out of the docs (for example, the Add-on SDK docs read API stability and mobile support from the code that implements the APIs).

How will we measure the value?

What is the expected value and impact and how will we measure it?

  • Increase MDN reach?
    • More docs on MDN?
    • New audience coming to MDN?
  • Improve documentation quality?
    • Readers bounce around less?
  • What else?

How could we do it?

This requires a service running on MDN to listen for changes in the repo and update the MDN site when relevant changes happen. Some of this code can be generic, and some would have to be product-specific. I'd expect the product-specific code would live in the same repo as the product.

The generic service would need to know the name of each product and the name of the repo containing its documentation.

URL Structure

On MDN, each product has a URL structure like this:

<base_url> + <product_name> + development/
<base_url> + <product_name> + latest/
<base_url> + <product_name> + version_N/
<base_url> + <product_name> + version_N-1/
<base_url> + <product_name> + version_N-2/
<base_url> + <product_name> + version N-.../

<base_url> is defined by MDN, while <product_name> and the version syntax are defined by each product.

For example:

/en-US/docs/Mozilla/addon-sdk/development/index.html
/en-US/docs/Mozilla/addon-sdk/latest/index.html
/en-US/docs/Mozilla/addon-sdk/1.14/index.html
/en-US/docs/Mozilla/addon-sdk/1.13/index.html
  • The subtree under "development" is for the latest, bleeding-edge, revision of the docs, and usually isn't part of a formal release.
  • Each subtree under a version identifier is the version of the docs for that released version of the product.
  • The subtree under "latest" is an alias pointing to the most recent released version: so in the example above it would point to 1.14.

Post-receive hook

The product's GitHub repo includes a post-receive hook that posts to an MDN URL. The service running at that URL looks at the JSON data about the commit to decide if:

  1. it changed any docs
  2. it represented a new release of the product, and if so, what the version identifier for this release is


The way to answer both these questions is probably product-specific, as it depends on how products organize docs and code, and on how they tag releases. So the generic service would have to ask a product-specific piece of code these questions.

If either is true, the service:

  • clones the repo
  • calls a function in the repo that generates the docs
  • copies the generated docs to "<base_url> + <product_name> + development/", replacing what was there before


If this commit also represents a new release, the service creates a new directory for the release like "<base_url> + <product_name> + <version_identifier>", copies the generated docs there as well, and updates the alias at "<base_url> + <product_name> + latest/" to point to this release.

Obsoleting old releases

It would be great if users visiting old versions of the docs were non-intrusively informed of this fact and offered a link to the latest version. The SDK docs do this at the moment (for example: https://addons.mozilla.org/en-US/developers/docs/sdk/1.10/packages/addon-kit/page-mod.html), but I have to add this when I generate the docs.

Perhaps the server could look at the version component of the URI, and insert a "this page is obsolete" notice of the version != the current value of the "latest" alias?

Supporting downloads

It would be great to enable users to download a zipped version of the docs for a given release. I'm not quite sure how this would be done. Perhaps the product-specific docs code could embed a relative link to the docs in the pages it generates (for example, it could be at <base_url> + <product_name> + downloads/ + <version_identifier>.zip) and the generic code could handle zipping up the docs and copying them under there?

Product-specific versus generic code

I'm not sure where we should draw the line between product-specific and generic code. In this proposal, I think the product-specific code implements 2 interfaces:

  • one that takes the JSON payload of the post-receive hook, and tells the generic code whether this commit changed the docs and whether it represents a new release, and if so, what the release identifier is
  • one that takes an optional release identifier and a path to a copy of the repo, and builds the docs from that repo


For the second interface, the code could be taken from the repo itself. For the first, it can't, unless the generic service is happy to clone the repo on every commit (which might be OK, depending on how large and active the repo is).

Docs as a Service

We could implement Docs-as-a-Service from MDN. There are a number of existing projects and platforms that we could re-use roughly as follows:

  1. Create an MDN theme/layout for the platform
  2. Stand up our own server process for the platform
    1. Generates docs
    2. Publishes them under MDN
    3. Indexes them to our search index
  3. Expose our server process as a GitHub post-receive hook

When the platform is set up, GitHub projects can add MDN as a post-receive hook.

Readthedocs.org

Read the Docs is a Open Source service built on the Sphinx documentation platform. It supports docs written using reStructuredText, and integrates with GitHub by way of post-receive webhooks.

In short, almost exactly the thing we need. And, it's already being put to good use by a lot of projects at Mozilla. There are a few things that MDN would need to add, though:

  1. Support for formats other than reStructuredText, so we can "be where the writers are" rather than expect everyone to use reST. Markdown is the most obvious target here.
  2. We need some glue code & theme work to integrate with MDN in general.
  3. Docs managed by this system should feed into the results of the in-house search feature we're developing right now.
  4. It might be interesting to make KumaScript as an available option to these docs


I think it's an open question whether we accomplish this by extending readthedocs.org or whether we roll our own thing. I'd much prefer us to use readthedocs.org, and to give any improvements back, but having looked a bit at readthedocs.org, it seems to be very heavily dependent on the docs using Sphinx/reStructuredText - so if it's important to us to support other formats (which it probably is) and other ways of generating a docset from a pile of source files, then it might be a big change.

Jekyll

Jekyll is another static site generator that can take a template directory of Markdown (or Textile) and Liquid converters to create static websites. It's the engine behind GitHub Pages.

Docco

Docco generates documentation from "Literate Code" and is starting to catch on in the JavaScript community:

Potential test subject: the Emscripten Wiki

Emscripten, Mozilla's LLVM bytecode -> JavaScript compiler, currently has its most complete source of documentation on the Emscripten Wiki located at https://github.com/kripken/emscripten/wiki. This project already has a dedicated community of contributors, although the site currently doesn't look very professional and could do with some editing and structural improvements.

The MDN team and Emscripten engineers would like to improve on this situation. The former group would obviously prefer the content to be just ported over to an MDN zone, for the MDN editors to have their wicked way with. The latter group have some reservations:

  • The Wiki on Github already has a dedicated community, who might get confused if it is suddenly uprooted and moved.
  • A lot of engineers prefer to write markdown and host on Github, rather than using a WYSIWYG tool like MDN.
  • They want Emscripten to stand up on its own as an independent project, and not be seen as Mozilla biased, which is a potential danger of putting it up on MDN.
  • We could put some content up on a dedicated Emscripten site, and then link to MDN for other content, but this might be a bit of a weird experience for users to go to the main Emscripten site, and then be put on a different site.

Whatever happens, we need to put some content up on MDN anyway about Emscripten (which would probably be basic intro/hello world, writen in a different style so as not to duplicate the main site too much.) But it would be really amazing to be able to transclude some of the content from the Emscripten Wiki directly onto MDN, cutting out any needless duplication of content, and allowing the Emscripten contributors to write the docs their way and keep their separate site.

We have already created a sample Emscripten zone at https://developer.mozilla.org/en-US/docs/Emscripten, to add test content to as we discuss this project, and the best way to proceed.


References

https://etherpad.mozilla.org/github-mdn

https://wiki.mozilla.org/MDN/Development/GitBackend

  • This doesn't necessarily have to be an either-or thing - git vs wiki. We can look at moving the wiki to git, keep the current editor on MDN, while opening the content up to alternative editing styles via direct git access.