Test Pilot/Wayback Machine
Overview
The Internet Archive is interested in promoting the development, and support the operation of, browser functionality to show archived versions of web pages that are no longer available or otherwise return an error, e.g. a 404.
Source Code: https://github.com/internetarchive/FirefoxNoMore404s
Contact: John Gruen
Resources
- Firefox alternative, sends to archive page if archive.org has the 404
- GitHub (XPI available here): https://github.com/internetarchive/FirefoxNoMore404s
- URLs that return a 404: https://docs.google.com/spreadsheets/d/1b69qmZYGL0gEYgh0g_7iconAzihTylKB4oPd-X71qhY/edit#gid=1339862914
- slide-down alternative; shows site's 404 page, slides down archive.org info if we have the 404
- movie of slide-down alternative in action: http://www.pbm.com/~lindahl/NoMore404Example.mov or https://www.youtube.com/watch?v=demZ1poPMLY
- GitHub: https://github.com/adam-miller/ChromeNoMore404s
- 2016-03-09 Firefox NoMore404s
Documentation
Browser-based 404 handling - an overview
Link rot (https://en.wikipedia.org/wiki/Link_rot) happens. Lets help deliver a more reliable web experience.
The problem: web sites come, and many go. When this happens unsuspecting web surfers can encounter the equivalent of a dead-end sign with little or no suggested next steps.
A solution: Detect and intersect 404 conditions. Add functionality to the Firefox browser to check to see if content from the requested URL has been archived, and is available, via the Wayback Machine and, if it is available, offer to render it.
The audience: Nearly all web browsers enter 404s from time to time.
The Internet Archive, a San Francisco based non-profit, has been gathering preserving and presenting web captures for the past 19 years. As part of the Internet Archive's "No More 404s" project (https://blog.archive.org/2013/10/25/fixing-broken-links/) we propose to partner with Mozilla to offer users a clear and simple opportunity to see versions of otherwise unavailable pages they have requested as they had been captured from the live/public web.
Options we may want to consider include:
- User option of turning the feature on or off, e.g. via Preferences
- Option of prompting users before taking them to archived versions of pages
- Support for various 404 and broken link use cases
- Option of letting users browse multiple versions of archived pages
Mark Graham - Feb 18, 2016
No More 404s - Browser Support
The Internet Archive is interested in promoting the development, and support the operation of, browser functionality to show archived versions of web pages that are no longer available or otherwise return an error, e.g. a 404.
We are now in conversation with Mozilla, and other web browser providers, about jointly developing, testing and learning about this capability. In our perfect world this functionality would be built natively into browsers but at this point are building plug-ins and extensions that demonstrate possible user experiences and can be used to get real-world test experiences.
The Internet Archive understands the back-end services required to support the presentation of alternative/archived pages, in place of broken pages that users request via the popular browsers, is a non-trivial proposition and could easily generate thousands of requests per second. While we do not have the capacity to support this level of traffic today we are committed to adding the resources required to meet the demand if and when it is required.
To date we have built a chrome plug-in that detects, and intercepts, 404 conditions and checks to see if a copy of the requested page is available in the Wayback Machine. If the page is available the user is presented with the option of viewing that backed-up version. There are various options for how this interaction can be managed, including auto-directing users to backed up versions of 404 pages. We have shared this code with a small number of people and would be happy to expand the user community to get feedback from real users about how we might improve the service as well as help get more potential collaborators and supporters engaged.
While the simple case of a requested page returning a 404 is common there are a number of edge cases in which doing the "right" thing is not so straight forward. These include, but are not limited to:
- backed up versions of URLs that have changed ownership, or use, over time
- requests that return a valid page (with a result code of 200) but present a "page not found" or other site-defined error message (in effect a "soft 404")
- redirects to redirects to 404 results (we will want to direct people to the version from the Wayback Machine that is from the 1st page in the chain, not the last)
- redirects to valid pages from the same host but not what the users expected (e.g. to the homepage of a blog as opposed a specific blog post)
- requests that fail as a result of a DNS or other network breakdown (differential between transient and long-lived DNS or other networking failures)
- certain special caes (e.g. geocities.com) where we can expect the user will want to see the old version of a page as opposed to the otherwise valid current version
- pages that contain embedded resources that can not be presented (which may or may not be available or "important" enough to offer alternatives to)
- pages that have changed ownership where of the phases of ownership have involved domain parking
A fair amount of work remains to be done to handle these conditions, and some of this will require close collaboration with browser developers to implement the functionality and interactions we have in mind. At this point we are focused on a minimum viable extension that supports:
- Actual 404s (ignoring most of the various edge cases shown above)
- Offering users the option of seeing versions of pages via from the Wayback Machine, if available
- Display of Wayback Machine versions of pages from the 1st URL in a redirect chain
We can add more features as we get feedback from users, gain experience with how people are using the service, and as we learn more about real-world causes for people not getting the web content they are requesting and/or expecting.
We are especially interesting in learning the following:
- URLs that people enter that return a 404, 503 and other defined conditions, regardless of if the user elects to request a backed-up version from the Wayback Machine
- End user's comments, bug reports and suggestions
- Counts/time for 404s, 503, and other defined conditions
Meeting Notes
2016-03-30
Comments from the add-on review team:
- not using https for the web.archive.org ping
- banner is constructed using innerHTML rather than createElement, etc (script insertion, malformed html risks).
- doesn't disable itself in private browsing mode
- general concern over sending https URLs to a 3rd party, but that's kind of the point, so that could be handled in messaging
Repository for code?
https://github.com/internetarchive/ChromeNoMore404s
2016-03-24
AA NOTES:
- pages that users try to access - means to access- usually not search results, clicking on a link (news article) http://www.pbm.com/~lindahl/404-examples.html
Sharon asks: commerce ramifications?
Next steps:
[todo:greg] get some good example pages up (hopefully more popular) + some user stories on paper. [todo:?] get a 2-person pilot going
2016-03-14 [elvin, marshall, javaun, jgruen, ckprice]
[marshall] give usertesting.com a look [jgruen] work with sharon later this week to draft a script no PII will be released to internet archive [elvin/marshall] no big flags for usertesting [ckprice] update the dudes
2016-03-07 [jgruen, ckprice, greg, mark]
[todo:ckprice] Send out questions to archive [jgruen] ux mocks full page error: http://cl.ly/0Q2D3O0H3W1Q (not a fan) modal implementation: http://cl.ly/2i09352Q0k2B (looks better) alert at the top (e.g. heartbeat): http://cl.ly/0z0s08002i47 (leaning toward this one) another example (except the top): http://cl.ly/222P0R1P3O0I [mark] play around with the words "old" may not be right. [jgruen] could we co-brand? [ckprice] 404 page is the site's 404 content [greg] this is pretty much exactly like the Chrome ext at the top [ http://cl.ly/2o3s0m2K2r1D ]. whimsical/fun: http://cl.ly/3D2M242e100s (would require branding) next steps [greg] port the Chrome ext to Firefox add-on + put in GH repo. [jgruen] connect with Amelia [ckprice] reach out for questions about branding. [ckprice] circulate comments from Elvin for feedback
2016-03-07 [javaun, ellee, ckprice] In addons.mozilla.org ?
what is up with the URL [ http://cl.ly/1m2k0e1E2E1O ] can we just show it locally to the user? UX notes WM logo? would have to talk about trademark/usage top bar assuming it is WIP API SLA? traffic limits, etc.
2016-02-26
(john, cory, wil, javaun, nick, mark, greg)
[todo:greg] 404 video, links to repos (see top) next steps 1. [team] conslidate & get a nod from MC on tuesday 2. [jgruen] sketch out some UX 3. [archive] get a viable version for usertesting.com (hopefully minor tweaks). 4. [jgruen] test script + launch test.
2016-02-24
(john, cory, wil, javaun, nick)
[nick] can we get this document down a bit, to present in a succinct way to MC [jgruen] use pieces of survey as content for meeting [ https://docs.google.com/a/mozilla.com/forms/d/1ik_XwGc_5knDDKEmPRu8xK7qzHc-LBlZG5SmSNExod4/viewform ] [nick] needs UX work [team:todo] read document, add comments before thursday EOD. keep in mind presentation to MC. goal: get something ready to show MC next week. don't need updated visuals demo the add-on (if time) - caveat obvious UX work required.