Security/Reviews/ReaderMode

From MozillaWiki
Jump to: navigation, search
Please use "Edit with form" above to edit this page.

Item Reviewed

Reader Mode
Target Reader mode; Creates an easily readable version of a page. See https://bugzilla.mozilla.org/show_bug.cgi?id=750712 for more information.

https://wiki.mozilla.org/Fennec/NativeUI/UserExperience/ReaderMode

http://lucasr.org/2012/06/21/reader-mode-in-firefox-mobile/


Introduce the Feature

Goal of Feature, what is trying to be achieved (problem solved, use cases, etc)

  • Extract content from web pages, show in cleaner, more readable layout

What solutions/approaches were considered other than the proposed solution?

`

Why was this solution chosen?

  • All parsing done on client side

2 main components: 1) parser 2) The UI Workflow:

  • access page
  • page view event from gecko
  • spawn a chrome worker which consumes a string (the HTML from the page) (serialized using the XML serializer)
  • This worker has a super simplfied DOM parser (we don't have access to the DOM from the worker) - creates a DOM-ish tree. Runs the readibility alg and returns the title, article content, etc.
  • After this - at this point we work out if the page is readible or not - if it is, the icon appears
    • When clicked, the about:reader page shows
    • This takes the result from the chrome worker and sends it to a non-priv about page.
    • (there's now also some sanitizing on the content - nsiparserutils?)
      • Currently, the sanitizer is blacklisting - people will try to fool the parser

Long term plan is to have the content in an iFrame but disable script in the page. We can't use the DOM parser because:

  • Threading constraints (single threaded)

We try to not parse as early as possible - we want to avoid parsing whenever we can Solution chosen out of neccessity When things are added to the reading list, their content is stored in an indexedDB (but NOT in places DB) for about:reader (not from the domain of the page/content)

  • Put in there from chrome code - caches, then adds bookmark
    • Tracked by click on unprivileged page (then priviliged code from a callback)

Any security threats already considered in the design and why?

Security issues have been identified with (earlier versions) of this - see https://bugzilla.mozilla.org/show_bug.cgi?id=778582 - following the issue identified here, various changes have been made to tighten things up (including removal of chrome privs from the about:reader page). https://bugzilla.mozilla.org/show_bug.cgi?id=785992 contains a patch that sanitizes the content being loaded into reader mode. it uses ParserUtils::ParseFragment : + let parserUtils = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils); + let contentFragment = parserUtils.parseFragment(article.content, Ci.nsIParserUtils.SanitizerDropForms, + false, articleUri, this._contentElement); this disables scripts on the document via its script loader

Threat Brainstorming

  • script injection - the sanitizer attempts to stop this
    • could use CSP once we support CSP in <meta> ?
      • Potential issues with the reader chrome if we do this
    • could use iframe sandbox ?
  • runs in an unprivileged about: page
    • lucasr has tried accessing various chrome things from this page to make sure it can't (mgoodwin will also try this)
    • not associated with domain of original article (awesome !)
  • Sanitizer attacks
    • Currently the sanitizer blacklists
    • people will try to fool the parser - can we instead try to whitelist allowed tags / attributes
  • Property "SecReview solution chosen" (as page type) with input value "* All parsing done on client side

    2 main components: 1) parser 2) The UI Workflow:

    • access page
    • page view event from gecko
    • spawn a chrome worker which consumes a string (the HTML from the page) (serialized using the XML serializer)
    • This worker has a super simplfied DOM parser (we don't have access to the DOM from the worker) - creates a DOM-ish tree. Runs the readibility alg and returns the title, article content, etc.
    • After this - at this point we work out if the page is readible or not - if it is, the icon appears
      • When clicked, the about:reader page shows
      • This takes the result from the chrome worker and sends it to a non-priv about page.
      • (there's now also some sanitizing on the content - nsiparserutils?)
        • Currently, the sanitizer is blacklisting - people will try to fool the parser

    Long term plan is to have the content in an iFrame but disable script in the page. We can't use the DOM parser because:

    • Threading constraints (single threaded)

    We try to not parse as early as possible - we want to avoid parsing whenever we can Solution chosen out of neccessity When things are added to the reading list, their content is stored in an indexedDB (but NOT in places DB) for about:reader (not from the domain of the page/content)

    • Put in there from chrome code - caches, then adds bookmark
      • Tracked by click on unprivileged page (then priviliged code from a callback)" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
      • Property "SecReview threats considered" (as page type) with input value "Security issues have been identified with (earlier versions) of this - see https://bugzilla.mozilla.org/show_bug.cgi?id=778582 - following the issue identified here, various changes have been made to tighten things up (including removal of chrome privs from the about:reader page).

    https://bugzilla.mozilla.org/show_bug.cgi?id=785992 contains a patch that sanitizes the content being loaded into reader mode. it uses ParserUtils::ParseFragment : + let parserUtils = Cc["@mozilla.org/parserutils;1"].getService(Ci.nsIParserUtils); + let contentFragment = parserUtils.parseFragment(article.content, Ci.nsIParserUtils.SanitizerDropForms, + false, articleUri, this._contentElement);

    this disables scripts on the document via its script loader" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
  • Property "SecReview threat brainstorming" (as page type) with input value "* script injection - the sanitizer attempts to stop this
      • could use CSP once we support CSP in  ?
        • Potential issues with the reader chrome if we do this
      • could use iframe sandbox ?
    • runs in an unprivileged about: page
      • lucasr has tried accessing various chrome things from this page to make sure it can't (mgoodwin will also try this)
      • not associated with domain of original article (awesome !)
    • Sanitizer attacks
      • Currently the sanitizer blacklists
      • people will try to fool the parser - can we instead try to whitelist allowed tags / attributes" contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Action Items

Action Item Status In Progress
Release Target `
Action Items
* Who :: What :: By when (Keep in mind all these things will be bugs that block the review bug, that blocks the feature bug)

mgoodwin::perform more testing on accessing chrome stuff from the about:reader page::w/c24/09/2012 (no bug required for this) lucasr::change parser filtering from blacklist style to whitelist style - include URL scheme whitelisting (http, https, FTP) (bug to be filed)::

Full Query
ID Summary Priority Status
794958 Use whitelist element and attribute filtering for reader mode P5 RESOLVED

1 Total; 0 Open (0%); 1 Resolved (100%); 0 Verified (0%);

The given value "* Who :: What :: By when (Keep in mind all these things will be bugs that block the review bug, that blocks the feature bug)

mgoodwin::perform more testing on accessing chrome stuff from the about:reader page::w/c24/09/2012 (no bug required for this) lucasr::change parser filtering from blacklist style to whitelist style - include URL scheme whitelisting (http, https, FTP) (bug to be filed)::

Full Query
ID Summary Priority Status
794958 Use whitelist element and attribute filtering for reader mode P5 RESOLVED

1 Total; 0 Open (0%); 1 Resolved (100%); 0 Verified (0%);

" contains strip markers and therefore it cannot be parsed sufficiently.