Security/Reviews/IDN

From MozillaWiki
Jump to: navigation, search
Please use "Edit with form" above to edit this page.

Item Reviewed

Implement new IDN Unicode display algorithm
Target See: https://wiki.mozilla.org/IDN_Display_Algorithm#Background for details. Let us know if you need more.


Introduce the Feature

Goal of Feature, what is trying to be achieved (problem solved, use cases, etc)

See: https://wiki.mozilla.org/IDN_Display_Algorithm#Background for details. Let us know if you need more.

What solutions/approaches were considered other than the proposed solution?

See: http://www.chromium.org/developers/design-documents/idn-in-google-chrome for a summary of what other browsers do. We did not consider those solutions acceptable - see below.

Why was this solution chosen?

See: https://wiki.mozilla.org/IDN_Display_Algorithm#Background Our primary goals were to:

a) protect users from spoofing attacks

b) treat all languages and scripts equally, in line with Mozilla's principles of inclusiveness

A consequence of b) is that we needed a solution such that if an IDN works in one copy of Firefox, it works in all of them. Anything else injects an intolerable element of uncertainty for domain owners, who cannot know how many of their customers see the correct form of the domain and how many see gobbledigook. That would have a significant chilling effect on the uptake of IDNs. This means that e.g. making name display depend on browser language or OS language, or other methods of throwing UI warnings in some but not all circumstances, are not acceptable.

Another consequence of b) is that saying that some scripts are just not allowed in IDNs, as Safari does, is also not acceptable.

One could argue that Latin has somewhat of a privilege in the resulting solution, but there are unavoidable historical reasons for that. Cyrillic and Greek are also somewhat disadvantaged as they cannot be mixed with Latin - but all-Cyrillic or all-Greek IDNs work everywhere, which is very for those script communities, and something that IE, Chrome and Safari have all failed to achieve. (In a rather culturally imperialistic move, Safari simply disables IDNs for those scripts entirely by default.)

Any security threats already considered in the design and why?

The design considers the same security threats as previous implementations, informed by eight years of additional experience.

It knowingly relaxes the restrictions in a way which may permit whole-script homographs - e.g. caxap.ru in all-Latin, and caxap.ru in all-Cyrillic. This is considered OK because other browsers such as Chrome and IE have implemented a form of script-mixing restrictions, and there have not been reports of problems with whole-script homographs. (Many registrars ban the practice of registering two homographic domains to different entities.) Also, there is no implementable programmatic way of detecting such possible problems at domain resolution time - they have to be detected at domain registration time, when the registrar is not under millisecond time pressure and has access to a database of existing registrations.

The above relaxation is considered less bad than trying to maintain the whitelist in the face of 1000+ new TLDs, which would lead to IDN not working in lots of places where it should due to lack of manpower.

There is a political dimension to this; if we encounter problems e.g. with whole-script homographs, we will need to place the blame where it belongs - on the registries which are letting their customers attack each other.

Threat Brainstorming

  • Does the IDN display algorithm ever allow unicode combining marks?
    • Two ways of representing the same thing? We convert the name to the canonicalized version before loading it, to prevent this problem.
    • Messes with multiple combining marks, like https://twitter.com/glitchr_ or http://sbp.so/supercombiner
    • combining with the dot or slash outside the segment, like in michal zalewski's attack
  • Do we turn off IDN display when it would cause us to draw replacement characters? (When a Linux user has no relevant fonts)

No

  • Can we require some form of registry transparency (like certificate transparency), so we (or independent security researchers) can notice if some registrar incorrectly allows homograph domains? (Maybe only in the case where a registrar allows both latin and cherokee, or latin and cyrillic)
    • You can request a full list of dot com domains. It requires some paperwork.
  • We speculated that Verisign might have checked whether existing homograph domains have the same owner. But what if that "owner" is markmonitor, and they turn out to be different customers of markmonitor?
  • What about labels that aren't under control of a registry, such as *.blogspot.com
    • that's relevant to whitelisted TLDs, not the algorithm as a whole. But this is why I'd like to kill the whitelist eventually, or apply the algorithm to the non-registered labels
  • Will IDN domain info show up in the SSL dialog? Can we give a 'score' to an IDN name and maybe use that to turn a domain from green to yellow in that dialog? Oh we don't use colors i see :-)
  • Do we need to warn users if they *enter* a mixed-script domain name, or is changing the URL bar display to punycode sufficient? I'm thinking of an attack where you paste a fake "paypal.com", see that it looks good in your URL bar before pressing enter, and then press enter. And then maybe the attack site redirects to "paypaj.com" so it doesn't look really weird.
  • How are RTL scripts handled? http://tools.ietf.org/html/rfc5893
  • Can we prevent people putting a lock item in the location bar? http://stackoverflow.com/questions/1384380/is-there-a-unicode-glyph-that-looks-like-a-key-icon/5859837#5859837
    • it would not be a valid script character, not allowed in domain names
  • Are plugins vulnerable - eg Flash? Does the NPAPI ....
    • plugins are a world of hurt, but mostly don't display domains. There are APIs where the plugin requests that the browser load things for it, in which case we'd use our own domain-name conversion/canonicalization code.
      • Flash - camera/mic request dialog, domain foo.com would like to access cam/mic. This is the case I worry about.
        • Worst case they show the raw domain name? Yes.
  • Property "SecReview threats considered" (as page type) with input value "The design considers the same security threats as previous implementations, informed by eight years of additional experience.

    It knowingly relaxes the restrictions in a way which may permit whole-script homographs - e.g. caxap.ru in all-Latin, and caxap.ru in all-Cyrillic. This is considered OK because other browsers such as Chrome and IE have implemented a form of script-mixing restrictions, and there have not been reports of problems with whole-script homographs. (Many registrars ban the practice of registering two homographic domains to different entities.) Also, there is no implementable programmatic way of detecting such possible problems at domain resolution time - they have to be detected at domain registration time, when the registrar is not under millisecond time pressure and has access to a database of existing registrations.

    The above relaxation is considered less bad than trying to maintain the whitelist in the face of 1000+ new TLDs, which would lead to IDN not working in lots of places where it should due to lack of manpower.

    There is a political dimension to this; if we encounter problems e.g. with whole-script homographs, we will need to place the blame where it belongs - on the registries which are letting their customers attack each other." contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.
  • Property "SecReview threat brainstorming" (as page type) with input value "* Does the IDN display algorithm ever allow unicode combining marks?
      • Two ways of representing the same thing? We convert the name to the canonicalized version before loading it, to prevent this problem.
      • Messes with multiple combining marks, like https://twitter.com/glitchr_ or http://sbp.so/supercombiner
      • combining with the dot or slash outside the segment, like in michal zalewski's attack
    • Do we turn off IDN display when it would cause us to draw replacement characters? (When a Linux user has no relevant fonts)

    No

    • Can we require some form of registry transparency (like certificate transparency), so we (or independent security researchers) can notice if some registrar incorrectly allows homograph domains? (Maybe only in the case where a registrar allows both latin and cherokee, or latin and cyrillic)
      • You can request a full list of dot com domains. It requires some paperwork.
    • We speculated that Verisign might have checked whether existing homograph domains have the same owner. But what if that "owner" is markmonitor, and they turn out to be different customers of markmonitor?
    • What about labels that aren't under control of a registry, such as *.blogspot.com
      • that's relevant to whitelisted TLDs, not the algorithm as a whole. But this is why I'd like to kill the whitelist eventually, or apply the algorithm to the non-registered labels
    • Will IDN domain info show up in the SSL dialog? Can we give a 'score' to an IDN name and maybe use that to turn a domain from green to yellow in that dialog? Oh we don't use colors i see :-)
    • Do we need to warn users if they *enter* a mixed-script domain name, or is changing the URL bar display to punycode sufficient? I'm thinking of an attack where you paste a fake "paypal.com", see that it looks good in your URL bar before pressing enter, and then press enter. And then maybe the attack site redirects to "paypaj.com" so it doesn't look really weird.
    • How are RTL scripts handled? http://tools.ietf.org/html/rfc5893
    • Can we prevent people putting a lock item in the location bar? http://stackoverflow.com/questions/1384380/is-there-a-unicode-glyph-that-looks-like-a-key-icon/5859837#5859837
      • it would not be a valid script character, not allowed in domain names
    • Are plugins vulnerable - eg Flash? Does the NPAPI ....
      • plugins are a world of hurt, but mostly don't display domains. There are APIs where the plugin requests that the browser load things for it, in which case we'd use our own domain-name conversion/canonicalization code.
        • Flash - camera/mic request dialog, domain foo.com would like to access cam/mic. This is the case I worry about.
          • Worst case they show the raw domain name? Yes." contains invalid characters or is incomplete and therefore can cause unexpected results during a query or annotation process.

Action Items

Action Item Status In Progress
Release Target `
Action Items
   
     Full Query    
ID Summary Priority Status
722299 Implement new IDN Unicode display algorithm -- RESOLVED
843689 Eliminate the IDN TLD whitelist -- RESOLVED
843739 back out bug 770877 (take .com, .net, and .name off the IDN whitelist) -- RESOLVED

3 Total; 0 Open (0%); 3 Resolved (100%); 0 Verified (0%);

  • bug 843689 -- eliminate whitelisting support :: to be resolved no more than two releases after new IDN code lands
  • bug 843739 -- back out .com whitelisting :: ASAP, but would be good to implement 722299 (the feature under review) at the same time
The given value "
   
     Full Query    
ID Summary Priority Status
722299 Implement new IDN Unicode display algorithm -- RESOLVED
843689 Eliminate the IDN TLD whitelist -- RESOLVED
843739 back out bug 770877 (take .com, .net, and .name off the IDN whitelist) -- RESOLVED

3 Total; 0 Open (0%); 3 Resolved (100%); 0 Verified (0%);

  • bug 843689 -- eliminate whitelisting support :: to be resolved no more than two releases after new IDN code lands
  • bug 843739 -- back out .com whitelisting :: ASAP, but would be good to implement 722299 (the feature under review) at the same time" contains strip markers and therefore it cannot be parsed sufficiently.