Unicode domain names issue (Encrypting a "fake" domain name)

classic Classic list List threaded Threaded
66 messages Options
1234
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Gervase Markham
On 24/04/17 11:53, L. David Baron wrote:
> This makes me wonder:  could we become more suspicious (in terms of
> UI indications) of sites where the script changes between different
> parts of the hostname (or eTLD+1), i.e., move towards expecting that
> non-Latin domain names will be using a non-Latin TLD?

That ends up basically being "no .com for _you_, suspicious-looking
non-Latin script". It's another way of treating some scripts as second
class. Admittedly, it's not the worst way of doing so, and a very
measured approach to this (basically, a TLD _black_list for TLDs which
are actively allowing their customers to attack each other) isn't a
totally terrible idea. The trouble is the collateral damage - those
companies and businesses who are happily using <some Cyrillic
string>.com as their domain name and now find it appears as gibberish in
major browsers after they've spent years building their brand, just
because the letters in their name happen all to have Latin homographs.

Gerv

_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode domain names issue (Encrypting a "fake" domain name)

L. David Baron
On Tuesday 2017-04-25 10:19 +0100, Gervase Markham wrote:

> On 24/04/17 11:53, L. David Baron wrote:
> > This makes me wonder:  could we become more suspicious (in terms of
> > UI indications) of sites where the script changes between different
> > parts of the hostname (or eTLD+1), i.e., move towards expecting that
> > non-Latin domain names will be using a non-Latin TLD?
>
> That ends up basically being "no .com for _you_, suspicious-looking
> non-Latin script". It's another way of treating some scripts as second
> class. Admittedly, it's not the worst way of doing so, and a very
> measured approach to this (basically, a TLD _black_list for TLDs which
> are actively allowing their customers to attack each other) isn't a
> totally terrible idea. The trouble is the collateral damage - those
> companies and businesses who are happily using <some Cyrillic
> string>.com as their domain name and now find it appears as gibberish in
> major browsers after they've spent years building their brand, just
> because the letters in their name happen all to have Latin homographs.
Couldn't it be done in a pretty limited way?  For example, we could
use the punycode representation if:

 * the component before the eTLD consists entirely of characters
   that are homographs for characters in a single other script, and

 * the component before the eTLD is in a different script from the
   eTLD.

If there are some legitimate sites that this would catch, maybe we
could then whitelist them?

(I'm assuming we already require each component to be
single-script.)

-David

--
π„ž   L. David Baron                         http://dbaron.org/   π„‚
𝄒   Mozilla                          https://www.mozilla.org/   π„‚
             Before I built a wall I'd ask to know
             What I was walling in or walling out,
             And to whom I was like to give offense.
               - Robert Frost, Mending Wall (1914)

_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Gervase Markham
In reply to this post by Gervase Markham
On 25/04/17 11:28, L. David Baron wrote:
>  * the component before the eTLD consists entirely of characters
>    that are homographs for characters in a single other script, and

(I assume you mean s/eTLD/TLD/ in each case.)

>  * the component before the eTLD is in a different script from the
>    eTLD.

AIUI this is what Chrome did, for Cyrillic only, and they said it
affected 2,800 sites in .com. I don't know if they did more analysis for
other TLDs - .ru, I suspect, would have a large number, and there would
be more if we extended to all possible homographs across all scripts. A
whitelist might solve that, but of course that would grandfather in
existing examples and not allow for businesses not yet existing or not
yet on the net.

One guiding principle I have found useful here is "what if the Internet
were invented by the Russians, and Latin was the script late to the
party?". I am trying to avoid doing anything to Cyrillic that I would
think were unfair were it done to Latin if the boot were on the other foot.

The trouble with Cyrillic in particular is that there are quite a few
clashing letters:
https://en.wikipedia.org/wiki/IDN_homograph_attack#Cyrillic
In Russian, you have a, c, e, o, p, x and y. Add in numbers, and you
have 3, 4 and 6. Cyrillic non-Russian languages add i, j and s, and if
you go rare/archaic (which may or may not be supported in the font
and/or noticeably different) you can add d, h, l and v. And that's just
lowercase. In the worst case, that's 14 of Latin's 26 letters, including
4 of the 5 vowels. It would be a significant crimp on Cyrillic domain
names if all names using only those letters didn't work except in .Ρ€Ρ„
and the like.

> (I'm assuming we already require each component to be
> single-script.)

Yes, we do. That is what solves 99% of the problem.

Gerv


_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Jonathan Kingston-3
Besides the fact lists are hard to maintain.
There isn't anything technical preventing Firefox having one for existing
popular sites that registries have registered and shouldn't have right?
This could just make the punycode show in the browser for sites in this
list.

On Tue, Apr 25, 2017 at 11:48 AM, Gervase Markham <[hidden email]> wrote:

> On 25/04/17 11:28, L. David Baron wrote:
> >  * the component before the eTLD consists entirely of characters
> >    that are homographs for characters in a single other script, and
>
> (I assume you mean s/eTLD/TLD/ in each case.)
>
> >  * the component before the eTLD is in a different script from the
> >    eTLD.
>
> AIUI this is what Chrome did, for Cyrillic only, and they said it
> affected 2,800 sites in .com. I don't know if they did more analysis for
> other TLDs - .ru, I suspect, would have a large number, and there would
> be more if we extended to all possible homographs across all scripts. A
> whitelist might solve that, but of course that would grandfather in
> existing examples and not allow for businesses not yet existing or not
> yet on the net.
>
> One guiding principle I have found useful here is "what if the Internet
> were invented by the Russians, and Latin was the script late to the
> party?". I am trying to avoid doing anything to Cyrillic that I would
> think were unfair were it done to Latin if the boot were on the other foot.
>
> The trouble with Cyrillic in particular is that there are quite a few
> clashing letters:
> https://en.wikipedia.org/wiki/IDN_homograph_attack#Cyrillic
> In Russian, you have a, c, e, o, p, x and y. Add in numbers, and you
> have 3, 4 and 6. Cyrillic non-Russian languages add i, j and s, and if
> you go rare/archaic (which may or may not be supported in the font
> and/or noticeably different) you can add d, h, l and v. And that's just
> lowercase. In the worst case, that's 14 of Latin's 26 letters, including
> 4 of the 5 vowels. It would be a significant crimp on Cyrillic domain
> names if all names using only those letters didn't work except in .Ρ€Ρ„
> and the like.
>
> > (I'm assuming we already require each component to be
> > single-script.)
>
> Yes, we do. That is what solves 99% of the problem.
>
> Gerv
>
>
> _______________________________________________
> dev-security mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-security
>
_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Daniel Veditz-2
In reply to this post by L. David Baron
On Mon, Apr 24, 2017 at 3:53 AM, L. David Baron <[hidden email]> wrote:

> This makes me wonder:  could we become more suspicious (in terms of
> UI indications) of sites where the script changes between different
> parts of the hostname (or eTLD+1), i.e., move towards expecting that
> non-Latin domain names will be using a non-Latin TLD?
>

​It would be nice and sometimes we could (I think I read that the .ru
registrar only allows ascii domains, and the Cyrillic version of their
ccTLD only has Cyrillic domains) but not in other cases. Of course .com is
a complete mess, but even with more thoughtful registries you have .eu
which explicitly accepts Cyrillic domains because Bulgaria is an EU member.

That would come back around to a TLD whitelist (or blacklist?) scheme.

-Dan Veditz
_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Gervase Markham
In reply to this post by Gervase Markham
On 25/04/17 14:59, Jonathan Kingston wrote:
> Besides the fact lists are hard to maintain.
> There isn't anything technical preventing Firefox having one for existing
> popular sites that registries have registered and shouldn't have right?
> This could just make the punycode show in the browser for sites in this
> list.

We could do this, but it seems to me like it would be whack-a-mole, with
a bad press round at each whack because we are ostensibly taking
responsibility for the problem but not resolving it.

Gerv


_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
1234
Loading...