Unicode domain names issue (Encrypting a "fake" domain name)

classic Classic list List threaded Threaded
67 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Gervase Markham
On 24/04/17 11:53, L. David Baron wrote:
> This makes me wonder:  could we become more suspicious (in terms of
> UI indications) of sites where the script changes between different
> parts of the hostname (or eTLD+1), i.e., move towards expecting that
> non-Latin domain names will be using a non-Latin TLD?

That ends up basically being "no .com for _you_, suspicious-looking
non-Latin script". It's another way of treating some scripts as second
class. Admittedly, it's not the worst way of doing so, and a very
measured approach to this (basically, a TLD _black_list for TLDs which
are actively allowing their customers to attack each other) isn't a
totally terrible idea. The trouble is the collateral damage - those
companies and businesses who are happily using <some Cyrillic
string>.com as their domain name and now find it appears as gibberish in
major browsers after they've spent years building their brand, just
because the letters in their name happen all to have Latin homographs.

Gerv

_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|

Re: Unicode domain names issue (Encrypting a "fake" domain name)

L. David Baron
On Tuesday 2017-04-25 10:19 +0100, Gervase Markham wrote:

> On 24/04/17 11:53, L. David Baron wrote:
> > This makes me wonder:  could we become more suspicious (in terms of
> > UI indications) of sites where the script changes between different
> > parts of the hostname (or eTLD+1), i.e., move towards expecting that
> > non-Latin domain names will be using a non-Latin TLD?
>
> That ends up basically being "no .com for _you_, suspicious-looking
> non-Latin script". It's another way of treating some scripts as second
> class. Admittedly, it's not the worst way of doing so, and a very
> measured approach to this (basically, a TLD _black_list for TLDs which
> are actively allowing their customers to attack each other) isn't a
> totally terrible idea. The trouble is the collateral damage - those
> companies and businesses who are happily using <some Cyrillic
> string>.com as their domain name and now find it appears as gibberish in
> major browsers after they've spent years building their brand, just
> because the letters in their name happen all to have Latin homographs.
Couldn't it be done in a pretty limited way?  For example, we could
use the punycode representation if:

 * the component before the eTLD consists entirely of characters
   that are homographs for characters in a single other script, and

 * the component before the eTLD is in a different script from the
   eTLD.

If there are some legitimate sites that this would catch, maybe we
could then whitelist them?

(I'm assuming we already require each component to be
single-script.)

-David

--
๐„ž   L. David Baron                         http://dbaron.org/   ๐„‚
๐„ข   Mozilla                          https://www.mozilla.org/   ๐„‚
             Before I built a wall I'd ask to know
             What I was walling in or walling out,
             And to whom I was like to give offense.
               - Robert Frost, Mending Wall (1914)

_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security

signature.asc (817 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Gervase Markham
In reply to this post by Gervase Markham
On 25/04/17 11:28, L. David Baron wrote:
>  * the component before the eTLD consists entirely of characters
>    that are homographs for characters in a single other script, and

(I assume you mean s/eTLD/TLD/ in each case.)

>  * the component before the eTLD is in a different script from the
>    eTLD.

AIUI this is what Chrome did, for Cyrillic only, and they said it
affected 2,800 sites in .com. I don't know if they did more analysis for
other TLDs - .ru, I suspect, would have a large number, and there would
be more if we extended to all possible homographs across all scripts. A
whitelist might solve that, but of course that would grandfather in
existing examples and not allow for businesses not yet existing or not
yet on the net.

One guiding principle I have found useful here is "what if the Internet
were invented by the Russians, and Latin was the script late to the
party?". I am trying to avoid doing anything to Cyrillic that I would
think were unfair were it done to Latin if the boot were on the other foot.

The trouble with Cyrillic in particular is that there are quite a few
clashing letters:
https://en.wikipedia.org/wiki/IDN_homograph_attack#Cyrillic
In Russian, you have a, c, e, o, p, x and y. Add in numbers, and you
have 3, 4 and 6. Cyrillic non-Russian languages add i, j and s, and if
you go rare/archaic (which may or may not be supported in the font
and/or noticeably different) you can add d, h, l and v. And that's just
lowercase. In the worst case, that's 14 of Latin's 26 letters, including
4 of the 5 vowels. It would be a significant crimp on Cyrillic domain
names if all names using only those letters didn't work except in .ั€ั„
and the like.

> (I'm assuming we already require each component to be
> single-script.)

Yes, we do. That is what solves 99% of the problem.

Gerv


_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Jonathan Kingston-3
Besides the fact lists are hard to maintain.
There isn't anything technical preventing Firefox having one for existing
popular sites that registries have registered and shouldn't have right?
This could just make the punycode show in the browser for sites in this
list.

On Tue, Apr 25, 2017 at 11:48 AM, Gervase Markham <[hidden email]> wrote:

> On 25/04/17 11:28, L. David Baron wrote:
> >  * the component before the eTLD consists entirely of characters
> >    that are homographs for characters in a single other script, and
>
> (I assume you mean s/eTLD/TLD/ in each case.)
>
> >  * the component before the eTLD is in a different script from the
> >    eTLD.
>
> AIUI this is what Chrome did, for Cyrillic only, and they said it
> affected 2,800 sites in .com. I don't know if they did more analysis for
> other TLDs - .ru, I suspect, would have a large number, and there would
> be more if we extended to all possible homographs across all scripts. A
> whitelist might solve that, but of course that would grandfather in
> existing examples and not allow for businesses not yet existing or not
> yet on the net.
>
> One guiding principle I have found useful here is "what if the Internet
> were invented by the Russians, and Latin was the script late to the
> party?". I am trying to avoid doing anything to Cyrillic that I would
> think were unfair were it done to Latin if the boot were on the other foot.
>
> The trouble with Cyrillic in particular is that there are quite a few
> clashing letters:
> https://en.wikipedia.org/wiki/IDN_homograph_attack#Cyrillic
> In Russian, you have a, c, e, o, p, x and y. Add in numbers, and you
> have 3, 4 and 6. Cyrillic non-Russian languages add i, j and s, and if
> you go rare/archaic (which may or may not be supported in the font
> and/or noticeably different) you can add d, h, l and v. And that's just
> lowercase. In the worst case, that's 14 of Latin's 26 letters, including
> 4 of the 5 vowels. It would be a significant crimp on Cyrillic domain
> names if all names using only those letters didn't work except in .ั€ั„
> and the like.
>
> > (I'm assuming we already require each component to be
> > single-script.)
>
> Yes, we do. That is what solves 99% of the problem.
>
> Gerv
>
>
> _______________________________________________
> dev-security mailing list
> [hidden email]
> https://lists.mozilla.org/listinfo/dev-security
>
_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Daniel Veditz-2
In reply to this post by L. David Baron
On Mon, Apr 24, 2017 at 3:53 AM, L. David Baron <[hidden email]> wrote:

> This makes me wonder:  could we become more suspicious (in terms of
> UI indications) of sites where the script changes between different
> parts of the hostname (or eTLD+1), i.e., move towards expecting that
> non-Latin domain names will be using a non-Latin TLD?
>

โ€‹It would be nice and sometimes we could (I think I read that the .ru
registrar only allows ascii domains, and the Cyrillic version of their
ccTLD only has Cyrillic domains) but not in other cases. Of course .com is
a complete mess, but even with more thoughtful registries you have .eu
which explicitly accepts Cyrillic domains because Bulgaria is an EU member.

That would come back around to a TLD whitelist (or blacklist?) scheme.

-Dan Veditz
_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|

Re: Unicode domain names issue (Encrypting a "fake" domain name)

Gervase Markham
In reply to this post by Gervase Markham
On 25/04/17 14:59, Jonathan Kingston wrote:
> Besides the fact lists are hard to maintain.
> There isn't anything technical preventing Firefox having one for existing
> popular sites that registries have registered and shouldn't have right?
> This could just make the punycode show in the browser for sites in this
> list.

We could do this, but it seems to me like it would be whack-a-mole, with
a bad press round at each whack because we are ostensibly taking
responsibility for the problem but not resolving it.

Gerv


_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
Reply | Threaded
Open this post in threaded view
|

Re: Unicode domain names issue (Encrypting a "fake" domain name)

akostadinov
In reply to this post by Gervase Markham
On Tuesday, April 25, 2017 at 1:49:04 PM UTC+3, Gervase Markham wrote:
...
> One guiding principle I have found useful here is "what if the Internet
> were invented by the Russians, and Latin was the script late to the
> party?". I am trying to avoid doing anything to Cyrillic that I would
> think were unfair were it done to Latin if the boot were on the other foot.

If internet was invented in a Cyrillic using country, then the whole domain would have been in Cyrillic, not only the different parts of it.

I'm from such a country (Cyrillic alphabet) and I find mixed domains useless. I mean mixed like "www.cyrillic-part.com". Am I expected to switch my keyboard to type the domain name in the URL bar?

Do you want, in case DNS was invented by a country with a Cyrillic alphabet, to type parts in Latin and parts in Cyrillic?

I don't care that many people bought mixed charset domains. Let them buy non-mixed ones and resolve the issue long-term. I want (as a technical user) to have ability to recognize when domains are using mixed charsets easily.

It is strange for me to see many Latin only users blocking any progress of this issue because potentially non-latin users would be alienated. If you are concerned about this, then as your non-latin users what they want. You are just guessing and blocking any sensible decision. There are polls and other strategies that can be used.

IMO, at the very least, there should be some highlighting when domain uses mixed charsets, no matter whether in single component of the domain name or not. This is pretty much equal treating IMO and wouldn't kill anybody.

Even better if mixed domains show up in punycode by default but have some UI to switch them to Unicode if user decides. But looking at the sentiment here, I don't really hope about this. At least *please* add some highlighting, no matter what it is, pretty please.

> The trouble with Cyrillic in particular is that there are quite a few
> clashing letters:
> https://en.wikipedia.org/wiki/IDN_homograph_attack#Cyrillic
> In Russian, you have a, c, e, o, p, x and y. Add in numbers, and you
> have 3, 4 and 6. Cyrillic non-Russian languages add i, j and s, and if
> you go rare/archaic (which may or may not be supported in the font
> and/or noticeably different) you can add d, h, l and v. And that's just
> lowercase. In the worst case, that's 14 of Latin's 26 letters, including
> 4 of the 5 vowels. It would be a significant crimp on Cyrillic domain
> names if all names using only those letters didn't work except in .ั€ั„
> and the like.
>
> > (I'm assuming we already require each component to be
> > single-script.)
>
> Yes, we do. That is what solves 99% of the problem.

Not really. There are some many high profile sites that can be abused. First things come to my mind ะตั€ะฐัƒ.bg and ะตะฑะฐัƒ.com

Former is impossible to spot. Latter one needs to carefully look at it. For the "b" also "ะฒ" and "ัŒ" could be hard to spot. An icon, different colors of the letters, or whatever will be much more useful. For example a warning icon and when you hover, to show explanation with more info about the problem.

In fact such a warning icon might be a good idea for many occasions. Firefox could detect different kinds of warnings going forward. An interested user (usually technical) would be able to make an informed decision whether the warning is relevant or not.

I'm not suggesting to abandon other long-term solutions that might be better for non-technical users. On the other hand, if Firefox ignores technical users, I doubt it would be good for it. I always preferred Firefox for the ability to make it behave as you want.
Presently quantum blocked many useful plugins for apparently no better stability in my personal observations (yes, had issues with replacements that used new APIs only that made my whole browsing experience a mess until I figured out what's going on). Now lets ignore the need for technical people to be sure in what they read in address bar. I really hope Firefox can be good for technical and non-technical people. Otherwise it will not matter anymore which browser am I using. It could be whatever comes pre-installed.
_______________________________________________
dev-security mailing list
[hidden email]
https://lists.mozilla.org/listinfo/dev-security
1234