Up until last week, Internet domain names were a pretty mature business. Then the folks at ICANN decided to shake things up by enabling non-Latin character ccTLDs (country code Top Level Domains – like .co.il and .co.uk ). What does that mean for you? Well, here’s a quick test. Try visiting this URL: http://موقع.وزارة-الأتصالات.مصر/.
What you’re looking at is an Internationalized Domain Name, or IDN for short. It doesn’t contain western or “Latin” letters, and chances are everything you know about URLs is about to get turned backwards (in this case, literally). What’s worse is that different browsers handle this kind of domain name differently, and there’s no one right answer.
Are you a software tester? Then your ship has come in because IDNs open up a whole new category of software bugs. Let’s take a look at a few big trouble areas, but hang on tight because this gets goofy fast.
From the ICANN annoucement:
The three new top-level domains are السعودية. (“Al-Saudiah”), امارات. ( “Emarat”) and مصر. (“Misr”). All three are Arabic script domains, and will enable domain names written fully right-to-left.
Right to Left TLDs
Take a look at the URL in the first paragraph (which goes to the Egyptian Ministry for Communications and Information Technology). After the http:// you’ll see the Misr (Egypt) TLD, followed by a period, and then the domain name. This makes sense because Arabic is written right-to-left, but it would be like reading the BBC’s URL as http://uk.co.bbc.www.
Of course, you can’t write out any old URL right-to-left – just those from certain languages. Which means that when it comes to parsing domain names, figuring out the language is an important first step to knowing whether the TLD comes first or last.
New Opportunities for Phishing
The next problem is even worse, and there’s no good solution. If you open the first URL in Firefox, you’ll notice that the URL bar shows it as a long string of Latin text. Safari, on the other hand, displays it properly. Click the images below to see what I mean.
Why does Firefox break the URL? Because IDNs have the potential to be very dangerous for web security and phishing. As more languages are approved for IDNs by ICANN, the number of valid character sets will grow. This introduces conflicts with international characters that look very similar to Latin characters.
For example, Russian Cyrillic will be a huge problem according to this article from Mashable. The Russian letters р, а, and у are treated as totally different characters from the Latin p, a, and y. Conveniently, they’re also the first five letters to paypal, meaning the Cyrillic раураl.com is a totally different domain from paypal.com (copy and paste those two domains in your URL bar – you’ll see).
This opens a whole new approach for phishing attacks. For this reason, Firefox defaults to displaying IDNs as gibberish to help manage this confusion. Safari, on the other hand tries to guess whether it should show the real text or gibberish. Either way, that’s only in the URL bar and not in actual links, meaning everyone has to be more careful.
Same Domain, Different Names
The fact that certain browsers handle these domains differently is yet another problem. The valid form of the Egyptian URL above is http://موقع.وزارة-الأتصالات.مصر/, however it could also be http://xn--4gbrim.xn—-ymcbaaajlc6dj7bxne2c.xn--wgbh1c. Those are one and the same, even though they look entirely different.
Software testing just got a lot more complicated, but here are a few ideas to get you started with IDNs:
- First, does it matter if the app handles IDNs at all? Not every app cares about URLs. Use good judgment before testing.
- Next, does the app handle international URLs with IDNs correctly? Feel free to use this URL as a test:
- Does the web app handle right-to-left domain names correctly? Again, use the URL above as a test.
- How does the app handle domains that could be phishing targets?
- Is the app able to differentiate between the international and Latin versions of a domain?
Did I forget anything? Let me know below.