Haruka's Diary
Chasing After Rainbows: Non-ASCII link

20 February 2010

Non-ASCII link

Although the support for this has been around for quite a while, I've rarely come across the sites non-Latin based languages having non-ASCII based URLs or IDN. Part of the reason for this are spoofing concerns such as the Cyrillic "а" looking a lot like the Latin "a" and misleading people since they look so identical, resulting on it not supported or enabled by default.

Anyways, on my main blog in Japanese, you would notice that the URL is
http://虹を追いかけて.blogspot.com/
in Webkit-based browsers (Safari, Chrome), but due to reasons mentioned in the previous paragraph, or in browsers that don't support it, you are most likely to see
http://xn--n8jos8fqkx000ci6n.blogspot.com/
instead.

"But how did you get the name there as the actual domain?" is what you're probably asking. If you want anything after ccTLD it uses code that's completely different from the above like
http://ja.wikipedia.org/wiki/%E8%99%B9%E3%82%92%E8%BF%BD%E3%81%84%E3%81%8B%E3%81%91%E3%81%A6
for
http://ja.wikipedia.org/wiki/虹を追いかけて
you might see when hovering over a link in eg. Firefox (of course, that article in question does not exist) and it's obviously longer, but that's not what I'm talking about here.

As you might know, entering the non-ASCII URLs is simple:

Internet Explorer 8 or Firefox 3.6:
Enter the URL you want and hit enter. If you are using IE8, the converted URL might appear for only a second before redirecting to your default search engine.
Enter the URL you want and hit enter. If you are using IE8, the converted URL might appear for only a second before redirecting to your default search engine.
You should see this or an error message below with that same url. "www." might be automatically added.

In IE8, you can check both the native and encoded addresses and even what type of characters were used.

Chrome:
The encoded URL would appear just as you type it.
Yes, it's that simple for Chrome. Anyways, copy that encoded URL, edit out unwanted parts (if necessary), register it, and you're done!

I would like to see this take off, but I don't want people using Cyrillic characters to mislead others into thinking they are the look-alike Latin characters. Also, can we hide "http://"? The person who invented it didn't want that to appear too. "www." is also redundant and a mouthful to say out.

(Cross-posted from the technology blog.)

No comments:

My profile

My photo
中野区, 東京都, Japan
帰国子女 英語能力は堪能。趣味はアニメや漫画やプログラムコードを編集。通常、あたしの小説を英語で書いてです。Grew up abroad &travelled to different countries. I write my own fictional novel on my blog.