vietnamese
Writing Vietnamese is a pain. You have several different input systems to choose from (VNI, Telex, VIQR, etc.) and they all require some effort to memorize and get used to. But as it turns out most Vietnamese text can be understood without those accent (diacritic) marks at all. Vietnamese people are accustomed to texting (sms) each other using Vietnamese words written without marks - because their phones often don't support Vietnamese characters. But it's better to have the marks and emails and posts on forums often do. GMail even supports a Vietnamese software keyboard built into the interface now.
But if Vietnamese people can understand Vietnamese without diacritics, can computers? Turns out there is software that can take unaccented Vietnamese text and ADD the diacritics!
Let's take some text:
Chuyến phiêu lưu khám phá bỏ nhà ra đi đầu tiên của bạn vào năm bao nhiêu tuổi? - Nghĩ lại thì thấy hồi xưa mỗi lần mà bị mẹ la là hay giận, bỏ nhà đi lắm, vì lúc nào cũng nghĩ mình đúng hết. Giận thì giận nhưng mà đi lang thang rồi lại về, hoặc về trong tình trạng được tìm thấy và lại tiếp tục bị mắng :D
Then we strip the accents and put it into a few websites to see the results.
http://vietnameseaccent.com/
Chuyến phiêu lưu khám phá bỏ nhà ra đi đầu tiên của bạn vào năm bao nhiêu tuổi?
- Nghĩ lại thì thấy hồi xưa mỗi lần mà bị mẹ là lạ hay gián, bỏ nhà đi làm, vì lúc nào cũng nghĩ mình dùng hết. Gian thi giản nhưng mà đi lang thang rồi lại về, hoặc vê tròn
http://vietlabs.com/vietizer.html
chuyến phiêu lưu khám phá bỏ nhà ra đi đầu tiên của bạn vào năm bao nhiêu tuổi?
- nghĩ lại thì thấy hồi xưa mỗi lần mà bị mẹ là là hay gian, bộ nhà đi làm, vì lúc nào cũng nghĩ mình đứng hết.
gian thì giản nhưng mà đi lang thang rồi lại về, hoặc về trọn
http://www.easyvn.com/tiengviet/index.php
Chuyến phiêu lưu khám phá bỏ nhà ra đi đầu tiên của bạn vào năm bao nhiêu tuổi?
- Nghĩ lại thì thấy hồi xưa mỗi lần mà bị mẹ là là hay giận, bỏ nhà đi lắm, vì lúc nào cũng nghĩ mình đúng het. Gian thì giận nhưng mà đi lang thang rồi lại về, hoặc về tron
The results are nearly the same except for the last word: 'tron'. This is because the real word is "trong" but it got cut off in the de-accenting process! So each different software took a different guess as to what the word was, but it was the wrong word to begin with.
All in all, they do a pretty good job and probably better than even some native Vietnamese speakers due to the fact that some tones are mixed up!
- tomo's blog
- Login to post comments
- Comments
Inspired by XKCD, this is a password generator for those of you who know English and Vietnamese or another language. Once a random set of words in your languages has been generated, images for those words will be shown to help you visually remember your new password. If the random password seems too hard to remember, you can always spin the wheel a second time!
Each time you click, 4 random words from the selected languages will be loaded. I chose the number 4 so as to not overload Google Image search, so you may want to run it twice to get 5 or more words for added security. I find that the images help to visually remember the password.
If you still want a password like "!Agt:m%p>" then it's also an option below.
Your Random Password
The other day there was an XKCD strip about password security. The idea is that we've been trained over the years to use passwords like 'Tr0ub4dor&3' because they mix upper and lower case, use numbers, and special characters. But a password like that is based on a common English word using a common substitution pattern (l33tsp34k) of letters for numbers and is much easier for a hacker to guess than four random words like 'correct horse battery staple', which is longer but much easier to remember.
A good password should be random. Humans aren't random and 'Tr0ub4dor' looks random enough but it isn't. Even translating the word into a foreign language is by itself weak. Generally, if you come up with the password yourself then it's not anything close to random.
Plenty of software exists to come up with passwords made up of random characters. The problem is that these passwords weren't meant to be memorized. Writing your password down somewhere sort of defeats the purpose.
So four random English words makes a pretty good password, but is still hard to remember if they are obscure and unfamiliar words. Out of the over one hundred thousand words in an English dictionary a few thousand are commonly used.
So a few thousand English words are generally useful. But those of us who are bilingual can basically double the size of the vocabulary used! This foreign language random password generator seeks to take advantage of that numerical weapon, and with a large number of possible languages (and even more language combinations), even if a hacker got an encrypted password file it would be as hard to crack as a random 9-character totally impossible to remember string.
You can increase the security of your password further by using a "salt" random string (non-dictionary word) that you remember and always use with your passwords, and by adding punctuation in one of the words.
UPDATE: There is now a Chrome extension that makes creating passwords on the fly really fast and easy! Check out the Correct Horse Battery Staple Google Chrome Extension
- tomo's blog
- Login to post comments
- Comments
Recent comments
1 year 11 weeks ago
2 years 3 days ago
2 years 1 week ago
2 years 3 weeks ago
2 years 19 weeks ago
2 years 19 weeks ago
2 years 19 weeks ago
2 years 19 weeks ago
2 years 19 weeks ago
2 years 19 weeks ago