Logo of Phnom Penh Post newspaper Phnom Penh Post - Khmer Unicode creator developing ‘AI’ spell check

Khmer Unicode creator developing ‘AI’ spell check

Content image - Phnom Penh Post
A man using Nextspell on his PC to look for errors. Hong Menea

Khmer Unicode creator developing ‘AI’ spell check

In 2001, Dahn Hong developed Khmer Unicode, which remains the basis for most Khmer language fonts in use today. He designed the Unicode – a standardised system which assigns a unique number to each character – in response to frustration with the Limon font, a much earlier version, which was developed in 1994. Limon was difficult to employ, particularly in the age of smartphones.

Not content with providing a clean, dependable source for nearly all of the Khmer typography in use, he has set his sights on reliable spellchecking software.

Hong, a former law student, was inspired to devise the program after observing a rise in misspelled words in the Kingdom’s national language.

He began working on the Nextspell app in 2019.

Most spellchecking software uses optical character recognition – the process that converts an image of text into a machine-readable text format. This meant that a system needed to be developed that would recognise Khmer text.

“English correction is simple in most well known programs, such as Microsoft’s Office suite. A person can make corrections with a click of a button, misspelled words are underlined automatically, and new words can be added to the program’s dictionary. Nextspell is similar in some ways, but it employs AI to store common corrections in a large central database,” said Hong.

The more people use the program, the more accurate it will become. Google Translate is an excellent example of this progress. As little as 10 years ago, it was often illegible when translating Khmer to English. Now, it is far more accurate.

“As long as data is uploaded on a regular basis, it will continue to improve,” he said.

Ultimately, his goal is to enable the app to automatically correct misspelled words, although this is ultimately dependant on the amount of date that is added to the app’s database.

“Misspelled words are underlined in red, and if the database contains an alternative, it will make the correction, and underline it in blue. If there is no alternative available in the database, all the program can do is highlight it,” said Hong.

He explained that the Khmer script is one of the most complicated in the world, and regular updates are necessary.

“We depend on the dictionary devised by supreme patriarch Chuon Nath in 1938, but there are modern terms that need to be added, and some words that still need to be standardised,” he said.

Hong is unaware of the exact number of people who use his app, but explained that the free version of the app limited each user to 200 words. Regular users could access up to 3,000 words, for just $12 a year.

“There are many more free users than professionals, but this is common with most apps,” he said.

In addition to Android and Apple apps, Nextspell can be accessed for free via any browser.

Users simply visit the Nextspell website, register for a free account, and then copy and paste Khmer text onto the page.

“The programme is becoming more accurate every day. Obviously, the most common words are the ones it identifies most quickly, but its database is constantly improving its base of knowledge,” said Hong.

Although his Unicode remains in wide use, he did not profit from it. Nextspell provides a modest income from app sales, and his work with the Khmer language led to him assisting the government of Laos to create their own Unicode.

“Khmer with vowels and syllables are far more complex that English ones, so this program is essential. I am also working on developing fonts for children. The Khmer language must remain relevant in the digital age,” he said.

MOST VIEWED

  • Joy as Koh Ker Temple registered by UNESCO

    Cambodia's Koh Ker Temple archaeological site has been officially added to UNESCO’s World Heritage List, during the 45th session of the World Heritage Committee held in Riyadh, Saudi Arabia, on September 17. The ancient temple, also known as Lingapura or Chok Gargyar, is located in

  • Ream base allegations must end, urges official

    A senior government official urges an end to the allegations and suspicions surrounding the development of Cambodia’s Ream Naval Base, now that Prime Minister Hun Manet has addressed the issue on the floor of the 78th UN General Assembly (UNGA 78). Jean-Francois Tain, a geopolitical

  • Cambodia set to celebrate Koh Ker UNESCO listing

    To celebrate the inscription of the Koh Ker archaeological site on UNESCO’s World Heritage List, the Ministry of Cults and Religion has appealed to pagodas and places of worship to celebrate the achievement by ringing bells, shaking rattles and banging gongs on September 20. Venerable

  • CP denied registration documents by ministry

    The Ministry of Interior will not reissue registration documents to the Candlelight Party (CP). Following a September 21 meeting between ministry secretary of state Bun Honn and CP representatives, the ministry cited the fact that there is no relevant law which would authorise it to do

  • Cambodian diaspora laud Manet’s UN Assembly visit

    Members of the Cambodian diaspora are rallying in support of Prime Minister Hun Manet’s forthcoming visit to the 78th UN General Assembly (UNGA 78) in the US’ New York City this week. Their move is an apparent response to a recent call by self-exiled former

  • After three deferrals, Capital Gains Tax to take effect Jan 1, 2024

    The General Department of Taxation (GDT) will implement the Capital Gains Tax starting January 1, 2024 to after being deferred three times as industrial players warn that the implementation might have some negative impact on the property market growth, which is down due to the economic downturn.