How does thetext decoder knows which text encoder is used to encode?

How does thetext decoder knows which text encoder is used to encode?

When I create a text file on Windows operating system using Notepad and hit save, I get to choose the text encoding.

encoder options

Suppose I saved this file using UTF-16 LE encoding and later sent this file to a friend.

How will the decoder at his end (his computer) know which encoder is used and thereafter decode it correctly?

答案1

The decoder does not know. It can make an educated guess by analyzing some or all of the text data. Some guesses may be better than others, especially when the encoding is as unique as UTF-16. Otherwise, an encoding could be ruled out if using it results in unknown characters.

There is a frequent case where ambiguity remains: ASCII text could also be UTF-8, because UTF-8 is designed this way. Many editors can be set to treat such files either as UTF-8 or as ASCII. Notepad on Windows 10 appears to treat them as UTF-8.

相关内容