1、First, unlike UTF-16, UTF-8 has no endianness issues.

2、The input file should be encoded in UTF-8 or UTF-16 format.

3、There are others (UTF-16 and UTF-32, for example) defined by the Unicode consortium, but UTF-8 is the best supported encoding for international character sets.

4、And unlike UTF-16 and UTF-32, it is backwards-compatible with ASCII, meaning UTF-8 is increasingly becoming the default encoding system fore-mail and websites.

5、UTF-16:16-bit UCS Transformation format, byte order identified by marker.

6、The W3C wisely explains, "In other situations, such as for APIs, UTF-16 or UTF-32 may be more appropriate.

7、UTF-8 was chosen as the default format for character data columns, with UTF-16 for graphic data columns.

8、Even documents that use the default UTF-8 and UTF-16 encodings should have such a declaration.

9、All three of these may or may not be preceded by a Unicode byte order mark in either UTF-8, big-endian UTF-16, or little-endian UTF-16.

10、The DOMString type is explicitly specified to consist of wide UTF-16 characters.

11、Numeric inputs are converted according to the UTF-16 encoding for characters.

12、The character is specified as one or two UTF-16 code units in hexadecimal notation.

13、DOMString 类型被显式指定包括宽UTF-16字符。

14、A number of different encoding schemes are used for this purpose: UTF-8, UTF-16, ISO-8859-1, Cp1252, SJIS, and many others.

15、UTF-8 is less likely than UTF-16 or other Unicode encodings to cause problems for systems that are unaware of Unicode and XML.

16、But even when you're encoding CJK XML in UTF-8, the actual size gain compared to UTF-16 probably isn't so large.

17、Xerces-C + + USES this larger character representation to exchange text as UTF-16 as opposed to UTF-8 or ISO-8859.

18、UTF-8 can also be browsed or read by almost all text-processing tools, many of which would have problems with UTF-16.

19、This method completely ignores all the encoding information available, and the returned string is always encoded in UTF-16.

20、In addition, there is another encoding scheme called UTF-16 that can also be used to represent supplementary characters.

21、In that case, it is recommended to use MemBufFormatTarget instead, for receiving an encoded string other than UTF-16.

22、This paper presents a "Fake UTF-16" coding algorithm, so that all XML parsers can handle GB code in an easy and universal fashion.

23、The UTF-16 encoding alleviates some of this penalty because each character is specified using two bytes, assuming no surrogate characters.

24、Google doesn't even allow alternate encodings of Unicode such as UTF-16, much less non-Unicode encodings like ISO-8859-1.

25、使用增补字符时,对于一个增补字符,使用CODEUNITS16 计算是两个UTF-16代码单元,而使用CODEUNITS32 计算则是一个UTF-32代码单元。

26、比方说,如果UTF-16数据原样加载到C字符串中,字符串可能从第一个ASCII字符的第二个字节截断。

27、因为每个字符都规定使用两个字节,同时假设没有代替的字符,所以 UTF-16编码在一定程度上减轻了这种*能损失。