特殊 (Unicode区段)

包含一些特殊代码点和两个非字符的Unicode块

特殊字符是Unicode的一个简短的区段,分配在基本多文种平面的最末端,位于U+FFF0-FFFF。在这16个码位中,有5个是从Unicode 3.0开始分配的。

  • U+FFF9 行间注解锚,标志着注解文本的开始。
  • U+FFFA 行间注解分隔符,标记注解字符的开始。
  • U+FFFB 行间注解终止符,标志着注解块的结束。
  • U+FFFC  OBJECT REPLACEMENT CHARACTER,在文本中为另一个未指定的对象提供占位符,例如在一个复合文件中。
  • U+FFFD � REPLACEMENT CHARACTER(替换字符),用于替换一个未知的、不被认可的或无法表示的字符。
  • U+FFFE <非字符-FFFE> 不是一个字符。
  • U+FFFF <非字符-FFFF> 不是一个字符。
特殊字符
Specials
范围U+FFF0..U+FFFF
(16个码位)
平面基本多文种平面BMP
文字通用
已分配5个码位
未分配9个保留码位
2个非字符
统一码版本历史
1.0.01 (+1)
2.12 (+1)
3.05 (+3)
码表
点击此处
注释[1][2]

FFFE和FFFF不是通常意义上的未分配字符,但不是Unicode字符。它们可以用来猜测一个文本的编码方案,因为根据定义,任何包含这些的文本都不是一个正确编码的Unicode文本。Unicode的U+FEFF BYTE ORDER MARK字符可以插在Unicode文本的开头,以表示它的字节性:一个程序在阅读这样的文本并遇到0xFFFE时,就会知道它应该为后面的所有字符转换字节顺序。

它在Unicode 1.0中的区段名是特殊。[3]

特殊字符[1][2][3]
Official Unicode Consortium code chart (PDF)
  0 1 2 3 4 5 6 7 8 9 A B C D E F
U+FFFx IAA IAS IAT
注释
1.^ 依据Unicode 15.0
2.^ 灰色区域表示未分配的代码点。
3.^ 黑色区域表示非字符英语Universal_Character_Set_characters#Non-characters(保证在Unicode标准中永远不会被分配为编码字符的码位)。

历史

编辑

以下Unicode文档记录了定义本区块中特定字符的目的与过程:

版本 最终码位[a] 码位数 L2英语International Committee for Information Technology Standards ID WG2英语ISO/IEC JTC 1/SC 2 ID 文档
1.0.0 U+FFFD 1 (to be determined)
U+FFFE..FFFF 2 (to be determined)
L2/01-295R Moore, Lisa, Motion 88-M2, Minutes from the UTC/L2 meeting #88, 2001-11-06 
L2/01-355 N2369 (html, doc页面存档备份,存于互联网档案馆)) Davis, Mark, Request to allow FFFF, FFFE in UTF-8 in the text of ISO/IEC 10646, 2001-09-26 
L2/02-154 N2403页面存档备份,存于互联网档案馆 Umamaheswaran, V. S., 9.3 Allowing FFFF and FFFE in UTF-8, Draft minutes of WG 2 meeting 41, Hotel Phoenix, Singapore, 2001-10-15/19, 2002-04-22 
2.1 U+FFFC 1 UTC/1995-056 Sargent, Murray, Recommendation to encode a WCH_EMBEDDING character, 1995-12-06 
UTC/1996-002 Aliprand, Joan; Hart, Edwin; Greenfield, Steve, Embedded Objects, UTC #67 Minutes, 1996-03-05 
N1365 Sargent, Murray, Proposal Summary – Object Replacement Character, 1996-03-18 
N1353页面存档备份,存于互联网档案馆 Umamaheswaran, V. S.; Ksar, Mike, 8.14, Draft minutes of WG2 Copenhagen Meeting # 30, 1996-06-25 
L2/97-288 N1603页面存档备份,存于互联网档案馆 Umamaheswaran, V. S., 7.3, Unconfirmed Meeting Minutes, WG 2 Meeting # 33, Heraklion, Crete, Greece, 20 June – 4 July 1997, 1997-10-24 
L2/98-004R N1681 Text of ISO 10646 – AMD 18 for PDAM registration and FPDAM ballot, 1997-12-22 
L2/98-070 Aliprand, Joan; Winkler, Arnold, Additional comments regarding 2.1, Minutes of the joint UTC and L2 meeting from the meeting in Cupertino, February 25-27, 1998 
L2/98-318 N1894页面存档备份,存于互联网档案馆 Revised text of 10646-1/FPDAM 18, AMENDMENT 18: Symbols and Others, 1998-10-22 
3.0 U+FFF9..FFFB 3 L2/97-255R Aliprand, Joan, 3.D Proposal for In-Line Notation (ruby), Approved Minutes – UTC #73 & L2 #170 joint meeting, Palo Alto, CA – August 4-5, 1997, 1997-12-03 
L2/98-055 Freytag, Asmus, Support for Implementing Inline and Interlinear Annotations, 1998-02-22 
L2/98-070 Aliprand, Joan; Winkler, Arnold, 3.C.5. Support for implementing inline and interlinear annotations, Minutes of the joint UTC and L2 meeting from the meeting in Cupertino, February 25-27, 1998 
L2/98-099 N1727 Freytag, Asmus, Support for Implementing Interlinear Annotations as used in East Asian Typography, 1998-03-18 
L2/98-158 Aliprand, Joan; Winkler, Arnold, Inline and Interlinear Annotations, Draft Minutes – UTC #76 & NCITS Subgroup L2 #173 joint meeting, Tredyffrin, Pennsylvania, April 20-22, 1998, 1998-05-26 
L2/98-286 N1703页面存档备份,存于互联网档案馆 Umamaheswaran, V. S.; Ksar, Mike, 8.14, Unconfirmed Meeting Minutes, WG 2 Meeting #34, Redmond, WA, USA; 1998-03-16--20, 1998-07-02 
L2/98-270 Hiura, Hideki; Kobayashi, Tatsuo, Suggestion to the inline and interlinear annotation proposal, 1998-07-29 
L2/98-281R (pdf, html页面存档备份,存于互联网档案馆)) Aliprand, Joan, In-Line and Interlinear Annotation (III.C.1.c), Unconfirmed Minutes – UTC #77 & NCITS Subgroup L2 # 174 JOINT MEETING, Redmond, WA -- July 29-31, 1998, 1998-07-31 
L2/98-363 N1861页面存档备份,存于互联网档案馆 Sato, T. K., Ruby markers, 1998-09-01 
L2/98-372 N1884R2 (pdf, doc页面存档备份,存于互联网档案馆)) Whistler, Ken; et al, Additional Characters for the UCS, 1998-09-22 
L2/98-416 N1882.zip Support for Implementing Interlinear Annotations, 1998-09-23 
L2/98-329 N1920页面存档备份,存于互联网档案馆 Combined PDAM registration and consideration ballot on WD for ISO/IEC 10646-1/Amd. 30, AMENDMENT 30: Additional Latin and other characters, 1998-10-28 
L2/98-421R Suignard, Michel; Hiura, Hideki, Notes concerning the PDAM 30 interlinear annotation characters, 1998-12-04 
L2/99-010 N1903 (pdf, html页面存档备份,存于互联网档案馆), doc页面存档备份,存于互联网档案馆)) Umamaheswaran, V. S., 8.2.15, Minutes of WG 2 meeting 35, London, U.K.; 1998-09-21--25, 1998-12-30 
L2/98-419 (pdf, doc页面存档备份,存于互联网档案馆)) Aliprand, Joan, Interlinear Annotation Characters, Approved Minutes -- UTC #78 & NCITS Subgroup L2 # 175 Joint Meeting, San Jose, CA -- December 1-4, 1998, 1999-02-05 
UTC/1999-021 Duerst, Martin; Bosak, Jon, W3C XML CG statement on annotation characters, 1999-06-08 
L2/99-176R Moore, Lisa, W3C Liaison Statement on Annotation Characters, Minutes from the joint UTC/L2 meeting in Seattle, June 8-10, 1999, 1999-11-04 
L2/01-301 Whistler, Ken, E. Indicated as "strongly discouraged" for plain text interchange, Analysis of Character Deprecation in the Unicode Standard, 2001-08-01 
  1. ^ 建议的码位和字符名称可能与最终的结果不同。

参考资料

编辑
  1. ^ Unicode character database. The Unicode Standard. [2016-07-09]. (原始内容存档于2022-09-25). 
  2. ^ Enumerated Versions of The Unicode Standard. The Unicode Standard. [2016-07-09]. (原始内容存档于2016-06-29). 
  3. ^ 3.8: Block-by-Block Charts (PDF). The Unicode Standard. version 1.0. Unicode Consortium. [2022-09-30]. (原始内容存档 (PDF)于2016-02-11).