Character codes

Both Perl 5 and Raku have good Unicode support, though Raku attempts to make working with Unicode effortless. Note that even multi-byte emoji and characters outside the BMP are considered single characters. Also note: all of these routines are built into the base compiler. No need to load external libraries. See Wikipedia: Unicode character properties for explanation of Unicode property.

for 'AΑАπͺš₯πŸ‡ΊπŸ‡ΈπŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦'.comb {
    .put for
    [ 'Character',
      'Character name',
      'Unicode property',
      'Unicode script',
      'Unicode block',
      'Added in Unicode version',
      'Ordinal(s)',
      'Hex ordinal(s)',
      'UTF-8',
      'UTF-16LE',
      'UTF-16BE',
      'Round trip by name',
      'Round trip by ordinal'
    ]Β».fmt('%25s:')
    Z
    [ $_,
      .uninames.join(', '),
      .uniprops.join(', '),
      .uniprops('Script').join(', '),
      .uniprops('Block').join(', '),
      .uniprops('Age').join(', '),
      .ords,
      .ords.fmt('0x%X'),
      .encode('utf8'   )Β».fmt('%02X'),
      .encode('utf16le')Β».fmt('%02X').join.comb(4),
      .encode('utf16be')Β».fmt('%02X').join.comb(4),
      .uninamesΒ».uniparse.join,
      .ords.chrs
    ];
    say '';
}

Output:

                Character: A
           Character name: LATIN CAPITAL LETTER A
         Unicode property: Lu
           Unicode script: Latin
            Unicode block: Basic Latin
 Added in Unicode version: 1.1
               Ordinal(s): 65
           Hex ordinal(s): 0x41
                    UTF-8: 41
                 UTF-16LE: 4100
                 UTF-16BE: 0041
       Round trip by name: A
    Round trip by ordinal: A

                Character: Ξ‘
           Character name: GREEK CAPITAL LETTER ALPHA
         Unicode property: Lu
           Unicode script: Greek
            Unicode block: Greek and Coptic
 Added in Unicode version: 1.1
               Ordinal(s): 913
           Hex ordinal(s): 0x391
                    UTF-8: CE 91
                 UTF-16LE: 9103
                 UTF-16BE: 0391
       Round trip by name: Ξ‘
    Round trip by ordinal: Ξ‘

                Character: А
           Character name: CYRILLIC CAPITAL LETTER A
         Unicode property: Lu
           Unicode script: Cyrillic
            Unicode block: Cyrillic
 Added in Unicode version: 1.1
               Ordinal(s): 1040
           Hex ordinal(s): 0x410
                    UTF-8: D0 90
                 UTF-16LE: 1004
                 UTF-16BE: 0410
       Round trip by name: А
    Round trip by ordinal: А

                Character: πͺš₯
           Character name: CJK UNIFIED IDEOGRAPH-2A6A5
         Unicode property: Lo
           Unicode script: Han
            Unicode block: CJK Unified Ideographs Extension B
 Added in Unicode version: 3.1
               Ordinal(s): 173733
           Hex ordinal(s): 0x2A6A5
                    UTF-8: F0 AA 9A A5
                 UTF-16LE: 69D8 A5DE
                 UTF-16BE: D869 DEA5
       Round trip by name: πͺš₯
    Round trip by ordinal: πͺš₯

                Character: πŸ‡ΊπŸ‡Έ
           Character name: REGIONAL INDICATOR SYMBOL LETTER U, REGIONAL INDICATOR SYMBOL LETTER S
         Unicode property: So, So
           Unicode script: Common, Common
            Unicode block: Enclosed Alphanumeric Supplement, Enclosed Alphanumeric Supplement
 Added in Unicode version: 6.0, 6.0
               Ordinal(s): 127482 127480
           Hex ordinal(s): 0x1F1FA 0x1F1F8
                    UTF-8: F0 9F 87 BA F0 9F 87 B8
                 UTF-16LE: 3CD8 FADD 3CD8 F8DD
                 UTF-16BE: D83C DDFA D83C DDF8
       Round trip by name: πŸ‡ΊπŸ‡Έ
    Round trip by ordinal: πŸ‡ΊπŸ‡Έ

                Character: πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦
           Character name: MAN, ZERO WIDTH JOINER, WOMAN, ZERO WIDTH JOINER, GIRL, ZERO WIDTH JOINER, BOY
         Unicode property: So, Cf, So, Cf, So, Cf, So
           Unicode script: Common, Inherited, Common, Inherited, Common, Inherited, Common
            Unicode block: Miscellaneous Symbols and Pictographs, General Punctuation, Miscellaneous Symbols and Pictographs, General Punctuation, Miscellaneous Symbols and Pictographs, General Punctuation, Miscellaneous Symbols and Pictographs
 Added in Unicode version: 6.0, 1.1, 6.0, 1.1, 6.0, 1.1, 6.0
               Ordinal(s): 128104 8205 128105 8205 128103 8205 128102
           Hex ordinal(s): 0x1F468 0x200D 0x1F469 0x200D 0x1F467 0x200D 0x1F466
                    UTF-8: F0 9F 91 A8 E2 80 8D F0 9F 91 A9 E2 80 8D F0 9F 91 A7 E2 80 8D F0 9F 91 A6
                 UTF-16LE: 3DD8 68DC 0D20 3DD8 69DC 0D20 3DD8 67DC 0D20 3DD8 66DC
                 UTF-16BE: D83D DC68 200D D83D DC69 200D D83D DC67 200D D83D DC66
       Round trip by name: πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦
    Round trip by ordinal: πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘¦

Last updated