Raku works with unicode natively and handles combining characters and multi-byte emoji correctly. In the last string, notice the the length is correctly shown as 11 characters and that the delta with a combining circumflex in position 6 is not the same as the deltas without in positions 5 & 9.
Copy -> $str {
my $i = 0;
print "\n{$str.raku} (length: {$str.chars}), has " ;
my %m = $str.comb.Bag;
if any(%m. values ) > 1 {
say "duplicated characters:" ;
say "'{.key}' ({.key.uninames}; hex ordinal: {(.key.ords).fmt: " 0x%X "})" ~
" in positions: {.value.join: ', '}" for %m. grep ( *.value > 1 ). sort ( *.value[0] );
} else {
say "no duplicated characters."
}
} for
'' ,
'.' ,
'abcABC' ,
'XYZ ZYX' ,
'1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ' ,
'01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X' ,
'๐ฆ๐๐จโ๐ฉโ๐งโ๐ฆ๐ฮฮฬ ๐ฆฮ๐๐จโ๐ฉโ๐งโ๐ฆ'
Copy "" (length: 0), has no duplicated characters.
"." (length: 1), has no duplicated characters.
"abcABC" (length: 6), has no duplicated characters.
"XYZ ZYX" (length: 7), has duplicated characters:
'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 1, 7
'Y' (LATIN CAPITAL LETTER Y; hex ordinal: 0x59) in positions: 2, 6
'Z' (LATIN CAPITAL LETTER Z; hex ordinal: 0x5A) in positions: 3, 5
"1234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ" (length: 36), has duplicated characters:
'0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 10, 25
"01234567890ABCDEFGHIJKLMN0PQRSTUVWXYZ0X" (length: 39), has duplicated characters:
'0' (DIGIT ZERO; hex ordinal: 0x30) in positions: 1, 11, 26, 38
'X' (LATIN CAPITAL LETTER X; hex ordinal: 0x58) in positions: 35, 39
"๐ฆ๐๐จโ๐ฉโ๐งโ๐ฆ๐ฮฮฬ ๐ฆฮ๐๐จโ๐ฉโ๐งโ๐ฆ" (length: 11), has duplicated characters:
'๐ฆ' (BUTTERFLY; hex ordinal: 0x1F98B) in positions: 1, 8
'๐จโ๐ฉโ๐งโ๐ฆ' (MAN ZERO WIDTH JOINER WOMAN ZERO WIDTH JOINER GIRL ZERO WIDTH JOINER BOY; hex ordinal: 0x1F468 0x200D 0x1F469 0x200D 0x1F467 0x200D 0x1F466) in positions: 3, 11
'ฮ' (GREEK CAPITAL LETTER DELTA; hex ordinal: 0x394) in positions: 5, 9