|
The m17n Library
1.8.4
|
Charset objects and API for them. More...

Macros | |
| #define | MCHAR_INVALID_CODE |
| Invalid code-point. More... | |
Functions | |
| MSymbol | mchar_define_charset (const char *name, MPlist *plist) |
| MSymbol | mchar_resolve_charset (MSymbol symbol) |
| Resolve charset name. More... | |
| int | mchar_list_charset (MSymbol **symbols) |
| List symbols representing charsets. More... | |
| int | mchar_decode (MSymbol charset_name, unsigned code) |
| Decode a code-point. More... | |
| unsigned | mchar_encode (MSymbol charset_name, int c) |
| Encode a character code. More... | |
| int | mchar_map_charset (MSymbol charset_name, void(*func)(int from, int to, void *arg), void *func_arg) |
| Call a function for all the characters in a specified charset. More... | |
Variables | |
| MSymbol | Mcharset |
Variables: Symbols representing a charset. | |
Each of the following symbols represents a predefined charset. | |
| MSymbol | Mcharset_ascii |
| Symbol representing the charset ASCII. More... | |
| MSymbol | Mcharset_iso_8859_1 |
| Symbol representing the charset ISO/IEC 8859/1. More... | |
| MSymbol | Mcharset_unicode |
| Symbol representing the charset Unicode. More... | |
| MSymbol | Mcharset_m17n |
| Symbol representing the largest charset. More... | |
| MSymbol | Mcharset_binary |
| Symbol representing the charset for ill-decoded characters. More... | |
Variables: Parameter keys for mchar_define_charset(). | |
These are the predefined symbols to use as parameter keys for the function mchar_define_charset() (which see). | |
| MSymbol | Mmethod |
| MSymbol | Mdimension |
| MSymbol | Mmin_range |
| MSymbol | Mmax_range |
| MSymbol | Mmin_code |
| MSymbol | Mmax_code |
| MSymbol | Mascii_compatible |
| MSymbol | Mfinal_byte |
| MSymbol | Mrevision |
| MSymbol | Mmin_char |
| MSymbol | Mmapfile |
| MSymbol | Mparents |
| MSymbol | Msubset_offset |
| MSymbol | Mdefine_coding |
| MSymbol | Maliases |
Variables: Symbols representing charset methods. | |
These are the predefined symbols that can be a value of the Mmethod parameter of a charset used in an argument to the mchar_define_charset() function. A method specifies how code-points and character codes are converted. See the documentation of the mchar_define_charset() function for the details. | |
| MSymbol | Moffset |
| MSymbol | Mmap |
| Symbol for the map type method of charset. More... | |
| MSymbol | Munify |
| Symbol for the unify type method of charset. More... | |
| MSymbol | Msubset |
| MSymbol | Msuperset |
| Symbol for the superset type method of charset. More... | |
Charset objects and API for them.
The symbol Mcharset.
The m17n library uses charset objects to represent a coded character sets (CCS). The m17n library supports many predefined coded character sets. Moreover, application programs can add other charsets. A character can belong to multiple charsets.
The m17n library distinguishes the following three concepts:
unsigned is used to represent a code-point. An invalid code-point is represented by the macro MCHAR_INVALID_CODE.Each charset object defines how characters are converted between code-points and character codes. To encode means converting code-points to character codes and to decode means converting character codes to code-points.
Any decoded M-text has a text property whose key is the predefined symbol Mcharset. The name of Mcharset is "charset".
| #define MCHAR_INVALID_CODE |
Invalid code-point.
The macro MCHAR_INVALID_CODE gives the invalid code-point.
| MSymbol mchar_define_charset | ( | const char * | name, |
| MPlist * | plist | ||
| ) |
| MSymbol mchar_resolve_charset | ( | MSymbol | symbol | ) |
Resolve charset name.
The mchar_resolve_charset() function returns symbol if it represents a charset. Otherwise, canonicalize symbol as to a charset name, and if the canonicalized name represents a charset, return it. Otherwise, return Mnil.
| int mchar_list_charset | ( | MSymbol ** | symbols | ) |
List symbols representing charsets.
The mchar_list_charsets() function makes an array of symbols representing a charset, stores the pointer to the array in a place pointed to by symbols, and returns the length of the array.
| int mchar_decode | ( | MSymbol | charset_name, |
| unsigned | code | ||
| ) |
Decode a code-point.
The mchar_decode() function decodes code-point code in the charset represented by the symbol charset_name to get a character code.
| unsigned mchar_encode | ( | MSymbol | charset_name, |
| int | c | ||
| ) |
Encode a character code.
The mchar_encode() function encodes character code c to get a code-point in the charset represented by the symbol charset_name.
| int mchar_map_charset | ( | MSymbol | charset_name, |
| void(*)(int from, int to, void *arg) | func, | ||
| void * | func_arg | ||
| ) |
Call a function for all the characters in a specified charset.
The mcharset_map_chars() function calls func for all the characters in the charset named charset_name. A call is done for a chunk of consecutive characters rather than character by character.
func receives three arguments: from, to, and arg. from and to specify the range of character codes in charset. arg is the same as func_arg.
MERROR_CHARSET | MSymbol Mcharset_ascii |
Symbol representing the charset ASCII.
The symbol Mcharset_ascii has name "ascii" and represents the charset ISO 646, USA Version X3.4-1968 (ISO-IR-6).
| MSymbol Mcharset_iso_8859_1 |
Symbol representing the charset ISO/IEC 8859/1.
The symbol Mcharset_iso_8859_1 has name "iso-8859-1" and represents the charset ISO/IEC 8859-1:1998.
| MSymbol Mcharset_unicode |
Symbol representing the charset Unicode.
The symbol Mcharset_unicode has name "unicode" and represents the charset Unicode.
| MSymbol Mcharset_m17n |
Symbol representing the largest charset.
The symbol Mcharset_m17n has name "m17n" and represents the charset that contains all characters supported by the m17n library.
| MSymbol Mcharset_binary |
Symbol representing the charset for ill-decoded characters.
The symbol Mcharset_binary has name "binary" and represents the fake charset which the decoding functions put to an M-text as a text property when they encounter an invalid byte (sequence).
See Code Conversion for more details.
| MSymbol Mmethod |
| MSymbol Mdimension |
| MSymbol Mmin_range |
| MSymbol Mmax_range |
| MSymbol Mmin_code |
| MSymbol Mmax_code |
| MSymbol Mascii_compatible |
| MSymbol Mfinal_byte |
| MSymbol Mrevision |
| MSymbol Mmin_char |
| MSymbol Mmapfile |
| MSymbol Mparents |
| MSymbol Msubset_offset |
| MSymbol Mdefine_coding |
| MSymbol Maliases |
| MSymbol Moffset |
@brief Symbol for the offset type method of charset. The symbol #Moffset has the name <tt>"offset"</tt> and, when used as a value of @b Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + MIN-CHAR
where, MIN-CODE is a value of @b Mmin_code parameter of the charset, and MIN-CHAR is a value of @b Mmin_char parameter.
| MSymbol Mmap |
Symbol for the map type method of charset.
The symbol Mmap has the name "map" and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up. The map must be given by Mmapfile parameter.
| MSymbol Munify |
Symbol for the unify type method of charset.
The symbol Munify has the name "unify" and, when used as a value of Mmethod parameter of a charset, it means that the conversion of code-points and character codes of the charset is done by map looking up and offsetting. The map must be given by Mmapfile parameter. For this kind of charset, a unique continuous character code space for all characters is assigned.
If the map has an entry for a code-point, the conversion is done by looking up the map. Otherwise, the conversion is done by this calculation:
CHARACTER-CODE = CODE-POINT - MIN-CODE + LOWEST-CHAR-CODE
where, MIN-CODE is a value of @b Mmin_code parameter of the charset, and LOWEST-CHAR-CODE is the lowest character code of the assigned code space.
| MSymbol Msubset |
@brief Symbol for the subset type method of charset. The symbol #Msubset has the name <tt>"subset"</tt> and, when used as a value of @b Mmethod parameter of a charset, it means that the charset is a subset of a parent charset. The parent charset must be given by @b Mparents parameter. The conversion of code-points and character codes of the charset is done conceptually by this calculation:
CHARACTER-CODE = PARENT-CODE (CODE-POINT) + SUBSET-OFFSET
where, PARENT-CODE is a pseudo function that returns a character code of CODE-POINT in the parent charset, and SUBSET-OFFSET is a value given by @b Msubset_offset parameter.
| MSymbol Msuperset |
Symbol for the superset type method of charset.
The symbol Msuperset has the name "superset" and, when used as a value of Mmethod parameter of a charset, it means that the charset is a superset of parent charsets. The parent charsets must be given by Mparents parameter.
| MSymbol Mcharset |