Previous: Special Characters in Fidonet, Up: Special Characters [Index]
MsgEd TE fully supports FSC 0054. It uses two binary files, a read map and a write map. The read map contains tables that define how to convert all the known transport charsets to the local charset (this even includes transliteration tables for displaying Russian text on Western computers!), and similarly, the write map contains tables that define how to convert the local character set to any allowed transport character sets.
The default name of the character set translation files are READMAPS.DAT and WRITMAPS.DAT, and they are searched in the current working directory when MsgEd TE starts. (On Unix, the exact name and location of these files are determined by the defaults you put into the makefile, or ~/.msged.readmaps and ~/msged.writmaps for the precompiled binaries, while the RPM distribution might use yet other locations). You can change the file names and location to use with the ‘Readmap’ and ‘Writemap’ keywords (see The Readmap and Writemap Keywords).
The binary release of MsgEd TE ships with a variety of read maps and write maps. For each character set or codepage, there is a pair of read map and write map file which contains tables to convert from any charset set to this codepage and back.
Now all you need to do in order to configure MSGED
to properly encode
and decode all sorts of special characters is to determine which character
set or codepage your computer is using, and copy the corresponding pair of
read map and write map file to the default name and location as noted above,
or use the ‘Readmap’ and ‘Writemap’ keywords to point Msged to the
correct pair of read map and write map files.
The following tables lists all read map and write map files that MsgEd TE currently includes:
Filename | Type of font / character set / computer ------------------------------------------------------------------------- READMAPS.850 | Any DOS or OS/2 computer, or Windows computer with the OEM WRITMAPS.850 | console fonts, that uses Codepage 850. This applies to most | computers in Western Europe. Some badly configured Linux | systems might also use this code page. ------------------------------------------------------------------------- READMAPS.437 | Any DOS or OS/2 computer, or Windows computer with the OEM WRITMAPS.437 | fonts, that uses the standard codepage 437. This applies to | most US-American installations, and also to some European | ones that use Codepage 850 because it has some more IBM | graphics characters. ------------------------------------------------------------------------- READMAPS.865 | This is the "nordic" IBM Codepage 865 used in some WRITMAPS.865 | Northern European countries. ------------------------------------------------------------------------- READMAPS.866 | This is IBM Codepage 866, used by DOS and OS/2 or Windows WRITMAPS.866 | text mode applications in Russia. ------------------------------------------------------------------------- READMAPS.UKR | This is IBM Codepage 1125, used by some DOS and OS/2 or WRITMAPS.UKR | Windows installations for text mode applications in | Ukraine and Belorussia. ------------------------------------------------------------------------- READMAPS.IS1 | This is ISO 8859-1. It is the standard encoding for WRITMAPS.IS1 | X11 fonts and console fonts on Linux and Unix systems | in Western Europe. ------------------------------------------------------------------------- READMAPS.IS5 | This is ISO 8859-5. It is the encoding used for vendor WRITMAPS.IS5 | shipped additional fonts in Russian editions of commercial | Unix systems. ------------------------------------------------------------------------- READMAPS.KOI | This is KOI8-R, used by the cyrillic cronyx font set, WRITMAPS.KOI | which is in use in almost any Russian Linux or FreeBSD | installation, and can of course also be installed on | commercial Unix systems. -------------------------------------------------------------------------
So, enabling character set translation is easy, but you have to pay attention to use the correct pair of read map and write map file.
If you have the source code distribution of MsgEd TE, the bin subdiretory contains readmaps.dat and writmaps.dat files for all supported code pages. If you are compiling directly out of CVS, you can go to the maps subdirectory and compile and use the makemaps utiltiy to build the proper character set maps.
The following table lists all level 2 transport character sets that MsgEd TE understands when it finds them in mails (meaning all character sets that are defined in the READMAPS.DAT and WRITMAPS.DAT files):
@CHRS-Kludge: | Conventional Name: | Used by these computers: --------------+-----------------------+--------------------------- LATIN-1 2 | ISO 8859-1 | Western Unix, Amiga, Windows GUI MAC 2 | MAC | Macintosh computers CP437 2 | IBM PC, Codepage 437 | DOS, OS/2, Windows console CP850 2 | IBM PC, Codepage 850 | International / European versions | | of DOS, OS/2 and Windows. CP865 2 | IBM PC, Codepage 865 | Nordic versions of DOS, OS/2, Win CP866 2 | IBM PC, Codepage 866 | Russian OS/2, DOS, Windows CP1125 2 | IBM PC, Codepage 1125 | Ukrainian and Belorussian | | OS/2, DOS, Windows
Apart from that, MsgEd TE also understands the (outdated)
@CODEPAGE
kludge line.
Most READMAPS.DAT and WRITMAPS.DAT files define some more
charset names, like KOI8-R
, ISO-5
, and so on, but these kludge
lines are not intended for message transport, but only as an internal name
for MsgEd TE to see what charset your computer uses locally.
There are quite some other (though incorrect or obsolete) charset kludges in
widespread use in fidonet. The most prominent one is IBMPC 2
.
Originaly, IBMPC 2
meant the DOS codepage 437, but in the meantime, it
has been used to simply denote the codepage that a "IBM" computer is
typically using. This means that if a Russian user has a kludge that says
IBMPC 2
, he probably means CP866 2
, while if a German user has
IBMPC 2
, he probably means CP850 2
.
Msged supports use of those outdated or incorrect charset kludges with the
CharsetAlias
command. A Western European user should
probably put the following comand into his config file:
CharsetAlias IBMPC CP850
An American user might prefer
CharsetAlias IBMPC CP437
A Russian user needs a whole bunch of commands:
CharsetAlias IBMPC CP866 CharsetAlias +7_FIDO CP866 CharsetAlias RUFIDO CP866
Once FSC 0054 is activated, MsgEd TE will be able to correctly display all mails that have been created with one of the characters sets listed above, as long as they contain the proper charset kludge line.
Unfortunately, quite some still don’t - do tell their authors to correct
their setup! Until those users fix their editors, MsgEd TE offers you
two options. The one is the ‘AssumeCharset’ keyword, which sets a
default character set kludge name for mails that do not contain such a kludge
line. See The AssumeCharset
keyword, for more
information. The other option is the Alt-T key combination that
you can press in message reading mode. You will get a menu that allows you to
select from all supported codepages and character sets. You can use this menu
if you must read a mail that contains a wrong character set kludge line, or
one that does not contain such a kludge and does not meet the assumption you
made with ‘AssumeCharset’. See Miscellaneous
other Reader Functions, for more information.
Now that you have set up MsgEd TE so far, you are able to read almost all mails with correct 8 bit characters, and mails that you write yourself may also contain any special characters, but MsgEd TE will convert them into a 7 bit ASCII character set when you save them.
If you want to retain special characters in your mail by adding a charset
kludge (which is highly recommended), you have to do two things. First, you
have to define which character set to use for saving your mails
. See The OutputCharset
Keyword, for information on
how to do this. In most cases, you will simply want to add the following line
to your configuration file:
OutputCharset IBMPC
Although it would be better to use CP437
, CP850
, CP865
or CP866
(depending on your location), most readers still don’t support this, so using
IBMPC
might be better for the moment. If you use IBMPC
,
MsgEd TE will emit an addtional CODEPAGE
kludge line to make
clear which codepage you actually are using.
Exporting in
IBMPC
charset will even work on a Linux box, just if you were wondering.
Then, you have to tell MsgEd TE in which areas it is allowable to send mails with special characters and charset kludges. MsgEd TE will only write special characters into areas that have the ‘8’ flag set. If you are importing your message area configuration from a tosser configuration file, you can simply put the following line to the beginning of your configuration file:
AreaFileFlags 8
(If you already have an AreaFileFlags
statement, just add the ‘8’
character to its parameters). See The AreaFileFlags
Keyword, for more information.
This will allow MsgEd TE to use special characters in conjunction with a charset kludge in all message areas imported from a tosser configuration file.
If you only want to enable special characters for a few areas, you have to define these areas manually, giving each of them the ‘8’ flag individually. The same applies if you have to define your areas manually for other reasons. See Manual Area Definition, for more information on this.
Previous: Special Characters in Fidonet, Up: Special Characters [Index]