Windows/CharacterEncoding
Character encoding on windows
It is important that we can emit Unicode output.
The windows API is complicated in the way it represents strings, as it contains different functions working on different string representations. Such functions are usually identified by a suffix in their name: A
for ASCII, W
for wide character.
We work under the (more or less valid) assumption that windows supports UTF-16, and not UCS-2.
The string types are:
LPWSTR
of typewchar *
;LPSTR
of typechar *
;LPTSTR
which is the same as eitherLPSTR
orLPWSTR
, depending on compile-time configuration.
String literals may be:
"..."
of typeconst char *
;L"..."
of typeconst wchar *
;T"..."
which can be of either type, depending on compile-time configuration.
Internal memory
The p≡p Engine works internally using UTF-8 strings of type char *
, on every platform including windows – the exception being platform_windows.cpp
, which takes care of translating as needed.
Console
The p≡p Engine always prints to the console in UTF-8.
It is the user’s responsibility to configure the console, which by default uses some old 8-bit msdos code page — 850 for Europe, or some other code page elsewhere. The default console configuration is inadequate for Unicode and unusable for us.
Other output
In every other contexts on this platform, including system logs, the output is in wide chars. This is achieved using -W
functions.
Where to find documentation
Volker says that msdn is the only reliable source of information about the windows API. Many other people trying to document their experience work under the assumptions of some specific setup, and are not to be trusted.
On msdn ignore everything about the .net platform: we are interested in the “unmanaged” alternative, and may even use deprecated functions.