Rectangle 27 21

First, you don't mention what platform you're targeting. Although recent Windows versions (Win2000, WinXP, Vista and Win7) support both Multibyte and Unicode versions of system calls using strings, the Unicode versions are faster (the multibyte versions are wrappers that convert to Unicode, call the Unicode version, then convert any returned strings back to mutlibyte). So if you're making a lot of these types of calls the Unicode will be faster.

Just because you're not planning on explicitly supporting additional languages, you should still consider supporting Unicode if your application saves and displays text entered by the users. Just because your application is unilingual, it doesn't follow that all it's users will be unilingual too. They may be perfectly happy to use your English language GUI, but might want to enter names, comments or other text in their own language and have them displayed properly.

"you should still consider supporting Unicode if your application saves and displays text entered by the users" - and if your application wants to deal with paths with arbitrary characters - and if it deals in any way with paths, it should.

This is exactly what I wanted to hear.. that one is a wrapper for the other. Unicode all the way baby.

visual c++ - C++ project type: unicode vs multi-byte; pros and cons - ...

c++ visual-c++ unicode ansi
Rectangle 27 0

a Japanese character gets converted to two ASCII chars ( -> "w). I assume that's correct?

No, that character, U+6F22, should be converted to three bytes: 0xE6 0xBC 0xA2

In UTF-16 (little endian) U+6F22 is stored in memory as 0x22 0x6F, which would look like "o in ascii (rather than "w) so it looks like something is wrong with your conversion from String^ to std::string.

I'm not familiar enough with String^ to know the right way to convert from String^ to std::wstring, but I'm pretty sure that's where your problem is.

I don't think the following has anything to do with your problem, but it is obviously wrong:

std::string strTo;
char *szTo = new char[wsValue.length() + 1];

You already know a single wide character can produce multiple narrow characters, so the number of wide characters is obviously not necessarily equal to or greater than the number of corresponding narrow characters.

You need to use WideCharToMultiByte to calculate the buffer size, and then call it again with a buffer of that size. Or you can just allocate a buffer to hold 3 times the number of chars as wide chars.

string - Unicode <-> Multibyte conversion (native vs. managed) - Stack...

string unicode native managed multibyte
Rectangle 27 0

First, you don't mention what platform you're targeting. Although recent Windows versions (Win2000, WinXP, Vista and Win7) support both Multibyte and Unicode versions of system calls using strings, the Unicode versions are faster (the multibyte versions are wrappers that convert to Unicode, call the Unicode version, then convert any returned strings back to mutlibyte). So if you're making a lot of these types of calls the Unicode will be faster.

Just because you're not planning on explicitly supporting additional languages, you should still consider supporting Unicode if your application saves and displays text entered by the users. Just because your application is unilingual, it doesn't follow that all it's users will be unilingual too. They may be perfectly happy to use your English language GUI, but might want to enter names, comments or other text in their own language and have them displayed properly.

"you should still consider supporting Unicode if your application saves and displays text entered by the users" - and if your application wants to deal with paths with arbitrary characters - and if it deals in any way with paths, it should.

This is exactly what I wanted to hear.. that one is a wrapper for the other. Unicode all the way baby.

visual c++ - C++ project type: unicode vs multi-byte; pros and cons - ...

c++ visual-c++ unicode ansi
Rectangle 27 0

static String^ FromNativeToDotNet(std::string value)
{
    array<Byte>^ bytes = gcnew array<Byte>(value.length());
    System::Runtime::InteropServices::Marshal::Copy(IntPtr((void*)value.c_str()), bytes, 0, value.length());
    return (gcnew System::Text::UTF8Encoding)->GetString(bytes);
}


static std::string FromDotNetToNative(String^ value)
{ 
    array<Byte>^ bytes = (gcnew System::Text::UTF8Encoding)->GetBytes(value);
    pin_ptr<Byte> chars = &bytes[0];
    return std::string((char*)chars, bytes->Length);
}

string - Unicode <-> Multibyte conversion (native vs. managed) - Stack...

string unicode native managed multibyte
Rectangle 27 0

String^ FromNativeToDotNet(std::string value)
{
  // Convert a UTF-8 string to a UTF-16 String
  int len = MultiByteToWideChar(CP_UTF8, 0, value.c_str(), value.length(), NULL, 0);
  if (len > 0)
  {
    std::vector<wchar_t> wszTo(len);
    MultiByteToWideChar(CP_UTF8, 0, value.c_str(), value.length(), &wszTo[0], len);
    return gcnew String(&wszTo[0], 0, len);
  }

  return gcnew String((wchar_t*)NULL);
}

std::string FromDotNetToNative(String^ value)
{ 
  // Pass on changes to native part
  pin_ptr<const wchar_t> wcValue = SafePtrToStringChars(value);

  // Convert a UTF-16 string to a UTF-8 string
  int len = WideCharToMultiByte(CP_UTF8, 0, wcValue, str->Length, NULL, 0, NULL, NULL);
  if (len > 0)
  {
    std::vector<char> szTo(len);
    WideCharToMultiByte(CP_UTF8, 0, wcValue, str->Length, &szTo[0], len, NULL, NULL);
    return std::string(&szTo[0], len);
  }

  return std::string();
}

You cannot convert from Unicode to UTF-8, because UTF-8 is already Unicode!

Semantics. Windows and .NET use UTF-16 as the encoding for Unicode strings. I changed the comments in my answer accordingly, but the code remains the same.

The Unicode Standard actually defines these terms. Microsoft is not free to embrace, extend, embellish, and extinguish terms carefully defined in accepted international standards for their own nefarious purposes. At best, it propagates inaccuracy and confusion. At middle, it is a lie. I have no idea what the worst is, because the outrageous schemes dreamt up by Microsoft for their own monopolistic betterment far exceed my own imagination. These words have standard meanings; I strongly suggest you use them.

string - Unicode <-> Multibyte conversion (native vs. managed) - Stack...

string unicode native managed multibyte
Rectangle 27 0

First, you don't mention what platform you're targeting. Although recent Windows versions (Win2000, WinXP, Vista and Win7) support both Multibyte and Unicode versions of system calls using strings, the Unicode versions are faster (the multibyte versions are wrappers that convert to Unicode, call the Unicode version, then convert any returned strings back to mutlibyte). So if you're making a lot of these types of calls the Unicode will be faster.

Just because you're not planning on explicitly supporting additional languages, you should still consider supporting Unicode if your application saves and displays text entered by the users. Just because your application is unilingual, it doesn't follow that all it's users will be unilingual too. They may be perfectly happy to use your English language GUI, but might want to enter names, comments or other text in their own language and have them displayed properly.

"you should still consider supporting Unicode if your application saves and displays text entered by the users" - and if your application wants to deal with paths with arbitrary characters - and if it deals in any way with paths, it should.

This is exactly what I wanted to hear.. that one is a wrapper for the other. Unicode all the way baby.

visual c++ - C++ project type: unicode vs multi-byte; pros and cons - ...

c++ visual-c++ unicode ansi