Skip to content

Commit

Permalink
Unify UTF-8 handling using til::u8u16 & revise WriteConsoleAImpl (#4422)
Browse files Browse the repository at this point in the history
Replace `utf8Parser` with `til::u8u16` in order to have the same
conversion algorithms used in terminal and conhost.

This PR addresses item 2 in this list:
1. ✉ Implement `til::u8u16` and `til::u16u8` (done in PR #4093)
2. ✔ **Unify UTF-8 handling using `til::u8u16` (this PR)**
    2.1. ✔ **Update VtInputThread::_HandleRunInput()**
    2.2. ✔ **Update ApiRoutines::WriteConsoleAImpl()**
    2.3. ❌ (optional / ask the core team) Remove Utf8ToWideCharParser from the code base to avoid further use
3. ❌ Enable BOM discarding (follow up)
    3.1. ❌ extend `til::u8u16` and `til::u16u8` with a 3rd parameter to enable discarding the BOM
    3.2. ❌ Make use of the 3rd parameter to discard the BOM in all current function callers, or (optional / ask the core team) make it the default for  `til::u8u16` and `til::u16u8` 
4. ❌ Find UTF-16 to UTF-8 conversions and examine if they can be unified, too (follow up)

Closes #4086
Closes #3378
  • Loading branch information
german-one authored Feb 4, 2020
1 parent 0d92f71 commit 06b3931
Show file tree
Hide file tree
Showing 4 changed files with 86 additions and 148 deletions.
25 changes: 11 additions & 14 deletions src/host/VtInputThread.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ VtInputThread::VtInputThread(_In_ wil::unique_hfile hPipe,
const bool inheritCursor) :
_hFile{ std::move(hPipe) },
_hThread{},
_utf8Parser{ CP_UTF8 },
_u8State{},
_dwThreadId{ 0 },
_exitRequested{ false },
_exitResult{ S_OK }
Expand All @@ -47,15 +47,14 @@ VtInputThread::VtInputThread(_In_ wil::unique_hfile hPipe,
}

// Method Description:
// - Processes a buffer of input characters. The characters should be utf-8
// encoded, and will get converted to wchar_t's to be processed by the
// - Processes a string of input characters. The characters should be UTF-8
// encoded, and will get converted to wstring to be processed by the
// input state machine.
// Arguments:
// - charBuffer - the UTF-8 characters recieved.
// - cch - number of UTF-8 characters in charBuffer
// - u8Str - the UTF-8 string received.
// Return Value:
// - S_OK on success, otherwise an appropriate failure.
[[nodiscard]] HRESULT VtInputThread::_HandleRunInput(_In_reads_(cch) const byte* const charBuffer, const int cch)
[[nodiscard]] HRESULT VtInputThread::_HandleRunInput(const std::string_view u8Str)
{
// Make sure to call the GLOBAL Lock/Unlock, not the gci's lock/unlock.
// Only the global unlock attempts to dispatch ctrl events. If you use the
Expand All @@ -67,16 +66,14 @@ VtInputThread::VtInputThread(_In_ wil::unique_hfile hPipe,

try
{
std::unique_ptr<wchar_t[]> pwsSequence;
unsigned int cchConsumed;
unsigned int cchSequence;
auto hr = _utf8Parser.Parse(charBuffer, cch, cchConsumed, pwsSequence, cchSequence);
std::wstring wstr{};
auto hr = til::u8u16(u8Str, wstr, _u8State);
// If we hit a parsing error, eat it. It's bad utf-8, we can't do anything with it.
if (FAILED(hr))
{
return S_FALSE;
}
_pInputStateMachine->ProcessString({ pwsSequence.get(), cchSequence });
_pInputStateMachine->ProcessString(wstr);
}
CATCH_RETURN();

Expand All @@ -100,12 +97,12 @@ DWORD WINAPI VtInputThread::StaticVtInputThreadProc(_In_ LPVOID lpParameter)
// failed, throw or log, depending on what the caller wants.
// Arguments:
// - throwOnFail: If true, throw an exception if there was an error processing
// the input recieved. Otherwise, log the error.
// the input received. Otherwise, log the error.
// Return Value:
// - <none>
void VtInputThread::DoReadInput(const bool throwOnFail)
{
byte buffer[256];
char buffer[256];
DWORD dwRead = 0;
bool fSuccess = !!ReadFile(_hFile.get(), buffer, ARRAYSIZE(buffer), &dwRead, nullptr);

Expand All @@ -120,7 +117,7 @@ void VtInputThread::DoReadInput(const bool throwOnFail)
return;
}

HRESULT hr = _HandleRunInput(buffer, dwRead);
HRESULT hr = _HandleRunInput({ buffer, gsl::narrow_cast<size_t>(dwRead) });
if (FAILED(hr))
{
if (throwOnFail)
Expand Down
5 changes: 2 additions & 3 deletions src/host/VtInputThread.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ Author(s):
#pragma once

#include "..\terminal\parser\StateMachine.hpp"
#include "utf8ToWideCharParser.hpp"

namespace Microsoft::Console
{
Expand All @@ -29,7 +28,7 @@ namespace Microsoft::Console
void DoReadInput(const bool throwOnFail);

private:
[[nodiscard]] HRESULT _HandleRunInput(_In_reads_(cch) const byte* const charBuffer, const int cch);
[[nodiscard]] HRESULT _HandleRunInput(const std::string_view u8Str);
DWORD _InputThread();

wil::unique_hfile _hFile;
Expand All @@ -40,6 +39,6 @@ namespace Microsoft::Console
HRESULT _exitResult;

std::unique_ptr<Microsoft::Console::VirtualTerminal::StateMachine> _pInputStateMachine;
Utf8ToWideCharParser _utf8Parser;
til::u8state _u8State;
};
}
Loading

0 comments on commit 06b3931

Please sign in to comment.