scuffed-code/icu4c/source/test/fuzzer/break_iterator_fuzzer.cc
Norbert Runge 219730e167 ICU-20217 Interprets fuzzer data as UCHar* instead of UTF-8. The conversion
from assumed UTF-8 resulted in an extremely large percentage of Unicode
replacement characters in the data passed to the API under test.

ICU-20217 Uses fuzzer generated bytes to make random selection of locales, converters,
etc., replacing the random number generator. This way the fuzzer can control
the selections.

ICU-20217 Minor follow-ups from code review.
Removes fuzzer target break_iterator_utf32_fuzzer which does not perform
anything useful what the regular break iterator fuzzer target already performs.

ICU-20217 Fixes for-loop body.

ICU-20217 Uses am allocated buffer to pass head-truncated fuzzer data to the
API under test. The fuzzer may otherwise not detect buffer underflow.
by

ICU-20217 Typing fix.

ICU-20217 Fixing typing.

ICU-20217 Improve fuzzer targets, move truncated fuzzer data into a
new buffer to prevent that buffer underflow goes undetected.

ICU-20217 Fixes buffer management of fuzzer-provided data.

ICU-20217 Factor in PR review comments.
2019-02-20 15:22:26 -08:00

71 lines
1.8 KiB
C++

// © 2019 and later: Unicode, Inc. and others.
// License & terms of use: http://www.unicode.org/copyright.html
#include <stddef.h>
#include <stdint.h>
#include <string.h>
#include <memory>
#include <utility>
#include "fuzzer_utils.h"
#include "unicode/brkiter.h"
#include "unicode/utext.h"
IcuEnvironment* env = new IcuEnvironment();
extern "C" int LLVMFuzzerTestOneInput(const uint8_t* data, size_t size) {
UErrorCode status = U_ZERO_ERROR;
uint8_t rnd8 = 0;
uint16_t rnd16 = 0;
if (size < 3) {
return 0;
}
// Extract one and two bytes from fuzzer data for random selection purpose.
rnd8 = *data;
data++;
rnd16 = *(reinterpret_cast<const uint16_t *>(data));
data = data + 2;
size = size - 3;
size_t unistr_size = size/2;
std::unique_ptr<char16_t[]> fuzzbuff(new char16_t[unistr_size]);
std::memcpy(fuzzbuff.get(), data, unistr_size * 2);
UText* fuzzstr = utext_openUChars(nullptr, fuzzbuff.get(), unistr_size, &status);
const icu::Locale& locale = GetRandomLocale(rnd16);
std::unique_ptr<icu::BreakIterator> bi;
switch (rnd8 % 5) {
case 0:
bi.reset(icu::BreakIterator::createWordInstance(locale, status));
break;
case 1:
bi.reset(icu::BreakIterator::createLineInstance(locale, status));
break;
case 2:
bi.reset(icu::BreakIterator::createCharacterInstance(locale, status));
break;
case 3:
bi.reset(icu::BreakIterator::createSentenceInstance(locale, status));
break;
case 4:
bi.reset(icu::BreakIterator::createTitleInstance(locale, status));
break;
}
bi->setText(fuzzstr, status);
if (U_FAILURE(status)) {
utext_close(fuzzstr);
return 0;
}
for (int32_t p = bi->first(); p != icu::BreakIterator::DONE; p = bi->next()) {}
utext_close(fuzzstr);
return 0;
}