ICU-544 information about and tools for gb 18030
X-SVN-Rev: 2762
This commit is contained in:
parent
be45790deb
commit
2be0117179
120
icu4c/source/tools/makeconv/gb18030/gb18030.html
Normal file
120
icu4c/source/tools/makeconv/gb18030/gb18030.html
Normal file
@ -0,0 +1,120 @@
|
||||
<html lang="en">
|
||||
<head>
|
||||
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
|
||||
<title>GB 18030</title>
|
||||
</head>
|
||||
|
||||
<!-- Copyright (C) 2000, International Business Machines Corporation and others. All Rights Reserved. -->
|
||||
|
||||
<body>
|
||||
<h1>GB 18030</h1>
|
||||
<p align="right">Markus Scherer, 2000-oct-21</p>
|
||||
|
||||
<p>GB 18030 is a new Chinese codepage standard, published 2000-mar-17, that is designed for
|
||||
<ul>
|
||||
<li>Upwards compatibility with the GB 2312-1980 standard</li>
|
||||
<li>Compatibility with the GBK specification, updated for Unicode 3.0</li>
|
||||
<li>Full coverage of all Unicode code points similar to a UTF</li>
|
||||
</ul></p>
|
||||
|
||||
<p>Byte sequence structure:
|
||||
<ul>
|
||||
<li>Single-byte: 00-80</li>
|
||||
<li>Two-byte: 81-fe | 40-7e, 80-fe</li>
|
||||
<li>Four-byte: 81-fe | 30-39 | 81-fe | 30-39</li>
|
||||
</ul></p>
|
||||
|
||||
<p>Special properties of GB 18030:
|
||||
<ul>
|
||||
<li>Huge: 1.6 million codepage code points — probably the largest codepage</li>
|
||||
<li>Similar to UTF: All 1.1 million Unicode code points U+0000-U+10ffff map to GB 18030 codes.
|
||||
All but 79 Unicode code points can be mapped from GB 18030.
|
||||
(I.e., there are 79 Unicode code points with only fallback mappings to GB 18030.)</li>
|
||||
<li>Most of these mappings, except for parts of the BMP, can be done algorithmically.
|
||||
This makes it an unusual mix of a Unicode encoding with a traditional codepage.</li>
|
||||
<li>It is not possible for all codepage byte sequences to determine the length of
|
||||
the sequence from the first byte. This is unusual.</li>
|
||||
</ul></p>
|
||||
|
||||
<h2>Generating a GB 18030 mapping table</h2>
|
||||
|
||||
<p>GB 18030 is derived from existing standards and specifications,
|
||||
and a mapping table can be generated from existing data.
|
||||
<em>Note: </em>Following this description does not guarantee compatibility with
|
||||
the standard or any particular implementation.
|
||||
This section is most useful for understanding the genesis and structure of GB 18030.</p>
|
||||
<ol>
|
||||
<li>GBK is a specification (not a standard) that is an extension of GB 2312-1980
|
||||
to cover the ideographs in Unicode 2.0. Microsoft co-authored GBK.
|
||||
Get a GBK table, e.g. the one for Microsoft Windows 2000 codepage 936 from ICU sample charsets.</li>
|
||||
<li>From the Microsoft codepage table, remove all fallback mappings and the one for GB+ff.
|
||||
Note that the Windows 2000 version contains the Euro sign at GB+80=U+20ac.
|
||||
Leave it in there for GB 18030.</li>
|
||||
<li>Get a copy of appendix E of the GB 18030 standard.
|
||||
There are 79 characters with "temporary" and "new" Unicode mappings.
|
||||
The temporary ones map to private-use code points because the characters were not assigned in Unicode 2.0.
|
||||
In the data, change them from roundtrip mappings to fallbacks.
|
||||
The new mappings are to Unicode 3.0 code points.
|
||||
Add them as roundtrip mappings to your data.</li>
|
||||
<li>U+0080 is not currently mapped by the standard.
|
||||
Also, there is a small number of known errors, typos, and ambiguities in the original standard publication.
|
||||
See <a href="ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf">this summary</a>.
|
||||
I have added U+0080=GB+8432eb38 to my data.
|
||||
This is not official at this point!</li>
|
||||
<li>You should arrive at data like <a href="gbkuni30.txt">gbkuni30.txt</a>.
|
||||
This file has the following simplified format on each line:<br>
|
||||
<code>unicode (':' | '>') gb ['*']</code><br>
|
||||
The left column contains the Unicode code point, the right column the byte sequence in GB 18030.
|
||||
The delimiter is either a colon for roundtrip mappings or a greater-than sign
|
||||
for fallbacks from Unicode to the codepage.
|
||||
I have marked mappings of the appendix E characters with a star.</li>
|
||||
<li>Now compile <a href="gbmake4.c">gbmake4</a> and run it with the above file as stdin input.
|
||||
You will get as output all the four-byte mappings for all
|
||||
BMP code points that do not have a one-byte or two-byte mapping.</li>
|
||||
<li>All Unicode code points on the supplementary planes, U+10000-U+10ffff, are mapped as well.
|
||||
Their GB 18030 codes are four-byte sequences starting at GB+90308130.
|
||||
You can enumerate them lexically by keeping the second and fourth bytes
|
||||
between 0x30 and 0x39 and the third byte between 0x81 and 0xfe. For example:</li>
|
||||
<pre>
|
||||
U+10000=GB+90308130
|
||||
U+10001=GB+90308131
|
||||
U+10002=GB+90308132
|
||||
...
|
||||
U+1000a=GB+90308230
|
||||
U+1000b=GB+90308231
|
||||
...
|
||||
U+10ffff=GB+e3329a35
|
||||
</pre>
|
||||
You can calculate linear values and differences between GB 18030 four-byte sequences
|
||||
with <a href="lineargb.c">lineargb</a>.
|
||||
<li>Done! The result is a set of 0x110000 mappings!</li>
|
||||
<li>Of course, an economic implementation would handle the mappings for the
|
||||
supplementary planes algorithmically.
|
||||
Also, large parts of the BMP mappings are contiguous and can be
|
||||
handled similarly. For an ICU MBCS converter, U+fffe and U+ffff should
|
||||
in any case be special-cased because these values have special meaning in .cnv files.</li>
|
||||
<li>You can have gbmake4 generate a list of contiguous four-byte ranges in the BMP.
|
||||
Run it with the same input but specify "r" as an argument.
|
||||
Sort the output descending.
|
||||
Select the ranges that you deem useful, add the one including U+fffe and U+ffff.
|
||||
For example, see <a href="ranges.txt">ranges.txt</a>.</li>
|
||||
<li>If you concatenate gbkuni30.txt and your selected ranges including the
|
||||
"ranges" line in between, you can run this through gbmake4 again and
|
||||
get a mapping table without the code points in the ranges.</li>
|
||||
<li>For an ICU converter, turn your data into a .ucm file and
|
||||
add the header information.
|
||||
Keep the roundtrip/fallback information:
|
||||
roundtrip mappings (':') need a trailing "|0", fallback mappings ('>') a trailing "|1".
|
||||
You can use <a href="gbtoucm.c">gbtoucm</a>.</li>
|
||||
<li>Also for an ICU MBCS converter, you need to specify a state table for the codepage
|
||||
that describes its structure. For example, with the supplementary planes and the
|
||||
<a href="ranges.txt">suggested ranges</a> handled algorithmically and therefore
|
||||
declared as "unassigned", see this <a href="gbstates.txt">sample state table</a>.</li>
|
||||
<li>All valid four-byte codepage code points that do not map to
|
||||
any Unicode code point are of course unassigned.
|
||||
This includes 9012 sequences with a 0x84 lead byte and 9824 with a 0xe3 lead byte,
|
||||
as well as about 0.5 million with lead bytes 0x85..0x8f and 0xe4..0xfe.</li>
|
||||
</ol></p>
|
||||
|
||||
</body>
|
||||
</html>
|
24149
icu4c/source/tools/makeconv/gb18030/gbkuni30.txt
Normal file
24149
icu4c/source/tools/makeconv/gb18030/gbkuni30.txt
Normal file
File diff suppressed because it is too large
Load Diff
211
icu4c/source/tools/makeconv/gb18030/gbmake4.c
Normal file
211
icu4c/source/tools/makeconv/gb18030/gbmake4.c
Normal file
@ -0,0 +1,211 @@
|
||||
/*
|
||||
*******************************************************************************
|
||||
*
|
||||
* Copyright (C) 2000, International Business Machines
|
||||
* Corporation and others. All Rights Reserved.
|
||||
*
|
||||
*******************************************************************************
|
||||
* file name: gbmake4.c
|
||||
* encoding: US-ASCII
|
||||
* tab size: 8 (not used)
|
||||
* indentation:4
|
||||
*
|
||||
* created on: 2000oct19
|
||||
* created by: Markus W. Scherer
|
||||
*
|
||||
* This tool reads and processes codepage mapping files for GB 18030.
|
||||
* Its main function is to read a mapping table with the one- and two-byte
|
||||
* mappings of GB 18030 and to then output a mapping table with all of the
|
||||
* four-byte mappings for the BMP.
|
||||
* When an "r" argument is specified, it will instead write a list of
|
||||
* ranges of contiguous mappings where both Unicode code points and GB 18030
|
||||
* four-byte sequences form contiguous blocks.
|
||||
* This kind of output can be appended to a mapping table with a "ranges" line
|
||||
* in between, and the resulting output will exclude the input ranges.
|
||||
* This is useful for generating a partial mapping table and to handle the input
|
||||
* ranges algorithmically in conversion.
|
||||
*
|
||||
* To compile, just call a C compiler/linker with this source file.
|
||||
* On Windows: cl gbmake4.c
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
/* in the printed standard, U+303e is mismapped; this sequence must be skipped */
|
||||
static const unsigned char skip303eBytes[4]={ 0x81, 0x39, 0xa6, 0x34 };
|
||||
|
||||
/* array of flags for each Unicode BMP code point */
|
||||
static char
|
||||
flags[0x10000]={ 0 };
|
||||
/* flag values: 0: not assigned 1:one/two-byte sequence 2:four-byte sequence */
|
||||
|
||||
static void
|
||||
incFourGB18030(unsigned char bytes[4]) {
|
||||
if(bytes[3]<0x39) {
|
||||
++bytes[3];
|
||||
} else {
|
||||
bytes[3]=0x30;
|
||||
if(bytes[2]<0xfe) {
|
||||
++bytes[2];
|
||||
} else {
|
||||
bytes[2]=0x81;
|
||||
if(bytes[1]<0x39) {
|
||||
++bytes[1];
|
||||
} else {
|
||||
bytes[1]=0x30;
|
||||
++bytes[0];
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
static void
|
||||
incSkipFourGB18030(unsigned char bytes[4]) {
|
||||
incFourGB18030(bytes);
|
||||
if(0==memcmp(bytes, skip303eBytes, 4)) {
|
||||
/* make sure to skip the mismapped sequence */
|
||||
incFourGB18030(bytes);
|
||||
}
|
||||
}
|
||||
|
||||
static int
|
||||
readRanges() {
|
||||
char line[200];
|
||||
char *s, *end;
|
||||
unsigned long c1, c2;
|
||||
|
||||
/* parse the input file from stdin, in the format of gb18030markus2.txt */
|
||||
while(gets(line)!=NULL) {
|
||||
/* skip empty and comment lines */
|
||||
if(line[0]==0 || line[0]=='#') {
|
||||
continue;
|
||||
}
|
||||
|
||||
/* find the Unicode code point range */
|
||||
s=strstr(line, "U+");
|
||||
if(s==NULL) {
|
||||
fprintf(stderr, "error parsing range from \"%s\"\n", line);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* read range */
|
||||
s+=2;
|
||||
c1=strtoul(s, &end, 16);
|
||||
if(end==s || *end!='-') {
|
||||
fprintf(stderr, "error parsing range start from \"%s\"\n", line);
|
||||
return 1;
|
||||
}
|
||||
|
||||
s=end+1;
|
||||
c2=strtoul(s, &end, 16);
|
||||
if(end==s || *end!=' ' && *end!=0) {
|
||||
fprintf(stderr, "error parsing range end from \"%s\"\n", line);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* set the flags for all code points in this range */
|
||||
while(c1<=c2) {
|
||||
flags[c1++]=2;
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
extern int
|
||||
main(int argc, const char *argv[]) {
|
||||
char line[200];
|
||||
char *end;
|
||||
unsigned long c, b;
|
||||
unsigned char bytes[4]={ 0x81, 0x30, 0x81, 0x30 };
|
||||
|
||||
/* parse the input file from stdin, in the format of gb18030markus2.txt */
|
||||
while(gets(line)!=NULL) {
|
||||
/* skip empty and comment lines */
|
||||
if(line[0]==0 || line[0]=='#' || line[0]==0x1a) {
|
||||
continue;
|
||||
}
|
||||
|
||||
/* end of code points, beginning of ranges? */
|
||||
if(0==strcmp(line, "ranges")) {
|
||||
int result=readRanges();
|
||||
if(result!=0) {
|
||||
return result;
|
||||
}
|
||||
break;
|
||||
}
|
||||
|
||||
/* read Unicode code point */
|
||||
c=strtoul(line, &end, 16);
|
||||
if(end==line || *end!=':' && *end!='>') {
|
||||
fprintf(stderr, "error parsing code point from \"%s\"\n", line);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* ignore non-BMP code points */
|
||||
if(c>0xffff) {
|
||||
continue;
|
||||
}
|
||||
|
||||
/* read byte sequence as one long value */
|
||||
b=strtoul(end+1, &end, 16);
|
||||
if(*end!=0 && *end!='*') {
|
||||
fprintf(stderr, "error parsing byte sequence from \"%s\"\n", line);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* set the flag for the code point */
|
||||
flags[c]= b<=0xffff ? 1 : 2;
|
||||
}
|
||||
|
||||
if(argc<=1) {
|
||||
/* generate all four-byte sequences that are no already in the input */
|
||||
for(c=0x81; c<=0xffff; ++c) {
|
||||
if(flags[c]==0) {
|
||||
printf("%04lx:%02x%02x%02x%02x\n", c, bytes[0], bytes[1], bytes[2], bytes[3]);
|
||||
}
|
||||
if(flags[c]!=1) {
|
||||
incSkipFourGB18030(bytes);
|
||||
}
|
||||
}
|
||||
} else if(0==strcmp(argv[1], "r")) {
|
||||
/* generate ranges of contiguous code points with four-byte sequences for what is not covered by the input */
|
||||
unsigned char b1[4], b2[4];
|
||||
unsigned long c1, c2;
|
||||
|
||||
printf("ranges\n");
|
||||
for(c1=0x81; c1<=0xffff;) {
|
||||
/* get start bytes of range */
|
||||
memcpy(b1, bytes, 4);
|
||||
|
||||
/* look for the first non-range code point */
|
||||
for(c2=c1; c2<=0xffff && flags[c2]==0; ++c2) {
|
||||
/* save this sequence to avoid decrementing it after this loop */
|
||||
memcpy(b2, bytes, 4);
|
||||
/* increment the sequence for the next code point */
|
||||
incSkipFourGB18030(bytes);
|
||||
}
|
||||
/* c2 is the first code point after the range; b2 are the bytes for the last code point in the range */
|
||||
|
||||
/* print this range, number of codes first for easy sorting */
|
||||
printf("%06lx U+%04lx-%04lx GB+%02x%02x%02x%02x-%02x%02x%02x%02x\n",
|
||||
c2-c1, c1, c2-1,
|
||||
b1[0], b1[1], b1[2], b1[3],
|
||||
b2[0], b2[1], b2[2], b2[3]);
|
||||
|
||||
/* skip all assigned Unicode BMP code points */
|
||||
for(c1=c2; c1<=0xffff && flags[c1]!=0; ++c1) {
|
||||
if(flags[c1]==2) {
|
||||
incSkipFourGB18030(bytes);
|
||||
}
|
||||
}
|
||||
}
|
||||
} else {
|
||||
fprintf(stderr, "unknown mode argument \"%s\"\n", argv[1]);
|
||||
return 2;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
21
icu4c/source/tools/makeconv/gb18030/gbstates.txt
Normal file
21
icu4c/source/tools/makeconv/gb18030/gbstates.txt
Normal file
@ -0,0 +1,21 @@
|
||||
# ICU state information for the GB 18030 MBCS codepage
|
||||
# Note that the entire block for the supplementary Unicode planes is
|
||||
# marked unassigned because they are handled algorithmically.
|
||||
# Similarly, some of the BMP mappings are marked as unassigned for the same reason.
|
||||
|
||||
# Mostly assigned sequences, with branches in the lead bytes
|
||||
<icu:state> 0-80, 81:7, 82:8, 83:9, 84:a, 85-fe:4
|
||||
<icu:state> 30-39:2, 40-7e, 80-fe
|
||||
<icu:state> 81-fe:3
|
||||
<icu:state> 30-39
|
||||
|
||||
# All-unassigned 4-byte sequences
|
||||
<icu:state> 30-39:5, 40-7e, 80-fe
|
||||
<icu:state> 81-fe:6
|
||||
<icu:state> 30-39.u
|
||||
|
||||
# Some unassigned 4-byte sequences, one state for each of the lead bytes 81-84
|
||||
<icu:state> 30:2, 31-35:5, 36-39:2, 40-7e, 80-fe
|
||||
<icu:state> 30-35:2, 36-39:5, 40-7e, 80-fe
|
||||
<icu:state> 30-37:5, 38:2, 39:5, 40-7e, 80-fe
|
||||
<icu:state> 30:5, 31-32:2, 33-39:5, 40-7e, 80-fe
|
87
icu4c/source/tools/makeconv/gb18030/gbtoucm.c
Normal file
87
icu4c/source/tools/makeconv/gb18030/gbtoucm.c
Normal file
@ -0,0 +1,87 @@
|
||||
/*
|
||||
*******************************************************************************
|
||||
*
|
||||
* Copyright (C) 2000, International Business Machines
|
||||
* Corporation and others. All Rights Reserved.
|
||||
*
|
||||
*******************************************************************************
|
||||
* file name: gbtoucm.c
|
||||
* encoding: US-ASCII
|
||||
* tab size: 8 (not used)
|
||||
* indentation:4
|
||||
*
|
||||
* created on: 2000oct19
|
||||
* created by: Markus W. Scherer
|
||||
*
|
||||
* This tool reads a mapping table in a very simple format and turns it into
|
||||
* .ucm file format.
|
||||
* The input format is as follows:
|
||||
* unicode [':' | '>'] codepage ['*']
|
||||
* With
|
||||
* unicode = hexadecimal number 0..10ffff
|
||||
* codepage = hexadecimal number 0..ffffffff for big-endian bytes
|
||||
* ':' for roundtrip mappings
|
||||
* '>' for fallbacks from Unicode to codepage
|
||||
* '*' ignored
|
||||
*
|
||||
* To compile, just call a C compiler/linker with this source file.
|
||||
* On Windows: cl gbtoucm.c
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
#include <string.h>
|
||||
|
||||
extern int
|
||||
main(int argc, const char *argv[]) {
|
||||
char line[200];
|
||||
char *end;
|
||||
unsigned long c, b;
|
||||
unsigned char fallback;
|
||||
|
||||
/* parse the input file from stdin */
|
||||
while(gets(line)!=NULL) {
|
||||
/* pass through empty and comment lines */
|
||||
if(line[0]==0 || line[0]=='#' || line[0]==0x1a) {
|
||||
puts(line);
|
||||
continue;
|
||||
}
|
||||
|
||||
/* end of code points, beginning of ranges? */
|
||||
if(0==strcmp(line, "ranges")) {
|
||||
break; /* ignore the rest of the file */
|
||||
}
|
||||
|
||||
/* read Unicode code point */
|
||||
c=strtoul(line, &end, 16);
|
||||
if(end==line || *end!=':' && *end!='>') {
|
||||
fprintf(stderr, "error parsing code point from \"%s\"\n", line);
|
||||
return 1;
|
||||
}
|
||||
if(*end==':') {
|
||||
fallback=0;
|
||||
} else {
|
||||
fallback=1;
|
||||
}
|
||||
|
||||
/* read byte sequence as one long value */
|
||||
b=strtoul(end+1, &end, 16);
|
||||
if(*end!=0 && *end!='*') {
|
||||
fprintf(stderr, "error parsing byte sequence from \"%s\"\n", line);
|
||||
return 1;
|
||||
}
|
||||
|
||||
/* output in .ucm format */
|
||||
if(b<=0xff) {
|
||||
printf("<U%04lx> \\x%02x |%u\n", c, b, fallback);
|
||||
} else if(b<=0xffff) {
|
||||
printf("<U%04lx> \\x%02x\\x%02x |%u\n", c, b>>8, b&0xff, fallback);
|
||||
} else if(b<=0xffffff) {
|
||||
printf("<U%04lx> \\x%02x\\x%02x\\x%02x |%u\n", c, b>>16, (b>>8)&0xff, b&0xff, fallback);
|
||||
} else {
|
||||
printf("<U%04lx> \\x%02x\\x%02x\\x%02x\\x%02x |%u\n", c, b>>24, (b>>16)&0xff, (b>>8)&0xff, b&0xff, fallback);
|
||||
}
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
70
icu4c/source/tools/makeconv/gb18030/lineargb.c
Normal file
70
icu4c/source/tools/makeconv/gb18030/lineargb.c
Normal file
@ -0,0 +1,70 @@
|
||||
/*
|
||||
*******************************************************************************
|
||||
*
|
||||
* Copyright (C) 2000, International Business Machines
|
||||
* Corporation and others. All Rights Reserved.
|
||||
*
|
||||
*******************************************************************************
|
||||
* file name: lineargb.c
|
||||
* encoding: US-ASCII
|
||||
* tab size: 8 (not used)
|
||||
* indentation:4
|
||||
*
|
||||
* created on: 2000oct03
|
||||
* created by: Markus W. Scherer
|
||||
*
|
||||
* This tool operates on 4-byte GB 18030 codepage sequences. It can
|
||||
* - calculate the linear value of such a sequence, with the lowest one,
|
||||
* 81 30 81 30, getting value 0
|
||||
* - calculate the linear difference between two sequences
|
||||
* - calculate a sequence that is linearly offset from another
|
||||
*
|
||||
* To compile, just call a C compiler/linker with this source file.
|
||||
* On Windows: cl lineargb.c
|
||||
*/
|
||||
|
||||
#include <stdio.h>
|
||||
#include <stdlib.h>
|
||||
|
||||
#define LINEAR_18030(a, b, c, d) ((((a)*10+(b))*126L+(c))*10L+(d))
|
||||
#define LINEAR_18030_BASE LINEAR_18030(0x81, 0x30, 0x81, 0x30)
|
||||
|
||||
static long
|
||||
getLinear(const char *argv[]) {
|
||||
unsigned int a, b, c, d;
|
||||
|
||||
a=(unsigned int)strtoul(argv[0], NULL, 16);
|
||||
b=(unsigned int)strtoul(argv[1], NULL, 16);
|
||||
c=(unsigned int)strtoul(argv[2], NULL, 16);
|
||||
d=(unsigned int)strtoul(argv[3], NULL, 16);
|
||||
|
||||
return LINEAR_18030(a, b, c, d);
|
||||
}
|
||||
|
||||
extern int
|
||||
main(int argc, const char *argv[]) {
|
||||
if(argc==5) {
|
||||
printf("Linear value: %ld\n", getLinear(argv+1)-LINEAR_18030_BASE);
|
||||
return 0;
|
||||
} else if(argc==6) {
|
||||
int a, b, c, d;
|
||||
long linear=getLinear(argv+1)-LINEAR_18030_BASE+strtoul(argv[5], NULL, 0);
|
||||
d=(int)(0x30+linear%10); linear/=10;
|
||||
c=(int)(0x81+linear%126); linear/=126;
|
||||
b=(int)(0x30+linear%10); linear/=10;
|
||||
a=(int)(0x81+linear);
|
||||
printf("Offset byte sequence: 0x%02x 0x%02x 0x%02x 0x%02x\n",
|
||||
a, b, c, d);
|
||||
return 0;
|
||||
} else if(argc==9) {
|
||||
printf("Linear difference: %ld\n", getLinear(argv+5)-getLinear(argv+1));
|
||||
return 0;
|
||||
} else {
|
||||
printf("Usage: %s a b c d [offset | e f g h] calculates with hexadecimal GB 18030 byte values.\n"
|
||||
"Just one sequence: prints linear value.\n"
|
||||
"Two sequences: prints the linear difference.\n"
|
||||
"One sequence and an offset (decimal or with 0x): prints offset byte sequence\n",
|
||||
argv[0]);
|
||||
return 1;
|
||||
}
|
||||
}
|
13
icu4c/source/tools/makeconv/gb18030/ranges.txt
Normal file
13
icu4c/source/tools/makeconv/gb18030/ranges.txt
Normal file
@ -0,0 +1,13 @@
|
||||
ranges
|
||||
00405a U+9fa6-dfff GB+82358f34-83389837
|
||||
001bbe U+0452-200f GB+8130d239-8136a530
|
||||
0010c7 U+e865-f92b GB+83389838-8431cc32
|
||||
00083e U+2643-2e80 GB+8137a838-8138fd37
|
||||
000406 U+fa2a-fe2f GB+8431e336-8432cc35
|
||||
000375 U+3ce1-4055 GB+8231d439-8232af33
|
||||
0002fd U+361b-3917 GB+8230a634-8230f238
|
||||
0002bf U+49b8-4c76 GB+8234a132-8234e734
|
||||
0001d7 U+4160-4336 GB+8232c938-8232f838
|
||||
0001b9 U+478e-4946 GB+8233e839-82349639
|
||||
000175 U+44d7-464b GB+8233a430-8233c932
|
||||
00001a U+ffe6-ffff GB+8432e932-8432eb37
|
Loading…
Reference in New Issue
Block a user