glibc/locale/programs/charmap.c
Ulrich Drepper 601d294296 Update.
2001-06-04  Bruno Haible  <haible@clisp.cons.org>

	* iconv/loop.c (UNICODE_TAG_HANDLER): New macro.
	* iconv/gconv_simple.c (__gconv_transform_internal_ascii): Invoke
	UNICODE_TAG_HANDLER.
	(__gconv_transform_internal_ucs2): Likewise.
	(__gconv_transform_internal_ucs2reverse): Likewise.
	* iconvdata/8bit-gap.c (BODY for TO_LOOP): Invoke UNICODE_TAG_HANDLER.
	* iconvdata/8bit-generic.c (BODY for TO_LOOP): Likewise.
	* iconvdata/ansi_x3.110.c (BODY for TO_LOOP): Likewise.
	* iconvdata/big5.c (BODY for TO_LOOP): Likewise.
	* iconvdata/big5hkscs.c (BODY for TO_LOOP): Likewise.
	* iconvdata/cp1255.c (BODY for TO_LOOP): Likewise.
	* iconvdata/cp1258.c (BODY for TO_LOOP): Likewise.
	* iconvdata/euc-cn.c (BODY for TO_LOOP): Likewise.
	* iconvdata/euc-jp.c (BODY for TO_LOOP): Likewise.
	* iconvdata/euc-kr.c (BODY for TO_LOOP): Likewise.
	* iconvdata/euc-tw.c (BODY for TO_LOOP): Likewise.
	* iconvdata/gbk.c (BODY for TO_LOOP): Likewise.
	* iconvdata/ibm930.c (BODY for TO_LOOP): Likewise.
	* iconvdata/ibm932.c (BODY for TO_LOOP): Likewise.
	* iconvdata/ibm933.c (BODY for TO_LOOP): Likewise.
	* iconvdata/ibm935.c (BODY for TO_LOOP): Likewise.
	* iconvdata/ibm937.c (BODY for TO_LOOP): Likewise.
	* iconvdata/ibm939.c (BODY for TO_LOOP): Likewise.
	* iconvdata/ibm943.c (BODY for TO_LOOP): Likewise.
	* iconvdata/iso646.c (BODY for TO_LOOP): Likewise.
	* iconvdata/iso8859-1.c (BODY for TO_LOOP): Likewise.
	* iconvdata/iso_6937.c (BODY for TO_LOOP): Likewise.
	* iconvdata/iso_6937-2.c (BODY for TO_LOOP): Likewise.
	* iconvdata/iso-2022-cn.c (BODY for TO_LOOP): Likewise.
	* iconvdata/iso-2022-cn-ext.c (BODY for TO_LOOP): Likewise.
	* iconvdata/iso-2022-kr.c (BODY for TO_LOOP): Likewise.
	* iconvdata/johab.c (BODY for TO_LOOP): Likewise.
	* iconvdata/sjis.c (BODY for TO_LOOP): Likewise.
	* iconvdata/t.61.c (BODY for TO_LOOP): Likewise.
	* iconvdata/uhc.c (BODY for TO_LOOP): Likewise.
	* iconvdata/unicode.c (BODY for TO_LOOP): Likewise.
	* iconvdata/iso-2022-jp.c (TAG_none, TAG_language, TAG_language_j,
	TAG_language_ja, TAG_language_k, TAG_language_ko, TAG_language_z,
	TAG_language_zh, CURRENT_TAG_MASK): New enum values.
	(EMIT_SHIFT_TO_INIT): Don't emit an escape sequence if ASCII_set
	is already selected but set2 or tag are set.
	(conversion): New enum type.
	(cvlist_t): New type.
	(CVLIST, CVLIST_FIRST, CVLIST_REST): New macros.
	(conversion_lists): New array.
	(BODY for TO_LOOP): Keep track of Unicode 3.1 language tag. If "ja",
	prefer conversion to Japanese character sets. If "zh", prefer
	conversion to GB2312. If "ko", prefer conversion to KSC5601. Small
	optimizations.
	(INIT_PARAMS): Add tag.
	(UPDATE_PARAMS): Add tag.

2001-06-04  Bruno Haible  <haible@clisp.cons.org>

	* locale/programs/locfile.c (write_locale_data): Before creat(),
	unlink the file, to avoid crashing the processes that mmap it.  Change
	a double slash to a single slash.  Free fname in case of error return.

2001-06-02  Jakub Jelinek  <jakub@redhat.com>

	* sysdeps/i386/fpu/s_frexpl.S (__frexpl): Mostly revert 2000-12-03
	changes, do the special handling for denormal numbers, not for
	normalized numbers (patch by <trevin@xmission.com>).

	* math/test-misc.c (main): Test frexpl with denormal arguments.

2001-06-04  Jakub Jelinek  <jakub@redhat.com>

	* math/libm-test.inc (llround_test): Add two new llround tests.
	* sysdeps/ieee754/ldbl-96/s_llroundl.c (__llroundl): Don't allow
	overflow when rounding away from zero.

2001-06-04  Jakub Jelinek  <jakub@redhat.com>

	* math/Makefile (libm-calls): Add e_log2, w_log2, remove s_log2.
	* math/math_private.h (__ieee754_log2, __ieee754_log2f,
	__ieee754_log2l): New prototypes.
	* sysdeps/generic/w_log2.c: New file.
	* sysdeps/generic/w_log2f.c: New file.
	* sysdeps/generic/w_log2l.c: New file.
	* sysdeps/generic/s_log2l.c: Move...
	* sysdeps/generic/e_log2l.c: ...to here. Rename to __ieee754_log2l.
	* sysdeps/ieee754/k_standard.c (__kernel_standard): Handle log2(0)
	and log2(x < 0).
	* sysdeps/i386/fpu/s_log2.S: Move...
	* sysdeps/i386/fpu/e_log2.S: ...to here. Rename to __ieee754_log2.
	* sysdeps/i386/fpu/s_log2f.S: Move...
	* sysdeps/i386/fpu/e_log2f.S: ...to here. Rename to __ieee754_log2f.
	* sysdeps/i386/fpu/s_log2l.S: Move...
	* sysdeps/i386/fpu/e_log2l.S: ...to here. Rename to __ieee754_log2l.
	* sysdeps/m68k/fpu/s_log2.S: Move...
	* sysdeps/m68k/fpu/e_log2.S: ...to here. Rename to __ieee754_log2.
	* sysdeps/m68k/fpu/s_log2f.S: Move...
	* sysdeps/m68k/fpu/e_log2f.S: ...to here. Rename to __ieee754_log2f.
	* sysdeps/m68k/fpu/s_log2l.S: Move...
	* sysdeps/m68k/fpu/e_log2l.S: ...to here. Rename to __ieee754_log2l.
	* sysdeps/ieee754/dbl-64/s_log2.c: Move...
	* sysdeps/ieee754/dbl-64/e_log2.c: ...to here. Rename to
	__ieee754_log2.
	* sysdeps/ieee754/flt-32/s_log2f.c: Move...
	* sysdeps/ieee754/flt-32/e_log2f.c: ...to here. Rename to
	__ieee754_log2f.

2001-06-04  Jakub Jelinek  <jakub@redhat.com>

	* sysdeps/generic/w_exp2.c (u_threshold): Lower threshold so that
	even arguments which result in denormalized exp2 are accepted.
	(__exp2): Arguments equal to u_threshold already result into
	underflow.
	* sysdeps/generic/w_exp2f.c (u_threshold, __exp2f): Likewise.
	* sysdeps/generic/w_exp2l.c (u_threshold, __exp2l): Likewise.
	* sysdeps/ieee754/dbl-64/e_exp2.c (__ieee754_exp2): Lomark was too
	low, with corrected lowmark use greaterequal, not greater.
	* sysdeps/ieee754/flt-32/e_exp2f.c (__ieee754_exp2f): Likewise.

2001-06-04  Jakub Jelinek  <jakub@redhat.com>

	* math/libm-test.inc (ilogb_test): Test that ilogb(+-Inf) == INT_MAX.
	* sysdeps/i386/fpu/s_ilogb.S (__ilogb): Return INT_MAX for +-Inf.
	* sysdeps/i386/fpu/s_ilogbf.S (__ilogbf): Likewise.
	* sysdeps/i386/fpu/s_ilogbl.S (__ilogbl): Likewise.
	* sysdeps/ieee754/dbl-64/s_ilogb.c (__ilogb): Likewise.
	* sysdeps/ieee754/flt-32/s_ilogbf.c (__ilogbf): Likewise.
	* sysdeps/ieee754/ldbl-128/s_ilogbl.c (__ilogbl): Likewise.
	* sysdeps/ieee754/ldbl-96/s_ilogbl.c (__ilogbl): Likewise.

2001-06-04  Jakub Jelinek  <jakub@redhat.com>

	* sysdeps/generic/w_coshl.c (__coshl): Test if finite argument
	gave non-finite result instead of using constant in generic
	version.
	* sysdeps/generic/w_coshf.c (__coshf): Likewise.
	* sysdeps/generic/w_cosh.c (__cosh): Likewise.
	* sysdeps/generic/w_exp10.c (o_threshold, u_threshold): Remove.
	(__exp10): Test if finite argument gave non-finite result.
	* sysdeps/generic/w_exp10f.c (o_threshold, u_threshold, __exp10f):
	Likewise.
	* sysdeps/generic/w_exp10l.c (o_threshold, u_threshold, __exp10l):
	Likewise.
2001-06-04  Jakub Jelinek  <jakub@redhat.com>

	* sysdeps/ieee754/ldbl-96/e_coshl.c (__ieee754_coshl): Fix
	overflow threshold constant (log(LDBL_MAX)+M_LN2l).

2001-05-29  Bruno Haible  <haible@clisp.cons.org>

	* locale/programs/ld-ctype.c (idx_table): New struct type.
	(idx_table_init, idx_table_get, idx_table_add): New functions.
	(MAX_CHARNAMES_IDX): Remove macro.
	(locale_ctype_t): Change type of charnames_idx field.
	(ctype_startup): Change initialization of charnames_idx field.
	(find_idx): Use idx_table_get and idx_table_add for speed.

	* locale/programs/charmap.c (charmap_new_char): Fix ucs4 value
	computation of characters in a range.

2001-05-29  Bruno Haible  <haible@clisp.cons.org>

	* iconvdata/gb18030.c (__fourbyte_to_ucs1): Add mappings for <U03F4>,
	<U03F5>.
	(__ucs_to_gb18030_tab1): Likewise.
	(BODY for FROM_LOOP): Add mapping for <U00010000>..<U0010FFFF>.
	(BODY for TO_LOOP): Likewise.
	* iconvdata/tst-table-charmap.sh: Update for charmaps containing
	<U00xxxxxx> syntax.
	* iconvdata/tst-table-from.c (bmp_only): New variable.
	(utf8_decode): If bmp_only, don't return characters outside Unicode
	plane 0.
	(main): When testing UTF-8 or GB18030, set bmp_only to 1. Don't print
	a conversion line if utf8_decode returns NULL.
	* iconvdata/tst-table-to.c (main): When testing encodings other than
	UTF-8 and GB18030, loop upto U+30000 instead of U+10000. Use UTF-8
	instead of UCS-2 as input.
	* iconvdata/tst-table.sh: For GB18030, use only the part < 0x10000
	of the charmap.

2001-05-29  Bruno Haible  <haible@clisp.cons.org>

	* iconvdata/cns11643l1.c: Update to Unicode 3.1.
	(__cns11643l1_to_ucs4_tab): Regenerated.
	(__cns11643l1_from_ucs4_tab12): Regenerated.
	* iconvdata/cns11643.c: Update to Unicode 3.1.
	(__cns11643l14_to_ucs4_tab): Remove array.
	(__cns11643l3_to_ucs4_tab, __cns11643l4_to_ucs4_tab,
	__cns11643l5_to_ucs4_tab, __cns11643l6_to_ucs4_tab,
	__cns11643l7_to_ucs4_tab, __cns11643l15_to_ucs4_tab): New arrays.
	(__cns11643_from_ucs4p0_tab): Renamed from __cns11643_from_ucs4_tab.
	(__cns11643_from_ucs4p2_tab): New array.
	* iconvdata/cns11643.h (__cns11643l14_to_ucs4_tab): Remove declaration.
	(__cns11643l3_to_ucs4_tab, __cns11643l4_to_ucs4_tab,
	__cns11643l5_to_ucs4_tab, __cns11643l6_to_ucs4_tab,
	__cns11643l7_to_ucs4_tab, __cns11643l15_to_ucs4_tab): New declarations.
	(cns11643_to_ucs4): Treat planes 3, 4, 5, 6, 7, 15 instead of 14.
	(__cns11643_from_ucs4_tab): Remove declaration.
	(__cns11643_from_ucs4p0_tab, __cns11643_from_ucs4p2_tab): New
	declarations.
	(ucs4_to_cns11643): Update for new arrays. Treat U+3400..U+4DFF and
	U+20000..U+2A6D6.
	* iconvdata/cns11643l2.h (__cns11643_from_ucs4_tab): Remove
	declaration.
	(__cns11643_from_ucs4p0_tab): New declaration.
	(ucs4_to_cns11643l2): Update for new arrays.
	* iconvdata/iso-2022-cn-ext.c (BODY for FROM_LOOP): Handle planes
	3 to 7.
	(BODY for TO_LOOP): Handle planes 3 to 7, instead of plane 14.
	* iconvdata/EUC-TW.irreversible: New file.
	* iconvdata/tst-table.sh: Use it.
	* iconvdata/Makefile (distribute): Add CP1255.irreversible,
	CP1258.irreversible, EUC-TW.irreversible.

2001-05-29  Bruno Haible  <haible@clisp.cons.org>

	* locale/C-translit.h.in: Add transliterations for new Unicode 3.1
	mathematical symbols.
2001-06-06 12:55:46 +00:00

1086 lines
28 KiB
C

/* Copyright (C) 1996,1998,1999,2000,2001 Free Software Foundation, Inc.
This file is part of the GNU C Library.
Contributed by Ulrich Drepper <drepper@gnu.org>, 1996.
The GNU C Library is free software; you can redistribute it and/or
modify it under the terms of the GNU Library General Public License as
published by the Free Software Foundation; either version 2 of the
License, or (at your option) any later version.
The GNU C Library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Library General Public License for more details.
You should have received a copy of the GNU Library General Public
License along with the GNU C Library; see the file COPYING.LIB. If not,
write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330,
Boston, MA 02111-1307, USA. */
#ifdef HAVE_CONFIG_H
# include <config.h>
#endif
#include <ctype.h>
#include <errno.h>
#include <libintl.h>
#include <limits.h>
#include <obstack.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "error.h"
#include "linereader.h"
#include "charmap.h"
#include "charmap-dir.h"
#include "repertoire.h"
#include <assert.h>
/* Define the lookup function. */
#include "charmap-kw.h"
extern void *xmalloc (size_t __n);
/* Prototypes for local functions. */
static struct charmap_t *parse_charmap (struct linereader *cmfile,
int verbose, int be_quiet);
static void new_width (struct linereader *cmfile, struct charmap_t *result,
const char *from, const char *to,
unsigned long int width);
static void charmap_new_char (struct linereader *lr, struct charmap_t *cm,
int nbytes, char *bytes, const char *from,
const char *to, int decimal_ellipsis, int step);
#ifdef NEED_NULL_POINTER
static const char *null_pointer;
#endif
static struct linereader *
cmlr_open (const char *directory, const char *name, kw_hash_fct_t hf)
{
FILE *fp;
fp = charmap_open (directory, name);
if (fp == NULL)
return NULL;
else
{
size_t dlen = strlen (directory);
int add_slash = (dlen == 0 || directory[dlen - 1] != '/');
size_t nlen = strlen (name);
char *pathname;
char *p;
pathname = alloca (dlen + add_slash + nlen + 1);
p = stpcpy (pathname, directory);
if (add_slash)
*p++ = '/';
stpcpy (p, name);
return lr_create (fp, pathname, hf);
}
}
struct charmap_t *
charmap_read (const char *filename, int verbose, int be_quiet, int use_default)
{
struct charmap_t *result = NULL;
if (filename != NULL)
{
struct linereader *cmfile;
/* First try the name as found in the parameter. */
cmfile = lr_open (filename, charmap_hash);
if (cmfile == NULL)
{
/* No successful. So start looking through the directories
in the I18NPATH if this is a simple name. */
if (strchr (filename, '/') == NULL)
{
char *i18npath = getenv ("I18NPATH");
if (i18npath != NULL && *i18npath != '\0')
{
char path[strlen (i18npath) + sizeof ("/charmaps")];
char *next;
i18npath = strdupa (i18npath);
while (cmfile == NULL
&& (next = strsep (&i18npath, ":")) != NULL)
{
stpcpy (stpcpy (path, next), "/charmaps");
cmfile = cmlr_open (path, filename, charmap_hash);
if (cmfile == NULL)
{
/* Try without the "/charmaps" part. */
cmfile = cmlr_open (next, filename, charmap_hash);
}
}
}
if (cmfile == NULL)
{
/* Try the default directory. */
cmfile = cmlr_open (CHARMAP_PATH, filename, charmap_hash);
}
}
}
if (cmfile != NULL)
{
result = parse_charmap (cmfile, verbose, be_quiet);
if (result == NULL && !be_quiet)
error (0, errno, _("character map file `%s' not found"), filename);
}
}
if (result == NULL && filename != NULL && strchr (filename, '/') == NULL)
{
/* OK, one more try. We also accept the names given to the
character sets in the files. Sometimes they differ from the
file name. */
CHARMAP_DIR *dir;
dir = charmap_opendir (CHARMAP_PATH);
if (dir != NULL)
{
const char *dirent;
while ((dirent = charmap_readdir (dir)) != NULL)
{
char **aliases;
char **p;
int found;
aliases = charmap_aliases (CHARMAP_PATH, dirent);
found = 0;
for (p = aliases; *p; p++)
if (strcasecmp (*p, filename) == 0)
{
found = 1;
break;
}
charmap_free_aliases (aliases);
if (found)
{
struct linereader *cmfile;
cmfile = cmlr_open (CHARMAP_PATH, dirent, charmap_hash);
if (cmfile != NULL)
result = parse_charmap (cmfile, verbose, be_quiet);
break;
}
}
charmap_closedir (dir);
}
}
if (result == NULL && DEFAULT_CHARMAP != NULL)
{
struct linereader *cmfile;
cmfile = cmlr_open (CHARMAP_PATH, DEFAULT_CHARMAP, charmap_hash);
if (cmfile != NULL)
result = parse_charmap (cmfile, verbose, be_quiet);
if (result == NULL)
error (4, errno, _("default character map file `%s' not found"),
DEFAULT_CHARMAP);
}
/* Test of ASCII compatibility of locale encoding.
Verify that the encoding to be used in a locale is ASCII compatible,
at least for the graphic characters, excluding the control characters,
'$' and '@'. This constraint comes from an ISO C 99 restriction.
ISO C 99 section 7.17.(2) (about wchar_t):
the null character shall have the code value zero and each member of
the basic character set shall have a code value equal to its value
when used as the lone character in an integer character constant.
ISO C 99 section 5.2.1.(3):
Both the basic source and basic execution character sets shall have
the following members: the 26 uppercase letters of the Latin alphabet
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
the 26 lowercase letters of the Latin alphabet
a b c d e f g h i j k l m n o p q r s t u v w x y z
the 10 decimal digits
0 1 2 3 4 5 6 7 8 9
the following 29 graphic characters
! " # % & ' ( ) * + , - . / : ; < = > ? [ \ ] ^ _ { | } ~
the space character, and control characters representing horizontal
tab, vertical tab, and form feed.
Therefore, for all members of the "basic character set", the 'char' code
must have the same value as the 'wchar_t' code, which in glibc is the
same as the Unicode code, which for all of the enumerated characters
is identical to the ASCII code. */
if (result != NULL && use_default)
{
static const char basic_charset[] =
{
'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
'0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
'!', '"', '#', '%', '&', '\'', '(', ')', '*', '+', ',', '-',
'.', '/', ':', ';', '<', '=', '>', '?', '[', '\\', ']', '^',
'_', '{', '|', '}', '~', ' ', '\t', '\v', '\f', '\0'
};
int failed = 0;
const char *p = basic_charset;
do
{
struct charseq * seq = charmap_find_symbol (result, p, 1);
if (seq == NULL || seq->ucs4 != *p)
failed = 1;
}
while (*p++ != '\0');
if (failed)
fprintf (stderr, _("\
character map `%s' is not ASCII compatible, locale not ISO C compliant\n"),
result->code_set_name);
}
return result;
}
static struct charmap_t *
parse_charmap (struct linereader *cmfile, int verbose, int be_quiet)
{
struct charmap_t *result;
int state;
enum token_t expected_tok = tok_error;
const char *expected_str = NULL;
char *from_name = NULL;
char *to_name = NULL;
enum token_t ellipsis = 0;
int step = 1;
/* We don't want symbolic names in string to be translated. */
cmfile->translate_strings = 0;
/* Allocate room for result. */
result = (struct charmap_t *) xmalloc (sizeof (struct charmap_t));
memset (result, '\0', sizeof (struct charmap_t));
/* The default DEFAULT_WIDTH is 1. */
result->width_default = 1;
#define obstack_chunk_alloc malloc
#define obstack_chunk_free free
obstack_init (&result->mem_pool);
if (init_hash (&result->char_table, 256)
|| init_hash (&result->byte_table, 256))
{
free (result);
return NULL;
}
/* We use a state machine to describe the charmap description file
format. */
state = 1;
while (1)
{
/* What's on? */
struct token *now = lr_token (cmfile, NULL, NULL, verbose);
enum token_t nowtok = now->tok;
struct token *arg;
if (nowtok == tok_eof)
break;
switch (state)
{
case 1:
/* The beginning. We expect the special declarations, EOL or
`CHARMAP'. */
if (nowtok == tok_eol)
/* Ignore empty lines. */
continue;
if (nowtok == tok_charmap)
{
from_name = NULL;
to_name = NULL;
/* We have to set up the real work. Fill in some
default values. */
if (result->mb_cur_max == 0)
result->mb_cur_max = 1;
if (result->mb_cur_min == 0)
result->mb_cur_min = result->mb_cur_max;
if (result->mb_cur_min > result->mb_cur_max)
{
if (!be_quiet)
error (0, 0, _("\
%s: <mb_cur_max> must be greater than <mb_cur_min>\n"),
cmfile->fname);
result->mb_cur_min = result->mb_cur_max;
}
lr_ignore_rest (cmfile, 1);
state = 2;
continue;
}
if (nowtok != tok_code_set_name && nowtok != tok_mb_cur_max
&& nowtok != tok_mb_cur_min && nowtok != tok_escape_char
&& nowtok != tok_comment_char && nowtok != tok_g0esc
&& nowtok != tok_g1esc && nowtok != tok_g2esc
&& nowtok != tok_g3esc && nowtok != tok_repertoiremap
&& nowtok != tok_include)
{
lr_error (cmfile, _("syntax error in prolog: %s"),
_("invalid definition"));
lr_ignore_rest (cmfile, 0);
continue;
}
/* We know that we need an argument. */
arg = lr_token (cmfile, NULL, NULL, verbose);
switch (nowtok)
{
case tok_code_set_name:
case tok_repertoiremap:
if (arg->tok != tok_ident && arg->tok != tok_string)
{
badarg:
lr_error (cmfile, _("syntax error in prolog: %s"),
_("bad argument"));
lr_ignore_rest (cmfile, 0);
continue;
}
if (nowtok == tok_code_set_name)
result->code_set_name = obstack_copy0 (&result->mem_pool,
arg->val.str.startmb,
arg->val.str.lenmb);
else
result->repertoiremap = obstack_copy0 (&result->mem_pool,
arg->val.str.startmb,
arg->val.str.lenmb);
lr_ignore_rest (cmfile, 1);
continue;
case tok_mb_cur_max:
case tok_mb_cur_min:
if (arg->tok != tok_number)
goto badarg;
if (verbose
&& ((nowtok == tok_mb_cur_max
&& result->mb_cur_max != 0)
|| (nowtok == tok_mb_cur_max
&& result->mb_cur_max != 0)))
lr_error (cmfile, _("duplicate definition of <%s>"),
nowtok == tok_mb_cur_min
? "mb_cur_min" : "mb_cur_max");
if (arg->val.num < 1)
{
lr_error (cmfile,
_("value for <%s> must be 1 or greater"),
nowtok == tok_mb_cur_min
? "mb_cur_min" : "mb_cur_max");
lr_ignore_rest (cmfile, 0);
continue;
}
if ((nowtok == tok_mb_cur_max && result->mb_cur_min != 0
&& (int) arg->val.num < result->mb_cur_min)
|| (nowtok == tok_mb_cur_min && result->mb_cur_max != 0
&& (int) arg->val.num > result->mb_cur_max))
{
lr_error (cmfile, _("\
value of <%s> must be greater or equal than the value of <%s>"),
"mb_cur_max", "mb_cur_min");
lr_ignore_rest (cmfile, 0);
continue;
}
if (nowtok == tok_mb_cur_max)
result->mb_cur_max = arg->val.num;
else
result->mb_cur_min = arg->val.num;
lr_ignore_rest (cmfile, 1);
continue;
case tok_escape_char:
case tok_comment_char:
if (arg->tok != tok_ident)
goto badarg;
if (arg->val.str.lenmb != 1)
{
lr_error (cmfile, _("\
argument to <%s> must be a single character"),
nowtok == tok_escape_char ? "escape_char"
: "comment_char");
lr_ignore_rest (cmfile, 0);
continue;
}
if (nowtok == tok_escape_char)
cmfile->escape_char = *arg->val.str.startmb;
else
cmfile->comment_char = *arg->val.str.startmb;
lr_ignore_rest (cmfile, 1);
continue;
case tok_g0esc:
case tok_g1esc:
case tok_g2esc:
case tok_g3esc:
case tok_escseq:
lr_ignore_rest (cmfile, 0); /* XXX */
continue;
case tok_include:
lr_error (cmfile, _("\
character sets with locking states are not supported"));
exit (4);
default:
/* Cannot happen. */
assert (! "Should not happen");
}
break;
case 2:
/* We have seen `CHARMAP' and now are in the body. Each line
must have the format "%s %s %s\n" or "%s...%s %s %s\n". */
if (nowtok == tok_eol)
/* Ignore empty lines. */
continue;
if (nowtok == tok_end)
{
expected_tok = tok_charmap;
expected_str = "CHARMAP";
state = 90;
continue;
}
if (nowtok != tok_bsymbol && nowtok != tok_ucs4)
{
lr_error (cmfile, _("syntax error in %s definition: %s"),
"CHARMAP", _("no symbolic name given"));
lr_ignore_rest (cmfile, 0);
continue;
}
/* If the previous line was not completely correct free the
used memory. */
if (from_name != NULL)
obstack_free (&result->mem_pool, from_name);
if (nowtok == tok_bsymbol)
from_name = (char *) obstack_copy0 (&result->mem_pool,
now->val.str.startmb,
now->val.str.lenmb);
else
{
obstack_printf (&result->mem_pool, "U%08X",
cmfile->token.val.ucs4);
obstack_1grow (&result->mem_pool, '\0');
from_name = (char *) obstack_finish (&result->mem_pool);
}
to_name = NULL;
state = 3;
continue;
case 3:
/* We have two possibilities: We can see an ellipsis or an
encoding value. */
if (nowtok == tok_ellipsis3 || nowtok == tok_ellipsis4
|| nowtok == tok_ellipsis2 || nowtok == tok_ellipsis4_2
|| nowtok == tok_ellipsis2_2)
{
ellipsis = nowtok;
if (nowtok == tok_ellipsis4_2)
{
step = 2;
nowtok = tok_ellipsis4;
}
else if (nowtok == tok_ellipsis2_2)
{
step = 2;
nowtok = tok_ellipsis2;
}
state = 4;
continue;
}
/* FALLTHROUGH */
case 5:
if (nowtok != tok_charcode)
{
lr_error (cmfile, _("syntax error in %s definition: %s"),
"CHARMAP", _("invalid encoding given"));
lr_ignore_rest (cmfile, 0);
state = 2;
continue;
}
if (now->val.charcode.nbytes < result->mb_cur_min)
lr_error (cmfile, _("too few bytes in character encoding"));
else if (now->val.charcode.nbytes > result->mb_cur_max)
lr_error (cmfile, _("too many bytes in character encoding"));
else
charmap_new_char (cmfile, result, now->val.charcode.nbytes,
now->val.charcode.bytes, from_name, to_name,
ellipsis != tok_ellipsis2, step);
/* Ignore trailing comment silently. */
lr_ignore_rest (cmfile, 0);
from_name = NULL;
to_name = NULL;
ellipsis = tok_none;
step = 1;
state = 2;
continue;
case 4:
if (nowtok != tok_bsymbol && nowtok != tok_ucs4)
{
lr_error (cmfile, _("syntax error in %s definition: %s"),
"CHARMAP",
_("no symbolic name given for end of range"));
lr_ignore_rest (cmfile, 0);
continue;
}
/* Copy the to-name in a safe place. */
if (nowtok == tok_bsymbol)
to_name = (char *) obstack_copy0 (&result->mem_pool,
cmfile->token.val.str.startmb,
cmfile->token.val.str.lenmb);
else
{
obstack_printf (&result->mem_pool, "U%08X",
cmfile->token.val.ucs4);
obstack_1grow (&result->mem_pool, '\0');
to_name = (char *) obstack_finish (&result->mem_pool);
}
state = 5;
continue;
case 90:
if (nowtok != expected_tok)
lr_error (cmfile, _("\
`%1$s' definition does not end with `END %1$s'"), expected_str);
lr_ignore_rest (cmfile, nowtok == expected_tok);
state = 91;
continue;
case 91:
/* Waiting for WIDTH... */
if (nowtok == tok_eol)
/* Ignore empty lines. */
continue;
if (nowtok == tok_width_default)
{
state = 92;
continue;
}
if (nowtok == tok_width)
{
lr_ignore_rest (cmfile, 1);
state = 93;
continue;
}
if (nowtok == tok_width_variable)
{
lr_ignore_rest (cmfile, 1);
state = 98;
continue;
}
lr_error (cmfile, _("\
only WIDTH definitions are allowed to follow the CHARMAP definition"));
lr_ignore_rest (cmfile, 0);
continue;
case 92:
if (nowtok != tok_number)
lr_error (cmfile, _("value for %s must be an integer"),
"WIDTH_DEFAULT");
else
result->width_default = now->val.num;
lr_ignore_rest (cmfile, nowtok == tok_number);
state = 91;
continue;
case 93:
/* We now expect `END WIDTH' or lines of the format "%s %d\n" or
"%s...%s %d\n". */
if (nowtok == tok_eol)
/* ignore empty lines. */
continue;
if (nowtok == tok_end)
{
expected_tok = tok_width;
expected_str = "WIDTH";
state = 90;
continue;
}
if (nowtok != tok_bsymbol && nowtok != tok_ucs4)
{
lr_error (cmfile, _("syntax error in %s definition: %s"),
"WIDTH", _("no symbolic name given"));
lr_ignore_rest (cmfile, 0);
continue;
}
if (from_name != NULL)
obstack_free (&result->mem_pool, from_name);
if (nowtok == tok_bsymbol)
from_name = (char *) obstack_copy0 (&result->mem_pool,
now->val.str.startmb,
now->val.str.lenmb);
else
{
obstack_printf (&result->mem_pool, "U%08X",
cmfile->token.val.ucs4);
obstack_1grow (&result->mem_pool, '\0');
from_name = (char *) obstack_finish (&result->mem_pool);
}
to_name = NULL;
state = 94;
continue;
case 94:
if (nowtok == tok_ellipsis3)
{
state = 95;
continue;
}
case 96:
if (nowtok != tok_number)
lr_error (cmfile, _("value for %s must be an integer"),
"WIDTH");
else
{
/* Store width for chars. */
new_width (cmfile, result, from_name, to_name, now->val.num);
from_name = NULL;
to_name = NULL;
}
lr_ignore_rest (cmfile, nowtok == tok_number);
state = 93;
continue;
case 95:
if (nowtok != tok_bsymbol && nowtok != tok_ucs4)
{
lr_error (cmfile, _("syntax error in %s definition: %s"),
"WIDTH", _("no symbolic name given for end of range"));
lr_ignore_rest (cmfile, 0);
state = 93;
continue;
}
if (nowtok == tok_bsymbol)
to_name = (char *) obstack_copy0 (&result->mem_pool,
now->val.str.startmb,
now->val.str.lenmb);
else
{
obstack_printf (&result->mem_pool, "U%08X",
cmfile->token.val.ucs4);
obstack_1grow (&result->mem_pool, '\0');
to_name = (char *) obstack_finish (&result->mem_pool);
}
state = 96;
continue;
case 98:
/* We now expect `END WIDTH_VARIABLE' or lines of the format
"%s\n" or "%s...%s\n". */
if (nowtok == tok_eol)
/* ignore empty lines. */
continue;
if (nowtok == tok_end)
{
expected_tok = tok_width_variable;
expected_str = "WIDTH_VARIABLE";
state = 90;
continue;
}
if (nowtok != tok_bsymbol && nowtok != tok_ucs4)
{
lr_error (cmfile, _("syntax error in %s definition: %s"),
"WIDTH_VARIABLE", _("no symbolic name given"));
lr_ignore_rest (cmfile, 0);
continue;
}
if (from_name != NULL)
obstack_free (&result->mem_pool, from_name);
if (nowtok == tok_bsymbol)
from_name = (char *) obstack_copy0 (&result->mem_pool,
now->val.str.startmb,
now->val.str.lenmb);
else
{
obstack_printf (&result->mem_pool, "U%08X",
cmfile->token.val.ucs4);
obstack_1grow (&result->mem_pool, '\0');
from_name = (char *) obstack_finish (&result->mem_pool);
}
to_name = NULL;
state = 99;
continue;
case 99:
if (nowtok == tok_ellipsis3)
state = 100;
/* Store info. */
from_name = NULL;
/* Warn */
state = 98;
continue;
case 100:
if (nowtok != tok_bsymbol && nowtok != tok_ucs4)
{
lr_error (cmfile, _("syntax error in %s definition: %s"),
"WIDTH_VARIABLE",
_("no symbolic name given for end of range"));
lr_ignore_rest (cmfile, 0);
continue;
}
if (nowtok == tok_bsymbol)
to_name = (char *) obstack_copy0 (&result->mem_pool,
now->val.str.startmb,
now->val.str.lenmb);
else
{
obstack_printf (&result->mem_pool, "U%08X",
cmfile->token.val.ucs4);
obstack_1grow (&result->mem_pool, '\0');
to_name = (char *) obstack_finish (&result->mem_pool);
}
/* XXX Enter value into table. */
lr_ignore_rest (cmfile, 1);
state = 98;
continue;
default:
error (5, 0, _("%s: error in state machine"), __FILE__);
/* NOTREACHED */
}
break;
}
if (state != 91 && !be_quiet)
error (0, 0, _("%s: premature end of file"), cmfile->fname);
lr_close (cmfile);
return result;
}
static void
new_width (struct linereader *cmfile, struct charmap_t *result,
const char *from, const char *to, unsigned long int width)
{
struct charseq *from_val;
struct charseq *to_val;
from_val = charmap_find_value (result, from, strlen (from));
if (from_val == NULL)
{
lr_error (cmfile, _("unknown character `%s'"), from);
return;
}
if (to == NULL)
to_val = from_val;
else
{
to_val = charmap_find_value (result, to, strlen (to));
if (to_val == NULL)
{
lr_error (cmfile, _("unknown character `%s'"), to);
return;
}
}
if (result->nwidth_rules >= result->nwidth_rules_max)
{
size_t new_size = result->nwidth_rules + 32;
struct width_rule *new_rules =
(struct width_rule *) obstack_alloc (&result->mem_pool,
(new_size
* sizeof (struct width_rule)));
memcpy (new_rules, result->width_rules,
result->nwidth_rules_max * sizeof (struct width_rule));
result->width_rules = new_rules;
result->nwidth_rules_max = new_size;
}
result->width_rules[result->nwidth_rules].from = from_val;
result->width_rules[result->nwidth_rules].to = to_val;
result->width_rules[result->nwidth_rules].width = (unsigned int) width;
++result->nwidth_rules;
}
struct charseq *
charmap_find_value (const struct charmap_t *cm, const char *name, size_t len)
{
void *result;
return (find_entry ((hash_table *) &cm->char_table, name, len, &result)
< 0 ? NULL : (struct charseq *) result);
}
static void
charmap_new_char (struct linereader *lr, struct charmap_t *cm,
int nbytes, char *bytes, const char *from, const char *to,
int decimal_ellipsis, int step)
{
hash_table *ht = &cm->char_table;
hash_table *bt = &cm->byte_table;
struct obstack *ob = &cm->mem_pool;
char *from_end;
char *to_end;
const char *cp;
int prefix_len, len1, len2;
unsigned int from_nr, to_nr, cnt;
struct charseq *newp;
len1 = strlen (from);
if (to == NULL)
{
newp = (struct charseq *) obstack_alloc (ob, sizeof (*newp) + nbytes);
newp->nbytes = nbytes;
memcpy (newp->bytes, bytes, nbytes);
newp->name = from;
newp->ucs4 = UNINITIALIZED_CHAR_VALUE;
if ((from[0] == 'U' || from[0] == 'P') && (len1 == 5 || len1 == 9))
{
/* Maybe the name is of the form `Uxxxx' or `Uxxxxxxxx' where
xxxx and xxxxxxxx are hexadecimal numbers. In this case
we use the value of xxxx or xxxxxxxx as the UCS4 value of
this character and we don't have to consult the repertoire
map.
If the name is of the form `Pxxxx' or `Pxxxxxxxx' the xxxx
and xxxxxxxx also give the code point in UCS4 but this must
be in the private, i.e., unassigned, area. This should be
used for characters which do not (yet) have an equivalent
in ISO 10646 and Unicode. */
char *endp;
errno = 0;
newp->ucs4 = strtoul (from + 1, &endp, 16);
if (endp - from != len1
|| (newp->ucs4 == ULONG_MAX && errno == ERANGE)
|| newp->ucs4 >= 0x80000000)
/* This wasn't successful. Signal this name cannot be a
correct UCS value. */
newp->ucs4 = UNINITIALIZED_CHAR_VALUE;
}
insert_entry (ht, from, len1, newp);
insert_entry (bt, newp->bytes, nbytes, newp);
/* Please note that it isn't a bug if a symbol is defined more
than once. All later definitions are simply discarded. */
return;
}
/* We have a range: the names must have names with equal prefixes
and an equal number of digits, where the second number is greater
or equal than the first. */
len2 = strlen (to);
if (len1 != len2)
{
illegal_range:
lr_error (lr, _("invalid names for character range"));
return;
}
cp = &from[len1 - 1];
if (decimal_ellipsis)
while (isdigit (*cp) && cp >= from)
--cp;
else
while (isxdigit (*cp) && cp >= from)
{
if (!isdigit (*cp) && !isupper (*cp))
lr_error (lr, _("\
hexadecimal range format should use only capital characters"));
--cp;
}
prefix_len = (cp - from) + 1;
if (cp == &from[len1 - 1] || strncmp (from, to, prefix_len) != 0)
goto illegal_range;
errno = 0;
from_nr = strtoul (&from[prefix_len], &from_end, decimal_ellipsis ? 10 : 16);
if (*from_end != '\0' || (from_nr == ULONG_MAX && errno == ERANGE)
|| ((to_nr = strtoul (&to[prefix_len], &to_end,
decimal_ellipsis ? 10 : 16)) == ULONG_MAX
&& errno == ERANGE)
|| *to_end != '\0')
{
lr_error (lr, _("<%s> and <%s> are illegal names for range"), from, to);
return;
}
if (from_nr > to_nr)
{
lr_error (lr, _("upper limit in range is not higher then lower limit"));
return;
}
for (cnt = from_nr; cnt <= to_nr; cnt += step)
{
char *name_end;
obstack_printf (ob, decimal_ellipsis ? "%.*s%0*d" : "%.*s%0*X",
prefix_len, from, len1 - prefix_len, cnt);
obstack_1grow (ob, '\0');
name_end = obstack_finish (ob);
newp = (struct charseq *) obstack_alloc (ob, sizeof (*newp) + nbytes);
newp->nbytes = nbytes;
memcpy (newp->bytes, bytes, nbytes);
newp->name = name_end;
newp->ucs4 = UNINITIALIZED_CHAR_VALUE;
if ((name_end[0] == 'U' || name_end[0] == 'P')
&& (len1 == 5 || len1 == 9))
{
/* Maybe the name is of the form `Uxxxx' or `Uxxxxxxxx' where
xxxx and xxxxxxxx are hexadecimal numbers. In this case
we use the value of xxxx or xxxxxxxx as the UCS4 value of
this character and we don't have to consult the repertoire
map.
If the name is of the form `Pxxxx' or `Pxxxxxxxx' the xxxx
and xxxxxxxx also give the code point in UCS4 but this must
be in the private, i.e., unassigned, area. This should be
used for characters which do not (yet) have an equivalent
in ISO 10646 and Unicode. */
char *endp;
errno = 0;
newp->ucs4 = strtoul (name_end + 1, &endp, 16);
if (endp - name_end != len1
|| (newp->ucs4 == ULONG_MAX && errno == ERANGE)
|| newp->ucs4 >= 0x80000000)
/* This wasn't successful. Signal this name cannot be a
correct UCS value. */
newp->ucs4 = UNINITIALIZED_CHAR_VALUE;
}
insert_entry (ht, name_end, len1, newp);
insert_entry (bt, newp->bytes, nbytes, newp);
/* Please note we don't examine the return value since it is no error
if we have two definitions for a symbol. */
/* Increment the value in the byte sequence. */
if (++bytes[nbytes - 1] == '\0')
{
int b = nbytes - 2;
do
if (b < 0)
{
lr_error (lr,
_("resulting bytes for range not representable."));
return;
}
while (++bytes[b--] == 0);
}
}
}
struct charseq *
charmap_find_symbol (const struct charmap_t *cm, const char *bytes,
size_t nbytes)
{
void *result;
return (find_entry ((hash_table *) &cm->byte_table, bytes, nbytes, &result)
< 0 ? NULL : (struct charseq *) result);
}