WORDSPLIT
NAMESYNOPSIS
DESCRIPTION
INCREMENTAL MODE
OPTIONS
EXPANSION
VARIABLE NAMES
LIMITING THE NUMBER OF WORDS
WORDSPLIT_T STRUCTURE
FLAGS
OPTIONS
ERROR CODES
RETURN VALUE
EXAMPLE
AUTHORS
BUGS
BUG REPORTS
COPYRIGHT
NAME
wordsplit - split string into words
SYNOPSIS
#include <wordsplit.h>
int wordsplit (const char *s, wordsplit_t *ws, int flags);
int wordsplit_len (const char *s, size_t len, wordsplit_t *p, int flags);
void wordsplit_free (wordsplit_t *p);
void wordsplit_free_words (wordsplit_t *ws);
void wordsplit_getwords (wordsplit_t *ws, int *wordc, char ***wordv);
void wordsplit_perror (wordsplit_t *ws);
const char *wordsplit_strerror (wordsplit_t *ws);
void wordsplit_clearerr (wordsplit_t *ws);
DESCRIPTION
The function wordsplit splits the string s into words using a set of rules governed by flags. Depending on flags, the function performs the following operations: whitespace trimming, tilde expansion, variable expansion, quote removal, command substitution, and path expansion. On success, wordsplit returns 0 and stores the words found in the member ws_wordv and the number of words in the member ws_wordc. On error, a non-zero error code is returned.
The function wordsplit_len acts similarly, except that it accesses only first len bytes of the string s, which is not required to be null-terminated.
When no longer needed, the resources allocated by a call to one of these functions must be freed using wordsplit_free.
The function wordsplit_free_words frees only the memory allocated for elements of ws_wordv after which it resets ws_wordv to NULL and ws_wordc to zero.
The usual calling sequence is:
wordsplit_t ws;
int rc;
if
(wordsplit(s, &ws, WRDSF_DEFFLAGS)) {
for (i = 0; i < ws.ws_wordc; i++) {
/* do something with ws.ws_wordv[i] */
}
}
wordsplit_free(&ws);
Notice, that wordsplit_free must be called after each invocation of wordsplit or wordsplit_len, even if it resulted in error.
The function wordsplit_getwords returns in wordv an array of words, and in wordc the number of elements in wordv. The array can be used after calling wordsplit_free. The caller becomes responsible for freeing the memory allocated for each element of the array and the array pointer itself.
The function wordsplit_perror prints error message from the last invocation of wordsplit. It uses the function pointed to by the ws_error member. By default, it outputs the message on the standard error.
For more sophisticated error reporting, the function wordsplit_strerror can be used. It returns a pointer to the string describing the error. The caller should treat this pointer as a constant string. It should not try to alter or deallocate it.
The function wordsplit_clearerr clears the error condition associated with ws.
INCREMENTAL MODE
In incremental mode wordsplit parses one word per invocation. It returns WRDSF_OK on success and WRDSF_NOINPUT when entire input string has been processed.
This mode is enabled if the flag WRDSF_INCREMENTAL is set in the flags argument. Subsequent calls to wordsplit must have NULL as first argument. Each successful call will return exactly one word in ws.ws_wordv[0].
An example usage:
wordsplit_t ws;
int rc;
flags = WRDSF_DEFFLAGS|WRDSF_INCREMENTAL;
for (rc =
wordsplit(s, &ws, flags); rc == WRDSF_OK;
rc = wordsplit(NULL, &ws, flags)) {
process(ws.ws_wordv[0]);
}
if (rc !=
WRDSE_NOINPUT)
wordsplit_perror(&ws);
wordsplit_free(&ws);
OPTIONS
The number of flags is limited to 32 (the width of uint32_t data type). By the time of this writing each bit is already occupied by a corresponding flag. However, the number of features wordsplit provides requires still more. Additional features can be requested by setting a corresponding option bit in the ws_option field of the struct wordsplit argument. To inform wordsplit functions that this field is initialized the WRDSF_OPTIONS flag must be set.
Option symbolic names begin with WRDSO_. They are discussed in detail in the subsequent chapters.
EXPANSION
Expansion is performed on the input after it has been split into words. The kinds of expansion to be performed are controlled by the appropriate bits set in the flags argument. Whatever expansion kinds are enabled, they are always run in the order described in this section.
Whitespace
trimming
Whitespace trimming removes any leading and trailing
whitespace from the initial word array. It is enabled by the
WRDSF_WS flag. Whitespace trimming is enabled
automatically if the word delimiters (ws_delim
member) contain whitespace characters ("
\t\n"), which is the default.
Variable
expansion
Variable expansion replaces each occurrence of
$NAME or ${NAME} with the
value of the variable NAME. It is enabled by default
and can be disabled by setting the WRDSF_NOVAR flag.
The caller is responsible for supplying the table of
available variables. Two mechanisms are provided:
environment array and a callback function.
Environment array is a NULL-terminated array of variables, stored in the ws_env member. The WRDSF_ENV flag must be set in order to instruct wordsplit to use this array.
By default, elements of the ws_env array have the form NAME=VALUE. An alternative format is enabled by the WRDSF_ENV_KV flag. When it is set, each variable is described by two consecutive elements in the array: ws_env[n] containing the variable name, and ws_env[n+1] containing its value. If the latter is NULL, the corresponding variable is undefined.
More sophisticated variable tables can be implemented using callback function. The ws_getvar member should be set to point to that function and WRDSF_GETVAR flag must be set. The function itself shall be defined as
int getvar (char **ret, const char *var, size_t len, void *clos);
The function shall look up the variable identified by the first len bytes of the string var. If the variable is found, the function shall store a copy of its value (allocated using malloc(3)) in the memory location pointed to by ret, and return WRDSE_OK. If the variable is not found, the function shall return WRDSE_UNDEF. Otherwise, a non-zero error code shall be returned.
If ws_getvar returns WRDSE_USERERR, it must store the pointer to the error description string in *ret. In any case (whether returning 0 or WRDSE_USERERR), the data returned in ret must be allocated using malloc(3).
If both ws_env and ws_getvar are used, the variable is first looked up in ws_env. If it is not found there, the ws_getvar callback is invoked. This order is reverted if the WRDSO_GETVARPREF option is set.
During variable
expansion, the forms below cause wordsplit to test
for a variable that is unset or null. Omitting the colon
results in a test only for a variable that is unset.
${variable:-word}
Use Default Values. If variable is unset or null, the expansion of word is substituted. Otherwise, the value of variable is substituted.
${variable:=word}
Assign Default Values. If variable is unset or null, the expansion of word is assigned to variable. The value of variable is then substituted.
${variable:?word}
Display Error if Null or Unset. If variable is null or unset, the expansion of word (or a message to that effect if word is not present) is output using ws_error. Otherwise, the value of variable is substituted.
${variable:+word}
Use Alternate Value. If variable is null or unset, nothing is substituted, otherwise the expansion of word is substituted.
Unless the above forms are used, a reference to an undefined variable expands to empty string. Three flags affect this behavior. If the WRDSF_UNDEF flag is set, expanding undefined variable triggers a WRDSE_UNDEF error. If the WRDSF_WARNUNDEF flag is set, a non-fatal warning is emitted for each undefined variable. Finally, if the WRDSF_KEEPUNDEF flag is set, references to undefined variables are left unexpanded.
If two or three of these flags are set simultaneously, the behavior is undefined.
Positional
argument expansion
Positional arguments are special parameters that can be
referenced in the input string by their ordinal number. The
numbering begins at 0. The syntax for referencing
positional arguments is the same as for the variables,
except that argument index is used instead of the variable
name. If the index is between 0 and 9, the $N
form is acceptable. Otherwise, the index must be enclosed in
curly braces: ${N}.
During argument expansion, references to positional arguments are replaced with the corresponding values.
Argument expansion is requested by the WRDSO_PARAMV option bit. The NULL-terminated array of variables shall be supplied in the ws_paramv member. The ws_paramc member shall be initialized to the number of elements in ws_paramv.
Setting the WRDSO_PARAM_NEGIDX option together with WRDSO_PARAMV enables negative positional argument references. A negative reference has the form ${-N}. It is expanded to the value of the argument with index ws_paramc - N.
Quote
removal
During quote removal, single or double quotes surrounding a
sequence of characters are removed and the sequence itself
is treated as a single word. Characters within single quotes
are treated verbatim. Characters within double quotes
undergo variable expansion and backslash interpretation (see
below).
Recognition of single quoted strings is enabled by the WRDSF_SQUOTE flag. Recognition of double quotes is enabled by the WRDSF_DQUOTE flag. The macro WRDSF_QUOTE enables both.
Backslash
interpretation
Backslash interpretation translates unquoted escape
sequences into corresponding characters. An escape
sequence is a backslash followed by one or more characters.
By default, each sequence \C appearing in
unquoted words is replaced with the character C. In
doubly-quoted strings, two backslash sequences are
recognized: \\ translates to a single backslash, and
\" translates to a double-quote.
Two flags are provided to modify this behavior. If WRDSF_CESCAPES flag is set, the following escape sequences are recognized:
Sequence Expansion ASCII
\\ |
\ |
134 | |
\" |
" |
042 | |
\a |
audible bell |
007 | |
\b |
backspace |
010 | |
\f |
form-feed |
014 | |
\n |
new line |
012 | |
\r |
charriage return |
015 | |
\t |
horizontal tabulation |
011 | |
\v |
vertical tabulation |
013 |
The sequence \xNN or \XNN, where NN stands for a two-digit hex number is replaced with ASCII character NN. The sequence \0NNN, where NNN stands for a three-digit octal number is replaced with ASCII character whose code is NNN.
The WRDSF_ESCAPE flag allows the caller to customize escape sequences. If it is set, the ws_escape member must be initialized. This member provides escape tables for unquoted words (ws_escape[0]) and quoted strings (ws_escape[1]). Each table is a string consisting of an even number of characters. In each pair of characters, the first one is a character that can appear after backslash, and the following one is its translation. For example, the above table of C escapes is represented as "\\\\"\"a\ab\bf\fn\nr\rt\tv\v".
It is valid to initialize ws_escape elements to zero. In this case, no backslash translation occurs.
Interpretation
of octal and hex escapes is controlled by the following bits
in ws_options:
WRDSO_BSKEEP_WORD
When an unrecognized escape sequence is encountered in a word, preserve it on output. If that bit is not set, the backslash is removed from such sequences.
WRDSO_OESC_WORD
Handle octal escapes in words.
WRDSO_XESC_WORD
Handle hex escapes in words.
WRDSO_BSKEEP_QUOTE
When an unrecognized escape sequence is encountered in a doubly-quoted string, preserve it on output. If that bit is not set, the backslash is removed from such sequences.
WRDSO_OESC_QUOTE
Handle octal escapes in doubly-quoted strings.
WRDSO_XESC_QUOTE
Handle hex escapes in doubly-quoted strings.
Command
substitution
During command substitution, each word is scanned for
commands. Each command found is executed and replaced by the
output it creates.
The syntax is:
$(command)
Command substitutions may be nested.
Unless the substitution appears within double quotes, word splitting and pathname expansion are performed on its result.
To enable command substitution, the caller must initialize the ws_command member with the address of the substitution function and make sure the WRDSF_NOCMD flag is not set.
The substitution function should be defined as follows:
int command (char **ret, const char *cmd, size_t len, char **argv, void *clos);
On input, the first len bytes of cmd contain the command invocation as it appeared between $( and ), with all expansions performed.
The argv parameter contains the command line split into words using the same settings as the input ws structure.
The clos parameter supplies user-specific data, passed in the ws_closure member).
On success, the function stores a pointer to the output string in the memory location pointed to by ret and returns WRDSE_OK (0). On error, it must return one of the error codes described in the section ERROR CODES. If WRDSE_USERERR, is returned, a pointer to the error description string must be stored in *ret.
When WRDSE_OK or WRDSE_USERERR is returned, the data stored in *ret must be allocated using malloc(3).
Tilde and
pathname expansion
Both expansions are performed if the WRDSF_PATHEXPAND
flag is set.
Tilde expansion affects any word that begins with an unquoted tilde character (~). If the tilde is followed immediately by a slash, it is replaced with the home directory of the current user (as determined by his passwd entry). A tilde alone is handled the same way. Otherwise, the characters between the tilde and first slash character (or end of string, if it doesn’t contain any) are treated as a login name. and are replaced (along with the tilde itself) with the home directory of that user. If there is no user with such login name, the word is left unchanged.
During pathname expansion each unquoted word is scanned for characters *, ?, and [. If any of these appears, the word is considered a pattern (in the sense of glob(3)) and is replaced with an alphabetically sorted list of file names matching the pattern.
If no matches are found for a word and the ws_options member has the WRDSO_NULLGLOB bit set, the word is removed.
If the WRDSO_FAILGLOB option is set, an error message is output for each such word using ws_error.
When matching a pattern, the dot at the start of a name or immediately following a slash must be matched explicitly, unless the WRDSO_DOTGLOB option is set.
VARIABLE NAMES
By default a shell-like lexical structure of a variable name is assumed. A valid variable name begins with an alphabetical character or underscore and contains alphabetical characters, digits and underscores.
The set of characters that constitute a variable name can be augmented. To do so, initialize the ws_namechar member to the C string containing the characters to be added, set the WRDSO_NAMECHAR bit in ws_options and set the WRDSF_OPTIONS bit in the flags argument.
For example, to allow for colons in variable names, do:
struct
wordsplit ws;
ws.ws_namechar = ":";
ws.ws_options = WRDSO_NAMECHAR;
wordsplit(str, &ws, WRDSF_DEFFLAGS|WRDSF_OPTIONS);
Certain characters cannot be allowed to be a name costituent. These are: $, {, }, *, @, -, +, ?, and =. If any of these appears in ws_namechar, the wordsplit (and wordsplit_len) function will return the WRDSE_USAGE error.
LIMITING THE NUMBER OF WORDS
The maximum number of words to be returned can be limited by setting the ws_maxwords member to the desired count, and setting the WRDSO_MAXWORDS option, e.g.:
struct
wordsplit ws;
ws.ws_maxwords = 3;
ws.ws_options = WRDSO_MAXWORDS;
wordsplit(str, &ws, WRDSF_DEFFLAGS|WRDSF_OPTIONS);
If the actual number of words in the expanded input is greater than the supplied limit, the trailing part of the input will be returned in the last word. For example, if the input to the above fragment were Now is the time for all good men, then the returned words would be:
"Now"
"is"
"the time for all good men"
WORDSPLIT_T STRUCTURE
The data type wordsplit_t has three members that contain output data upon return from wordsplit or wordsplit_len, and a number of members that the caller can initialize on input in order to customize the function behavior. For each input member there is a corresponding flag bit, which must be set in the flags argument in order to instruct the wordsplit function to use the member.
OUTPUT
size_t ws_wordc
Number of words in ws_wordv. Accessible upon successful return from wordsplit.
char ** ws_wordv
Array of resulting words. Accessible upon successful return from wordsplit.
The caller should not attempt to free or reallocate ws_wordv or any elements thereof, nor to modify ws_wordc.
To store away
the words for use after freeing ws with
wordsplit_free, the caller should use
wordsplit_getwords. It is more effective than copying
the contents of ws_wordv manually.
size_t ws_wordi
Total number of words processed. This field is intended for use with WRDSF_INCREMENTAL flag. If that flag is not set, the following relation holds: ws_wordi == ws_wordc - ws_offs.
int ws_errno
Error code, if the invocation of wordsplit or wordsplit_len failed. This is the same value as returned from the function in that case.
char *ws_errctx
On error, context in which the error occurred. For WRDSE_UNDEF, it is the name of the undefined variable. For WRDSE_GLOBERR - the pattern that caused error.
The caller should treat this member as const char *.
The following
members are used if the variable expansion was requested and
the input string contained an Assign Default Values
form (${variable:=word}).
char **ws_envbuf
Modified environment. It follows the same arrangement as ws_env on input (see the WRDSF_ENV_KV flag). If ws_env was NULL (or WRDSF_ENV was not set), but the ws_getvar callback was used, the ws_envbuf array will contain only the modified variables.
size_t ws_envidx
Number of entries in ws_envbuf.
If positional
parameters were used (see the WRDSO_PARAMV option)
and any of them were modified during processing, the
following two members supply the modified parameter array.
char ** ws_parambuf
Array of positional parameters.
size_t ws_paramidx
Number of positional parameters.
INPUT
size_t ws_offs
If the WRDSF_DOOFFS flag is set, this member specifies the number of initial elements in ws_wordv to fill with NULLs. These elements are not counted in the returned ws_wordc.
size_t ws_maxwords
Maximum number of words to return. For this field to take effect, the WRDSO_MAXWORDS option and WRDSF_OPTIONS flag must be set. For a detailed discussion, see the chapter LIMITING THE NUMBER OF WORDS.
int ws_flags
Contains flags passed to wordsplit on entry. Can be used as a read-only member when using wordsplit in incremental mode or in a loop with WRDSF_REUSE flag set.
int ws_options
Additional options used when WRDSF_OPTIONS is set.
const char *ws_delim
Word delimiters. If initialized on input, the WRDSF_DELIM flag must be set. Otherwise, it is initialized on entry to wordsplit with the string " \t\n".
const char *ws_comment
A zero-terminated string of characters that begin an inline comment. If initialized on input, the WRDSF_COMMENT flag must be set. By default, it’s value is "#".
const char *ws_escape[2]
Escape tables for unquoted words (ws_escape[0]) and quoted strings (ws_escape[1]). These are used to translate escape sequences (\C) into characters. Each table is a string consisting of even number of characters. In each pair of characters, the first one is a character that can appear after backslash, and the following one is its representation. For example, the string "t\tn\n" translates \t into horizontal tabulation character and \n into newline. WRDSF_ESCAPE flag must be set if this member is initialized.
const char *ws_namechar
Lists characters that are allowed in a variable name, in addition to alphanumerics and underscore. The WRDSO_NAMECHAR bit must be set in ws_options for this to take effect.
See the chapter VARIABLE NAMES, for a detailed discussion.
void (*ws_alloc_die) (wordsplit_t *)
This function is called when wordsplit is unable to allocate memory and the WRDSF_ENOMEMABRT flag was set. The default function prints a message on standard error and aborts. This member can be used to customize error handling. If initialized, the WRDSF_ALLOC_DIE flag must be set.
void (*ws_error) (const char *, ...)
Pointer to function used for error reporting. The invocation convention is the same as for printf(3). The default function formats and prints the message on the standard error.
If this member is initialized, the WRDSF_ERROR flag must be set.
void (*ws_debug) (const char *, ...)
Pointer to function used for debugging output. By default it points to the same function as ws_error. If initialized, the WRDSF_DEBUG flag must be set.
const char **ws_env
A NULL-terminated array of environment variables. It is used during variable expansion. If set, the WRDSF_ENV flag must be set. Variable expansion is enabled only if either WRDSF_ENV or WRDSF_GETVAR (see below) is set, and WRDSF_NOVAR flag is not set.
Each element of ws_env must have the form "NAME=VALUE, where NAME is the name of the variable, and VALUE is its value. Alternatively, if the WRDSF_ENV_KV flag is set, each variable is described by two elements of ws_env: one containing variable name, and the next one with its value.
int (*ws_getvar) (char **ret, const char *var, size_t len, void *clos)
Points to the function that will be used during variable expansion for environment variable lookups. This function is used if the variable expansion is enabled (i.e. the WRDSF_NOVAR flag is not set), and the WRDSF_GETVAR flag is set.
If both WRDSF_ENV and WRDSF_GETVAR are set, the variable is first looked up in the ws_env array and, if not found there, ws_getvar is called. If the WRDSO_GETVARPREF option is set, this order is reverted.
The name of the variable is specified by the first len bytes of the string var. The clos parameter supplies the user-specific data (see below the description of ws_closure member) and the ret parameter points to the memory location where output data is to be stored. On success, the function must store there a pointer to the string with the value of the variable and return 0. On error, it must return one of the error codes described in the section ERROR CODES. If ws_getvar returns WRDSE_USERERR, it must store the pointer to the error description string in *ret. In any case (whether returning 0 or WRDSE_USERERR), the data returned in ret must be allocated using malloc(3).
void *ws_closure
Additional user-specific data passed as the last argument to ws_getvar or ws_command (see below). If defined, the WRDSF_CLOSURE flag must be set.
int
(*ws_command) (char **ret, const char *cmd,
size_t len, char
**argv, void *clos)
Pointer to the function that performs command substitution. It treats the first len bytes of the string cmd as a command (whatever it means for the caller) and attempts to execute it. On success, a pointer to the string with the command output is stored in the memory location pointed to by ret and 0 is returned. On error, the function must return one of the error codes described in the section ERROR CODES. If ws_command returns WRDSE_USERERR, it must store the pointer to the error description string in *ret. In any case (whether returning 0 or WRDSE_USERERR), the data returned in ret must be allocated using malloc(3).
The parameter argv contains the command split into words using the same settings as the input ws structure, with command substitution disabled.
The clos parameter supplies user-specific data (see the description of ws_closure member).
The following
two members are consulted if the WRDSO_PARAMV option
is set. They provide an array of positional parameters.
char const **ws_paramv
Positional parameters. These are accessible in the input string using the notation $N or ${N}, where N is the 0-based parameter number.
size_t ws_paramc
Number of positional parameters.
FLAGS
The following
macros are defined for use in the flags argument.
WRDSF_DEFFLAGS
Default flags. This is a shortcut for:
(WRDSF_NOVAR | WRDSF_NOCMD | WRDSF_QUOTE | WRDSF_SQUEEZE_DELIMS | WRDSF_CESCAPES),
i.e.: disable variable expansion and quote substitution, perform quote removal, treat any number of consecutive delimiters as a single delimiter, replace C escapes appearing in the input string with the corresponding characters.
WRDSF_APPEND
Append the resulting words to the array left from a previous call to wordsplit.
WRDSF_DOOFFS
Insert ws_offs initial NULLs in the array ws_wordv. These are not counted in the returned ws_wordc.
WRDSF_NOCMD
Don’t do command substitution. The WRDSO_NOCMDSPLIT option set together with this flag prevents splitting command invocations into separate words (see the OPTIONS section).
WRDSF_REUSE
The parameter ws resulted from a previous call to wordsplit, and wordsplit_free was not called. Reuse the allocated storage.
WRDSF_SHOWERR
Print errors using ws_error.
WRDSF_UNDEF
Consider it an error if an undefined variable is expanded.
WRDSF_NOVAR
Don’t do variable expansion. The WRDSO_NOVARSPLIT option set together with this flag prevents variable references from being split into separate words (see the OPTIONS section).
WRDSF_ENOMEMABRT
Abort on ENOMEM error. By default, out of memory errors are treated as any other errors: the error is reported using ws_error if the WRDSF_SHOWERR flag is set, and error code is returned. If this flag is set, the ws_alloc_die function is called instead. This function is not supposed to return.
WRDSF_WS
Trim off any leading and trailing whitespace from the returned words. This flag is useful if the ws_delim member does not contain whitespace characters.
WRDSF_SQUOTE
Handle single quotes.
WRDSF_DQUOTE
Handle double quotes.
WRDSF_QUOTE
A shortcut for (WRDSF_SQUOTE|WRDSF_DQUOTE).
WRDSF_SQUEEZE_DELIMS
Replace each input sequence of repeated delimiters with a single delimiter.
WRDSF_RETURN_DELIMS
Return delimiters.
WRDSF_SED_EXPR
Treat sed(1) expressions as words.
WRDSF_DELIM
ws_delim member is initialized.
WRDSF_COMMENT
ws_comment member is initialized.
WRDSF_ALLOC_DIE
ws_alloc_die member is initialized.
WRDSF_ERROR
ws_error member is initialized.
WRDSF_DEBUG
ws_debug member is initialized.
WRDSF_ENV
ws_env member is initialized.
WRDSF_GETVAR
ws_getvar member is initialized.
WRDSF_SHOWDBG
Enable debugging.
WRDSF_NOSPLIT
Don’t split input into words. This flag is is useful for side effects, e.g. to perform variable expansion within a string.
WRDSF_KEEPUNDEF
Keep undefined variables in place, instead of expanding them to empty strings.
WRDSF_WARNUNDEF
Warn about undefined variables.
WRDSF_CESCAPES
Handle C-style escapes in the input string.
WRDSF_CLOSURE
ws_closure is set.
WRDSF_ENV_KV
Each two consecutive elements in the ws_env array describe a single variable: ws_env[n] contains variable name, and ws_env[n+1] contains its value.
WRDSF_ESCAPE
ws_escape is set.
WRDSF_INCREMENTAL
Incremental mode. Each subsequent call to wordsplit with NULL as its first argument parses the next word from the input. See the section INCREMENTAL MODE for a detailed discussion.
WRDSF_PATHEXPAND
Perform pathname and tilde expansion. See the subsection Pathname expansion for details.
WRDSF_OPTIONS
The ws_options member is initialized.
OPTIONS
The
ws_options member is consulted if the
WRDSF_OPTIONS flag is set. It contains a bitwise
OR of one or more of the following options:
WRDSO_NULLGLOB
Remove the words that produce empty string after pathname expansion.
WRDSO_FAILGLOB
Output error message if pathname expansion produces empty string.
WRDSO_DOTGLOB
During pathname expansion allow a leading period to be matched by metacharacters.
WRDSO_BSKEEP_WORD
Backslash interpretation: when an unrecognized escape sequence is encountered in a word, preserve it on output. If that bit is not set, the backslash is removed from such sequences.
WRDSO_OESC_WORD
Backslash interpretation: handle octal escapes in words.
WRDSO_XESC_WORD
Backslash interpretation: handle hex escapes in words.
WRDSO_BSKEEP_QUOTE
Backslash interpretation: when an unrecognized escape sequence is encountered in a doubly-quoted string, preserve it on output. If that bit is not set, the backslash is removed from such sequences.
WRDSO_OESC_QUOTE
Backslash interpretation: handle octal escapes in doubly-quoted strings.
WRDSO_XESC_QUOTE
Backslash interpretation: handle hex escapes in doubly-quoted strings.
WRDSO_MAXWORDS
The ws_maxwords member is initialized. This is used to control the number of words returned by a call to wordsplit. For a detailed discussion, refer to the chapter LIMITING THE NUMBER OF WORDS.
WRDSO_NOVARSPLIT
When WRDSF_NOVAR is set, don’t split variable references, even if they contain whitespace. E.g. ${VAR:-foo bar} will be treated as a single word.
WRDSO_NOCMDSPLIT
When WRDSF_NOCMD is set, don’t split whatever looks like command invocation, even if it contains whitespace. E.g. $(command arg) will be treated as a single word.
WRDSO_PARAMV
Positional arguments are supplied in ws_paramv and ws_paramc. See the subsection Positional argument expansion for a discussion.
WRDSO_PARAM_NEGIDX
Used together with WRDSO_PARAMV, this allows for negative positional argument references. A negative argument reference has the form ${-N}. It is expanded to the value of the argument with index ws_paramc - N, i.e. Nth if counting from the end.
WRDSO_NAMECHAR
When set, indicates that the ws_namechar member of the wordsplit_t struct has been initialized.
This member allows you to modify the notion of what characters can be part of a valid variable name. See the chapter VARIABLE NAMES, for a detailed discussion.
ERROR CODES
WRDSE_OK, WRDSE_EOF
Successful return.
WRDSE_QUOTE
Missing closing quote. The ws_endp points to the position in the input string where the error occurred.
WRDSE_NOSPACE
Memory exhausted.
WRDSE_USAGE
Invalid wordsplit usage.
WRDSE_CBRACE
Unbalanced curly brace.
WRDSE_UNDEF
Undefined variable. This error is returned only if the WRDSF_UNDEF flag is set.
WRDSE_NOINPUT
Input exhausted. This is not actually an error. This code is returned if wordsplit (or wordsplit_len) is invoked in incremental mode and encounters end of input string. See the section INCREMENTAL MODE.
WRDSE_PAREN
Unbalanced parenthesis.
WRDSE_GLOBERR
An error occurred during pattern matching.
WRDSE_USERERR
User-defined error. Normally this error is returned by ws_getvar or ws_command. Use the function wordsplit_strerror to get textual description of the error.
RETURN VALUE
Both wordsplit and wordsplit_len return 0 on success, and a non-zero error code on error (see the section ERROR CODES).
wordsplit_strerror returns a pointer to the constant string describing the last error condition that occurred in ws.
EXAMPLE
The short program below implements a function that parses the input string similarly to the shell. All expansions are performed. Default error reporting is used.
#include
<stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <wordsplit.h>
/* Run command
from str (len bytes long) and store its
output in ret.
argv and closure are not used.
Return wordsplit error code.
*/
static int runcmd(char **ret, const char *str, size_t len,
char **argv, void *closure)
{
FILE *fp;
char *cmd;
int c, lastc;
char *buffer = NULL;
size_t bufsize = 0;
size_t buflen = 0;
/* Convert to a
null-terminated string for popen(3) */
cmd = malloc(len + 1);
if (!cmd)
return WRDSE_NOSPACE;
memcpy(cmd, str, len);
cmd[len] = 0;
fp = popen(cmd,
"r");
if (!fp) {
char buf[128];
snprintf(buf,
sizeof buf, "can’t run %s: %s",
cmd, strerror(errno));
*ret = strdup(buf);
if (!*ret)
return WRDSE_NOSPACE;
else
return WRDSE_USERERR;
}
/* Collect the
output, reallocating buffer as needed. */
while ((c = fgetc(fp)) != EOF) {
lastc = c;
if (c == ’0)
c = ’ ’;
if (buflen == bufsize) {
char *p;
if (bufsize ==
0)
bufsize = 80;
else
bufsize *= 2;
p = realloc(buffer, bufsize);
if (!p) {
free(buffer);
free(cmd);
return WRDSE_NOSPACE;
}
buffer = p;
}
buffer[buflen++] = c;
}
/* Tream off
the trailing newline */
if (buffer) {
if (lastc == ’0)
--buflen;
buffer[buflen] = 0;
}
pclose(fp);
free(cmd);
/* Return the
composed string. */
*ret = buffer;
return WRDSE_OK;
}
extern char **environ;
/* Parse
s much as shell does. Return array of words on
succes, and NULL on error.
*/
char **shell_parse(char *s)
{
wordsplit_t ws;
size_t wc;
char **wv;
int rc;
/* Initialize
ws */
ws.ws_env = (const char **) environ;
ws.ws_command = runcmd;
/* Call wordsplit. Let it report errors, if any. */
rc = wordsplit(s, &ws,
WRDSF_QUOTE | WRDSF_SQUEEZE_DELIMS | WRDSF_PATHEXPAND
| WRDSF_SHOWERR);
if (rc == WRDSE_OK)
/* Store away the resulting words on success. */
wordsplit_getwords(&ws, &wc, &wv);
else
wv = NULL;
wordsplit_free(&ws);
return wv;
}
AUTHORS
Sergey Poznyakoff
BUGS
Backtick command expansion is not supported.
BUG REPORTS
Report bugs to <gray@gnu.org>.
COPYRIGHT
Copyright
© 2009-2019 Sergey Poznyakoff
License GPLv3+: GNU GPL version 3 or later
<http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and
redistribute it. There is NO WARRANTY, to the extent
permitted by law.
Manpage server at man.gnu.org.ua.
Powered by mansrv 1.1