WORDSPLIT

NAME
SYNOPSIS
DESCRIPTION
INCREMENTAL MODE
OPTIONS
EXPANSION
VARIABLE NAMES
LIMITING THE NUMBER OF WORDS
WORDSPLIT_T STRUCTURE
FLAGS
OPTIONS
ERROR CODES
RETURN VALUE
EXAMPLE
AUTHORS
BUGS
BUG REPORTS
COPYRIGHT

NAME

wordsplit - split string into words

SYNOPSIS

#include <wordsplit.h>

int wordsplit (const char *s, wordsplit_t *ws, int flags);

int wordsplit_len (const char *s, size_t len, wordsplit_t *p, int flags);

void wordsplit_free (wordsplit_t *p);

void wordsplit_free_words (wordsplit_t *ws);

void wordsplit_getwords (wordsplit_t *ws, int *wordc, char ***wordv);

void wordsplit_perror (wordsplit_t *ws);

const char *wordsplit_strerror (wordsplit_t *ws);

void wordsplit_clearerr (wordsplit_t *ws);

DESCRIPTION

The function wordsplit splits the string s into words using a set of rules governed by flags. Depending on flags, the function performs the following operations: whitespace trimming, tilde expansion, variable expansion, quote removal, command substitution, and path expansion. On success, wordsplit returns 0 and stores the words found in the member ws_wordv and the number of words in the member ws_wordc. On error, a non-zero error code is returned.

The function wordsplit_len acts similarly, except that it accesses only first len bytes of the string s, which is not required to be null-terminated.

When no longer needed, the resources allocated by a call to one of these functions must be freed using wordsplit_free.

The function wordsplit_free_words frees only the memory allocated for elements of ws_wordv after which it resets ws_wordv to NULL and ws_wordc to zero.

The usual calling sequence is:

wordsplit_t ws;
int rc;

if (wordsplit(s, &ws, WRDSF_DEFFLAGS)) {
for (i = 0; i < ws.ws_wordc; i++) {
/* do something with ws.ws_wordv[i] */
}
}
wordsplit_free(&ws);

Notice, that wordsplit_free must be called after each invocation of wordsplit or wordsplit_len, even if it resulted in error.

The function wordsplit_getwords returns in wordv an array of words, and in wordc the number of elements in wordv. The array can be used after calling wordsplit_free. The caller becomes responsible for freeing the memory allocated for each element of the array and the array pointer itself.

The function wordsplit_perror prints error message from the last invocation of wordsplit. It uses the function pointed to by the ws_error member. By default, it outputs the message on the standard error.

For more sophisticated error reporting, the function wordsplit_strerror can be used. It returns a pointer to the string describing the error. The caller should treat this pointer as a constant string. It should not try to alter or deallocate it.

The function wordsplit_clearerr clears the error condition associated with ws.

INCREMENTAL MODE

In incremental mode wordsplit parses one word per invocation. It returns WRDSF_OK on success and WRDSF_NOINPUT when entire input string has been processed.

This mode is enabled if the flag WRDSF_INCREMENTAL is set in the flags argument. Subsequent calls to wordsplit must have NULL as first argument. Each successful call will return exactly one word in ws.ws_wordv[0].

An example usage:

wordsplit_t ws;
int rc;
flags = WRDSF_DEFFLAGS|WRDSF_INCREMENTAL;

for (rc = wordsplit(s, &ws, flags); rc == WRDSF_OK;
rc = wordsplit(NULL, &ws, flags)) {
process(ws.ws_wordv[0]);
}

if (rc != WRDSE_NOINPUT)
wordsplit_perror(&ws);

wordsplit_free(&ws);

OPTIONS

The number of flags is limited to 32 (the width of uint32_t data type). By the time of this writing each bit is already occupied by a corresponding flag. However, the number of features wordsplit provides requires still more. Additional features can be requested by setting a corresponding option bit in the ws_option field of the struct wordsplit argument. To inform wordsplit functions that this field is initialized the WRDSF_OPTIONS flag must be set.

Option symbolic names begin with WRDSO_. They are discussed in detail in the subsequent chapters.

EXPANSION

Expansion is performed on the input after it has been split into words. The kinds of expansion to be performed are controlled by the appropriate bits set in the flags argument. Whatever expansion kinds are enabled, they are always run in the order described in this section.

Whitespace trimming
Whitespace trimming removes any leading and trailing whitespace from the initial word array. It is enabled by the WRDSF_WS flag. Whitespace trimming is enabled automatically if the word delimiters (ws_delim member) contain whitespace characters (" \t\n"), which is the default.

Variable expansion
Variable expansion replaces each occurrence of $NAME or ${NAME} with the value of the variable NAME. It is enabled by default and can be disabled by setting the WRDSF_NOVAR flag. The caller is responsible for supplying the table of available variables. Two mechanisms are provided: environment array and a callback function.

Environment array is a NULL-terminated array of variables, stored in the ws_env member. The WRDSF_ENV flag must be set in order to instruct wordsplit to use this array.

By default, elements of the ws_env array have the form NAME=VALUE. An alternative format is enabled by the WRDSF_ENV_KV flag. When it is set, each variable is described by two consecutive elements in the array: ws_env[n] containing the variable name, and ws_env[n+1] containing its value. If the latter is NULL, the corresponding variable is undefined.

More sophisticated variable tables can be implemented using callback function. The ws_getvar member should be set to point to that function and WRDSF_GETVAR flag must be set. The function itself shall be defined as

int getvar (char **ret, const char *var, size_t len, void *clos);

The function shall look up the variable identified by the first len bytes of the string var. If the variable is found, the function shall store a copy of its value (allocated using malloc(3)) in the memory location pointed to by ret, and return WRDSE_OK. If the variable is not found, the function shall return WRDSE_UNDEF. Otherwise, a non-zero error code shall be returned.

If ws_getvar returns WRDSE_USERERR, it must store the pointer to the error description string in *ret. In any case (whether returning 0 or WRDSE_USERERR), the data returned in ret must be allocated using malloc(3).

If both ws_env and ws_getvar are used, the variable is first looked up in ws_env. If it is not found there, the ws_getvar callback is invoked. This order is reverted if the WRDSO_GETVARPREF option is set.

During variable expansion, the forms below cause wordsplit to test for a variable that is unset or null. Omitting the colon results in a test only for a variable that is unset.
${
variable:-word}

Use Default Values. If variable is unset or null, the expansion of word is substituted. Otherwise, the value of variable is substituted.

${variable:=word}

Assign Default Values. If variable is unset or null, the expansion of word is assigned to variable. The value of variable is then substituted.

${variable:?word}

Display Error if Null or Unset. If variable is null or unset, the expansion of word (or a message to that effect if word is not present) is output using ws_error. Otherwise, the value of variable is substituted.

${variable:+word}

Use Alternate Value. If variable is null or unset, nothing is substituted, otherwise the expansion of word is substituted.

Unless the above forms are used, a reference to an undefined variable expands to empty string. Three flags affect this behavior. If the WRDSF_UNDEF flag is set, expanding undefined variable triggers a WRDSE_UNDEF error. If the WRDSF_WARNUNDEF flag is set, a non-fatal warning is emitted for each undefined variable. Finally, if the WRDSF_KEEPUNDEF flag is set, references to undefined variables are left unexpanded.

If two or three of these flags are set simultaneously, the behavior is undefined.

Positional argument expansion
Positional arguments
are special parameters that can be referenced in the input string by their ordinal number. The numbering begins at 0. The syntax for referencing positional arguments is the same as for the variables, except that argument index is used instead of the variable name. If the index is between 0 and 9, the $N form is acceptable. Otherwise, the index must be enclosed in curly braces: ${N}.

During argument expansion, references to positional arguments are replaced with the corresponding values.

Argument expansion is requested by the WRDSO_PARAMV option bit. The NULL-terminated array of variables shall be supplied in the ws_paramv member. The ws_paramc member shall be initialized to the number of elements in ws_paramv.

Setting the WRDSO_PARAM_NEGIDX option together with WRDSO_PARAMV enables negative positional argument references. A negative reference has the form ${-N}. It is expanded to the value of the argument with index ws_paramc - N.

Quote removal
During quote removal, single or double quotes surrounding a sequence of characters are removed and the sequence itself is treated as a single word. Characters within single quotes are treated verbatim. Characters within double quotes undergo variable expansion and backslash interpretation (see below).

Recognition of single quoted strings is enabled by the WRDSF_SQUOTE flag. Recognition of double quotes is enabled by the WRDSF_DQUOTE flag. The macro WRDSF_QUOTE enables both.

Backslash interpretation
Backslash interpretation translates unquoted escape sequences into corresponding characters. An escape sequence is a backslash followed by one or more characters. By default, each sequence \C appearing in unquoted words is replaced with the character C. In doubly-quoted strings, two backslash sequences are recognized: \\ translates to a single backslash, and \" translates to a double-quote.

Two flags are provided to modify this behavior. If WRDSF_CESCAPES flag is set, the following escape sequences are recognized:

Sequence Expansion ASCII

\\

\

134

\"

"

042

\a

audible bell

007

\b

backspace

010

\f

form-feed

014

\n

new line

012

\r

charriage return

015

\t

horizontal tabulation

011

\v

vertical tabulation

013

The sequence \xNN or \XNN, where NN stands for a two-digit hex number is replaced with ASCII character NN. The sequence \0NNN, where NNN stands for a three-digit octal number is replaced with ASCII character whose code is NNN.

The WRDSF_ESCAPE flag allows the caller to customize escape sequences. If it is set, the ws_escape member must be initialized. This member provides escape tables for unquoted words (ws_escape[0]) and quoted strings (ws_escape[1]). Each table is a string consisting of an even number of characters. In each pair of characters, the first one is a character that can appear after backslash, and the following one is its translation. For example, the above table of C escapes is represented as "\\\\"\"a\ab\bf\fn\nr\rt\tv\v".

It is valid to initialize ws_escape elements to zero. In this case, no backslash translation occurs.

Interpretation of octal and hex escapes is controlled by the following bits in ws_options:
WRDSO_BSKEEP_WORD

When an unrecognized escape sequence is encountered in a word, preserve it on output. If that bit is not set, the backslash is removed from such sequences.

WRDSO_OESC_WORD

Handle octal escapes in words.

WRDSO_XESC_WORD

Handle hex escapes in words.

WRDSO_BSKEEP_QUOTE

When an unrecognized escape sequence is encountered in a doubly-quoted string, preserve it on output. If that bit is not set, the backslash is removed from such sequences.

WRDSO_OESC_QUOTE

Handle octal escapes in doubly-quoted strings.

WRDSO_XESC_QUOTE

Handle hex escapes in doubly-quoted strings.

Command substitution
During command substitution, each word is scanned for commands. Each command found is executed and replaced by the output it creates.

The syntax is:

$(command)

Command substitutions may be nested.

Unless the substitution appears within double quotes, word splitting and pathname expansion are performed on its result.

To enable command substitution, the caller must initialize the ws_command member with the address of the substitution function and make sure the WRDSF_NOCMD flag is not set.

The substitution function should be defined as follows:

int command (char **ret, const char *cmd, size_t len, char **argv, void *clos);

On input, the first len bytes of cmd contain the command invocation as it appeared between $( and ), with all expansions performed.

The argv parameter contains the command line split into words using the same settings as the input ws structure.

The clos parameter supplies user-specific data, passed in the ws_closure member).

On success, the function stores a pointer to the output string in the memory location pointed to by ret and returns WRDSE_OK (0). On error, it must return one of the error codes described in the section ERROR CODES. If WRDSE_USERERR, is returned, a pointer to the error description string must be stored in *ret.

When WRDSE_OK or WRDSE_USERERR is returned, the data stored in *ret must be allocated using malloc(3).

Tilde and pathname expansion
Both expansions are performed if the WRDSF_PATHEXPAND flag is set.

Tilde expansion affects any word that begins with an unquoted tilde character (~). If the tilde is followed immediately by a slash, it is replaced with the home directory of the current user (as determined by his passwd entry). A tilde alone is handled the same way. Otherwise, the characters between the tilde and first slash character (or end of string, if it doesn’t contain any) are treated as a login name. and are replaced (along with the tilde itself) with the home directory of that user. If there is no user with such login name, the word is left unchanged.

During pathname expansion each unquoted word is scanned for characters *, ?, and [. If any of these appears, the word is considered a pattern (in the sense of glob(3)) and is replaced with an alphabetically sorted list of file names matching the pattern.

If no matches are found for a word and the ws_options member has the WRDSO_NULLGLOB bit set, the word is removed.

If the WRDSO_FAILGLOB option is set, an error message is output for each such word using ws_error.

When matching a pattern, the dot at the start of a name or immediately following a slash must be matched explicitly, unless the WRDSO_DOTGLOB option is set.

VARIABLE NAMES

By default a shell-like lexical structure of a variable name is assumed. A valid variable name begins with an alphabetical character or underscore and contains alphabetical characters, digits and underscores.

The set of characters that constitute a variable name can be augmented. To do so, initialize the ws_namechar member to the C string containing the characters to be added, set the WRDSO_NAMECHAR bit in ws_options and set the WRDSF_OPTIONS bit in the flags argument.

For example, to allow for colons in variable names, do:

struct wordsplit ws;
ws.ws_namechar = ":";
ws.ws_options = WRDSO_NAMECHAR;
wordsplit(str, &ws, WRDSF_DEFFLAGS|WRDSF_OPTIONS);

Certain characters cannot be allowed to be a name costituent. These are: $, {, }, *, @, -, +, ?, and =. If any of these appears in ws_namechar, the wordsplit (and wordsplit_len) function will return the WRDSE_USAGE error.

LIMITING THE NUMBER OF WORDS

The maximum number of words to be returned can be limited by setting the ws_maxwords member to the desired count, and setting the WRDSO_MAXWORDS option, e.g.:

struct wordsplit ws;
ws.ws_maxwords = 3;
ws.ws_options = WRDSO_MAXWORDS;
wordsplit(str, &ws, WRDSF_DEFFLAGS|WRDSF_OPTIONS);

If the actual number of words in the expanded input is greater than the supplied limit, the trailing part of the input will be returned in the last word. For example, if the input to the above fragment were Now is the time for all good men, then the returned words would be:

"Now"
"is"
"the time for all good men"

WORDSPLIT_T STRUCTURE

The data type wordsplit_t has three members that contain output data upon return from wordsplit or wordsplit_len, and a number of members that the caller can initialize on input in order to customize the function behavior. For each input member there is a corresponding flag bit, which must be set in the flags argument in order to instruct the wordsplit function to use the member.

OUTPUT
size_t
ws_wordc

Number of words in ws_wordv. Accessible upon successful return from wordsplit.

char ** ws_wordv

Array of resulting words. Accessible upon successful return from wordsplit.

The caller should not attempt to free or reallocate ws_wordv or any elements thereof, nor to modify ws_wordc.

To store away the words for use after freeing ws with wordsplit_free, the caller should use wordsplit_getwords. It is more effective than copying the contents of ws_wordv manually.
size_t
ws_wordi

Total number of words processed. This field is intended for use with WRDSF_INCREMENTAL flag. If that flag is not set, the following relation holds: ws_wordi == ws_wordc - ws_offs.

int ws_errno

Error code, if the invocation of wordsplit or wordsplit_len failed. This is the same value as returned from the function in that case.

char *ws_errctx

On error, context in which the error occurred. For WRDSE_UNDEF, it is the name of the undefined variable. For WRDSE_GLOBERR - the pattern that caused error.

The caller should treat this member as const char *.

The following members are used if the variable expansion was requested and the input string contained an Assign Default Values form (${variable:=word}).
char **
ws_envbuf

Modified environment. It follows the same arrangement as ws_env on input (see the WRDSF_ENV_KV flag). If ws_env was NULL (or WRDSF_ENV was not set), but the ws_getvar callback was used, the ws_envbuf array will contain only the modified variables.

size_t ws_envidx

Number of entries in ws_envbuf.

If positional parameters were used (see the WRDSO_PARAMV option) and any of them were modified during processing, the following two members supply the modified parameter array.
char **
ws_parambuf

Array of positional parameters.

size_t ws_paramidx

Number of positional parameters.

INPUT
size_t
ws_offs

If the WRDSF_DOOFFS flag is set, this member specifies the number of initial elements in ws_wordv to fill with NULLs. These elements are not counted in the returned ws_wordc.

size_t ws_maxwords

Maximum number of words to return. For this field to take effect, the WRDSO_MAXWORDS option and WRDSF_OPTIONS flag must be set. For a detailed discussion, see the chapter LIMITING THE NUMBER OF WORDS.

int ws_flags

Contains flags passed to wordsplit on entry. Can be used as a read-only member when using wordsplit in incremental mode or in a loop with WRDSF_REUSE flag set.

int ws_options

Additional options used when WRDSF_OPTIONS is set.

const char *ws_delim

Word delimiters. If initialized on input, the WRDSF_DELIM flag must be set. Otherwise, it is initialized on entry to wordsplit with the string " \t\n".

const char *ws_comment

A zero-terminated string of characters that begin an inline comment. If initialized on input, the WRDSF_COMMENT flag must be set. By default, it’s value is "#".

const char *ws_escape[2]

Escape tables for unquoted words (ws_escape[0]) and quoted strings (ws_escape[1]). These are used to translate escape sequences (\C) into characters. Each table is a string consisting of even number of characters. In each pair of characters, the first one is a character that can appear after backslash, and the following one is its representation. For example, the string "t\tn\n" translates \t into horizontal tabulation character and \n into newline. WRDSF_ESCAPE flag must be set if this member is initialized.

const char *ws_namechar

Lists characters that are allowed in a variable name, in addition to alphanumerics and underscore. The WRDSO_NAMECHAR bit must be set in ws_options for this to take effect.

See the chapter VARIABLE NAMES, for a detailed discussion.

void (*ws_alloc_die) (wordsplit_t *)

This function is called when wordsplit is unable to allocate memory and the WRDSF_ENOMEMABRT flag was set. The default function prints a message on standard error and aborts. This member can be used to customize error handling. If initialized, the WRDSF_ALLOC_DIE flag must be set.

void (*ws_error) (const char *, ...)

Pointer to function used for error reporting. The invocation convention is the same as for printf(3). The default function formats and prints the message on the standard error.

If this member is initialized, the WRDSF_ERROR flag must be set.

void (*ws_debug) (const char *, ...)

Pointer to function used for debugging output. By default it points to the same function as ws_error. If initialized, the WRDSF_DEBUG flag must be set.

const char **ws_env

A NULL-terminated array of environment variables. It is used during variable expansion. If set, the WRDSF_ENV flag must be set. Variable expansion is enabled only if either WRDSF_ENV or WRDSF_GETVAR (see below) is set, and WRDSF_NOVAR flag is not set.

Each element of ws_env must have the form "NAME=VALUE, where NAME is the name of the variable, and VALUE is its value. Alternatively, if the WRDSF_ENV_KV flag is set, each variable is described by two elements of ws_env: one containing variable name, and the next one with its value.

int (*ws_getvar) (char **ret, const char *var, size_t len, void *clos)

Points to the function that will be used during variable expansion for environment variable lookups. This function is used if the variable expansion is enabled (i.e. the WRDSF_NOVAR flag is not set), and the WRDSF_GETVAR flag is set.

If both WRDSF_ENV and WRDSF_GETVAR are set, the variable is first looked up in the ws_env array and, if not found there, ws_getvar is called. If the WRDSO_GETVARPREF option is set, this order is reverted.

The name of the variable is specified by the first len bytes of the string var. The clos parameter supplies the user-specific data (see below the description of ws_closure member) and the ret parameter points to the memory location where output data is to be stored. On success, the function must store there a pointer to the string with the value of the variable and return 0. On error, it must return one of the error codes described in the section ERROR CODES. If ws_getvar returns WRDSE_USERERR, it must store the pointer to the error description string in *ret. In any case (whether returning 0 or WRDSE_USERERR), the data returned in ret must be allocated using malloc(3).

void *ws_closure

Additional user-specific data passed as the last argument to ws_getvar or ws_command (see below). If defined, the WRDSF_CLOSURE flag must be set.

int (*ws_command) (char **ret, const char *cmd, size_t len, char
**argv, void *clos)

Pointer to the function that performs command substitution. It treats the first len bytes of the string cmd as a command (whatever it means for the caller) and attempts to execute it. On success, a pointer to the string with the command output is stored in the memory location pointed to by ret and 0 is returned. On error, the function must return one of the error codes described in the section ERROR CODES. If ws_command returns WRDSE_USERERR, it must store the pointer to the error description string in *ret. In any case (whether returning 0 or WRDSE_USERERR), the data returned in ret must be allocated using malloc(3).

The parameter argv contains the command split into words using the same settings as the input ws structure, with command substitution disabled.

The clos parameter supplies user-specific data (see the description of ws_closure member).

The following two members are consulted if the WRDSO_PARAMV option is set. They provide an array of positional parameters.
char const **
ws_paramv

Positional parameters. These are accessible in the input string using the notation $N or ${N}, where N is the 0-based parameter number.

size_t ws_paramc

Number of positional parameters.

FLAGS

The following macros are defined for use in the flags argument.
WRDSF_DEFFLAGS

Default flags. This is a shortcut for:

(WRDSF_NOVAR | WRDSF_NOCMD | WRDSF_QUOTE | WRDSF_SQUEEZE_DELIMS | WRDSF_CESCAPES),

i.e.: disable variable expansion and quote substitution, perform quote removal, treat any number of consecutive delimiters as a single delimiter, replace C escapes appearing in the input string with the corresponding characters.

WRDSF_APPEND

Append the resulting words to the array left from a previous call to wordsplit.

WRDSF_DOOFFS

Insert ws_offs initial NULLs in the array ws_wordv. These are not counted in the returned ws_wordc.

WRDSF_NOCMD

Don’t do command substitution. The WRDSO_NOCMDSPLIT option set together with this flag prevents splitting command invocations into separate words (see the OPTIONS section).

WRDSF_REUSE

The parameter ws resulted from a previous call to wordsplit, and wordsplit_free was not called. Reuse the allocated storage.

WRDSF_SHOWERR

Print errors using ws_error.

WRDSF_UNDEF

Consider it an error if an undefined variable is expanded.

WRDSF_NOVAR

Don’t do variable expansion. The WRDSO_NOVARSPLIT option set together with this flag prevents variable references from being split into separate words (see the OPTIONS section).

WRDSF_ENOMEMABRT

Abort on ENOMEM error. By default, out of memory errors are treated as any other errors: the error is reported using ws_error if the WRDSF_SHOWERR flag is set, and error code is returned. If this flag is set, the ws_alloc_die function is called instead. This function is not supposed to return.

WRDSF_WS

Trim off any leading and trailing whitespace from the returned words. This flag is useful if the ws_delim member does not contain whitespace characters.

WRDSF_SQUOTE

Handle single quotes.

WRDSF_DQUOTE

Handle double quotes.

WRDSF_QUOTE

A shortcut for (WRDSF_SQUOTE|WRDSF_DQUOTE).

WRDSF_SQUEEZE_DELIMS

Replace each input sequence of repeated delimiters with a single delimiter.

WRDSF_RETURN_DELIMS

Return delimiters.

WRDSF_SED_EXPR

Treat sed(1) expressions as words.

WRDSF_DELIM

ws_delim member is initialized.

WRDSF_COMMENT

ws_comment member is initialized.

WRDSF_ALLOC_DIE

ws_alloc_die member is initialized.

WRDSF_ERROR

ws_error member is initialized.

WRDSF_DEBUG

ws_debug member is initialized.

WRDSF_ENV

ws_env member is initialized.

WRDSF_GETVAR

ws_getvar member is initialized.

WRDSF_SHOWDBG

Enable debugging.

WRDSF_NOSPLIT

Don’t split input into words. This flag is is useful for side effects, e.g. to perform variable expansion within a string.

WRDSF_KEEPUNDEF

Keep undefined variables in place, instead of expanding them to empty strings.

WRDSF_WARNUNDEF

Warn about undefined variables.

WRDSF_CESCAPES

Handle C-style escapes in the input string.

WRDSF_CLOSURE

ws_closure is set.

WRDSF_ENV_KV

Each two consecutive elements in the ws_env array describe a single variable: ws_env[n] contains variable name, and ws_env[n+1] contains its value.

WRDSF_ESCAPE

ws_escape is set.

WRDSF_INCREMENTAL

Incremental mode. Each subsequent call to wordsplit with NULL as its first argument parses the next word from the input. See the section INCREMENTAL MODE for a detailed discussion.

WRDSF_PATHEXPAND

Perform pathname and tilde expansion. See the subsection Pathname expansion for details.

WRDSF_OPTIONS

The ws_options member is initialized.

OPTIONS

The ws_options member is consulted if the WRDSF_OPTIONS flag is set. It contains a bitwise OR of one or more of the following options:
WRDSO_NULLGLOB

Remove the words that produce empty string after pathname expansion.

WRDSO_FAILGLOB

Output error message if pathname expansion produces empty string.

WRDSO_DOTGLOB

During pathname expansion allow a leading period to be matched by metacharacters.

WRDSO_BSKEEP_WORD

Backslash interpretation: when an unrecognized escape sequence is encountered in a word, preserve it on output. If that bit is not set, the backslash is removed from such sequences.

WRDSO_OESC_WORD

Backslash interpretation: handle octal escapes in words.

WRDSO_XESC_WORD

Backslash interpretation: handle hex escapes in words.

WRDSO_BSKEEP_QUOTE

Backslash interpretation: when an unrecognized escape sequence is encountered in a doubly-quoted string, preserve it on output. If that bit is not set, the backslash is removed from such sequences.

WRDSO_OESC_QUOTE

Backslash interpretation: handle octal escapes in doubly-quoted strings.

WRDSO_XESC_QUOTE

Backslash interpretation: handle hex escapes in doubly-quoted strings.

WRDSO_MAXWORDS

The ws_maxwords member is initialized. This is used to control the number of words returned by a call to wordsplit. For a detailed discussion, refer to the chapter LIMITING THE NUMBER OF WORDS.

WRDSO_NOVARSPLIT

When WRDSF_NOVAR is set, don’t split variable references, even if they contain whitespace. E.g. ${VAR:-foo bar} will be treated as a single word.

WRDSO_NOCMDSPLIT

When WRDSF_NOCMD is set, don’t split whatever looks like command invocation, even if it contains whitespace. E.g. $(command arg) will be treated as a single word.

WRDSO_PARAMV

Positional arguments are supplied in ws_paramv and ws_paramc. See the subsection Positional argument expansion for a discussion.

WRDSO_PARAM_NEGIDX

Used together with WRDSO_PARAMV, this allows for negative positional argument references. A negative argument reference has the form ${-N}. It is expanded to the value of the argument with index ws_paramc - N, i.e. Nth if counting from the end.

WRDSO_NAMECHAR

When set, indicates that the ws_namechar member of the wordsplit_t struct has been initialized.

This member allows you to modify the notion of what characters can be part of a valid variable name. See the chapter VARIABLE NAMES, for a detailed discussion.

ERROR CODES

WRDSE_OK, WRDSE_EOF

Successful return.

WRDSE_QUOTE

Missing closing quote. The ws_endp points to the position in the input string where the error occurred.

WRDSE_NOSPACE

Memory exhausted.

WRDSE_USAGE

Invalid wordsplit usage.

WRDSE_CBRACE

Unbalanced curly brace.

WRDSE_UNDEF

Undefined variable. This error is returned only if the WRDSF_UNDEF flag is set.

WRDSE_NOINPUT

Input exhausted. This is not actually an error. This code is returned if wordsplit (or wordsplit_len) is invoked in incremental mode and encounters end of input string. See the section INCREMENTAL MODE.

WRDSE_PAREN

Unbalanced parenthesis.

WRDSE_GLOBERR

An error occurred during pattern matching.

WRDSE_USERERR

User-defined error. Normally this error is returned by ws_getvar or ws_command. Use the function wordsplit_strerror to get textual description of the error.

RETURN VALUE

Both wordsplit and wordsplit_len return 0 on success, and a non-zero error code on error (see the section ERROR CODES).

wordsplit_strerror returns a pointer to the constant string describing the last error condition that occurred in ws.

EXAMPLE

The short program below implements a function that parses the input string similarly to the shell. All expansions are performed. Default error reporting is used.

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
#include <wordsplit.h>

/* Run command from str (len bytes long) and store its
output in ret.
argv
and closure are not used.
Return wordsplit error code.
*/
static int runcmd(char **ret, const char *str, size_t len,
char **argv, void *closure)
{
FILE *fp;
char *cmd;
int c, lastc;
char *buffer = NULL;
size_t bufsize = 0;
size_t buflen = 0;

/* Convert to a null-terminated string for popen(3) */
cmd = malloc(len + 1);
if (!cmd)
return WRDSE_NOSPACE;
memcpy(cmd, str, len);
cmd[len] = 0;

fp = popen(cmd, "r");
if (!fp) {
char buf[128];

snprintf(buf, sizeof buf, "can’t run %s: %s",
cmd, strerror(errno));
*ret = strdup(buf);
if (!*ret)
return WRDSE_NOSPACE;
else
return WRDSE_USERERR;
}

/* Collect the output, reallocating buffer as needed. */
while ((c = fgetc(fp)) != EOF) {
lastc = c;
if (c == ’0)
c = ’ ’;
if (buflen == bufsize) {
char *p;

if (bufsize == 0)
bufsize = 80;
else
bufsize *= 2;
p = realloc(buffer, bufsize);
if (!p) {
free(buffer);
free(cmd);
return WRDSE_NOSPACE;
}
buffer = p;
}
buffer[buflen++] = c;
}

/* Tream off the trailing newline */
if (buffer) {
if (lastc == ’0)
--buflen;
buffer[buflen] = 0;
}

pclose(fp);
free(cmd);

/* Return the composed string. */
*ret = buffer;
return WRDSE_OK;
}

extern char **environ;

/* Parse s much as shell does. Return array of words on
succes, and NULL on error.
*/
char **shell_parse(char *s)
{
wordsplit_t ws;
size_t wc;
char **wv;
int rc;

/* Initialize ws */
ws.ws_env = (const char **) environ;
ws.ws_command = runcmd;
/* Call wordsplit. Let it report errors, if any. */
rc = wordsplit(s, &ws,
WRDSF_QUOTE | WRDSF_SQUEEZE_DELIMS | WRDSF_PATHEXPAND
| WRDSF_SHOWERR);
if (rc == WRDSE_OK)
/* Store away the resulting words on success. */
wordsplit_getwords(&ws, &wc, &wv);
else
wv = NULL;
wordsplit_free(&ws);
return wv;
}

AUTHORS

Sergey Poznyakoff

BUGS

Backtick command expansion is not supported.

BUG REPORTS

Report bugs to <gray@gnu.org>.

COPYRIGHT

Copyright © 2009-2019 Sergey Poznyakoff
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law.


Manpage server at man.gnu.org.ua.

Powered by mansrv 1.1