DISCLAIMER. English language used here only for compatibility (ASCII only), so any suggestions about my bad grammar (and not only it) will be greatly appreciated.

четверг, 16 декабря 2010 г.

[bash] Write filenames to array.

Here is two general tasks:
    1. Assign strings (e.g. filenames) separated by '\0' from some input
       stream to corresponding array elements.
    2. Convert array into stream consisting from strings separated by '\0'.

I.e we have some bash script, which somewhere get such stream, e.g. by `find`
    
    find $root -wholename "*/$project" -prune -print0

and then we want to place this filenames into array elements. But there is a
problem: we can't use '\0' in IFS, so we can't split find's output stream
using bash word splitting expansion. And we can't use any other character to
separate filenames, because filename may contain any character.

One possible method (i don't know other, though it may be) is to transform
stream to bash code and then `eval` it.
   
But here is another problem: we can't simply escape string with double or
single quotes, because string may contain un-escaped double or single quotes
inside (find does not escape characters in filenames, and hence we assume,
that string contain un-escpaed characters, i.e written "as is"). For example

     a"'      b.txt

than after escaping with double quotes

    "a"'      b.txt"

or with single quotes

    'a"'      b.txt'

In both cases some part of string remain unescaped and not-matched quote
appears. (Write anothre example with command).


So, we can't escape string by commands like sed or awk, which perfom text
editing without string parsing. But we can use bash itself to parse and escape
string properly, and then output escaped result, like this

    eval "$(find $root -wholename "*/$project" -prune -print0 \
            | sort -z -s \
            | xargs -0 -x bash -c '
                            arr=( "$@" );
                            declare -p arr
                        ' escape_filename)"

(the last argument 'escape_filename' is used as $0. It may be any, but
required for correct work. For details see chapter 7.4.2 from 'info find')

Script for bash instance, invoked by xargs, may do some other operations with
strings, like

    j=15;
    eval "$(find $root -wholename "*/$project" -prune -print0 \
            | sort -z -s \
            | xargs -0 -x bash -c "
                            set -- \"\${@#$root/}\"
                            set -- \"\${@%$project}\"
                            arr2=( [$j]=\"\$1\" \"\${@:2}\" );
                            declare -p arr2
                        " escape_filename)"

Here we delete leading ($root) and trailing ($project) portions of filename
and then assign resulted set of strings to array starting at index $j.
Variables $root, $project and $j are substituted by main bash process before
executing pipeline.

If you use in this script array name you want assign to in main script, no
further editing of output will be needed.


Here is another example to what incorrect quoting may lead to. If we have in
input stream string like this

     a' rm -rf ~ '

then quote it with single quotes and add assignment (with sed, for example)

    var='a' rm -rf ~ ''

when it will be eval-ed it execute command

    rm -rf ~

Here is sample script

    #!/bin/bash

    str="a' ls ~ '"
    to_eval="$(echo "$str" | sed -e"s/^/var='/;s/$/'/")"
    eval "$to_eval"

Комментариев нет:

Отправить комментарий