BASH’s ‘read’ built-in supports '\0' as delimiter

I thought it was impossible to use '\0' as a delimiter in bash, but noticed yesterday that Gentoo’s ebuild.sh had pipelines like this:
find ..... -print0 |
while read -r -d $'\0' x; do
# Do something with file $x
done

This makes it possible to handle any strange filenames correctly, even if the filename contains newline ('\n') or carriage return ('\r') characters. (Some other commands, including sort and xargs, have options to make null character the delimiter based on the same reason.)

Because BASH internally uses C-style strings, in which '\0' is the terminator, read -d $'\0' is essentially equivalent to read -d ''. This is why I believed read did not accept null-delimited strings. However, it turns out that BASH actually handles this correctly.

I checked BASH’s souce code and found the delimiter was simply determined by delim = *list_optarg; (bash-3.2/builtins/read.def, line 296) where list_optarg points to the argument following -d. Therefore, it makes no difference to the value of delim whether $'\0' or '' is used.

13 comments:

  1. This is fantastic, thank you. I've encountered this problem in years past, and have been struggling with it the past couple days. If only bash's "for" loop could be used this way, it would be more elegant.

    ReplyDelete
  2. Don't forget to quote the $x as in "$x" :)

    ReplyDelete
  3. Hmm, I discovered this same thing today myself, then I realized: it is still not safe! It appears to handle the null just fine, but it cannot handle trailing newlines, they are still gobbled up. Someone should fix 'read' to not do gobble them when the delimiter is null.

    The 'readarray' command strangely does support trailing newlines (it actually keeps them in the array), but since it does not support defining an alternate delimiter (such as null) it is of no use!!!

    But ultimately, it would be better if IFS supported null, then you could actually assign the values to an array without loosing your context (a problem which makes 'readarray' and 'read' way less useful even if they worked properly).

    ReplyDelete
    Replies
    1. If you don't need to split the line, just its whole content (ie. when reading filenames), you can use that:

      IFS=;
      find -print0 | while read -r -d $'\0' X

      Delete
    2. find -print0 | while IFS= read -r -d $'\0' X

      avoids affecting $IFS outside the loop

      Delete
  4. Very useful for reading /proc/$$/cmdline

    ReplyDelete
  5. Nice find, but sadly read still trims values, so filenames cannot have spaces at the end:

    echo -e ' test \0 string ' |
    { read -rd '' s; read -rd '' x; echo "-$s- -$x-" | od -tx1z; }

    0000000 2d 74 65 73 74 2d 20 2d 73 74 72 69 6e 67 2d 0a >-test- -string-.<

    But sometimes there is a workaround

    find . -printf "%p.\\0" |
    while read -rd '' name;
    do name="${name%.}";
    echo "-$name-";
    done

    ReplyDelete
  6. Hi, you're misleaded: $'\0' won't ever be a valid command line argument as NUL character is not a valid character for command lines and variable in bash. What happens here is that \0 is silently removed, and what you are doing is:

    read -r -d '' X

    which happens to be understood by read as separating on the NUL character. You can try your example by removing $'\0' it'll work the same.

    Remember: variable and command line argument can't hold NUL characters: they are silently skipped (it's the only char they can't hold). Of course, pipes support all binary data: so you can write or read NUL characters.

    ReplyDelete
    Replies
    1. oups, just saw your last paragraph about this ! ;) A good lesson that I should read the entire post carefully before answering ;)

      Delete
    2. However, being explicit about it, using -d $'\0' rather than '', makes it obvious what you're expecting the delimiter to be. Readability!

      Delete
  7. if [[ -z $(read -r -p "Hi there null:" imNULL) ]];then
    echo "true"
    else
    echo "you are not nothing you are nothing not even null"
    fi

    ReplyDelete
  8. Thanks for the tip! I was playing around with this some more and I have a slight possible improvement. In your example, the "while" command runs in a subshell. So if the commands try to set variables, the rest of the script won't see them. Instead, you can make the left hand side run in a subshell instead, like this:

    while IFS="" read -r -d $'\0' x ; do
    echo ">>$x<<"
    last_found="$x"
    done < <( find -name \*.txt -print0 )
    echo "last_found = $last_found"

    IFS="" stops the line being read being broken up into words.

    ReplyDelete