使用具有非空格值的 IFS 解析数组会创建空元素。
即使使用tr -s
将多个分隔符缩小为单个分隔符也是不够的。
一个例子可以更清楚地解释这个问题..
有没有办法通过调整 IFS 来实现“正常”结果(是否有关联的设置来改变 IFS 的行为?....即与默认空白相同的行为国际金融服务协会。
var=" abc def ghi "
echo "============== IFS=<default>"
arr=($var)
for x in ${!arr[*]} ; do
echo "# arr[$x] \"${arr[x]}\""
done
#
sfi="$IFS" ; IFS=':'
set -f # Disable file name generation (globbing)
# (This data won't "glob", but unless globbing
# is actually needed, turn if off, because
# unusual/unexpected combinations of data can glob!
# and they can do it in the most obscure ways...
# With IFS, "you're not in Kansas any more! :)
var=":abc::def:::ghi::::"
echo "============== IFS=$IFS"
arr=($var)
for x in ${!arr[*]} ; do
echo "# arr[$x] \"${arr[x]}\""
done
echo "============== IFS=$IFS and tr"
arr=($(echo -n "$var"|tr -s "$IFS"))
for x in ${!arr[*]} ; do
echo "# arr[$x] \"${arr[x]}\""
done
set +f # enable globbing
IFS="$sfi" # re-instate original IFS val
echo "============== IFS=<default>"
这是输出
============== IFS=<default>
# arr[0] "abc"
# arr[1] "def"
# arr[2] "ghi"
============== IFS=:
# arr[0] ""
# arr[1] "abc"
# arr[2] ""
# arr[3] "def"
# arr[4] ""
# arr[5] ""
# arr[6] "ghi"
# arr[7] ""
# arr[8] ""
# arr[9] ""
============== IFS=: and tr
# arr[0] ""
# arr[1] "abc"
# arr[2] "def"
# arr[3] "ghi"
============== IFS=<default>
答案1
来自bash
联机帮助页:
IFS 中非 IFS 空白的任何字符以及任何相邻的 IFS 空白字符共同界定字段。 IFS 空白字符序列也被视为分隔符。
代表着IFS 空白(空格、制表符和换行符)的处理方式与其他分隔符不同。如果您想使用替代分隔符获得完全相同的行为,您可以借助tr
或进行一些分隔符交换sed
:
var=":abc::def:::ghi::::"
arr=($(echo -n $var | sed 's/ /%#%#%#%#%/g;s/:/ /g'))
for x in ${!arr[*]} ; do
el=$(echo -n $arr | sed 's/%#%#%#%#%/ /g')
echo "# arr[$x] \"$el\""
done
这个%#%#%#%#%
东西是一个神奇的值来替换字段内可能的空格,它应该是“唯一的”(或者非常不相关)。如果您确定字段中不会有空格,则只需删除此部分)。
答案2
要删除多个(非空格)连续分隔符,可以使用两个(字符串/数组)参数扩展。技巧是将IFS
变量设置为空字符串以进行数组参数扩展。
这记录在man bash
在下面分词:
由于扩展没有值的参数而产生的不带引号的隐式空参数将被删除。
(
set -f
str=':abc::def:::ghi::::'
IFS=':'
arr=(${str})
IFS=""
arr=(${arr[@]})
echo ${!arr[*]}
for ((i=0; i < ${#arr[@]}; i++)); do
echo "${i}: '${arr[${i}]}'"
done
)
答案3
你也可以用 gawk 来做到这一点,但它并不漂亮:
var=":abc::def:::ghi::::"
out=$( gawk -F ':+' '
{
# strip delimiters from the ends of the line
sub("^"FS,"")
sub(FS"$","")
# then output in a bash-friendly format
for (i=1;i<=NF;i++) printf("\"%s\" ", $i)
print ""
}
' <<< "$var" )
eval arr=($out)
for x in ${!arr[*]} ; do
echo "# arr[$x] \"${arr[x]}\""
done
输出
# arr[0] "abc"
# arr[1] "def"
# arr[2] "ghi"
答案4
由于 bash IFS 没有提供内部方法来将连续分隔符字符视为单个分隔符(对于非空白分隔符),因此我整理了一个全 bash 版本(与使用外部调用,例如 tr、awk、sed) )
它可以处理多字符 IFS..
以下是其执行时间结果,以及此 Q/A 页面上显示的tr
和awk
选项的类似测试...测试基于仅构建阵列的 10000 次迭代(没有 I/O)...
pure bash 3.174s (28 char IFS)
call (awk) 0m32.210s (1 char IFS)
call (tr) 0m32.178s (1 char IFS)
这是输出
# dlm_str = :.~!@#$%^&()_+-=`}{][ ";></,
# original = :abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'single*quote?'..123:
# unified = :abc::::def::::::::::::::::::::::::::::'single*quote?'::123:
# max-w 2^ = ::::::::::::::::
# shrunk.. = :abc:def:'single*quote?':123:
# arr[0] "abc"
# arr[1] "def"
# arr[2] "'single*quote?'"
# arr[3] "123"
这是脚本
#!/bin/bash
# Note: This script modifies the source string.
# so work with a copy, if you need the original.
# also: Use the name varG (Global) it's required by 'shrink_repeat_chars'
#
# NOTE: * asterisk in IFS causes a regex(?) issue, but * is ok in data.
# NOTE: ? Question-mark in IFS causes a regex(?) issue, but ? is ok in data.
# NOTE: 0..9 digits in IFS causes empty/wacky elements, but they're ok in data.
# NOTE: ' single quote in IFS; don't know yet, but ' is ok in data.
#
function shrink_repeat_chars () # A 'tr -s' analog
{
# Shrink repeating occurrences of char
#
# $1: A string of delimiters which when consecutively repeated and are
# considered as a shrinkable group. A example is: " " whitespace delimiter.
#
# $varG A global var which contains the string to be "shrunk".
#
# echo "# dlm_str = $1"
# echo "# original = $varG"
dlms="$1" # arg delimiter string
dlm1=${dlms:0:1} # 1st delimiter char
dlmw=$dlm1 # work delimiter
# More than one delimiter char
# ============================
# When a delimiter contains more than one char.. ie (different byte` values),
# make all delimiter-chars in string $varG the same as the 1st delimiter char.
ix=1;xx=${#dlms};
while ((ix<xx)) ; do # Where more than one delim char, make all the same in varG
varG="${varG//${dlms:$ix:1}/$dlm1}"
ix=$((ix+1))
done
# echo "# unified = $varG"
#
# Binary shrink
# =============
# Find the longest required "power of 2' group needed for a binary shrink
while [[ "$varG" =~ .*$dlmw$dlmw.* ]] ; do dlmw=$dlmw$dlmw; done # double its length
# echo "# max-w 2^ = $dlmw"
#
# Shrik groups of delims to a single char
while [[ ! "$dlmw" == "$dlm1" ]] ; do
varG=${varG//${dlmw}$dlm1/$dlm1}
dlmw=${dlmw:$((${#dlmw}/2))}
done
varG=${varG//${dlmw}$dlm1/$dlm1}
# echo "# shrunk.. = $varG"
}
# Main
varG=':abc:.. def:.~!@#$%^&()_+-=`}{][ ";></,'\''single*quote?'\''..123:'
sfi="$IFS"; IFS=':.~!@#$%^&()_+-=`}{][ ";></,' # save original IFS and set new multi-char IFS
set -f # disable globbing
shrink_repeat_chars "$IFS" # The source string name must be $varG
arr=(${varG:1}) # Strip leading dlim; A single trailing dlim is ok (strangely
for ix in ${!arr[*]} ; do # Dump the array
echo "# arr[$ix] \"${arr[ix]}\""
done
set +f # re-enable globbing
IFS="$sfi" # re-instate the original IFS
#
exit