The Cabin in the City

Redis Transaction

2021-11-17 Redis

基础 Redis 事务

有时，需要在同一时间操作多个数据结构，这就需要进行多次 Redis 命令调用。虽然存在一些命令能够在 key 之间移动元素，但是并没有一个命令能够在不同类型的 key 之间移动元素（例外就是通过 ZUNIONSTORE 敏玲复制一个 SET 到 ZSET）。对于涉及多个 key 的操作（无论是相同类型还是不同类型），Redis 提供了 5 个命令对多个 key 进行操作，且不需要中断：WATCH, MULTI,EXEC, UNWATCH, DISCARD.

Redis 最简单的事务就是使用 MULTI 和 EXEC 命令, 基础事务就意味着这两种命令提供了一种方式，使得一个客户端执行多个命令 A, B, C 时，不会被其它客户端打断，意思就是如果没有这种事务保证，A, B, C 命令一条一条执行时，执行完 A 命令，有可能其它客户端执行了 D 命令，然后这个客户端才去执行 B, A, B, C 的执行被打断了。这与关系数据库的事务时不同的，关系数据库的事务可以部分执行，然后回滚或是提交。在 Redis 中，作为 MULTI/EXEC 事务中一部分的每一个命令都是一个接一个执行的，直到所有命令全部完成，然后其它客户端才能执行它们的命令.

要在 Redis 中之执行一个事务的步骤:

首先调用 MULTI 命令
跟随一系列要执行的其它命令, Redis 会将该连接到来的这些命令存在队列中
执行 EXEC 命令, Redis 会顺序地执行 2 步骤中地所有命令,并且不会被中断

语义上, Redis 的 Python 客户端库使用一种称之为管道 popeline 的方式处理这种事务,在一个连接对象上调用 pipeline() 方法会创建一个事务,使用正确的话, 会自动将一系列命令列封装在 MULTI 和 EXEC命令之间, 于此同时, Python 的客户端也会将要发送的命令存储起来,直到真正要送的时候, 才发送这些命令. 这能够减少 Redis 服务器与客户端的网络通信次数, 能够提高命令的执行效率, 提高性能.

如何验证 ? 可以通过多线程验证, 每个线程都是先对某个计数器 key 加 1, 再减 1, 非事务的情况下, 一个线程的 +1 和 -1 操作可能被其它线程的 +1 和 -1 操作打断; 如果使用了 Redis 事务, 则不会出现这种情况。

其它客户端也是如此 ?

使用事务的好处和坏处是什么？

Redis 事务

MULTI/EXEC 这种基础事务的问题在于，如果没有执行 EXEC 命令，那么之前的任何命令都不会被执行，这意味着不能利用中间一些读操作的结果来在程序中及时做出决策。

WATCH 命令结合 MULTI 和 EXEC 命令，以及 UNWATCH 和 DISCARD 命令，当通过 WATCH 命令关注 (watch) 键时，在执行 EXEC 操作前的任意时刻，这些键被其它客户端替换、更新或删除，那么这时尝试执行 EXEC 会失败，并返回错误信息。通过使用 WATCH, MULTI/EXEC, 以及 UNWATCH/DISCARD 命令可以确保在做一些重要的操作时，数据不会被修改。

UNWATCH/DISCARD 的区别

WATCH
|
|-----> UNWATCH (reset the connection)
|
MULTI
|
|-----> DISCARD (reset the connection: cancel the WATCH and clear out any queued commands)
|
EXEC

如果通过 WATCH 命令关注了一些 key，然后通过 MULTI 命令开启了一个事务，并跟随了一组命令，这时可以通过 DISCARD 命令取消关注，并清除任何缓存的命令。

客户端1操作

127.0.0.1:6379> get guoph2
"2"
127.0.0.1:6379> watch guoph2
OK
127.0.0.1:6379> multi
OK
127.0.0.1:6379> set guoph2 3
QUEUED
# 在客户端2进行 set 后，执行 exec，本次事务执行失败，争产执行会返回 OK
127.0.0.1:6379> exec
(nil)

客户端2操作

127.0.0.1:6379> get guoph2
"2"
# 在客户端1操作 watch，multi,set命令后，对 key 的值进行更新
127.0.0.1:6379> set guoph2 1
OK
127.0.0.1:6379>

结合 WATCH 与 MULTI/EXEC 命令，就可以在关注的键被其它客户端修改时得到通知，可以再次进行重试。

为什么 Redis 不实现典型的锁机制

当处于写数据的目的而访问数据时，即 SQL 中的 SELECT FOR UPDATE, 关系型数据库会对要访问的行进行加锁，直到一个事务通过 COMMIT 或者 ROLLBACK完成事务的处理. 如果其它客户端尝试对相同的行，access data for writing 时，其它客户端会被阻塞直到第一个事务完成。这种形式的锁在实际中应用的很好（特别是所有的关系型数据库都实现了它），但是可能会导致客户端等待获取锁而长时间等到锁。

由于这种可能存在的长时间等待，并且 Redis 的设计就是较少客户端的等待时间，Redis 在 WATCH期间并不会锁数据，相反，Redis 会通知客户端，如果其它客户端先修改了数据，这也被称之为乐观锁 (optimistic locking). 关系型数据库执行的锁可以被视作悲观锁 (pessimistic). 乐观锁同样也应用广泛，因为客户端从来不等待第一个锁的持有者释放锁，它只是在不断地进行重试。

网络相关

2021-11-14 Linux

ping 命令

ping 命令通过 ICMP (Internet Control Message Protocol) 中的 echo 分组测试两台主机的连通性。向某台主机发送 echo 分组时，如果能够送达目标主机，可以返回一条 reply 消息；如果没有到目标主机的路由或其它原因，则 ping 命令失败。

# -c n 限制发送的分组数量
# icmp_seq icmp 序号
# ttl time to live 剩余路由器转发的次数，次数为 0 时还没有到达目标主机，消息将被丢弃
# time RTT(Round Trip Time) 分组的往返时间
guo@DESKTOP-4L69AND:/mnt/e/learning-dir/shell-learning$ ping baidu.com -c 5
PING baidu.com (220.181.38.148) 56(84) bytes of data.
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=1 ttl=52 time=35.3 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=2 ttl=52 time=35.4 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=3 ttl=52 time=35.4 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=4 ttl=52 time=35.8 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=5 ttl=52 time=35.8 ms

--- baidu.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 5053ms
rtt min/avg/max/mdev = 35.322/35.588/35.876/0.331 ms

ping 命令执行顺利，目标主机可达，退出状态为 0；否则非 0，目标主机不可达。

测试局域网下所有可达的主机：

#!/bin/bash

for ip in 192.168.31.{1..255}
do
        echo "testing $ip"
        ping $ip -c 1 &> /dev/null ;
        if [ $? -eq 0 ];then
                echo $ip is alive
        fi
done

创建套接字

TCP/IP 网络中用于传输数据的套接字，可以使用 netcat 或 nc 命令。

基本使用

首先创建一个监听本地端口 1234 的套接字

1 2	# -l 指定监听一个要连接接到本地某个端口的套接字，而不是初始化一个到远程主机的连接 ph@guo-lenovo:~$ nc -l 1234

本地或其它主机连接到上述监听套接字

1	ph@guo-lenovo:~$ nc localhost 1234

任意一端中输入信息并按下回车键，信息会出现在另一端中，完成了一次通信

ph@guo-lenovo:~$ nc -l 1234
hello
a


ph@guo-lenovo:~$ nc localhost 1234
hello
a

快速复制文件

主要利用的时 shell 的重定向

监听端

1	nc -l 1234 > nc_redirect_out

发送端

1	nc 192.168.31.188 1234 < yihuo.sh

文件内容

ph@guo-lenovo:~$ cat nc_redirect_out
#!/bin/bash

orig=(01101101 01101001 01100100 01101110 01101001 01100111 01101000 01110100)
key=(01001101 01001101 01010100 01111110 01101111 01100001 01000000 00010100)

for i in "${!orig[@]}";do
    o=$(echo -n $((2#${orig[$i]})))
    k=$(echo -n $((2#${key[$i]})))
    echo $(($o ^ $k)) | xargs -n 1 | while read dec; do echo  "ibase=10;obase=2;$dec" | bc | tr "\n" " " | sed 's/^/0/g'; done

done

搭建网桥

主要使用到了 ip link 命令, 主机有两个网卡， eth0 配置连接到子网 192.168.1.0，eth1 通过网桥连接到子网 10.0.0.0，并为网桥配置 ip 地址为 10.0.0.2，如果以太网适配器加入了网桥，该适配器不再配置 ip 地址，需要配置地址的时网桥。

# 创建名为 br0 的网桥
ip link add br0 type bridge

# 将以太网适配器添加到网桥
ip link sed dev eht1 master br0

# 配置网桥的 ip 地址
ifconfig br0 10.0.0.2

# 启用分组转发
echo 1 > /proc/sys/net/ipv4/ip_forward

# 10.0.0.0/24 中的主机添加路由
route add -net 192.168.1.0/16 gw 10.0.0.2

# 192.168.1.0/16 中的主机添加路由
route add -net 10.0.0.0/24 gw 192.168.1.2

Internet 连接共享

Internet 连接共享其实也是利用的分组转发的功能，类似上面的网桥部分，起到了路由器的功能，可以提供防火墙以及连接共享。假设有线网络连接 eth0 连接到了 internet，可以创建无线连接，并创建一个共享的无线网络，是其它设备连接到这个无线网络，其中的分组通过虚拟路由器，最终转发到互联网

使用 iptables 架设简易防火墙

防火墙的目的主要时过滤、阻止不需要的网络流量，允许正常的网络流量通过。

阻止到特定 IP 地址的流量

# -A Append to chain 链就是若干规则的集合，追加到 OUTPUT 链，控制所有的出站流量 (outgoing traffic)
# -I Insert in chain as rulenum (default 1=first)
# -d 匹配分组的目的地址
# -j iptables 执行特定的处理:DROP,ACCEPT,REJECT
iptables -A OUTPUT -d 8.8.8.8 -j DROP

阻止到特定端口的流量

# -P 指定规则仅适用于 TCP
# -dport 指定了对应的端口
# 这里来相当于阻止了所有出站的 FTP 流量
iptables -A OUTPUT -p tcp -dport 21 -j DROP

阻止进入的特定流量

# -I 新的规则将插入到规则集的开头
# INPUT INPUT 链，控制所有的入站流量 (incoming traffic)
# -s 指定了分组的源地址
iptables -I INPUT -s 1.2.3.4 -j DROP

1 2	# -flush 清除对 iptables 链所作出的所有改动 iptables -flush

Wicked Cook Shell Scripts

2021-09-21 Linux

格式化过长的行

可以直接使用 fmt 命令，或者使用下面的脚本，借助了 nroff 命令

#!/bin/bash
# fmt: used to format text lines
# -w width 指定宽度
# -h 是否启用连字符

while getopts "hw:" opt; do
    case "${opt}" in
    h) hype=1 ;;
    w) width="$OPTARG" ;;
    *) echo "ignore unknow argument: $opt" ;;
    esac
done

shift $(($OPTIND - 1))

nroff <<EOF
.ll ${width:-72} # set up default value
.na
.hy ${hype:-0}
.pl 1
$(cat "$@")
EOF

exit 0

显示目录内容

脚本的作用在于打印出目录下的文件块大小（每个块 1024 字节）或目录下的条目个数。

在打印出所有的内容前，为了使得输出的每行展示两列, 并进行对齐，使用了如下的步骤：

使用了 sed 流文本编辑器命令，先将原始输出的两行变为一行，再将可能包含的空格字符先替换为 \0 null 字符（具体的过程见后面的注释）
使用 awk 命令格式化输出

sed 命令真是一个非常有趣的命令！

#!/usr/bin/env bash

if [[ $# -gt 1 ]]; then
    echo "Usage: $0 [dirname]"
    exit 1
elif [[ $# -eq 1 ]]; then
    cd "$@"
    # if cd command failed because it is not a valid path
    if [[ $? -ne 0 ]]; then
        exit 1
    fi
fi

# shell will extend '*' as all files and directories under the current directory
for file in *; do
    if [[ -d "$file" ]]; then
        size=$(ls "$file" | wc -l | sed 's/[^[:digit:]]//g')
        echo "$file ($size entr(y|ies))"
    else
        size=$(ls -sk $file | awk '{print $1}')
        echo "$file (${size}KB)"
    fi
done | \
sed -n \
-e '$!N' \
-E -e 's/\n/\x0/g' \
-e 'p' | \
awk -F "\0" '{ printf "%-39s %-39s\n",$1,$2 }' 

### a bettern output fotmat

## step 1: use sed to make things easy and right
# -n			disable automatic priting
# '$!N' 		append the next line of input into the pattern space, now we have two lines in sed's pattern space
# 's/\n/\x0/g'	replace '\n' in pattern space, so we make two lines a line separated by '\0'
# 'p'			print the content of pattern space

### step 2: awk final output
# make the '\0' as field delimiter, and format the output

测试结果：

guo@DESKTOP-4L69AND:/mnt/e/learning-dir/shell-learning/scripts$ sudo ./formatDir.sh /
bin (171 entr(y|ies))                   boot (0 entr(y|ies))
dev (210 entr(y|ies))                   dump-new.rdb (8KB)
etc (189 entr(y|ies))                   home (1 entr(y|ies))
init (620KB)                            lib (22 entr(y|ies))
lib64 (1 entr(y|ies))                   media (0 entr(y|ies))
mnt (8 entr(y|ies))                     opt (6 entr(y|ies))
proc (46 entr(y|ies))                   root (0 entr(y|ies))
run (11 entr(y|ies))                    sbin (220 entr(y|ies))
snap (0 entr(y|ies))                    srv (0 entr(y|ies))
sys (10 entr(y|ies))                    tmp (79 entr(y|ies))
usr (8 entr(y|ies))                     var (13 entr(y|ies))
zookeeper_server.pid (0KB)

提醒工具

添加提醒

#!/usr/bin/env bash
# 简单的命令行提醒工具

remindFile="$HOME/.remind"

if [ $# -eq 0 ]; then
    echo "enter note and end with ctrl-D:" # ctrl-D Terminate input, or exit shell 终止输入或退出 shell
    cat - >>$remindFile
else
    echo "$@" >>$remindFile
fi

exit 0

查询提醒

#!/usr/bin/env bash
# remindMe 查看提醒

remindFile="$HOME/.remind"

if [ ! -f $remindFile ]; then
    echo "$0: You don't have a .remind file under your user home" >/dev/stderr
    exit 1
fi

if [ $# -eq 0 ]; then
    more $remindFile
else
    grep -i -- "$@" $remindFile | ${PAGER:-more} # 变量为空默认赋值，即使用 more 命令来展示内容；-- 将后面参数不再作为 grep 命令的参数来解析，出于安全执行命令考虑
fi

exit 0

交互式包装器

#!/usr/bin/env bash
# frontBc 交互式计算器，这里我只是为了参考源代码的编写方式，即如何实现一个简单的交互式包装器、

function show_help() {
    cat <<EOF
This is a help documentation, supported commands are as follows:
    (1) del del an item
    (2) add add a new item
    (3) mov move a item
EOF
}

echo "enter 'help' for help, 'quit' to quit"
echo -n "calc> "

while read command args; do
    case $command in
    quit | exit | bye) exit 0 ;;
    help | ?) show_help ;;
    *) echo "$command $args" ;;
    esac
    echo -n "calc> "
done

echo ""
exit 0

图床测试

2021-09-11 Linux

Description

真是一件麻烦的事情呢，竟然使用了 Redis 作为图片去重的校验…

CSV2Table

2021-09-11 Linux

CSV 转换为表格（支持MD格式的表格）

写了一个 CSV 文件转为 stdout 中的表格的脚本，也支持输出 Markdown 表格，考虑了汉字与英文字符的显示宽度。

Shell 使用 utf-8 编码
仅支持 ASCII 字符和占3个字节的字符（其他字符在UTF-8中可能是2个字节或4个字节，不适用；或者要占据3个字节，但是打印的宽度不等于一个ASCII字符的2倍的情况下，也不适用）
支持表头字段带有颜色（黄色），使用 -c 选项
支持 MD 表格格式，使用 -m 选项
分隔符默认为 ，，使用 -d "delemeter" 指定，delemeter 中的第一个字符会作为分隔符

满足上述情况下，才能打印出格式正确的表格

待优化事项

行数增多时，处理的时间明显过长，是由于每个单元格中的值都要计算字节数和字符数，计算打印时要打印的空格字符数
只针对ASCII字符和占三个字节的字符，并且占三个字节的字符的显示宽度是ASCII字符的两倍，其他情况现在还不能正确处理

1 2	# SYNOPSIS ./toTable.sh -m -v -c -p -n 50 -d "," file

示例

示例文本：

$ cat sep
A,B,C
1,郭,2,C
我是人间一朵花,你是人间一头牛
1,郭,2,C
我是人间一朵花,你是人间一头牛,lovedthe ywat uodjsadsandsasdsnadmsmada,dsad

示例输出:

#!/bin/bash

### rule: always use `LF` as the end of lie in linux or unexpected things will drive you crazy
### csv file to md table or normal table
### only ascii and chinese characters allowed in file,and assumes utf-8 is used as charset or it displays not properly 

oldIFS=$IFS

function ExitFunc() {
    IFS=$oldIFS
}

function printSymbolLie() {
    sep=$1
    mid=$2
    shift 2
    echo -n "$sep"
    for count in "$@"; do
        for ((m = -2; m < $count; m++)); do
            echo -n "$mid"
        done
        echo -n "$sep"
    done
    echo
}

trap 'ExitFunc' 2 9 15 20 EXIT

isMD=0
isColor=0
IFS=,
while getopts 'cmd:' OPT; do
    case "$OPT" in
    c)
        isColor=1
        ;;
    m)
        isMD=1
        ;;
    d)
        IFS=$(echo -ne "$OPTARG")
        ;;
    ?)
        echo "avaliable options: [-c] [-m] [-d delimiter]" >&2 ## standard error
        exit 1
        ;;
    esac
done
shift "$(($OPTIND - 1))"

echo "delimiter has been set to [$IFS]"
echo -n "$IFS" | hexdump -C
echo

for item in $@; do
    if [ ! -f "$item" ]; then
        echo "file ${item} not exists"
        continue
    fi

    echo "file name: $item"

    # array contasing all line per file
    line_arr=()
    col_count=0
    row_count=0
    while read line; do
        line_arr[$row_count]=$line
        arr=($line)
        temp_count=${#arr[@]}
        if ((temp_count > col_count)); then
            col_count=$temp_count
        fi
        ((row_count++))
    done <$item
    echo "table column size: ${col_count}, row size: ${row_count}"

    # array contains max length of every col
    max_count_arr=()
    for ((i = 0; i < ${row_count}; i++)); do
        line=${line_arr[i]}
        arr=($line)

        # echo "line: $line ${arr[@]}"
        for ((j = 0; j < $col_count; j++)); do
            max_length=${max_count_arr[j]}
            col_str=${arr[j]}

            # wc output: always in the following order: newline, word, character, byte, maximum line length.
            bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n ","))
            bytes=${bytes_chars[2]}
            chars=${bytes_chars[1]}

            if ((bytes > chars)); then
                current_length=$(((bytes + chars) / 2))
            else
                current_length=$chars
            fi

            if ((current_length > max_length)); then
                max_count_arr[j]=$current_length
            fi
        done
    done

    echo -e "max width of every col: ${max_count_arr[@]}\n"

    for ((i = 0; i < ${row_count}; i++)); do
        line=${line_arr[i]}
        arr=($line)

        if [[ $isMD -eq 1 && $i -eq 1 ]]; then
            printSymbolLie "|" "-" "${max_count_arr[@]}"
        fi

        # print the table head line +-------+-------+
        if [ $isMD -eq 0 ]; then
            if [[ $i -eq 0 || $i -eq 1 ]]; then
                printSymbolLie "+" "-" "${max_count_arr[@]}"
            fi
        fi

        for ((j = 0; j < $col_count; j++)); do
            bytes_chars=($(echo -n ${arr[j]} | wc -c -m | xargs -n 1 echo -n ","))
            bytes=${bytes_chars[2]}
            chars=${bytes_chars[1]}

            # let's assume x: Chinese char count;  y:ascii char count; default charset is utf-8
            # as for display width: one Chinese character = ascii * 2
            #
            # (1) x*3 + y = bytes (use wc)
            # (2) x + y = chars (use wc)
            # (3) 2*x + y + z = max_width (we have calculated it before, z is the empty char count)
            # 
            # printf command will count bytes as min size: printf "%-10s" "hello" 10 means bytes
            # so finally, the printf min size for current col is:
            #       3*x + y + z = max_col_width + (bytes - chars) / 2 (calculated by (1) (2) (3) above)

            if ((j == 0)); then echo -n "|"; fi
            max_col_width=${max_count_arr[j]}
            min_print=$((max_col_width + (bytes - chars) / 2))

            if ((i == 0 && isColor == 1)); then
                printf " \033[1;33m%-${min_print}s\033[00m |" ${arr[j]}
            else
                printf " %-${min_print}s |" ${arr[j]}
            fi
        done
        echo
    done

    # print the end line +-------+-------+
    if [ $isMD -eq 0 ]; then
        printSymbolLie "+" "-" "${max_count_arr[@]}"
    fi
done

IFS=$oldIFS

并行计算每列的最大宽度（列粒度）

#!/bin/bash

### rule: always use `LF` as the end of lie in linux or unexpected things will drive you crazy
### csv file to md table or normal table
### only ascii and chinese characters allowed in file,or it displays not properly and assumes utf-8 is used as charset

oldIFS=$IFS

function log() {
    echo -e "[$(date +"%F %T")]: $@"
}

function ExitFunc() {
    IFS=$oldIFS
}

function calLength() {
    local line=$1
    local start=$2
    local end=$3
    local arr=($line)
    local result=""

    while ((start < end)); do
        local col_str=${arr[start]}
        # wc output: always in the following order: newline, word, character, byte, maximum line length.
        local bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n ","))
        local bytes=${bytes_chars[2]}
        local chars=${bytes_chars[1]}
        if ((bytes > chars)); then
            result="${result}$(((bytes + chars) / 2)),"
        else
            result="${result}$chars,"
        fi
        ((start++))
    done

    echo "$result" >"col$2.txt"

    return 0
}

function printSymbolLie() {
    sep=$1
    mid=$2
    shift 2

    echo -n "$sep"
    for count in "$@"; do
        for ((m = -2; m < $count; m++)); do
            echo -n "$mid"
        done
        echo -n "$sep"
    done
    echo
}

trap 'ExitFunc' 2 9 15 20 EXIT

isMD=0
isColor=0
IFS=,
while getopts 'cmd:' OPT; do
    case "$OPT" in
    c)
        isColor=1
        ;;
    m)
        isMD=1
        ;;
    d)
        IFS=$(echo -ne "$OPTARG")
        ;;
    ?)
        echo "avaliable options: [-c] [-m] [-d delimiter]" >&2 ## standard error
        exit 1
        ;;
    esac
done
shift "$(($OPTIND - 1))"

log "delimiter has been set to [$IFS]"
echo -n "$IFS" | hexdump -C
echo

for item in $@; do
    if [ ! -f "$item" ]; then
        echo "file ${item} not exists"
        continue
    fi

    log "file name: $item"

    # array contasing all line per file
    line_arr=()
    col_count=0
    row_count=0
    while read line; do
        line_arr[$row_count]=$line
        arr=($line)
        temp_count=${#arr[@]}
        if ((temp_count > col_count)); then
            col_count=$temp_count
        fi
        ((row_count++))
    done <$item
    log "table column size: ${col_count}, row size: ${row_count}"

    # array contains max length of every col
    max_count_arr=()
    for ((i = 0; i < ${row_count}; i++)); do
        line=${line_arr[i]}
        arr=($line)

        # 并行没有办法返回给一个变量，所以只能都写到文件中，然后再拼接
        # 此处还有一个方案就是行的并行，当前的方案是列的并行
        files=""
        for((l=0;l<$col_count;l++)); do
            (calLength "$line" "$l" "$((l+1))") &
            files="${files}col${l}.txt "
        done
        wait

        result=$(echo $files | xargs cat)
        # echo "command result: $result"
        col_length_arr=($result)
        for ((j = 0; j < $col_count; j++)); do
            max_length=${max_count_arr[j]}
            current_length=${col_length_arr[j]}

            if ((current_length > max_length)); then
                max_count_arr[j]=$current_length
            fi
        done
    done
    
    echo $files | xargs rm


    # log "max width of every col: ${max_count_arr[@]}\n"

    # for ((i = 0; i < ${row_count}; i++)); do
    i=0
    while read line; do
        # line=${line_arr[i]}
        arr=($line)

        # echo "line:######### $line"
        if [[ $isMD -eq 1 && $i -eq 1 ]]; then
            printSymbolLie "|" "-" "${max_count_arr[@]}"
        fi

        # print the table head line +-------+-------+
        if [[ $isMD -eq 0 && ($i -eq 0 || $i -eq 1) ]]; then
            printSymbolLie "+" "-" "${max_count_arr[@]}"
        fi

        for ((j = 0; j < $col_count; j++)); do
            bytes_chars=($(echo -n ${arr[j]} | wc -c -m | xargs -n 1 echo -n ","))
            bytes=${bytes_chars[2]}
            chars=${bytes_chars[1]}

            # let's assume x: Chinese char count;  y:ascii char count; default charset is utf-8
            # as for display width: one Chinese character = ascii * 2
            #
            # (1) x*3 + y = bytes (use wc)
            # (2) x + y = chars (use wc)
            # (3) x*2 + y + z = max_width (we have calculated it before, z is the empty char count)
            #
            # printf command will count bytes as min size: printf "%-10s" "hello" 10 means bytes
            # so finally, the printf min size for current col is:
            #       3*x + y + z = max_col_width + (bytes - chars) / 2 (calculated by (1) (2) (3) above)

            if ((j == 0)); then echo -n "|"; fi
            max_col_width=${max_count_arr[j]}
            min_print=$((max_col_width + (bytes - chars) / 2))
            # z=$((max_col_width - (bytes + chars) / 2))

            if ((i == 0 && isColor == 1)); then
                printf " \033[1;33m%-${min_print}s\033[00m |" ${arr[j]}
            else
                printf " %-${min_print}s |" ${arr[j]}
            fi
        done  
        echo
        ((i++))
    done <$item

    # print the end line +-------+-------+
    if [ $isMD -eq 0 ]; then
        printSymbolLie "+" "-" "${max_count_arr[@]}"
    fi
    echo
done

IFS=$oldIFS

列、行并行

#!/bin/bash

### rule: always use `LF` as the end of lie in linux or unexpected things will drive you crazy
### csv file to md table or normal table
### only ascii and chinese characters allowed in file,or it displays not properly and assumes utf-8 is used as charset

oldIFS=$IFS

function log() {
    echo -e "[$(date +"%F %T")]: $@"
}

function ExitFunc() {
    IFS=$oldIFS
}

function calLength() {
    local line=$1
    local start=$2
    local end=$3
    local arr=($line)
    local result=""

    while ((start < end)); do
        local col_str=${arr[start]}
        # wc output: always in the following order: newline, word, character, byte, maximum line length.
        local bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n "$IFS"))
        local bytes=${bytes_chars[2]}
        local chars=${bytes_chars[1]}
        if ((bytes > chars)); then
            result="${result}$(((bytes + chars) / 2)),"
        else
            result="${result}$chars,"
        fi
        ((start++))
    done

    echo "$result" >"col$2.txt"

    return 0
}

function calLength2() {
    local line=$1
    local start=$2
    local end=$3
    local row=$4
    local arr=($line)
    local result=""

    while ((start < end)); do
        local col_str=${arr[start]}
        # wc output: always in the following order: newline, word, character, byte, maximum line length.
        local bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n "$IFS"))
        local bytes=${bytes_chars[2]}
        local chars=${bytes_chars[1]}
        if ((bytes > chars)); then
            result="${result}$(((bytes + chars) / 2)),"
        else
            result="${result}$chars,"
        fi
        ((start++))
    done

    echo "$result" >"row${row}_col$2.txt"

    return 0
}

function rowCal() {
    local line_start_num=$1
    local col_count=$2

    shift 2
    local max_count_arr=()
    for line in "$@"; do
        local arr=($line)
        local files=""
        for ((l = 0; l < $col_count; l++)); do
            (calLength2 "$line" "$l" "$((l + 1))" "${line_start_num}") &
            files="${files}row${line_start_num}_col${l}.txt "
        done
        wait

        local result=$(echo $files | xargs cat)
        # echo "command result: $result"
        local col_length_arr=($result)
        for ((j = 0; j < $col_count; j++)); do
            local max_length=${max_count_arr[j]}
            local current_length=${col_length_arr[j]}

            if ((current_length > max_length)); then
                max_count_arr[j]=$current_length
            fi
        done
    done

    echo $files | xargs rm &

    local final=""
    for count in "${max_count_arr[@]}"; do
        final="${final}${count},"
    done

    echo -n "$final" >"row${line_start_num}"
}

function printSymbolLie() {
    sep=$1
    mid=$2
    shift 2

    echo -n "$sep"
    for count in "$@"; do
        for ((m = -2; m < $count; m++)); do
            echo -n "$mid"
        done
        echo -n "$sep"
    done
    echo
}

trap 'ExitFunc' 2 9 15 20 EXIT

isMD=0
isColor=0
IFS=,
parrel=4
while getopts 'cmp:d:' OPT; do
    case "$OPT" in
    c)
        isColor=1
        ;;
    m)
        isMD=1
        ;;
    p)
        parrel=$OPTARG
        ;;
    d)
        IFS=$(echo -ne "$OPTARG")
        ;;
    ?)
        echo "avaliable options: [-c] [-m] [-d delimiter]" >&2 ## standard error
        exit 1
        ;;
    esac
done
shift "$(($OPTIND - 1))"

log "delimiter has been set to [$IFS]"
echo -n "$IFS" | hexdump -C
echo

for item in $@; do
    if [ ! -f "$item" ]; then
        echo "file ${item} not exists"
        continue
    fi

    log "file name: $item"

    ##########################################################
    # array contasing all line per file
    line_arr=()
    col_count=0
    row_count=0
    while read line; do
        line_arr[$row_count]=$line
        arr=($line)
        temp_count=${#arr[@]}
        if ((temp_count > col_count)); then
            col_count=$temp_count
        fi
        ((row_count++))
    done <$item
    log "table column size: ${col_count}, row size: ${row_count}"

    ##########################################################
    row_files=""
    if ((row_count > parrel)); then
        per_size=$((row_count / parrel))
        for ((r = 0; r < $parrel; r++)); do
            start_index=$((r * per_size))
            if ((r == parrel - 1)); then
                length=$((row_count - (r*per_size)))
            else
                length=$per_size
            fi
            (rowCal "${start_index}" $col_count "${line_arr[@]:${start_index}:${length}}") &
            row_files="${row_files}row${start_index}$IFS"
        done
    else
        (rowCal 0 $col_count "${line_arr[@]:0:${col_count}}") &
        row_files="row0"
    fi
    wait

    # array contains max length of every col
    max_count_arr=()
    for row_cal_file in $row_files; do
        result=$(cat $row_cal_file)
        col_length_arr=($result)

        for ((j = 0; j < $col_count; j++)); do
            max_length=${max_count_arr[j]}
            current_length=${col_length_arr[j]}

            if ((current_length > max_length)); then
                max_count_arr[j]=$current_length
            fi
        done
        rm $row_cal_file &
    done
    log "max width of every col cal \n"

    ##########################################################
    # for ((i = 0; i < ${row_count}; i++)); do
    i=0
    while read line; do
        # line=${line_arr[i]}
        arr=($line)

        # echo "line:######### $line"
        if [[ $isMD -eq 1 && $i -eq 1 ]]; then
            printSymbolLie "|" "-" "${max_count_arr[@]}"
        fi

        # print the table head line +-------+-------+
        if [[ $isMD -eq 0 && ($i -eq 0 || $i -eq 1) ]]; then
            printSymbolLie "+" "-" "${max_count_arr[@]}"
        fi

        for ((j = 0; j < $col_count; j++)); do
            bytes_chars=($(echo -n ${arr[j]} | wc -c -m | xargs -n 1 echo -n "$IFS"))
            bytes=${bytes_chars[2]}
            chars=${bytes_chars[1]}

            # let's assume x: Chinese char count;  y:ascii char count; default charset is utf-8
            # as for display width: one Chinese character = ascii * 2
            #
            # (1) x*3 + y = bytes (use wc)
            # (2) x + y = chars (use wc)
            # (3) x*2 + y + z = max_width (we have calculated it before, z is the empty char count)
            #
            # printf command will count bytes as min size: printf "%-10s" "hello" 10 means bytes
            # so finally, the printf min size for current col is:
            #       3*x + y + z = max_col_width + (bytes - chars) / 2 (calculated by (1) (2) (3) above)

            if ((j == 0)); then echo -n "|"; fi
            max_col_width=${max_count_arr[j]}
            min_print=$((max_col_width + (bytes - chars) / 2))
            # z=$((max_col_width - (bytes + chars) / 2))

            if ((i == 0 && isColor == 1)); then
                printf " \033[1;33m%-${min_print}s\033[00m |" ${arr[j]}
            else
                printf " %-${min_print}s |" ${arr[j]}
            fi

        done

        echo
        ((i++))
    done <$item

    # print the end line +-------+-------+
    if [ $isMD -eq 0 ]; then
        printSymbolLie "+" "-" "${max_count_arr[@]}"
    fi
    echo
done
IFS=$oldIFS

一次性输出

#!/bin/bash

### rule: always use `LF` as the end of lie in linux or unexpected things will drive you crazy
### csv file to md table or normal table
### only ascii and chinese characters allowed in file,or it displays not properly and assumes utf-8 is used as charset

oldIFS=$IFS
isVerbose=0

function debug() {
    if ((isVerbose == 1)); then
        echo -e "[$(date +"%F %T")]: $@"
    fi
}

function info() {
    echo -e "$@"
}

function ExitFunc() {
    IFS=$oldIFS
}

function calLength2() {
    local line=$1
    local start=$2
    local end=$3
    local row=$4
    local arr=($line)
    local result=""

    while ((start < end)); do
        local col_str=${arr[start]}
        # wc output: always in the following order: newline, word, character, byte, maximum line length.
        local bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n "$IFS"))
        local bytes=${bytes_chars[2]}
        local chars=${bytes_chars[1]}
        if ((bytes > chars)); then
            result="${result}$(((bytes + chars) / 2)),"
        else
            result="${result}$chars,"
        fi
        ((start++))
    done

    echo "$result" >"row${row}_col$2.txt"

    return 0
}

function rowCal() {
    local line_start_num=$1
    local col_count=$2

    shift 2
    local max_count_arr=()
    for line in "$@"; do

        debug "line: $line"

        local arr=($line)
        local files=""
        for ((l = 0; l < $col_count; l++)); do
            (calLength2 "$line" "$l" "$((l + 1))" "${line_start_num}") &
            files="${files}row${line_start_num}_col${l}.txt "
        done
        wait

        local result=$(echo $files | xargs cat)
        # echo "command result: $result"
        local col_length_arr=($result)
        for ((j = 0; j < $col_count; j++)); do
            local max_length=${max_count_arr[j]}
            local current_length=${col_length_arr[j]}

            if ((current_length > max_length)); then
                max_count_arr[j]=$current_length
            fi
        done
    done

    echo $files | xargs rm &

    local final=""
    for count in "${max_count_arr[@]}"; do
        final="${final}${count},"
    done

    echo -n "$final" >"row${line_start_num}"
}

function printSymbolLie() {
    sep=$1
    mid=$2
    shift 2

    echo -n "$sep"
    for count in "$@"; do
        for ((m = -2; m < $count; m++)); do
            echo -n "$mid"
        done
        echo -n "$sep"
    done
}

trap 'ExitFunc' 2 9 15 20 EXIT

isMD=0
isColor=0
isParrel=0
IFS=,
parrel=4

start_seconds=$(date +"%s")

while getopts 'cmvpn:d:' OPT; do
    case "$OPT" in
    c)
        isColor=1
        ;;
    m)
        isMD=1
        ;;
    p)
        isParrel=1
        ;;
    v)
        isVerbose=1
        ;;
    n)
        parrel=$OPTARG
        ;;
    d)
        IFS=$(echo -ne "$OPTARG")
        ;;
    ?)
        echo "avaliable options: [-c] [-v] [-m] [-p] [-n parrel_size] [-d delimiter]" >&2 ## standard error
        exit 1
        ;;
    esac
done
shift "$(($OPTIND - 1))"

debug "delimiter has been set to [$IFS]"
if ((isVerbose == 1)); then echo -n "$IFS" | hexdump -C; fi
echo

for item in $@; do
    if [ ! -f "$item" ]; then
        echo "file ${item} not exists"
        continue
    fi

    debug "file name: $item"

    ##########################################################
    # array contasing all line per file
    line_arr=()
    col_count=0
    row_count=0
    while read line; do
        line_arr[$row_count]=$line
        arr=($line)
        temp_count=${#arr[@]}
        if ((temp_count > col_count)); then
            col_count=$temp_count
        fi
        ((row_count++))
    done <$item
    debug "table column size: ${col_count}, row size: ${row_count}"

    ##########################################################
    row_files=""
    if ((isParrel == 1 && row_count > parrel)); then
        per_size=$((row_count / parrel))
        for ((r = 0; r < $parrel; r++)); do
            start_index=$((r * per_size))
            if ((r == parrel - 1)); then
                length=$((row_count - (r * per_size)))
            else
                length=$per_size
            fi
            (rowCal "${start_index}" $col_count "${line_arr[@]:${start_index}:${length}}") &
            row_files="${row_files}row${start_index}$IFS"
        done
    else
        debug "no parrel function call:" 0 $col_count "${line_arr[@]:0:${row_count}}"
        (rowCal 0 $col_count "${line_arr[@]:0:${row_count}}") &
        row_files="row0"
    fi
    wait

    # array contains max length of every col
    max_count_arr=()
    for row_cal_file in $row_files; do
        result=$(cat $row_cal_file)
        col_length_arr=($result)

        for ((j = 0; j < $col_count; j++)); do
            max_length=${max_count_arr[j]}
            current_length=${col_length_arr[j]}

            if ((current_length > max_length)); then
                max_count_arr[j]=$current_length
            fi
        done
        rm $row_cal_file &
    done
    debug "max width of every col cal \n"

    ##########################################################
    # for ((i = 0; i < ${row_count}; i++)); do
    i=0

    format_str=""
    parameter_str=""
    print_count=0
    while read line; do
        ((print_count++))
        # line=${line_arr[i]}
        arr=($line)

        # echo "line:######### $line"
        if [[ $isMD -eq 1 && $i -eq 1 ]]; then
            format_str=${format_str}$(printSymbolLie "|" "-" "${max_count_arr[@]}")"\n"
        fi

        # print the table head line +-------+-------+
        if [[ $isMD -eq 0 && ($i -eq 0 || $i -eq 1) ]]; then
            format_str=${format_str}$(printSymbolLie "+" "-" "${max_count_arr[@]}")"\n"
        fi

        for ((j = 0; j < $col_count; j++)); do
            bytes_chars=($(echo -n ${arr[j]} | wc -c -m | xargs -n 1 echo -n "$IFS"))
            bytes=${bytes_chars[2]}
            chars=${bytes_chars[1]}

            # let's assume x: Chinese char count;  y:ascii char count; default charset is utf-8
            # as for display width: one Chinese character = ascii * 2
            #
            # (1) x*3 + y = bytes (use wc)
            # (2) x + y = chars (use wc)
            # (3) x*2 + y + z = max_width (we have calculated it before, z is the empty char count)
            #
            # printf command will count bytes as min size: printf "%-10s" "hello" 10 means bytes
            # so finally, the printf min size for current col is:
            #       3*x + y + z = max_col_width + (bytes - chars) / 2 (calculated by (1) (2) (3) above)
            #
            # todo:// optimization has to be done to reduce cal time

            if ((j == 0)); then format_str="${format_str}|"; fi
            max_col_width=${max_count_arr[j]}
            min_print=$((max_col_width + (bytes - chars) / 2))

            if ((i == 0 && isColor == 1)); then
                format_str="${format_str} \033[1;33m%-${min_print}s\033[00m |"
            else
                format_str="${format_str} %-${min_print}s |"
            fi

            parameter_str="${parameter_str}\"${arr[j]}\" "
        done

        format_str="${format_str}\n"

        debug "format_str: $format_str"
        debug "parameter_str: $parameter_str"

        if ((print_count == 50 || print_count == row_count)); then
            echo -n "$parameter_str" | xargs printf "$format_str"
            parameter_str=""
            format_str=""
        fi

        ((i++))
    done <$item

    # echo -n "$parameter_str" | xargs printf "$format_str"

    # print the end line +-------+-------+
    if [ $isMD -eq 0 ]; then
        printSymbolLie "+" "-" "${max_count_arr[@]}"
    fi
    echo
done
IFS=$oldIFS

end_seconds=$(date +"%s")

info "\nTable row size: ${row_count}, column size: ${col_count}"
info "It takes $((end_seconds - start_seconds)) seconds to calculate and display.\n"

Prev Next