Redis Transaction

基础 Redis 事务

有时,需要在同一时间操作多个数据结构,这就需要进行多次 Redis 命令调用。虽然存在一些命令能够在 key 之间移动元素,但是并没有一个命令能够在不同类型的 key 之间移动元素(例外就是通过 ZUNIONSTORE 敏玲 复制一个 SETZSET)。对于涉及多个 key 的操作(无论是相同类型还是不同类型),Redis 提供了 5 个命令对多个 key 进行操作,且不需要中断:WATCH, MULTI,EXEC, UNWATCH, DISCARD.

Redis 最简单的事务就是使用 MULTIEXEC 命令, 基础事务就意味着这两种命令提供了一种方式,使得一个客户端执行多个命令 A, B, C 时,不会被其它客户端打断,意思就是如果没有这种事务保证,A, B, C 命令一条一条执行时,执行完 A 命令,有可能其它客户端执行了 D 命令,然后这个客户端才去执行 B, A, B, C 的执行被打断了。这与关系数据库的事务时不同的,关系数据库的事务可以部分执行,然后回滚或是提交。在 Redis 中,作为 MULTI/EXEC 事务中一部分的每一个命令都是一个接一个执行的,直到所有命令全部完成,然后其它客户端才能执行它们的命令.

image-20211117203044176

要在 Redis 中之执行一个事务的步骤:

  1. 首先调用 MULTI 命令
  2. 跟随一系列要执行的其它命令, Redis 会将该连接到来的这些命令存在队列中
  3. 执行 EXEC 命令, Redis 会顺序地执行 2 步骤中地所有命令,并且不会被中断

语义上, Redis 的 Python 客户端库使用一种称之为管道 popeline 的方式处理这种事务,在一个连接对象上调用 pipeline() 方法会创建一个事务,使用正确的话, 会自动将一系列命令列封装在 MULTIEXEC命令之间, 于此同时, Python 的客户端也会将要发送的命令存储起来,直到真正要送的时候, 才发送这些命令. 这能够减少 Redis 服务器与客户端的网络通信次数, 能够提高命令的执行效率, 提高性能.

如何验证 ? 可以通过多线程验证, 每个线程都是先对某个计数器 key 加 1, 再减 1, 非事务的情况下, 一个线程的 +1 和 -1 操作可能被其它 线程的 +1 和 -1 操作打断; 如果使用了 Redis 事务, 则不会出现这种情况。

其它客户端也是如此 ?

使用事务的好处和坏处是什么?

Redis 事务

MULTI/EXEC 这种基础事务的问题在于,如果没有执行 EXEC 命令,那么之前的任何命令都不会被执行, 这意味着不能利用中间一些读操作的结果来在程序中及时做出决策。

WATCH 命令结合 MULTIEXEC 命令, 以及 UNWATCHDISCARD 命令,当通过 WATCH 命令关注 (watch) 键时,在执行 EXEC 操作前的任意时刻,这些键被其它客户端替换、更新或删除,那么这时尝试执行 EXEC 会失败,并返回错误信息。通过使用 WATCH, MULTI/EXEC, 以及 UNWATCH/DISCARD 命令可以确保在做一些重要的操作时,数据不会被修改。

UNWATCH/DISCARD 的区别

1
2
3
4
5
6
7
8
9
WATCH
|
|-----> UNWATCH (reset the connection)
|
MULTI
|
|-----> DISCARD (reset the connection: cancel the WATCH and clear out any queued commands)
|
EXEC

如果通过 WATCH 命令关注了一些 key, 然后通过 MULTI 命令开启了一个事务,并跟随了一组命令,这时可以通过 DISCARD 命令取消关注,并清除任何缓存的命令。

客户端1操作

1
2
3
4
5
6
7
8
9
10
11
127.0.0.1:6379> get guoph2
"2"
127.0.0.1:6379> watch guoph2
OK
127.0.0.1:6379> multi
OK
127.0.0.1:6379> set guoph2 3
QUEUED
# 在客户端2进行 set 后,执行 exec,本次事务执行失败,争产执行会返回 OK
127.0.0.1:6379> exec
(nil)

客户端2操作

1
2
3
4
5
6
127.0.0.1:6379> get guoph2
"2"
# 在客户端1操作 watch,multi,set命令后,对 key 的值进行更新
127.0.0.1:6379> set guoph2 1
OK
127.0.0.1:6379>

结合 WATCHMULTI/EXEC 命令,就可以在关注的键被其它客户端修改时得到通知,可以再次进行重试。

为什么 Redis 不实现典型的锁机制

当处于写数据的目的而访问数据时,即 SQL 中的 SELECT FOR UPDATE, 关系型数据库会对要访问的行进行加锁,直到一个事务通过 COMMIT 或者 ROLLBACK完成事务的处理. 如果其它客户端尝试对相同的行,access data for writing 时,其它客户端会被阻塞直到第一个事务完成。这种形式的锁在实际中应用的很好(特别是所有的关系型数据库都实现了它),但是可能会导致客户端等待获取锁而长时间等到锁。

由于这种可能存在的长时间等待,并且 Redis 的设计就是较少客户端的等待时间,Redis 在 WATCH期间并不会锁数据,相反,Redis 会通知客户端,如果其它客户端先修改了数据,这也被称之为乐观锁 (optimistic locking). 关系型数据库执行的锁可以被视作悲观锁 (pessimistic). 乐观锁同样也应用广泛,因为客户端从来不等待第一个锁的持有者释放锁,它只是在不断地进行重试。

网络相关

ping 命令

ping 命令通过 ICMP (Internet Control Message Protocol) 中的 echo 分组测试两台主机的连通性。向某台主机发送 echo 分组时,如果能够送达目标主机,可以返回一条 reply 消息;如果没有到目标主机的路由或其它原因,则 ping 命令失败。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# -c n 限制发送的分组数量
# icmp_seq icmp 序号
# ttl time to live 剩余路由器转发的次数,次数为 0 时还没有到达目标主机,消息将被丢弃
# time RTT(Round Trip Time) 分组的往返时间
guo@DESKTOP-4L69AND:/mnt/e/learning-dir/shell-learning$ ping baidu.com -c 5
PING baidu.com (220.181.38.148) 56(84) bytes of data.
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=1 ttl=52 time=35.3 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=2 ttl=52 time=35.4 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=3 ttl=52 time=35.4 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=4 ttl=52 time=35.8 ms
64 bytes from 220.181.38.148 (220.181.38.148): icmp_seq=5 ttl=52 time=35.8 ms

--- baidu.com ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 5053ms
rtt min/avg/max/mdev = 35.322/35.588/35.876/0.331 ms

ping 命令执行顺利,目标主机可达,退出状态为 0;否则非 0,目标主机不可达。

测试局域网下所有可达的主机:

1
2
3
4
5
6
7
8
9
10
#!/bin/bash

for ip in 192.168.31.{1..255}
do
echo "testing $ip"
ping $ip -c 1 &> /dev/null ;
if [ $? -eq 0 ];then
echo $ip is alive
fi
done

创建套接字

TCP/IP 网络中用于传输数据的套接字,可以使用 netcatnc 命令。

基本使用

  1. 首先创建一个监听本地端口 1234 的套接字
1
2
# -l 指定监听一个要连接接到本地某个端口的套接字,而不是初始化一个到远程主机的连接
ph@guo-lenovo:~$ nc -l 1234
  1. 本地或其它主机连接到上述监听套接字
1
ph@guo-lenovo:~$ nc localhost 1234
  1. 任意一端中输入信息并按下回车键,信息会出现在另一端中,完成了一次通信
1
2
3
4
5
6
7
8
ph@guo-lenovo:~$ nc -l 1234
hello
a


ph@guo-lenovo:~$ nc localhost 1234
hello
a

快速复制文件

主要利用的时 shell 的重定向

  1. 监听端
1
nc -l 1234 > nc_redirect_out
  1. 发送端
1
nc 192.168.31.188 1234 < yihuo.sh
  1. 文件内容
1
2
3
4
5
6
7
8
9
10
11
12
ph@guo-lenovo:~$ cat nc_redirect_out
#!/bin/bash

orig=(01101101 01101001 01100100 01101110 01101001 01100111 01101000 01110100)
key=(01001101 01001101 01010100 01111110 01101111 01100001 01000000 00010100)

for i in "${!orig[@]}";do
o=$(echo -n $((2#${orig[$i]})))
k=$(echo -n $((2#${key[$i]})))
echo $(($o ^ $k)) | xargs -n 1 | while read dec; do echo "ibase=10;obase=2;$dec" | bc | tr "\n" " " | sed 's/^/0/g'; done

done

搭建网桥

image-20211114121038694

主要使用到了 ip link 命令, 主机有两个网卡, eth0 配置连接到子网 192.168.1.0,eth1 通过网桥连接到子网 10.0.0.0, 并为网桥配置 ip 地址为 10.0.0.2,如果以太网适配器加入了网桥,该适配器不再配置 ip 地址,需要配置地址的时网桥。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# 创建名为 br0 的网桥
ip link add br0 type bridge

# 将以太网适配器添加到网桥
ip link sed dev eht1 master br0

# 配置网桥的 ip 地址
ifconfig br0 10.0.0.2

# 启用分组转发
echo 1 > /proc/sys/net/ipv4/ip_forward

# 10.0.0.0/24 中的主机添加路由
route add -net 192.168.1.0/16 gw 10.0.0.2

# 192.168.1.0/16 中的主机添加路由
route add -net 10.0.0.0/24 gw 192.168.1.2

Internet 连接共享

Internet 连接共享其实也是利用的分组转发的功能,类似上面的网桥部分,起到了路由器的功能,可以提供防火墙以及连接共享。假设有线网络连接 eth0 连接到了 internet,可以创建无线连接,并创建一个共享的无线网络,是其它设备连接到这个无线网络,其中的分组通过虚拟路由器,最终转发到互联网

使用 iptables 架设简易防火墙

防火墙的目的主要时过滤、阻止不需要的网络流量,允许正常的网络流量通过。

  • 阻止到特定 IP 地址的流量
1
2
3
4
5
# -A Append to chain 链就是若干规则的集合,追加到 OUTPUT 链,控制所有的出站流量 (outgoing traffic)
# -I Insert in chain as rulenum (default 1=first)
# -d 匹配分组的目的地址
# -j iptables 执行特定的处理:DROP,ACCEPT,REJECT
iptables -A OUTPUT -d 8.8.8.8 -j DROP
  • 阻止到特定端口的流量
1
2
3
4
# -P 指定规则仅适用于 TCP
# -dport 指定了对应的端口
# 这里来相当于阻止了所有出站的 FTP 流量
iptables -A OUTPUT -p tcp -dport 21 -j DROP
  • 阻止进入的特定流量
1
2
3
4
# -I 新的规则将插入到规则集的开头
# INPUT INPUT 链,控制所有的入站流量 (incoming traffic)
# -s 指定了分组的源地址
iptables -I INPUT -s 1.2.3.4 -j DROP
1
2
# -flush 清除对 iptables 链所作出的所有改动
iptables -flush

Wicked Cook Shell Scripts

格式化过长的行

可以直接使用 fmt 命令,或者使用下面的脚本,借助了 nroff 命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/bin/bash
# fmt: used to format text lines
# -w width 指定宽度
# -h 是否启用连字符

while getopts "hw:" opt; do
case "${opt}" in
h) hype=1 ;;
w) width="$OPTARG" ;;
*) echo "ignore unknow argument: $opt" ;;
esac
done

shift $(($OPTIND - 1))

nroff <<EOF
.ll ${width:-72} # set up default value
.na
.hy ${hype:-0}
.pl 1
$(cat "$@")
EOF

exit 0

显示目录内容

脚本的作用在于打印出目录下的文件块大小(每个块 1024 字节)或目录下的条目个数。

在打印出所有的内容前,为了使得输出的每行展示两列, 并进行对齐,使用了如下的步骤:

  1. 使用了 sed 流文本编辑器命令,先将原始输出的两行变为一行,再将可能包含的空格字符先替换为 \0 null 字符(具体的过程见后面的注释)
  2. 使用 awk 命令格式化输出

sed 命令真是一个非常有趣的命令!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
#!/usr/bin/env bash

if [[ $# -gt 1 ]]; then
echo "Usage: $0 [dirname]"
exit 1
elif [[ $# -eq 1 ]]; then
cd "$@"
# if cd command failed because it is not a valid path
if [[ $? -ne 0 ]]; then
exit 1
fi
fi

# shell will extend '*' as all files and directories under the current directory
for file in *; do
if [[ -d "$file" ]]; then
size=$(ls "$file" | wc -l | sed 's/[^[:digit:]]//g')
echo "$file ($size entr(y|ies))"
else
size=$(ls -sk $file | awk '{print $1}')
echo "$file (${size}KB)"
fi
done | \
sed -n \
-e '$!N' \
-E -e 's/\n/\x0/g' \
-e 'p' | \
awk -F "\0" '{ printf "%-39s %-39s\n",$1,$2 }'

### a bettern output fotmat

## step 1: use sed to make things easy and right
# -n disable automatic priting
# '$!N' append the next line of input into the pattern space, now we have two lines in sed's pattern space
# 's/\n/\x0/g' replace '\n' in pattern space, so we make two lines a line separated by '\0'
# 'p' print the content of pattern space

### step 2: awk final output
# make the '\0' as field delimiter, and format the output

测试结果:

1
2
3
4
5
6
7
8
9
10
11
12
13
guo@DESKTOP-4L69AND:/mnt/e/learning-dir/shell-learning/scripts$ sudo ./formatDir.sh /
bin (171 entr(y|ies)) boot (0 entr(y|ies))
dev (210 entr(y|ies)) dump-new.rdb (8KB)
etc (189 entr(y|ies)) home (1 entr(y|ies))
init (620KB) lib (22 entr(y|ies))
lib64 (1 entr(y|ies)) media (0 entr(y|ies))
mnt (8 entr(y|ies)) opt (6 entr(y|ies))
proc (46 entr(y|ies)) root (0 entr(y|ies))
run (11 entr(y|ies)) sbin (220 entr(y|ies))
snap (0 entr(y|ies)) srv (0 entr(y|ies))
sys (10 entr(y|ies)) tmp (79 entr(y|ies))
usr (8 entr(y|ies)) var (13 entr(y|ies))
zookeeper_server.pid (0KB)

提醒工具

添加提醒

1
2
3
4
5
6
7
8
9
10
11
12
13
#!/usr/bin/env bash
# 简单的命令行提醒工具

remindFile="$HOME/.remind"

if [ $# -eq 0 ]; then
echo "enter note and end with ctrl-D:" # ctrl-D Terminate input, or exit shell 终止输入或退出 shell
cat - >>$remindFile
else
echo "$@" >>$remindFile
fi

exit 0

查询提醒

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/usr/bin/env bash
# remindMe 查看提醒

remindFile="$HOME/.remind"

if [ ! -f $remindFile ]; then
echo "$0: You don't have a .remind file under your user home" >/dev/stderr
exit 1
fi

if [ $# -eq 0 ]; then
more $remindFile
else
grep -i -- "$@" $remindFile | ${PAGER:-more} # 变量为空默认赋值,即使用 more 命令来展示内容;-- 将后面参数不再作为 grep 命令的参数来解析,出于安全执行命令考虑
fi

exit 0

交互式包装器

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/usr/bin/env bash
# frontBc 交互式计算器,这里我只是为了参考源代码的编写方式,即如何实现一个简单的交互式包装器、

function show_help() {
cat <<EOF
This is a help documentation, supported commands are as follows:
(1) del del an item
(2) add add a new item
(3) mov move a item
EOF
}

echo "enter 'help' for help, 'quit' to quit"
echo -n "calc> "

while read command args; do
case $command in
quit | exit | bye) exit 0 ;;
help | ?) show_help ;;
*) echo "$command $args" ;;
esac
echo -n "calc> "
done

echo ""
exit 0

图床测试

Description

image-20201128011943682

真是一件麻烦的事情呢,竟然使用了 Redis 作为图片去重的校验…

image-20201128012358249

CSV2Table

CSV 转换为表格(支持MD格式的表格)

写了一个 CSV 文件转为 stdout 中的表格的脚本,也支持输出 Markdown 表格,考虑了汉字与英文字符的显示宽度。

  • Shell 使用 utf-8 编码
  • 仅支持 ASCII 字符和占3个字节的字符 (其他字符在UTF-8中可能是2个字节或4个字节,不适用;或者要占据3个字节,但是打印的宽度不等于一个ASCII字符的2倍的情况下,也不适用)
  • 支持表头字段带有颜色(黄色),使用 -c 选项
  • 支持 MD 表格格式,使用 -m 选项
  • 分隔符默认为 , 使用 -d "delemeter" 指定,delemeter 中的第一个字符会作为分隔符

满足上述情况下,才能打印出格式正确的表格

待优化事项

  • 行数增多时,处理的时间明显过长,是由于每个单元格中的值都要计算字节数和字符数,计算打印时要打印的空格字符数
  • 只针对ASCII字符和占三个字节的字符,并且占三个字节的字符的显示宽度是ASCII字符的两倍,其他情况现在还不能正确处理
1
2
# SYNOPSIS 
./toTable.sh -m -v -c -p -n 50 -d "," file

示例

示例文本:

1
2
3
4
5
6
$ cat sep
A,B,C
1,郭,2,C
我是人间一朵花,你是人间一头牛
1,郭,2,C
我是人间一朵花,你是人间一头牛,lovedthe ywat uodjsadsandsasdsnadmsmada,dsad

示例输出:

image-20210911205933449

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
#!/bin/bash

### rule: always use `LF` as the end of lie in linux or unexpected things will drive you crazy
### csv file to md table or normal table
### only ascii and chinese characters allowed in file,and assumes utf-8 is used as charset or it displays not properly

oldIFS=$IFS

function ExitFunc() {
IFS=$oldIFS
}

function printSymbolLie() {
sep=$1
mid=$2
shift 2
echo -n "$sep"
for count in "$@"; do
for ((m = -2; m < $count; m++)); do
echo -n "$mid"
done
echo -n "$sep"
done
echo
}

trap 'ExitFunc' 2 9 15 20 EXIT

isMD=0
isColor=0
IFS=,
while getopts 'cmd:' OPT; do
case "$OPT" in
c)
isColor=1
;;
m)
isMD=1
;;
d)
IFS=$(echo -ne "$OPTARG")
;;
?)
echo "avaliable options: [-c] [-m] [-d delimiter]" >&2 ## standard error
exit 1
;;
esac
done
shift "$(($OPTIND - 1))"

echo "delimiter has been set to [$IFS]"
echo -n "$IFS" | hexdump -C
echo

for item in $@; do
if [ ! -f "$item" ]; then
echo "file ${item} not exists"
continue
fi

echo "file name: $item"

# array contasing all line per file
line_arr=()
col_count=0
row_count=0
while read line; do
line_arr[$row_count]=$line
arr=($line)
temp_count=${#arr[@]}
if ((temp_count > col_count)); then
col_count=$temp_count
fi
((row_count++))
done <$item
echo "table column size: ${col_count}, row size: ${row_count}"

# array contains max length of every col
max_count_arr=()
for ((i = 0; i < ${row_count}; i++)); do
line=${line_arr[i]}
arr=($line)

# echo "line: $line ${arr[@]}"
for ((j = 0; j < $col_count; j++)); do
max_length=${max_count_arr[j]}
col_str=${arr[j]}

# wc output: always in the following order: newline, word, character, byte, maximum line length.
bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n ","))
bytes=${bytes_chars[2]}
chars=${bytes_chars[1]}

if ((bytes > chars)); then
current_length=$(((bytes + chars) / 2))
else
current_length=$chars
fi

if ((current_length > max_length)); then
max_count_arr[j]=$current_length
fi
done
done

echo -e "max width of every col: ${max_count_arr[@]}\n"

for ((i = 0; i < ${row_count}; i++)); do
line=${line_arr[i]}
arr=($line)

if [[ $isMD -eq 1 && $i -eq 1 ]]; then
printSymbolLie "|" "-" "${max_count_arr[@]}"
fi

# print the table head line +-------+-------+
if [ $isMD -eq 0 ]; then
if [[ $i -eq 0 || $i -eq 1 ]]; then
printSymbolLie "+" "-" "${max_count_arr[@]}"
fi
fi

for ((j = 0; j < $col_count; j++)); do
bytes_chars=($(echo -n ${arr[j]} | wc -c -m | xargs -n 1 echo -n ","))
bytes=${bytes_chars[2]}
chars=${bytes_chars[1]}

# let's assume x: Chinese char count; y:ascii char count; default charset is utf-8
# as for display width: one Chinese character = ascii * 2
#
# (1) x*3 + y = bytes (use wc)
# (2) x + y = chars (use wc)
# (3) 2*x + y + z = max_width (we have calculated it before, z is the empty char count)
#
# printf command will count bytes as min size: printf "%-10s" "hello" 10 means bytes
# so finally, the printf min size for current col is:
# 3*x + y + z = max_col_width + (bytes - chars) / 2 (calculated by (1) (2) (3) above)

if ((j == 0)); then echo -n "|"; fi
max_col_width=${max_count_arr[j]}
min_print=$((max_col_width + (bytes - chars) / 2))

if ((i == 0 && isColor == 1)); then
printf " \033[1;33m%-${min_print}s\033[00m |" ${arr[j]}
else
printf " %-${min_print}s |" ${arr[j]}
fi
done
echo
done

# print the end line +-------+-------+
if [ $isMD -eq 0 ]; then
printSymbolLie "+" "-" "${max_count_arr[@]}"
fi
done

IFS=$oldIFS

并行计算每列的最大宽度(列粒度)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
#!/bin/bash

### rule: always use `LF` as the end of lie in linux or unexpected things will drive you crazy
### csv file to md table or normal table
### only ascii and chinese characters allowed in file,or it displays not properly and assumes utf-8 is used as charset

oldIFS=$IFS

function log() {
echo -e "[$(date +"%F %T")]: $@"
}

function ExitFunc() {
IFS=$oldIFS
}

function calLength() {
local line=$1
local start=$2
local end=$3
local arr=($line)
local result=""

while ((start < end)); do
local col_str=${arr[start]}
# wc output: always in the following order: newline, word, character, byte, maximum line length.
local bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n ","))
local bytes=${bytes_chars[2]}
local chars=${bytes_chars[1]}
if ((bytes > chars)); then
result="${result}$(((bytes + chars) / 2)),"
else
result="${result}$chars,"
fi
((start++))
done

echo "$result" >"col$2.txt"

return 0
}

function printSymbolLie() {
sep=$1
mid=$2
shift 2

echo -n "$sep"
for count in "$@"; do
for ((m = -2; m < $count; m++)); do
echo -n "$mid"
done
echo -n "$sep"
done
echo
}

trap 'ExitFunc' 2 9 15 20 EXIT

isMD=0
isColor=0
IFS=,
while getopts 'cmd:' OPT; do
case "$OPT" in
c)
isColor=1
;;
m)
isMD=1
;;
d)
IFS=$(echo -ne "$OPTARG")
;;
?)
echo "avaliable options: [-c] [-m] [-d delimiter]" >&2 ## standard error
exit 1
;;
esac
done
shift "$(($OPTIND - 1))"

log "delimiter has been set to [$IFS]"
echo -n "$IFS" | hexdump -C
echo

for item in $@; do
if [ ! -f "$item" ]; then
echo "file ${item} not exists"
continue
fi

log "file name: $item"

# array contasing all line per file
line_arr=()
col_count=0
row_count=0
while read line; do
line_arr[$row_count]=$line
arr=($line)
temp_count=${#arr[@]}
if ((temp_count > col_count)); then
col_count=$temp_count
fi
((row_count++))
done <$item
log "table column size: ${col_count}, row size: ${row_count}"

# array contains max length of every col
max_count_arr=()
for ((i = 0; i < ${row_count}; i++)); do
line=${line_arr[i]}
arr=($line)

# 并行没有办法返回给一个变量,所以只能都写到文件中,然后再拼接
# 此处还有一个方案就是行的并行,当前的方案是列的并行
files=""
for((l=0;l<$col_count;l++)); do
(calLength "$line" "$l" "$((l+1))") &
files="${files}col${l}.txt "
done
wait

result=$(echo $files | xargs cat)
# echo "command result: $result"
col_length_arr=($result)
for ((j = 0; j < $col_count; j++)); do
max_length=${max_count_arr[j]}
current_length=${col_length_arr[j]}

if ((current_length > max_length)); then
max_count_arr[j]=$current_length
fi
done
done

echo $files | xargs rm


# log "max width of every col: ${max_count_arr[@]}\n"

# for ((i = 0; i < ${row_count}; i++)); do
i=0
while read line; do
# line=${line_arr[i]}
arr=($line)

# echo "line:######### $line"
if [[ $isMD -eq 1 && $i -eq 1 ]]; then
printSymbolLie "|" "-" "${max_count_arr[@]}"
fi

# print the table head line +-------+-------+
if [[ $isMD -eq 0 && ($i -eq 0 || $i -eq 1) ]]; then
printSymbolLie "+" "-" "${max_count_arr[@]}"
fi

for ((j = 0; j < $col_count; j++)); do
bytes_chars=($(echo -n ${arr[j]} | wc -c -m | xargs -n 1 echo -n ","))
bytes=${bytes_chars[2]}
chars=${bytes_chars[1]}

# let's assume x: Chinese char count; y:ascii char count; default charset is utf-8
# as for display width: one Chinese character = ascii * 2
#
# (1) x*3 + y = bytes (use wc)
# (2) x + y = chars (use wc)
# (3) x*2 + y + z = max_width (we have calculated it before, z is the empty char count)
#
# printf command will count bytes as min size: printf "%-10s" "hello" 10 means bytes
# so finally, the printf min size for current col is:
# 3*x + y + z = max_col_width + (bytes - chars) / 2 (calculated by (1) (2) (3) above)

if ((j == 0)); then echo -n "|"; fi
max_col_width=${max_count_arr[j]}
min_print=$((max_col_width + (bytes - chars) / 2))
# z=$((max_col_width - (bytes + chars) / 2))

if ((i == 0 && isColor == 1)); then
printf " \033[1;33m%-${min_print}s\033[00m |" ${arr[j]}
else
printf " %-${min_print}s |" ${arr[j]}
fi
done
echo
((i++))
done <$item

# print the end line +-------+-------+
if [ $isMD -eq 0 ]; then
printSymbolLie "+" "-" "${max_count_arr[@]}"
fi
echo
done

IFS=$oldIFS

列、行并行

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
#!/bin/bash

### rule: always use `LF` as the end of lie in linux or unexpected things will drive you crazy
### csv file to md table or normal table
### only ascii and chinese characters allowed in file,or it displays not properly and assumes utf-8 is used as charset

oldIFS=$IFS

function log() {
echo -e "[$(date +"%F %T")]: $@"
}

function ExitFunc() {
IFS=$oldIFS
}

function calLength() {
local line=$1
local start=$2
local end=$3
local arr=($line)
local result=""

while ((start < end)); do
local col_str=${arr[start]}
# wc output: always in the following order: newline, word, character, byte, maximum line length.
local bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n "$IFS"))
local bytes=${bytes_chars[2]}
local chars=${bytes_chars[1]}
if ((bytes > chars)); then
result="${result}$(((bytes + chars) / 2)),"
else
result="${result}$chars,"
fi
((start++))
done

echo "$result" >"col$2.txt"

return 0
}

function calLength2() {
local line=$1
local start=$2
local end=$3
local row=$4
local arr=($line)
local result=""

while ((start < end)); do
local col_str=${arr[start]}
# wc output: always in the following order: newline, word, character, byte, maximum line length.
local bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n "$IFS"))
local bytes=${bytes_chars[2]}
local chars=${bytes_chars[1]}
if ((bytes > chars)); then
result="${result}$(((bytes + chars) / 2)),"
else
result="${result}$chars,"
fi
((start++))
done

echo "$result" >"row${row}_col$2.txt"

return 0
}

function rowCal() {
local line_start_num=$1
local col_count=$2

shift 2
local max_count_arr=()
for line in "$@"; do
local arr=($line)
local files=""
for ((l = 0; l < $col_count; l++)); do
(calLength2 "$line" "$l" "$((l + 1))" "${line_start_num}") &
files="${files}row${line_start_num}_col${l}.txt "
done
wait

local result=$(echo $files | xargs cat)
# echo "command result: $result"
local col_length_arr=($result)
for ((j = 0; j < $col_count; j++)); do
local max_length=${max_count_arr[j]}
local current_length=${col_length_arr[j]}

if ((current_length > max_length)); then
max_count_arr[j]=$current_length
fi
done
done

echo $files | xargs rm &

local final=""
for count in "${max_count_arr[@]}"; do
final="${final}${count},"
done

echo -n "$final" >"row${line_start_num}"
}

function printSymbolLie() {
sep=$1
mid=$2
shift 2

echo -n "$sep"
for count in "$@"; do
for ((m = -2; m < $count; m++)); do
echo -n "$mid"
done
echo -n "$sep"
done
echo
}

trap 'ExitFunc' 2 9 15 20 EXIT

isMD=0
isColor=0
IFS=,
parrel=4
while getopts 'cmp:d:' OPT; do
case "$OPT" in
c)
isColor=1
;;
m)
isMD=1
;;
p)
parrel=$OPTARG
;;
d)
IFS=$(echo -ne "$OPTARG")
;;
?)
echo "avaliable options: [-c] [-m] [-d delimiter]" >&2 ## standard error
exit 1
;;
esac
done
shift "$(($OPTIND - 1))"

log "delimiter has been set to [$IFS]"
echo -n "$IFS" | hexdump -C
echo

for item in $@; do
if [ ! -f "$item" ]; then
echo "file ${item} not exists"
continue
fi

log "file name: $item"

##########################################################
# array contasing all line per file
line_arr=()
col_count=0
row_count=0
while read line; do
line_arr[$row_count]=$line
arr=($line)
temp_count=${#arr[@]}
if ((temp_count > col_count)); then
col_count=$temp_count
fi
((row_count++))
done <$item
log "table column size: ${col_count}, row size: ${row_count}"

##########################################################
row_files=""
if ((row_count > parrel)); then
per_size=$((row_count / parrel))
for ((r = 0; r < $parrel; r++)); do
start_index=$((r * per_size))
if ((r == parrel - 1)); then
length=$((row_count - (r*per_size)))
else
length=$per_size
fi
(rowCal "${start_index}" $col_count "${line_arr[@]:${start_index}:${length}}") &
row_files="${row_files}row${start_index}$IFS"
done
else
(rowCal 0 $col_count "${line_arr[@]:0:${col_count}}") &
row_files="row0"
fi
wait

# array contains max length of every col
max_count_arr=()
for row_cal_file in $row_files; do
result=$(cat $row_cal_file)
col_length_arr=($result)

for ((j = 0; j < $col_count; j++)); do
max_length=${max_count_arr[j]}
current_length=${col_length_arr[j]}

if ((current_length > max_length)); then
max_count_arr[j]=$current_length
fi
done
rm $row_cal_file &
done
log "max width of every col cal \n"

##########################################################
# for ((i = 0; i < ${row_count}; i++)); do
i=0
while read line; do
# line=${line_arr[i]}
arr=($line)

# echo "line:######### $line"
if [[ $isMD -eq 1 && $i -eq 1 ]]; then
printSymbolLie "|" "-" "${max_count_arr[@]}"
fi

# print the table head line +-------+-------+
if [[ $isMD -eq 0 && ($i -eq 0 || $i -eq 1) ]]; then
printSymbolLie "+" "-" "${max_count_arr[@]}"
fi

for ((j = 0; j < $col_count; j++)); do
bytes_chars=($(echo -n ${arr[j]} | wc -c -m | xargs -n 1 echo -n "$IFS"))
bytes=${bytes_chars[2]}
chars=${bytes_chars[1]}

# let's assume x: Chinese char count; y:ascii char count; default charset is utf-8
# as for display width: one Chinese character = ascii * 2
#
# (1) x*3 + y = bytes (use wc)
# (2) x + y = chars (use wc)
# (3) x*2 + y + z = max_width (we have calculated it before, z is the empty char count)
#
# printf command will count bytes as min size: printf "%-10s" "hello" 10 means bytes
# so finally, the printf min size for current col is:
# 3*x + y + z = max_col_width + (bytes - chars) / 2 (calculated by (1) (2) (3) above)

if ((j == 0)); then echo -n "|"; fi
max_col_width=${max_count_arr[j]}
min_print=$((max_col_width + (bytes - chars) / 2))
# z=$((max_col_width - (bytes + chars) / 2))

if ((i == 0 && isColor == 1)); then
printf " \033[1;33m%-${min_print}s\033[00m |" ${arr[j]}
else
printf " %-${min_print}s |" ${arr[j]}
fi

done

echo
((i++))
done <$item

# print the end line +-------+-------+
if [ $isMD -eq 0 ]; then
printSymbolLie "+" "-" "${max_count_arr[@]}"
fi
echo
done
IFS=$oldIFS

一次性输出

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
#!/bin/bash

### rule: always use `LF` as the end of lie in linux or unexpected things will drive you crazy
### csv file to md table or normal table
### only ascii and chinese characters allowed in file,or it displays not properly and assumes utf-8 is used as charset

oldIFS=$IFS
isVerbose=0

function debug() {
if ((isVerbose == 1)); then
echo -e "[$(date +"%F %T")]: $@"
fi
}

function info() {
echo -e "$@"
}

function ExitFunc() {
IFS=$oldIFS
}

function calLength2() {
local line=$1
local start=$2
local end=$3
local row=$4
local arr=($line)
local result=""

while ((start < end)); do
local col_str=${arr[start]}
# wc output: always in the following order: newline, word, character, byte, maximum line length.
local bytes_chars=($(echo -n $col_str | wc -c -m | xargs -n 1 echo -n "$IFS"))
local bytes=${bytes_chars[2]}
local chars=${bytes_chars[1]}
if ((bytes > chars)); then
result="${result}$(((bytes + chars) / 2)),"
else
result="${result}$chars,"
fi
((start++))
done

echo "$result" >"row${row}_col$2.txt"

return 0
}

function rowCal() {
local line_start_num=$1
local col_count=$2

shift 2
local max_count_arr=()
for line in "$@"; do

debug "line: $line"

local arr=($line)
local files=""
for ((l = 0; l < $col_count; l++)); do
(calLength2 "$line" "$l" "$((l + 1))" "${line_start_num}") &
files="${files}row${line_start_num}_col${l}.txt "
done
wait

local result=$(echo $files | xargs cat)
# echo "command result: $result"
local col_length_arr=($result)
for ((j = 0; j < $col_count; j++)); do
local max_length=${max_count_arr[j]}
local current_length=${col_length_arr[j]}

if ((current_length > max_length)); then
max_count_arr[j]=$current_length
fi
done
done

echo $files | xargs rm &

local final=""
for count in "${max_count_arr[@]}"; do
final="${final}${count},"
done

echo -n "$final" >"row${line_start_num}"
}

function printSymbolLie() {
sep=$1
mid=$2
shift 2

echo -n "$sep"
for count in "$@"; do
for ((m = -2; m < $count; m++)); do
echo -n "$mid"
done
echo -n "$sep"
done
}

trap 'ExitFunc' 2 9 15 20 EXIT

isMD=0
isColor=0
isParrel=0
IFS=,
parrel=4

start_seconds=$(date +"%s")

while getopts 'cmvpn:d:' OPT; do
case "$OPT" in
c)
isColor=1
;;
m)
isMD=1
;;
p)
isParrel=1
;;
v)
isVerbose=1
;;
n)
parrel=$OPTARG
;;
d)
IFS=$(echo -ne "$OPTARG")
;;
?)
echo "avaliable options: [-c] [-v] [-m] [-p] [-n parrel_size] [-d delimiter]" >&2 ## standard error
exit 1
;;
esac
done
shift "$(($OPTIND - 1))"

debug "delimiter has been set to [$IFS]"
if ((isVerbose == 1)); then echo -n "$IFS" | hexdump -C; fi
echo

for item in $@; do
if [ ! -f "$item" ]; then
echo "file ${item} not exists"
continue
fi

debug "file name: $item"

##########################################################
# array contasing all line per file
line_arr=()
col_count=0
row_count=0
while read line; do
line_arr[$row_count]=$line
arr=($line)
temp_count=${#arr[@]}
if ((temp_count > col_count)); then
col_count=$temp_count
fi
((row_count++))
done <$item
debug "table column size: ${col_count}, row size: ${row_count}"

##########################################################
row_files=""
if ((isParrel == 1 && row_count > parrel)); then
per_size=$((row_count / parrel))
for ((r = 0; r < $parrel; r++)); do
start_index=$((r * per_size))
if ((r == parrel - 1)); then
length=$((row_count - (r * per_size)))
else
length=$per_size
fi
(rowCal "${start_index}" $col_count "${line_arr[@]:${start_index}:${length}}") &
row_files="${row_files}row${start_index}$IFS"
done
else
debug "no parrel function call:" 0 $col_count "${line_arr[@]:0:${row_count}}"
(rowCal 0 $col_count "${line_arr[@]:0:${row_count}}") &
row_files="row0"
fi
wait

# array contains max length of every col
max_count_arr=()
for row_cal_file in $row_files; do
result=$(cat $row_cal_file)
col_length_arr=($result)

for ((j = 0; j < $col_count; j++)); do
max_length=${max_count_arr[j]}
current_length=${col_length_arr[j]}

if ((current_length > max_length)); then
max_count_arr[j]=$current_length
fi
done
rm $row_cal_file &
done
debug "max width of every col cal \n"

##########################################################
# for ((i = 0; i < ${row_count}; i++)); do
i=0

format_str=""
parameter_str=""
print_count=0
while read line; do
((print_count++))
# line=${line_arr[i]}
arr=($line)

# echo "line:######### $line"
if [[ $isMD -eq 1 && $i -eq 1 ]]; then
format_str=${format_str}$(printSymbolLie "|" "-" "${max_count_arr[@]}")"\n"
fi

# print the table head line +-------+-------+
if [[ $isMD -eq 0 && ($i -eq 0 || $i -eq 1) ]]; then
format_str=${format_str}$(printSymbolLie "+" "-" "${max_count_arr[@]}")"\n"
fi

for ((j = 0; j < $col_count; j++)); do
bytes_chars=($(echo -n ${arr[j]} | wc -c -m | xargs -n 1 echo -n "$IFS"))
bytes=${bytes_chars[2]}
chars=${bytes_chars[1]}

# let's assume x: Chinese char count; y:ascii char count; default charset is utf-8
# as for display width: one Chinese character = ascii * 2
#
# (1) x*3 + y = bytes (use wc)
# (2) x + y = chars (use wc)
# (3) x*2 + y + z = max_width (we have calculated it before, z is the empty char count)
#
# printf command will count bytes as min size: printf "%-10s" "hello" 10 means bytes
# so finally, the printf min size for current col is:
# 3*x + y + z = max_col_width + (bytes - chars) / 2 (calculated by (1) (2) (3) above)
#
# todo:// optimization has to be done to reduce cal time

if ((j == 0)); then format_str="${format_str}|"; fi
max_col_width=${max_count_arr[j]}
min_print=$((max_col_width + (bytes - chars) / 2))

if ((i == 0 && isColor == 1)); then
format_str="${format_str} \033[1;33m%-${min_print}s\033[00m |"
else
format_str="${format_str} %-${min_print}s |"
fi

parameter_str="${parameter_str}\"${arr[j]}\" "
done

format_str="${format_str}\n"

debug "format_str: $format_str"
debug "parameter_str: $parameter_str"

if ((print_count == 50 || print_count == row_count)); then
echo -n "$parameter_str" | xargs printf "$format_str"
parameter_str=""
format_str=""
fi

((i++))
done <$item

# echo -n "$parameter_str" | xargs printf "$format_str"

# print the end line +-------+-------+
if [ $isMD -eq 0 ]; then
printSymbolLie "+" "-" "${max_count_arr[@]}"
fi
echo
done
IFS=$oldIFS

end_seconds=$(date +"%s")

info "\nTable row size: ${row_count}, column size: ${col_count}"
info "It takes $((end_seconds - start_seconds)) seconds to calculate and display.\n"