shell - To Split into fixed sequences and leave extra out -


i limit files of same fixed length last item can variable size not more 557. means file amount can more determined flag -n of command split.

code 1 (ok)

$ seq -w 1 1671 > /tmp/k && gsplit -n15 /tmp/k && wc -c xaa && wc -c xao 557 xaa 557 xao 

where xaa first file of sequence, while xao last one. increase sequence 1 unit causes 5 unit increase (557->562) in last file xao not understand:

$ seq -w 1 1672 > /tmp/k && gsplit -n15 /tmp/k && wc -c xaa && wc -c xao 557 xaa 562 xao 

why increase of one-unit in sequence increase last item (xao) 5 units?

code 2

$ seq -w 1 1671 | gsed ':a;n;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao 445 xaa 455 xao $ seq -w 1 1672 | gsed ':a;n;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao 445 xaa 459 xao 

so increasing whole length 1 sequence (4 characters) leads 4 character increase (455 -> 459), in contrast first code increase 5 characters.

code 3

let's keep each unit of sequence fixed 4 characters seq -w 0 0.0001 1 | gsed 's/\.//g':

$ seq -w 0 0.0001 1 | gsed 's/\.//g' |  gsed ':a;n;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao 3333 xaa 3344 xao $ seq -w 0 0.0001 1.0001 | gsed 's/\.//g' |  gsed ':a;n;$!ba;s/\n//g' > /tmp/k && gsplit -n15 /tmp/k&& wc -c xaa && wc -c xao 3334 xaa 3335 xao 

so increasing sequence 1 characters increases xaa unit decreases xao 9 units. behavior not keep logical.

how can limit sequence length first, instance fixed @ 557 , later determine amount of files of successful files?

original answer — code 1

because seq -w 1 1671 generates 5 characters per number — 4 digits , 1 newline. adding 1 number output adds 5 bytes output.

extra answer — code 2

you've asked gnu split (aka gsplit) split file input 15 chunks. best values out. there's limit can when total number of bytes not multiple of 15. there options control happens.

however, in basic form, -n 15 option means first 14 output files each 445 characters, , last gets 455 because there 6685 = 445 * 15 + 10 characters in output file. when add 4 characters file (because delete newlines), last file gets additional 4 characters (because 6689 = 445 * 15 + 14).

extra answer — code 3

first of all, output seq -w 0 0.0001 1 looks like:

0.0000 0.0001 0.0002 … 0.9998 0.9999 1.0000 

so after output edited first sed, numbers 00000 10000 present, 1 per line, 6 characters per line (including newline). second sed eliminates newlines, again.

there 50006 bytes in /tmp/k on 1 line. that's equal 15 * 3333 + 11, hence first output. second variant has 50011 bytes in /tmp/k, 15 * 3334 + 1. hence difference of one.


Comments

Popular posts from this blog

c# - Better 64-bit byte array hash -

webrtc - Which ICE candidate am I using and why? -

php - Zend Framework / Skeleton-Application / Composer install issue -