csplit | Like split (35.9), csplit lets you break a file into smaller pieces, but csplit (context split) also allows the file to be broken into different-sized pieces, according to context. With csplit, you give the locations (line numbers or search patterns) at which to break each section. csplit comes with System V, but there are also freely available versions. |
---|
Let's look at search patterns first. Suppose you have an outline consisting of three main sections. You could create a separate file for each section by typing:
%csplit outline /I./ /II./ /III./
28 number of characters in each file 415 . 372 . 554 . %ls
outline xx00 outline title, etc. xx01 Section I xx02 Section II xx03 Section III
This command creates four new files (outline remains intact).
csplit displays the character counts for each file. Note that
the first file (xx00) contains any text up to but not including
the first pattern, and that xx01 contains the first section, as you'd
expect. This is why the naming scheme begins with 00.
(Even if outline had begun immediately with a I.
,
xx01 would still contain Section I, but xx00 would be empty in this
case.)
If you don't want to save the text that occurs before a specified pattern, use a percent sign as the pattern delimiter:
%csplit outline %I.% /II./ /III./
415 372 554 %ls
outline xx00 Section I xx01 Section II xx02 Section III
The preliminary text file has been suppressed, and the created files now begin where the actual outline starts (the file numbering is off, however).
Let's make some further refinements. We'll use the -s option to suppress the display of the character counts, and we'll use the -f option to specify a file prefix other than the conventional xx:
%csplit -s -f part. outline /I./ /II./ /III./
%ls
outline part.00 part.01 part.02 part.03
There's still a slight problem though. In search patterns, a period is a
metacharacter (26.10)
that matches any single character, so the pattern
/I./
may inadvertently
match words like Introduction. We need to escape the period with a
backslash; however, the backslash has meaning both to the pattern and to
the shell, so in fact, we need either to use a double backslash or to surround
the pattern in
quotes (8.14).
A subtlety, yes, but one that can drive you crazy
if you don't remember it. Our command line becomes:
%csplit -s -f part. outline "/I\./" /II./ /III./
You can also break a file at repeated occurrences of the same pattern. Let's say you have a file that describes 50 ways to cook a chicken, and you want each method stored in a separate file. Each section begins with headings WAY #1, WAY #2, and so on. To divide the file, use csplit's repeat argument:
%csplit -s -f cook. fifty_ways /^WAY/ "{49}"
This command splits the file at the first occurrence of WAY, and the number in braces tells csplit to repeat the split 49 more times. Note that a caret is used to match the beginning of the line and that the C shell requires quotes around the braces (9.5). The command has created 50 files:
%ls cook.*
cook.00 cook.01 ... cook.48 cook.49
Quite often, when you want to split a file repeatedly, you don't know or don't care how many files will be created; you just want to make sure that the necessary number of splits takes place. In this case, it makes sense to specify a repeat count that is slightly higher than what you need (maximum is 99). Unfortunately, if you tell csplit to create more files than it's able to, this produces an "out of range" error. Furthermore, when csplit encounters an error, it exits by removing any files it created along the way. (A bug, if you ask me.) This is where the -k option comes in. Specify -k to keep the files around, even when the "out of range" message occurs.
csplit allows you to break a file at some number of lines above or below a given search pattern. For example, to break a file at the line that is five lines below the one containing Sincerely, you could type:
%csplit -s -f letter. all_letters /Sincerely/+5
This situation might arise if you have a series of business letters strung together in one file. Each letter begins differently, but each one begins five lines after the previous letter's Sincerely line. Here's another example, adapted from AT&T's UNIX User's Reference Manual:
%csplit -s -k -f routine. prog.c '%main(%' '/^}/+1' '{99}'
The idea is that the file prog.c contains a group of C routines,
and we want to place each one in a separate file
(routine.00, routine.01, etc.). The first pattern uses %
because we want to discard anything before main. The next argument
says, "Look for a closing brace at the beginning of a line (the conventional
end of a routine) and split on the following line (the assumed beginning of
the next routine)." Repeat this split up to 99 times, using -k to
preserve the created files.
[4]
[4] In this case, the repeat can actually occur only 98 times, since we've already specified two arguments and the maximum number is 100.
The csplit command takes line-number arguments in addition to patterns. You can say:
%csplit stuff 50 373 955
to create files split at some arbitrary line numbers. In that example, the new file xx00 will have lines 1-49 (49 lines total), xx01 will have lines 50-372 (323 lines total), xx02 will have lines 373-954 (582 lines total), and xx03 will hold the rest of stuff.
csplit works like split if you repeat the argument. The command:
%csplit top_ten_list 10 "{18}"
breaks the list into 19 segments of 10 lines each. [5]
[5] Not really. The first file contains only nine lines (1-9); the rest contain 10. In this case, you're better off saying
split -10 top_ten_list
.
-