Author: Song Li
Shown as “–help”, there are two major functions:
-u to format the text
-w [length] to limit the width of a line into a
–h to show the help page
–v to show the version info
gcc -o fmt.out fmt.c
fmt.c: the main code of this program
inputjane: the inputfile that I use to test the efficiency
HW2 fmt readme.pdf: Document of this program
-w [length]: if the original line is longer than the number length, this line will be cut into several lines. After this operation, the maximum length of every line will be the input number “length” and
every new line which comes from a same original line will share the same indent just as the original line. They have the same number of leading blanks. And the words will not be separated which means if a word is right at the cut point of the original line, this word will be moved to the next line. So no word will be cut. The default length of every new line is 75.
-u: if there are more than one blank characters such as ‘ ’, ”\t”, some of them will be removed. The result is that between every two words there will be only one blank. After every sentence, which means there are some characters like ‘.’, ‘?’, ‘!’ shown up at the end of a sentence, there will be two blanks. After this action, the length of every line will be limited within 75 characters by default. If users input the maximum length of every line, this limitation will be the input number.
##INPUT AND OUTPUT##
The input text must come from an input file. The result will be output as a “stdout” stream. The
input format are exampled here:
./fmt –u –w 50 inputfile1 inputfile2 > outputfile
This program formats texts line by line. In this way, it can handle a very large file in a very small memory cost. It opens the first file, reads a single line into an array, formats it and output it into a particular file and then the next line. After handling a file, it will handle another in the same way.
In this function, there are two strings. The first one is the original string. The second string is the new string. There will be two pointers. The first pointer points at the original string, the second one points at the new string. The first pointer, which points at the original string, always keep moving forward.
1) When the first pointer points at a normal character, this character will be copied to the new string, and the pointer points at the new string will move forward too.
2) When the first pointers points at a blank such as ‘ ’, ‘\t’, this pointer will keep moving until it points at a normal character which is not a blank character. At the same time, the second pointer will add a blank to the second string. If the last no-blank character is ‘.’, ‘?’ or ‘!’, the second pointer will add two blanks to the second string.
After this, the result string will be sent to the –w procession. After the –w procession, we get the final result of this line.
In this function, there also are two strings and two pointers like the uniform function. In order to have a better performance, I designed an algorithm for this function.
1) Reset the length limitation to the original limitation minus the number of leading blanks. In this way, we can easily get the real length of characters.
2) The original string pointer jumps from the start position of the new line to the expected end position, which is the start position add the length limitation.
3)If this position is in the middle of a word, then the first pointer will move backward until the pointer finds a blank character or reaches the new start position.
4)If the pointer gets a blank, it will continue move backward to finds a normal character.
5) If the pointer reaches the new start position, it will move forward to find the end of this word.
6) Using “memcpy” to copy this new line which from the new start position to the end position to the aim string.
7) Set the new start to the position of the end position and repeat the steps from 2 to 7.
I compared the run time of my program and the system fmt program. The data to test this two programs is a novel called “Jane Eyre”, which contains 21062 lines. Here is the result. (The time is the average time of 10 tests, using linux time function). I’m sure that the system’s fmt is much better than mine. There are lots of things that I didn’t considered like the proper real length, the dictionary and so on.
Here is the code: