A number of commands available for finding positions that you want in a line and extracting only a certain amount of text can be pretty cumbersome in terms of learning curve. This isn’t to say that once you get the hang of them that they’re terribly complicated but it can take a little while to get the hang of them. And when you need something fast, you might want an easy command for extracting text from lines. In these cases, consider cut. The cut command doesn’t do regular expressions (I guess you could argue that its ability to use a delimiter can be used as a regular expression) and so it’s really easy to use.
Basically, you feed cut some data and then tell it which characters in the line that you want to keep. It then gets rid of the rest. The easiest use of this is to look at a list of data. For example, let’s saw we have a file called test.txt with the following contents:
Now we’re going to cat the file (which just reads the file contents) and then pipe the output of reading that file into a cut command (which is done by simply adding a pipe character at the end of the first part of the command. Then we’re going to use the -c option of cut (which looks at character positions) to simply grab the first three positions (1-3) of the lines. The command would end up looking as follows:
cat test.txt | cut -c 1-3
And the output would look as follows (this output could then be redirected into a new file btw):
You can also specify multiple ranges of characters (or single characters for that matter). For example, to see only characters 1-2 and 5-6:
cat test.txt | cut -c 1-2,5-6
Overall, cut is a very easy to use tool, with a limitation that your pattern that you are looking to maintain must be consistent in terms of the character position that you are using in each line. It also uses every line in a file; however, to go another step and look for all positions in a line only if the line has a pattern that it can match you could simply add a grep in the middle. For example, if you’re looking for each line of our sample text file that has the number 4 then you could do:
cat test.txt | grep 4
This would show you only the last five lines of the file since those are the only lines that have that number in them. You could then pipe the output of that file into your cut and, let’s say, look for characters 1-3 and 6 in the output:
cat test.txt | grep 4 | cut -c 1-3,6
Your result would then be the following:
Finally, there are going to be times when you’re not looking for a specific character position in a line but instead a character position or a pattern that begins with another pattern. For this you’re going to end up needing to use a more advanced tool, such as awk or if you’re feelin’ frisky (maybe I’m speaking for myself there) regex. These tools will have a steeper learning curve, but ultimately be far more useful.
krypted January 17th, 2010
A regular expression “engine” is a piece of software that can process regular expressions, trying to match the pattern to the given string. Usually, the engine is part of a larger application and you do not access the engine directly. Rather, the application will invoke it for you when needed, making sure the right regular expression is applied to the right file or data.
krypted June 15th, 2007
Posted In: Unix