Working Code
Example 1: Basic text analysis with grep and wc
Use pipes to process data step by step:
# Count items in notes.md
grep "item" Documents/notes.md | wc -l
Output:
2
# Find lines matching a specific pattern
cat Documents/notes.md | grep "^-"
Output:
- item 1
- item 2
Example 2: Combining head and tail
# Check both the beginning and end of a file
head -2 Documents/notes.md
echo "---"
tail -1 Documents/notes.md
Output:
# Notes
- item 1
---
- item 2
Example 3: Saving search results to a file
# Save grep results to a file
grep "item" Documents/notes.md > found.txt
cat found.txt
Output:
- item 1
- item 2
# Compare statistics across files with wc
wc -l Documents/hello.txt Documents/notes.md
Output:
1 Documents/hello.txt
3 Documents/notes.md
4 total
Try It Yourself
awk: Column-Based Text Processing
awk is a mini language designed for text processing. It excels at handling delimited data like CSV files.
Basic structure:
awk 'pattern { action }' file
Example: Basic column output
# Create a simple CSV
echo "name,score,grade" > scores.csv
echo "Alice,85,B" >> scores.csv
echo "Bob,92,A" >> scores.csv
echo "Carol,78,C" >> scores.csv
echo "Dave,65,D" >> scores.csv
# Print only the first column (-F',' sets comma as delimiter)
awk -F',' '{ print $1 }' scores.csv
Output:
name
Alice
Bob
Carol
Dave
# Print name and score only
awk -F',' '{ print $1, $2 }' scores.csv
Output:
name score
Alice 85
Bob 92
Carol 78
Dave 65
Conditional filtering
# Print rows where column 2 (score) is 80 or above
awk -F',' '$2 >= 80' scores.csv
Output:
name,score,grade
Alice,85,B
Bob,92,A
# Skip the header and filter by score (NR: line number)
awk -F',' 'NR > 1 && $2 >= 80' scores.csv
Output:
Alice,85,B
Bob,92,A
Aggregation: BEGIN and END
# Calculate the average
awk -F',' 'NR > 1 { sum += $2; count++ } END { print "Average:", sum/count }' scores.csv
Output:
Average: 80
# Pipe into awk
cat scores.csv | awk -F',' 'NR > 1 { print $1, $2 }'
Output:
Alice 85
Bob 92
Carol 78
Dave 65
"Why?" — Why You Need Text Processing Tools
Server logs, CSV data, config files — most data is text. By chaining these tools with pipes, you can analyze data quickly without spreadsheets or dedicated software.
Real-world scenario: Log analysis
Imagine you have an access log:
2024-01-15 10:30:01 INFO User login: user001
2024-01-15 10:30:05 ERROR Database connection failed
2024-01-15 10:30:10 INFO File upload complete
2024-01-15 10:31:00 WARN Memory usage exceeds 80%
2024-01-15 10:31:05 ERROR File save failed
# Filter ERROR logs only
grep "ERROR" app.log
# Count ERRORs
grep -c "ERROR" app.log
# Extract errors from a specific time window
grep "10:30" app.log | grep "ERROR"
# Save ERROR messages to a file
grep "ERROR" app.log > errors.txt
awk Key Concepts
| Concept | Description | Example |
| ------------- | -------------------------- | ------------------------- |
| $0 | Entire line | print $0 |
| $1, $2... | Each column | print $1, $3 |
| NR | Line number | NR > 1 (skip header) |
| NF | Number of fields (columns) | print NF |
| -F | Field delimiter | -F',', -F'\t' |
| BEGIN | Runs before processing | BEGIN { print "Start" } |
| END | Runs after processing | END { print sum } |
Common Mistakes
Mistake 1: awk column numbers start at 1
# Wrong: $0 is not the first column
awk -F',' '{ print $0 }' file.csv # prints the entire line
# Correct
awk -F',' '{ print $1 }' file.csv # prints the first column
Mistake 2: Forgetting to specify the delimiter
# Processing a comma-delimited CSV with default (space) delimiter
awk '{ print $1 }' scores.csv
# Treats "name,score,grade" as one field
# Correct
awk -F',' '{ print $1 }' scores.csv
Mistake 3: Always verify pipe results
# Build pipes step by step
cat scores.csv | head -3 # check step 1
cat scores.csv | head -3 | grep "8" # check step 2
Build complex pipes incrementally, verifying the output at each step.
Deep Dive
awk: Pattern matching and field operations
# Process only rows matching a pattern
awk -F',' '/A/ { print $1, "excellent" }' scores.csv
# Field arithmetic
awk -F',' 'NR > 1 { print $1, $2 * 1.1, "adjusted" }' scores.csv
# Multiple conditions
awk -F',' 'NR > 1 && $2 >= 80 && $3 == "A"' scores.csv
# Formatted output
awk -F',' 'NR > 1 { printf "%-10s %3d pts\n", $1, $2 }' scores.csv
sed: The stream editor
sed is a tool for transforming text:
# Text substitution (s/original/replacement/)
echo "Hello World" | sed 's/World/Terminal/'
# Global substitution (g flag)
echo "aaa bbb aaa" | sed 's/aaa/xxx/g'
# Delete a specific line
cat file.txt | sed '2d'
# Add line numbers
cat file.txt | sed '='
sed doesn't modify the file itself — it outputs to stdout. To edit in place, use the -i option.
Error handling in pipe chains
If a command in the middle of a pipe fails, results can be unexpected:
# set -o pipefail: treat the whole pipe as failed if any part fails
set -o pipefail
# Save intermediate results to variables for debugging
result=$(cat file.txt | grep "pattern")
echo "Result: $result"
- Create a file:
echo "a,1" > data.csv,echo "b,2" >> data.csv,echo "c,3" >> data.csv. - Print the first column with
awk -F',' '{ print $1 }' data.csv. - Filter rows where column 2 is 2 or above:
awk -F',' '$2 >= 2' data.csv. - Sum the second column:
awk -F',' '{ sum += $2 } END { print sum }' data.csv. - Count items with
cat Documents/notes.md | grep "item" | wc -l.
Q1. In awk -F',' '$2 >= 80' scores.csv, what does -F',' do?
- A) Filters values greater than 80
- B) Sets the field delimiter to a comma
- C) Specifies the file format as CSV
- D) Specifies a second file