DaleSchool

Archiving and Compression

Beginner25min

Learning Objectives

  • Bundle, compress, and extract files with tar
  • Compress individual files with gzip and bzip2
  • Work with zip/unzip for cross-platform compatibility
  • Distinguish between compression formats and choose appropriately

Working Code

Example 1: Bundling files with tar

# Create sample files
mkdir myproject
echo "Main file" > myproject/main.py
echo "Config file" > myproject/config.yaml
echo "README" > myproject/README.md

Bundle with tar (no compression):

tar -cf myproject.tar myproject/
ls -lh myproject.tar

Output:

-rw-r--r--  1 dale staff  10K Jan 15 10:30 myproject.tar

View archive contents:

tar -tf myproject.tar

Output:

myproject/
myproject/main.py
myproject/config.yaml
myproject/README.md

Example 2: Adding gzip compression

# Bundle + gzip compress (-z option)
tar -czf myproject.tar.gz myproject/
ls -lh myproject.tar.gz

Output:

-rw-r--r--  1 dale staff  512B Jan 15 10:30 myproject.tar.gz

Much smaller!

# Extract
tar -xzf myproject.tar.gz

# Extract to a different location
tar -xzf myproject.tar.gz -C /tmp/

Example 3: zip format

# Create a zip (Windows/Mac compatible)
zip -r myproject.zip myproject/

# Extract
unzip myproject.zip

# View contents
unzip -l myproject.zip

Try It Yourself

Mastering tar Options

tar options may look complex, but they follow a pattern:

tar [action] [options] [archive-name] [target]

Actions:
  -c  create
  -x  extract
  -t  list contents

Options:
  -f  specify filename (always required)
  -z  gzip compression (.tar.gz)
  -j  bzip2 compression (.tar.bz2)
  -J  xz compression (.tar.xz)
  -v  verbose (show progress)
  -C  specify extraction directory

Common commands:

# Create compressed archives
tar -czf archive.tar.gz directory/   # gzip (.tar.gz)
tar -cjf archive.tar.bz2 directory/  # bzip2 (.tar.bz2)
tar -czf archive.tgz *.txt           # multiple files

# Extract
tar -xzf archive.tar.gz
tar -xjf archive.tar.bz2
tar -xzf archive.tgz -C /target/dir/

# View contents (without extracting)
tar -tzf archive.tar.gz
tar -tjf archive.tar.bz2

# Create/extract with progress
tar -czfv archive.tar.gz large-dir/

Single-file gzip Compression

tar bundles multiple files; gzip compresses a single file:

# Compress (original is replaced with .gz)
gzip large-file.log
ls
# large-file.log.gz

# Decompress
gunzip large-file.log.gz
# or
gzip -d large-file.log.gz

# Keep original while compressing
gzip -k large-file.log
# large-file.log (kept) + large-file.log.gz (created)

# Check compression ratio
gzip -l large-file.log.gz

bzip2 and xz: Higher Compression

# bzip2 (better compression than gzip, slower)
bzip2 file.txt        # -> file.txt.bz2
bunzip2 file.txt.bz2  # -> file.txt

# xz (best compression, slowest)
xz file.txt        # -> file.txt.xz
unxz file.txt.xz   # -> file.txt

Compression Format Comparison

| Format | Extension | Compression | Speed | Compatibility | | ------ | --------- | ----------- | ------ | ---------------------- | | gzip | .gz | Moderate | Fast | Excellent | | bzip2 | .bz2 | Good | Medium | Good | | xz | .xz | Best | Slow | Moderate | | zip | .zip | Moderate | Fast | Best (Windows support) |

Choosing in practice:

  • Linux/server deployment: .tar.gz
  • Very large files: .tar.xz
  • Sharing with Windows users: .zip

"Why?" — When You Need Archives

Scenario 1: Server Deployment

# Deploy a project to a server
tar -czf deploy-20260306.tar.gz \
  --exclude='node_modules' \
  --exclude='.git' \
  myproject/

# Upload and extract on the server
scp deploy-20260306.tar.gz user@server:/tmp/
ssh user@server "tar -xzf /tmp/deploy-20260306.tar.gz -C /var/www/"

Scenario 2: Log File Archival

# Compress old logs for storage
tar -czf logs-2025.tar.gz /var/log/app/2025/
rm -r /var/log/app/2025/

# Check size
du -sh logs-2025.tar.gz

Scenario 3: Software Installation

# Download and install open source software
wget https://example.com/software-1.0.tar.gz
tar -xzf software-1.0.tar.gz
cd software-1.0/
./configure && make && sudo make install

Common Mistakes

Mistake 1: tar option order

# Wrong: -f must be followed by the filename
tar -fczv archive.tar.gz dir/   # -f grabs the next arg as filename

# Correct: put -f last (or right before the filename)
tar -czvf archive.tar.gz dir/

Mistake 2: Archiving with absolute paths

# Dangerous: archived with absolute paths
tar -czf archive.tar.gz /etc/hosts
# Extracting could overwrite /etc/hosts!

# Safe: use relative paths or cd first
cd /etc
tar -czf ~/archive.tar.gz hosts

Mistake 3: Double-compressing already compressed files

# No point in tar.gz for already compressed files
# .jpg, .mp4 and similar formats are already compressed
tar -czf images.tar.gz *.jpg  # almost no compression benefit
tar -cf images.tar *.jpg      # just bundle without compressing

Deep Dive

Excluding files from tar
# Exclude specific files/directories
tar -czf project.tar.gz project/ \
  --exclude='project/node_modules' \
  --exclude='project/.git' \
  --exclude='*.log' \
  --exclude='*.tmp'

Useful when creating deployment archives without unnecessary files.

Extracting specific files from an archive
# View contents
tar -tzf archive.tar.gz

# Extract a specific file
tar -xzf archive.tar.gz myproject/config.yaml

# Extract files matching a pattern
tar -xzf archive.tar.gz --wildcards '*.txt'
Parallel compression with pigz
# Install pigz (parallel gzip)
brew install pigz

# Parallel compression (as fast as your CPU cores)
tar -I pigz -cf archive.tar.gz large-directory/

# Parallel extraction
tar -I pigz -xf archive.tar.gz

If you compress large files frequently, pigz provides a significant speed boost.

  1. Create a practice directory with a few files in it.
  2. Compress it with tar -czf test.tar.gz directory/.
  3. View the contents with tar -tzf test.tar.gz.
  4. Extract to a different location with tar -xzf test.tar.gz -C /tmp/.
  5. Compress a single file with gzip filename and compare sizes with ls -lh.

Q1. In tar -xzf archive.tar.gz, which correctly describes each option?

  • A) -x create, -z gzip, -f specify file
  • B) -x extract, -z gzip, -f specify file
  • C) -x list, -z compress, -f force
  • D) -x extract, -z zip, -f specify file