Hands-on: Linux refresher¶

Linux history¶

UNIX is a computer operating system originally developed in 1969 by a group of AT&T employees at Bell Labs.

UNIX-like derivatives spread from that moment:

BSD
AIX
HP-UX
Linux
etc.

UNIX Philosophy¶

Portable: Same code should work the same in different machines
Multi-tasking: Different processes can run simultaneously. Every process has a unique identifier (PID)
Multi-user: Many people can use the same machine at the same time. Users can share resources and processes
Use of plain-text for storing data: also for configuration files
Hierarchical file system
Almost everything is a file: That includes devices and some information of processes
Use of small programs all together to retrieve an output instead of an only multifunctional one

Filesystem structure. Linux and Mac¶

/ (Root)
├── bin (Binaries)
├── boot (Boot files)
├── dev (Device files)
├── etc (Configuration files)
├── home (User personal data)
├── lib (Shared libraries)
├── media (Removable media)
├── mnt (Mount directory)
├── opt (Optional software)
├── proc (Process & kernel files)
├── root (Root user home)
├── sbin (System binaries)
├── srv (Service data)
├── tmp (Temporary files)
├── usr (User binaries & data)
│   ├── bin
│   ├── sbin
│   ├── lib
│   └── share
└── var (Variable data)
    └── log

You can inspect it with ls /. We will discover more later…

Other UNIX systems (e.g., Mac OS X) are similar, but not exactly the same.

/ (Root)
├── Applications/ (User apps like Chrome, Spotify)
├── Library/ (Shared system resources & settings)
├── System/ (Core macOS files; read-only)
│   └── Applications/ (Built-in apps like Safari, Mail)
├── Users/ (Equivalent to /home in Linux)
│   ├── yourusername/ (Desktop, Documents, etc.)
│   └── Shared/ (Files accessible to all users)
├── Volumes/ (Mounted drives, USBs, and DMGs)
├── bin/ (Essential Unix binaries) -> [Hidden]
├── sbin/ (System administration binaries) -> [Hidden]
├── usr/ (User binaries, libraries, and include files) -> [Hidden]
├── dev/ (Device files) -> [Hidden]
├── private/ (Contains actual etc, var, and tmp) -> [Hidden]
│   ├── etc/ (System configuration files)
│   ├── var/ (Variable data, logs, caches)
│   └── tmp/ (Temporary files)
└── cores/ (Core dumps for debugging) -> [Hidden]

Move around¶

We use what we name the Command Line Interface (CLI) Shell, in our case Bash, as an interface with the Operating System.

Relative paths¶

cd -> go to home
cd ../ -> go one level back
cd Desktop -> go to Desktop
cd ../../ -> go two levels back
cd ../Desktop/.. -> don’t move
cd . -> go nowhere - ‘.’ means current location

Absolute paths¶

cd / -> go to root
cd /home/training/Desktop
cd ~ -> go to home

Play with ls and different modifiers: ls -l, ls -lt, ls -la, etc.

Note

Sometimes we get lost and would like to know where we are. We can use the command pwd

PRO TIP

Don’t use man, visit tldr.sh

You can use from the terminal:

uv tool install tldr
tldr ls

Create files and directories¶

To create files and folders in Linux is quite simple. You can use a number of programs for creating an empty file (touch) or an empty directory (mkdir)

touch my_beautiful_file.txt
mkdir my_beautiful_folder

To display the list of files and folder we can use the command ls

ls

my_beautiful_file.txt  my_beautiful_folder

To change the name of a file (or a directory) you can use the command mv while for copying the file you can use cp. Adding the option -r (recursive) to cp allows to copy a whole folder and its content.

mv my_beautiful_file.txt my_ugly_file.txt
mv my_beautiful_folder my_ugly_folder

cp my_ugly_file.txt my_beautiful_file.txt
cp -r my_ugly_folder my_beautiful_folder

If you omit the -r option the system will complain

cp my_ugly_folder my_other_folder

cp: omitting directory ‘my_ugly_folder’

You can use mv also for moving a file (or a directory) inside a folder. Also cp will allow you to make a copy inside a folder.

mv my_beautiful_file.txt my_beautiful_folder
cp my_ugly_file.txt my_ugly_folder

ls

my_beautiful_folder  my_ugly_file.txt  my_ugly_folder

For entering in a folder we can use the tool cd

cd my_ugly_folder

ls

my_ugly_file.txt

For going out we can move one level out

cd ../

ls

my_beautiful_folder  my_ugly_file.txt  my_ugly_folder

We can write to a file using the character >, that means output redirection.

echo "ATGTACTGACTGCATGCATGCCATGCA" > my_dna.txt

And display the content of the file using the program cat

cat my_dna.txt

ATGTACTGACTGCATGCATGCCATGCA

To convert this sequence to a RNA one we can just replace the T base with U by using the program sed. The syntax of this program is the following s/<TO BE REPLACED>/<TO REPLACE>/.

You can add a g at the end if you want to replace every character found s/<TO BE REPLACED>/<TO REPLACE>/g.

sed 's/T/U/g' my_dna.txt > my_rna.txt

cat my_rna.txt

AUGUACUGACUGCAUGCAUGCCAUGCA

Let’s remove something:

rm my_rna.txt

Let’s try to remove the previous directory:

rm my_beautiful_folder
# We cannot :O
rmdir my_beautiful_folder
# We cannot either
rm my_beautiful_folder/* && rmdir my_beautiful_folder
# Alternative
rm -r my_beautiful_folder # Recursive! Use with care

We can provide access to files and directories installed in other locations to more convenient places (e.g., for performing analyses) and so we save some space. Be careful sometimes if using shared storage.

ln -s my_dna.txt your_dna.txt

Recap¶

touch writes empty files
mkdir created empty directories
mv move files (or directory) or change their name
ls list files and directories
cp copy files and directories
cd change the directory
pwd displays where you are
echo print values to standard output
cat print the content of a file to standard output
sed replace a string with another

rm remove a file
rmdir remove an empty directory
ln provides links to files/directories in other locations

Download files¶

Then we can go back to our command line and use the program wget to download that file and using CTRL+C to paste the address:

wget ftp://ftp.ensemblgenomes.org/pub/bacteria/release-42/fasta/bacteria_22_collection/escherichia_coli_bl21_gold_de3_plyss_ag_/dna/README

Define a specific filename.

wget -O my_readme.txt ftp://ftp.ensemblgenomes.org/pub/bacteria/release-42/fasta/bacteria_22_collection/escherichia_coli_bl21_gold_de3_plyss_ag_/dna/README

Capture log (stdout and stderr)

wget -O my_readme.txt ftp://ftp.ensemblgenomes.org/pub/bacteria/release-42/fasta/bacteria_22_collection/escherichia_coli_bl21_gold_de3_plyss_ag_/dna/README > wget.log 2>&1

A similar existing tool is curl:

curl -o my_readme.txt ftp://ftp.ensemblgenomes.org/pub/bacteria/release-42/fasta/bacteria_22_collection/escherichia_coli_bl21_gold_de3_plyss_ag_/dna/README > curl.log 2>&1

curl is more powerful and you can do more stuff than simply download files (e.g., test API services)

Piping¶

Standard Streams [2]

Pipe stdout to another command:

ls | grep ".txt"

Lists files, then filters for .txt files.

Redirect stdout to a file (overwrite):

echo "Hello" > output.txt

Append stdout to a file:

echo "World" >> output.txt

Redirect stderr to a file:

ls non_existing_file 2> error.log

Redirect both stdout and stderr to a file:

command > all_output.log 2>&1
# Use the wget or curl example above

Basic commands for manipulating text files¶

We will use some of them during the course in practical examples.

more / less paginate contents of a file if it is large and cat is not convenient
tar can be used for pack and unpack several files in a single archive
gzip / gunzip are used for compressing and uncompressing files - normally resulting in files with .gz extension
zcat is like cat but allowing to view gzipped files
grep finds patterns
cut splits the contents, normally in a line-by-line basis
head prints the starting contents
tail prints the ending contents
sort sorts the content of a file or stdin
wc counts words, characters, or lines

Below a couple of examples:

# Let's download a gzipped FASTQ file
wget https://biocorecrg.github.io/RNAseq_coursesCRG_2026/latest/data/reads/SRR3091420_1_chr6.fastq.gz
# We see its contents without uncompressing it
zcat SRR3091420_1_chr6.fastq.gz
# We pipe it to the paginator
zcat SRR3091420_1_chr6.fastq.gz | less
# We look for 55 within the file
zcat SRR3091420_1_chr6.fastq.gz | grep 55
# We look for 55 within the file and we read interactively
zcat SRR3091420_1_chr6.fastq.gz | grep 55 | less
# We get the first 10 lines of the file
zcat SRR3091420_1_chr6.fastq.gz | head -n 10
# We get the last 10 lines of the file
zcat SRR3091420_1_chr6.fastq.gz | tail -n 10
# We uncompress the file
gunzip SRR3091420_1_chr6.fastq.gz
# We paginate the uncompressed file
less SRR3091420_1_chr6.fastq
# We compress it back
gzip SRR3091420_1_chr6.fastq

# Let's download a tar archive
wget https://biocorecrg.github.io/RNAseq_coursesCRG_2026/latest/data/annotation/reference_chr6_Hsapiens.tar.gz
# Let's inspect its contents
tar tf reference_chr6_Hsapiens.tar.gz
# reference_chr6/
# reference_chr6/Homo_sapiens.GRCh38.dna.chrom6.fa.gz
# reference_chr6/Homo_sapiens.GRCh38.115.chr6.gtf.gz
# reference_chr6/gencode.v49.transcripts.chr6.fa.gz
# Let's extract a specific file
tar xf reference_chr6_Hsapiens.tar.gz reference_chr6/Homo_sapiens.GRCh38.115.chr6.gtf.gz
# Let's inspect its last 10 lines
zcat reference_chr6/Homo_sapiens.GRCh38.115.chr6.gtf.gz | tail -n 10
# Let's extract the fourth column of the 100 last lines and we place the contents into a file
zcat reference_chr6/Homo_sapiens.GRCh38.115.chr6.gtf.gz | tail -n 100 | cut -f 4 > start-points.txt
# Let's inspect that file and we sort is contents
less start-points.txt
sort start-points.txt | less
sort start-points.txt | wc -l
sort -u start-points.txt | less
sort -u start-points.txt | wc -l

Running programs¶

Making and Running a Bash Script¶

Create a file called myscript.sh with this content:
```
echo 'echo "Hello, world!"' > myscript.sh
```
Make it executable:
```
chmod +x myscript.sh
```
Run it:
```
./myscript.sh
```

Finding running programs¶

When you run a program in Linux, the shell searches for its executable in directories listed in the PATH environment variable. You can check your current PATH with:

echo $PATH

$PATH is one of the many available environment variables. You can check all them with env.

If you want to add a directory to your PATH permanently, for example bin subdirectory in your $HOME directory, edit your .bashrc file (in your home directory) and add:

export PATH="$PATH:~/bin"

After saving .bashrc, reload it with:

source ~/.bashrc

Environment variables like PATH control how programs are found and executed. Modifying .bashrc lets you customize your shell environment.

We can test it placing myscript.sh in the new PATH folder we defined:

mv myscript.sh ~/bin

Running containers¶

We will use already pre-made containers, so we will not need installing so many programs (and struggle with their dependencies).

For executing the tools from the container images, we will use Apptainer/Singularity.

Why Apptainer/Singularity?

Not as popular as Docker, but very convenient for Bioinformatics and HPC environments
- You don’t need to worry so much about different kind of UNIX permissions

For the command-line part of the course not happening in RStudio, we will use this container image we generated with Seqera Containers.

singularity pull RNAseq_course.sif docker://community.wave.seqera.io/library/fastq-screen_fastqc_kraken2_multiqc_pruned:3161b0b514b51263

singularity exec -e RNAseq_course.sif fastqc --version

Along the course we will use this shortcut for no needing to type so much:

# Important: RNASeq_course.sif must be in home ~
export RUN="singularity exec -e $HOME/RNAseq_course.sif"
$RUN fastqc --version

Further recommendations¶

You can browse the contents of a directory in a Graphical User Interface (GUI) calling the file explorer program and the target directory.

# This opens Nautilus - GUI file explorer in the current directory 
# Used in Ubuntu GNOME environment
nautilus .

There are also many CLI text editors available (e.g., vim, nano, etc.), but we can also choose to use GUI editors.

By default, by using open <file> command, we will open the actual file with the default application associated with that file type.

open myfile.txt

We expect that text files in the Ubuntu GNOME environment will be opened with: gnome-text-editor.

Tip

If you execute open <directory>, it will actually open that directory with the default file explorer.

# Open current directory in a file explorer
open .