As using Bash commands is inevitable if you work on NLP and MT tasks, I thought it would be useful to list the majority of commands I learnt to use on a daily base, thanks to practice, searching, and helpful colleagues I met over years. Obviously, this is not an exclusive list; however, I hope it includes most of the one-line Bash commands you would need. Please note the majority of these commands have been mainly tested on Linux.
Table of Contents
- File Management
- Reading Files
- Nano Editor Commands
- Finding
- Downloading
- Compressing and Extracting
- Server-related Bash Commands
- Other Useful Packages
File Management
Open a directory:
cd <path/dir_name>
List the files and sub-directories in the current directory:
ls
Create a new directory:
mkdir <dir_name>
Rename or move a file or directory:
mv <old_filename> <new_filename>
Move a file to a directory:
mv <old_filename> <dir_name>
Move all files whose name starting with a string, using *:
mv <old_filename>* <folder_name>
Rename multiple files: (details) rename ‘s/<original_string>/<new_string>/g’ *
Delete a file:
rm <file_name>
To delete multiple files, just add them after the rm command separated by spaces:
rm <file_name1> <file_name2> <file_name3>
Delete any file that starts with “wow”, using *:
rm wow*
Delete a directory and its contents:
rm -r <dir_name>
Avoid deleting files by mistake by using trash instead of rm, installing trash-cli:
sudo apt-get install trash-cli
• Delete:
trash <file_name>
• List trashed items:
trash-list
• Restore a file (first move to the root folder or a specific folder):
restore-trash and then type a number.
• Empty the trash list:
trash-empty
Copy a file:
cp <original_filename> <new_filename>
Copy a directory and its contained files (at least -r is required):
cp -avr <original_dirname> <new_dirname>
Copy and show a progress bar (good for large files)
rsync -ah --progress <source> <destination>
Complete a command or file name (e.g. my_file_name.txt):
- Type myand then press Tab – once if there is no other file starting with “my”.
 OR
- Type myand then press Tab – twice if you want to know what files starting with “my”.
Move to a location in a command or text:
Move the cursor to the location, press Alt or Option, and click.
Clear the current window:
- Type clear
 OR
- Press Ctrl+l
End the current command (before it finishes):
- Press Ctrl+c
Move to the last accessed path:
cd -
List your previous commands
history
Search your command history
Ctrl+r
List the *.txt files in the current directory (or path):
ls *.txt
Show the files in all folders that starts with “aaa”:
ls aaa*
Show files and subdirectories in all directories in the current directory:
ls *
List all the files with details:
ls -l
Display file details:
ls -l <file_name>
List all the files with details, the size is in MB/GB:
ls -lh
List all the files with details, the size in MB/GB, arrange by time ascendingly:
ls -lht ls -lht <dir_name1>/*/<dir_name2>
List all the files with details, the size in MB/GB, arrange by time ascendingly:
ls -lhtr
List file sizes only for all files in the current directory:
ls -hs
OR
du
Display the file size only:
ls -hs <file_name> OR
du -h <file_name> OR for one size for all directories and files du -hs <file_name>
Display the last modified file:
ls -t | head -1
Display sizes of the current directory:
du -d 1 -h . Sort the results in ascending order:
du -d 1 -h . | sort -h Sort the results in descending order:
du -d 1 -h . | sort -h -r
Find files the are bigger than 200MB:
find /home/$USER/ -type f -size +200000k -exec ls -lh {} \; | awk '{ print $9 ": " $5 }'
Display file size with stat (Linux):
stat –printf=”%s” <file_name>
Display file last edited time (Linux):
stat -c %y <file_name>
Display file last edited time (Mac):
stat -x <file_name>
Get the current path (print working directory):
pwd
Create a symbolic link, i.e. a shortcut to a file or directory:
ln -s <file_name> <shortcut_name>
Get the path of a file:
readlink -f <file_name> OR echo “$(pwd)/file_name” OR realpath <file_name>
Get word count in a file:
wc <file_name>
Get the number of lines in a file:
wc -l <file_name>
Count lines of all file in subdirectories; use * if the file name is partial (details):
find ./ -type f -name “<file_name>” -exec wc -l {} +
Count lines in a*. gz file, use -c to avoid writing the uncompressed file to desk:
gunzip -c <file_name.gz> | wc -l
Split a file into multiple files, 3000 lines each, with numeric-suffixes:
split -a 4 -d -l 3000 <file_name> <prefix> –additional-suffix <extension>
Find out if two files are identical:
cmp –silent first_file_name second_file_name | echo “——> Files are different.”
Find out the difference between two files:
diff <file_name1> <file_name2>
Find different lines in file1.txt compared to file2.txt:
comm -23 <(sort file1.txt) <(sort file2.txt) > different.txt
Find common lines in both file1.txt and file2.txt:
comm -12 <(sort file1.txt) <(sort file2.txt) > common.txt
Complete a long command in a new line:
\
Reading Files
Read the whole file:
cat <file_name>
Read the whole file; display line numbers:
cat -n <file_name>
Read the first 10 lines of a file:
head <file_name>
Read the first 4 lines of a file:
head -4 <file_name> OR
head -n 4 <file_name>
Read the first 3 lines of two files:
head -q -n <file_name1> <file_name2>
Read the last 10 lines of a file:
tail <file_name>
Read the last 3 lines of a file:
tail -3 <file_name> OR
tail -n 3 <file_name>
Read a specific line of a file, e.g. line #10:
sed -n 10p <file_name>
Read the end of the file and use -f to update the output:
tail -f <file_name>
Use Ctrl+c to exit.
Read a file in chunks:
less <file_name> Press Enter to move to the next chunk of the file, and “q” to quick.
Read a file in chunks, display line numbers:
less -N <file_name>
Disable sending to stdout (i.e. printing in Terminal) by adding 1> /dev/null
cat <file_name1> <file_name2> | tee <output_file_name> 1> /dev/null
Processing
Merge two files, use > to create the output file:
cat <file_name1> <file_name2> > <output_file>
Merge all the files that ends with (say “.en”) to a file (e.g. “all.en”):
cat *.en > all.en
Merge all the files in the current folder:
cat * > <output_file_name>
Merge the source text and target translation into one tab-delimited file
paste -d "\t" all.en all.ar > all.enar
Remove duplicates from a file
sort -S 95% --parallel=8 all.enar | uniq -u > all.unique.enar
Shuffle
shuf all.unique.enar > all.unique.shuf.enar
Split into the source and target from a one tab-delimited file into two files
cut -f 1 all.unique.shuf.enar > all.unique.en
cut -f 2 all.unique.shuf.enar > all.unique.ar
Replace “abc” with “XYZ” in a file
sed -i -e 's/abc/XYZ/g' /tmp/file.txt
Nano Editor Commands
Create a new file:
nano <new_file_name>
Open an existing file:
nano <file_name>
Open multiple files:
nano <file_name1> <file_name2>
Search the current file:
Ctrl+w
Move to the end of the file:
Ctrl+w and then Ctrl+v
Move to the end of the line:
Ctrl+e
Move to the start of the line:
Ctrl+a
Delete the current line:
Ctrl+k
Move a page down:
Ctrl+v
Move a page up:
Ctrl+y
Cut the curret line
Ctrl+k
Mark text:
Ctrl+Shift+6 (i.e. it is Ctrl+^) and then move in the direction to you need.
Cut the marked text:
Ctrl+k
Paste the cut text:
Ctrl+u
Note to be able to pate across multiple files, the second file must be open first open the two files, copy/cut from the first file, close it, and then paste to the second file.
Close the current file:
Ctrl+x
You will be prompted if you want to save; type “y” for yes and “n” for no. If you select to save, just press Enter to keep the current file name. You can also move between two open files as in the next command.
Move between two open files:
alt+. to move forward one file.
alt+, move backward one file.
Note that if you are on Mac, Option+. and Option+, are used to insert ≥≤ symbols, so you need to first press Alt+Command+O to change the behaviour of Option in Terminal.
Finding
Find a file that includes a word (e.g. “really great” in *.txt files):
grep “really great” *.txt
Search sub-directories recursively using grep:
grep -r <word_to_search> * OR
grep -R <word_to_search> *
Use regular expressions with grep, e.g. the only word in the line is ‘nan’:
grep ^nan$ <file_name>
Find a file on the machine by name:
sudo find / -name <file_name>
Find all files in directory and subdirectories that end with *.en:
find “$PWD” -type f | grep ‘.en$’
Find all files in directory and subdirectories that has ‘aaa’ followed with some text:
find “$PWD” -type f | grep “aaa*”
Find files in the current directory that either whose name or content includes “wonderful”:
ls | grep “wonderful”
If you have very long list generated by ls and want to display them page by page:
ls | less
List files whose names include a range of numbers:
ls model.0{1..3}*
List files whose names include different letters:
ls model.{a,b,c,d}
Move multiple files (or run any command on multiple files):
- add the difference between { }separated by a comma.
Find installed Python3 packages:
pip3 freeze
Find installed Python3 packages that start with “tensor”, use -i to ignore case:
pip3 freeze | grep -i tensor
Find the location of a command (e.g python3):
which python3
Downloading
Download a file using curl:
curl <http://some.url> –output <file_name>
If this is the first time to use curl, you might get a message like “Command ‘curl’ not found, but can be installed with:
sudo apt install curl
Download a file that requires cookies:
curl –cookie <cookies.txt> <http://some.url> –output <file_name> To get the “cookies.txt” file, you can use a Chrome extension like “cookies.txt” to export cookies into a TXT file.
Copy GitHub repository to the machine:
git clone https://github.com/USERNAME/REPOSITORYNAME
Update a downloaded GitHub repository:
cd <repository_dir_name> git pull git checkout master
Stage and Commit a GitHub repository (details)
git add <file_name\>
git commit -m “Message, e.g. Update file”
git push origin main
The default branch is usually called “master” or “main” – if it is not, replace it with the right name.
Compressing and Extracting
Extract a *.zip file:
unzip <file_name>
Create a zip archive from file(s):
zip <archive_filename> <file_list>
Create a zip archive from a directory with high level of compression:
zip -r -9 <archive_filename.zip> <dir_name>
Extract a *.gz file:
gunzip <file_name.gz>
Compress all the files separately as file_name.gz
cd <dir_name>
gzip *
Compress all the files in the same directory even if there are subdirectories:
cd <dir_name>
gzip -r .
Extract a *.tar.gz file:
tar xzvf <file_name.tar.gz>
Extract a *.tgz file:
tar xzvf <file_name.tgz>
Extract in a different directory:
tar xzvf <file_name.tgz> -C </path/dir_name>
OR
gunzip -c <file_name.tgz> | tar xvf -
Create a *.tar archive:
tar -czvf archive.tar.gz <dir_name>
Create a *.tar archive from multiple files/directories:
tar -czvf <archive_file_name.tar.gz> <file_name1> <file_name2>
Compress as *.tar.bz2 (higher compression):
tar -jcvf <archive_name.tar.bz2> <file_dir_name>
Extract a *.tar.bz2 archive:
tar -jxvf <archive_name.tar.bz2>
Compress all the files separately as file_name.bz2
bzip2 *
Extract file_name.bz2 (without tar)
bzip2 -d <file_name.bz2>
Server-related Bash Commands
Obviously, many of these commands can be used locally, but they are most useful while working on servers.
Find out the server date and time:
date
Measure time taken to run a script or command:
time <python3 script.py>
Find out the space on the desk:
df -h
Create an alias for a command: (details)
alias <command>
To save aliases, put this in ~/.bash_aliases
nano ~/.bash_aliases
For example, you can add this command to the ~/.bash_aliases file, use quotes for multi-word commands:
alias frz="pip3 freeze"
For the alias change to take effect
source ~/.bash_aliases
OR
exec bash
The next time you typefrzin the Terminal, it will run the commandpip3 freeze
Repeat the same command
watch
Avoid ending a command if the local Terminal is closed:
screen
Create a new screen with a name:
screen -S <name>
Create a new screen with logging enabled; screenlog.0 is created:
screen -L -S <name>
Detach the current screen:
Ctrl+a+d
Resume a single screen:
screen -r
Resume a screen from multiple running screen:
screen -r <name> OR screen -r <id>
List the currently running screens:
screen -list screen -ls
End a screen:
screen -X -S <id> quit
or resume the screen and then
Ctrl+a then k then y
Shutdown the machine after finishing a command — separate them with ;
python3 file.py; sudo shutdown
Adjust File permissions, access by the current user only:
chmod 700 <file_name>
For example, this is required before using the *.pem key file provided by AWS E2.
Display RAM used:
free -m
Display GPU memory used:
nvidia-smi
Find the CUDA version:
nvcc –version
Run a command continously (optionally use -n 
watch
Check kernel termination errors (use one of these commands)
dmesg
OR
nano /var/log/kern.log
Check currently running processes - use grep if you are looking for a specific type of processes:
ps -ef | grep python3
Move a file from a server (e.g. AWS2) to the local Machine (run it from the local machine):
scp <file_name> <user>@<serpver_ip:port>:/<dir_name>
Move a directory from a server (e.g. AWS2) to the local Machine; use -r (run it from the local machine):
scp -r <dir_name> <user>@<serpver_ip:port>:/<dir_name>
Move a file from AWS2 to the local Machine (run it from the local machine):
scp -i <key.pem> <file_name> ubuntu@ec2[…].compute.amazonaws.com:~/<dir_name>
Move a file from the local machine to a server (run it from the local machine):
scp <user>@<server_ip:port>:/<dir_name>/<file_name> </path/on/the/local/machine>
Move a file from Google Could to the local machine:
gcloud compute scp –project <project_name> –recurse <user_name>@machine_name:~/<dir_name>/<file_name> </path/on/the/local/machine>
Log out of the current connection (and similar senarios):
Ctrl+d
Other Useful Packages
Among useful packages that you might want to install yourself are:
- curlor- wgetfor downloading files, or- aria2cfor faster download
- trash-clifor trashing unwanted files into a folder instead of using the- rmcommand
- treefor displaying the directory structure
- htopfor monitoring CPU resources
- locatefor quickly finding files by name after- updatedb
- ackfor searching files like- grep, but faster
- parallelfor multithreading from the bash
- s3cmdfor uploading and downloading files between AWS S3 buckets and non-AWS servers. For AWS E2 servers, use the- aws s3command instead.