Bash Commands for NLP Engineers

As using Bash commands is inevitable if you work on NLP and MT tasks, I thought it would be useful to list the majority of commands I learnt to use on a daily base, thanks to practice, searching, and helpful colleagues I met over years. Obviously, this is not an exclusive list; however, I hope it includes most of the one-line Bash commands you would need. Please note the majority of these commands have been mainly tested on Linux.

File Management

Fundamentals

Open a directory:
cd <path/dir_name>

List the files and sub-directories in the current directory:
ls

Create a new directory:
mkdir <dir_name>

Rename or move a file or directory:
mv <old_filename> <new_filename>

Move a file to a directory:
mv <old_filename> <dir_name>

Move all files whose name starting with a string, using *:
mv <old_filename>* <folder_name>

Rename multiple files: (details)
rename 's/<original_string>/<new_string>/g' *

Delete a file:
rm <file_name>

To delete multiple files, just add them after the rm command separated by spaces:
rm <file_name1> <file_name2> <file_name3>

Delete any file that starts with “wow”, using *:
rm wow*

Delete a directory and its contents:
rm -r <dir_name>

Copy a file:
cp <original_filename> <new_filename>

Copy a directory and its contained files (at least -r is required):
cp -avr <original_dirname> <new_dirname>

Create a new text file:
nano <file_name>

Complete a command or file name (e.g. my_file_name.txt):
Type my and then press Tab – once if there is no other file starting with “my”.
OR
Type my and then press Tab – twice if you want to know what files starting with “my”.

Move to a location in a command or text:
Move the cursor to the location, press Alt or Option, and click.

Clear the current window:
Type clear
OR
Press Ctrl+l

End the current command (before it finishes):
Press Ctrl+c

Advanced

Move to the last accessed path:
cd -

List the *.txt files in the current directory (or path):
ls *.txt

Show the files in all folders that starts with “aaa”:
ls aaa*

Show files and subdirectories in all directories in the current directory:
ls *

List all the files with details:
ls -l

Display file details:
ls -l <file_name>

List all the files with details, the size is in MB/GB:
ls -lh

List all the files with details, the size in MB/GB, arrange by time ascendingly:
ls -lht
ls -lht <dir_name1>/*/<dir_name2>

List all the files with details, the size in MB/GB, arrange by time ascendingly:
ls -lhtr

List file sizes only for all files in the current directory:
ls -hs
OR
du

Display the file size only:
ls -hs <file_name>
OR
du <file_name>

Display file size with stat (Linux):
stat --printf="%s" <file_name>

Display file last edited time (Linux):
stat -c %y <file_name>

Display file last edited time (Mac):
stat -x <file_name>

Get the current path (print working directory):
pwd

Create a symbolic link, i.e. a shortcut to a file or directory:
ln -s <file_name> <shortcut_name>

Get the path of a file:
readlink -f <file_name>
OR
echo "$(pwd)/file_name"
OR
realpath <file_name>

Get word count in a file:
wc <file_name>

Get the number of lines in a file:
wc -l <file_name>

Count lines of all file in subdirectories; use * if the file name is partial (details):
find ./ -type f -name "*<file_name>*" -exec wc -l {} +

Count lines in a*. gz file, use -c to avoid writing the uncompressed file to desk:
gunzip -c <file_name.gz> | wc -l

Split a file into multiple files, 3000 lines each:
split -l 3000 <file_name> <prefix>

Find out if two files are identical:
cmp --silent first_file_name second_file_name || echo "——> Files are different."

Find out the difference between two files:
diff <file_name1> <file_name2>

Complete a long command in a new line:
\

Reading files

Read the whole file:
cat <file_name>

Read the whole file; display line numbers:
cat -n <file_name>

Merge two files, use > to create the output file:
cat <file_name1> <file_name2> > <output_file>

Read the first 10 lines of a file:
head <file_name>

Read the first 4 lines of a file:
head -4 <file_name>
OR
head -n 4 <file_name>

Read the first 3 lines of two files:
head -q -n <file_name1> <file_name2>

Read the last 10 lines of a file:
tail <file_name>

Read the last 3 lines of a file:
tail -3 <file_name>
OR
tail -n 3 <file_name>

Read the end of the file and use -f to update the output:
tail -f <file_name>
Use Ctrl+C to exit.

Read a file in chunks:
less <file_name>
Press Enter to move to the next chunk of the file, and “q” to quick.

Read a file in chunks, display line numbers:
less -N <file_name>

Merge two text files:
cat <file_name1> <file_name2> | tee <output_file_name>
Note they must end with a new line to avoid spoiling the last line.

Merge all the files in the current folder:
cat * | tee <output_file_name>

Disable sending to stdout (i.e. printing in Terminal) by adding 1> /dev/null
cat <file_name1> <file_name2> | tee <output_file_name> 1> /dev/null

Finding

Find a file that includes a word (e.g. “really great” in *.txt files):
grep "really great"  *.txt

Search sub-directories recursively using grep:
grep -r <word_to_search> *
OR
grep -R <word_to_search> *

Find a file on the machine by name:
sudo find / -name <file_name>

Find all files in directory and subdirectories that end with *.en:
find "$PWD" -type f | grep '\.en$'

Find all files in directory and subdirectories that has ‘aaa’ followed with some text:
find "$PWD" -type f | grep "aaa*"

Find files in the current directory that either whose name or content includes “wonderful”:
ls | grep “wonderful”

If you have very long list generated by ls and want to display them page by page:
ls | less

List files whose names include a range of numbers:
ls model.0{1..3}*
List files whose names include different letters:
ls model.{a,b,c,d}

Find installed Python3 packages:
pip3 freeze

Find installed Python3 packages that start with “tensor”, use -i to ignore case:
pip3 freeze | grep -i tensor

Find the location of a command (e.g python3):
which python3

Downloading

Download a file using curl:
curl <http://some.url> --output <file_name>

If this is the first time to use curl, you might get a message like “Command ‘curl’ not found, but can be installed with:
sudo apt install curl

Download a file that requires cookies:
curl --cookie <cookies.txt> <http://some.url> --output <file_name>
To get the “cookies.txt” file, you can use a Chrome extension like “cookies.txt” to export cookies into a TXT file.

Copy GitHub repository to the machine:
git clone https://github.com/USERNAME/REPOSITORYNAME

Update a downloaded GitHub repository:
cd <repository_dir_name>
git pull
git checkout master
The default branch is usually called master – if it is not, replace it with the right name.

Compressing and Extracting

Extract a *.zip file:
unzip <file_name>

Create a zip archive from file(s):
zip <archive_filename> <file_list>

Create a zip archive from a directory:
zip -r <archive_filename.zip> <dir_name>

Extract a *.gz file:
gunzip <file_name.gz>

Compress all the files in the same folder as *.gzip:
cd <dir_name>
gzip *

Compress all the files in the same directory even if there are subdirectories:
cd <dir_name>
gzip -r .

Extract a *.tar.gz file:
tar xzvf <file_name.tar.gz>

Extract a *.tgz file:
tar xzvf <file_name.tgz>

Extract in a different directory:
tar xzvf <file_name.tgz> -C </path/dir_name>
OR
gunzip -c <file_name.tgz> | tar xvf -

Create a *.tar archive:
tar -czvf archive.tar.gz <dir_name>

Create a *.tar archive from multiple files/directories:
tar -czvf <archive_file_name.tar.gz> <file_name1> <file_name2>

Compress as *.bz2 (higher compression):
tar -jcvf <archive_name.tar.bz2> <file_dir_name>

Extract a *.bz2 archive:
tar -jxvf <archive_name.tar.bz2>

Nano Editor

Create a new file:
nano <new_file_name>

Open an existing file:
nano <file_name>

Open multiple files:
nano <file_name1> <file_name2>

Search the current file:
Ctrl+w

Move to the end of the file:
Ctrl+w and then Ctrl+v

Move to the end of the line:
Ctrl+e

Move to the start of the line:
Ctrl+a

Move a page down:
Ctrl+v

Move a page up:
Ctrl+y

Mark text:
Ctrl+Shift+6 (i.e. it is Ctrl+^) and then move in the direction to you need.

Cut the marked text:
Ctrl+k

Paste the cut text:
Ctrl+u
Note to be able to pate across multiple files, the second file must be open first open the two files, copy/cut from the first file, close it, and then paste to the second file.

Close the current file:
Ctrl+x
You will be prompted if you want to save; type “y” for yes and “n” for no. If you select to save, just press Enter to keep the current file name. You can also move between two open files as in the next command.

Move between two open files:
alt+. to move forward one file.
alt+, move backward one file.
Note that if you are on Mac, Option+. and Option+, are used to insert ≥≤ symbols, so you need to first press Alt+Command+O to change the behaviour of Option in Terminal.

Advanced Commands

Find out the server date and time:
date

Measure time taken to run a script or command:
time <python3 script.py>

Find out the space on the desk:
df -h

Check free memory (Linux):
free -m

Create an alias for a command: (details)
alias <command>

Avoid ending a command if the local Terminal is closed (useful on severs):
screen

Create a new screen with a name:
screen -S <name>

Detach the current screen:
Ctrl+A+D

Resume a single screen:
screen -r

Resume a screen from multiple running screen:
screen -r <name>
OR
screen -r <id>

List the currently running screens:
screen -list

End a screen:
screen -X -S <id> quit

Shutdown the machine after finishing a command — separate them with ;
python3 file.py; sudo shutdown
For other shutdown commands, check this answer.

Adjust File permissions, access by the current user only:
chmod 700 <file_name>
For example, this is required before using the *.pem key file provided by AWS E2.

Move a file from AWS2 to the local Machine (run it from the local machine):
scp <file_name> <user>@<serpver_ip:port>:/<dir_name>

Move a file from AWS2 to the local Machine (run it from the local machine):
scp -i <key.pem> <file_name> ubuntu@ec2[...].compute.amazonaws.com:~/<dir_name>

Move a file from the local machine to a server (run it from the local machine):
scp <user>@<server_ip:port>:/<dir_name>/<file_name> </path/on/the/local/machine>

Move a file from Google Could to the local machine:
gcloud compute scp --project <project_name> --recurse <user_name>@machine_name:~/<dir_name>/<file_name> </path/on/the/local/machine>
To move multiple files, just add the difference between { } separated by a comma.

Log out of the current connection:
Ctrl+D

No votes yet.
Please wait...

Leave a Reply

Your email address will not be published. Required fields are marked *