Yasmin Moslem

Machine Translation Researcher.

Bash Commands for NLP Engineers

02 Aug 2020 » mt

As using Bash commands is inevitable if you work on NLP and MT tasks, I thought it would be useful to list the majority of commands I learnt to use on a daily base, thanks to practice, searching, and helpful colleagues I met over years. Obviously, this is not an exclusive list; however, I hope it includes most of the one-line Bash commands you would need. Please note the majority of these commands have been mainly tested on Linux.


Table of Contents

File Management


Open a directory:

cd <path/dir_name>

List the files and sub-directories in the current directory:

ls

Create a new directory:

mkdir <dir_name>

Rename or move a file or directory:

mv <old_filename> <new_filename>

Move a file to a directory:

mv <old_filename> <dir_name>

Move all files whose name starting with a string, using *:

mv <old_filename>* <folder_name>

Rename multiple files: (details) rename ‘s/<original_string>/<new_string>/g’ *

Delete a file:

rm <file_name>

To delete multiple files, just add them after the rm command separated by spaces:

rm <file_name1> <file_name2> <file_name3>

Delete any file that starts with “wow”, using *:

rm wow*

Delete a directory and its contents:

rm -r <dir_name>

Avoid deleting files by mistake by using trash instead of rm, installing trash-cli:

sudo apt-get install trash-cli
• Delete:
trash <file_name>
• List trashed items:
trash-list
• Restore a file (first move to the root folder or a specific folder):
restore-trash and then type a number.
• Empty the trash list:
trash-empty

Copy a file:

cp <original_filename> <new_filename>

Copy a directory and its contained files (at least -r is required):

cp -avr <original_dirname> <new_dirname>

Complete a command or file name (e.g. my_file_name.txt):

  • Type my and then press Tab – once if there is no other file starting with “my”.
    OR
  • Type my and then press Tab – twice if you want to know what files starting with “my”.

Move to a location in a command or text: Move the cursor to the location, press Alt or Option, and click.

Clear the current window:

  • Type clear
    OR
  • Press Ctrl+l

End the current command (before it finishes):

  • Press Ctrl+c

Move to the last accessed path:

cd -

List your previous commands

history

Search your command history
Ctrl+r

List the *.txt files in the current directory (or path):

ls *.txt

Show the files in all folders that starts with “aaa”:

ls aaa*

Show files and subdirectories in all directories in the current directory:

ls *

List all the files with details:

ls -l

Display file details:

ls -l <file_name>

List all the files with details, the size is in MB/GB:

ls -lh

List all the files with details, the size in MB/GB, arrange by time ascendingly:

ls -lht ls -lht <dir_name1>/*/<dir_name2>

List all the files with details, the size in MB/GB, arrange by time ascendingly:

ls -lhtr

List file sizes only for all files in the current directory:

ls -hs
OR
du

Display the file size only:

ls -hs <file_name> OR
du <file_name>

Display the last modified file:

ls -t head -1

Display sizes of the current directory:

du -d 1 -h . Sort the results in ascending order:
du -d 1 -h . | sort -h Sort the results in ascending order:
du -d 1 -h . | sort -h -r

Find files the are bigger than 200MB:

find /home/$USER/ -type f -size +200000k -exec ls -lh {} \; awk ‘{ print $5 “ –> “ $9 }’

Display file size with stat (Linux):

stat –printf=”%s” <file_name>

Display file last edited time (Linux):

stat -c %y <file_name>

Display file last edited time (Mac):

stat -x <file_name>

Get the current path (print working directory):

pwd

Create a symbolic link, i.e. a shortcut to a file or directory:

ln -s <file_name> <shortcut_name>

Get the path of a file:

readlink -f <file_name> OR echo “$(pwd)/file_name” OR realpath <file_name>

Get word count in a file:

wc <file_name>

Get the number of lines in a file:

wc -l <file_name>

Count lines of all file in subdirectories; use * if the file name is partial (details):

find ./ -type f -name “<file_name>” -exec wc -l {} +

Count lines in a*. gz file, use -c to avoid writing the uncompressed file to desk:

gunzip -c <file_name.gz> wc -l

Split a file into multiple files, 3000 lines each:

split -l 3000 <file_name> <prefix>

Find out if two files are identical:

cmp –silent first_file_name second_file_name   echo “——> Files are different.”

Find out the difference between two files:

diff <file_name1> <file_name2>

Complete a long command in a new line:

\

Reading Files


Read the whole file:

cat <file_name>

Read the whole file; display line numbers:

cat -n <file_name>

Merge two files, use > to create the output file:

cat <file_name1> <file_name2> > <output_file>

Merge all the files that ends with (say “.en”) to a file (e.g. “all.en”):

cat *.en > all.en

Read the first 10 lines of a file:

head <file_name>

Read the first 4 lines of a file:

head -4 <file_name> OR
head -n 4 <file_name>

Read the first 3 lines of two files:

head -q -n <file_name1> <file_name2>

Read the last 10 lines of a file:

tail <file_name>

Read the last 3 lines of a file:

tail -3 <file_name> OR
tail -n 3 <file_name>

Read a specific line of a file, e.g. line #10:

sed -n 10p <file_name>

Read the end of the file and use -f to update the output:

tail -f <file_name>
Use Ctrl+c to exit.

Read a file in chunks:

less <file_name> Press Enter to move to the next chunk of the file, and “q” to quick.

Read a file in chunks, display line numbers:

less -N <file_name>

Merge two text files:

cat <file_name1> <file_name2> | tee <output_file_name>
Note they must end with a new line to avoid spoiling the last line.

Merge all the files in the current folder:

cat * tee <output_file_name>

Disable sending to stdout (i.e. printing in Terminal) by adding 1> /dev/null

cat <file_name1> <file_name2> tee <output_file_name> 1> /dev/null

Remove duplicates from a file

sort -u input.txt > output.txt

Nano Editor Commands


Create a new file:

nano <new_file_name>

Open an existing file:

nano <file_name>

Open multiple files:

nano <file_name1> <file_name2>

Search the current file:

Ctrl+w

Move to the end of the file:

Ctrl+w and then Ctrl+v

Move to the end of the line:

Ctrl+e

Move to the start of the line:

Ctrl+a

Move a page down:

Ctrl+v

Move a page up:

Ctrl+y

Mark text:

Ctrl+Shift+6 (i.e. it is Ctrl+^) and then move in the direction to you need.

Cut the marked text:

Ctrl+k

Paste the cut text:

Ctrl+u

Note to be able to pate across multiple files, the second file must be open first open the two files, copy/cut from the first file, close it, and then paste to the second file.

Close the current file:

Ctrl+x

You will be prompted if you want to save; type “y” for yes and “n” for no. If you select to save, just press Enter to keep the current file name. You can also move between two open files as in the next command.

Move between two open files:

alt+. to move forward one file.
alt+, move backward one file.

Note that if you are on Mac, Option+. and Option+, are used to insert ≥≤ symbols, so you need to first press Alt+Command+O to change the behaviour of Option in Terminal.

Finding


Find a file that includes a word (e.g. “really great” in *.txt files):

grep “really great” *.txt

Search sub-directories recursively using grep:

grep -r <word_to_search> * OR
grep -R <word_to_search> *

Use regular expressions with grep, e.g. the only word in the line is ‘nan’:

grep ^nan$ <file_name>

Find a file on the machine by name:

sudo find / -name <file_name>

Find all files in directory and subdirectories that end with *.en:

find “$PWD” -type f grep ‘.en$’

Find all files in directory and subdirectories that has ‘aaa’ followed with some text:

find “$PWD” -type f grep “aaa*”

Find files in the current directory that either whose name or content includes “wonderful”:

ls grep “wonderful”

If you have very long list generated by ls and want to display them page by page:

ls less

List files whose names include a range of numbers:

ls model.0{1..3}*

List files whose names include different letters:

ls model.{a,b,c,d}

Move multiple files (or run any command on multiple files):

  • add the difference between { } separated by a comma.

Find installed Python3 packages:

pip3 freeze

Find installed Python3 packages that start with “tensor”, use -i to ignore case:

pip3 freeze grep -i tensor

Find the location of a command (e.g python3):

which python3

Downloading


Download a file using curl:

curl <http://some.url> –output <file_name>

If this is the first time to use curl, you might get a message like “Command ‘curl’ not found, but can be installed with:

sudo apt install curl

Download a file that requires cookies:

curl –cookie <cookies.txt> <http://some.url> –output <file_name> To get the “cookies.txt” file, you can use a Chrome extension like “cookies.txt” to export cookies into a TXT file.

Copy GitHub repository to the machine:

git clone https://github.com/USERNAME/REPOSITORYNAME

Update a downloaded GitHub repository:

cd <repository_dir_name> git pull git checkout master

Stage and Commit a GitHub repository (details)

git add <file_name\>
git commit -m “Message, e.g. Update file”
git push origin main

The default branch is usually called “master” or “main” – if it is not, replace it with the right name.

Compressing and Extracting


Extract a *.zip file:

unzip <file_name>

Create a zip archive from file(s):

zip <archive_filename> <file_list>

Create a zip archive from a directory:

zip -r <archive_filename.zip> <dir_name>

Extract a *.gz file:

gunzip <file_name.gz>

Compress all the files in the same folder as *.gzip:

cd <dir_name>
gzip *

Compress all the files in the same directory even if there are subdirectories:

cd <dir_name> gzip -r .

Extract a *.tar.gz file:

tar xzvf <file_name.tar.gz>

Extract a *.tgz file:

tar xzvf <file_name.tgz>

Extract in a different directory:

tar xzvf <file_name.tgz> -C </path/dir_name>
OR
gunzip -c <file_name.tgz> | tar xvf -

Create a *.tar archive:

tar -czvf archive.tar.gz <dir_name>

Create a *.tar archive from multiple files/directories:

tar -czvf <archive_file_name.tar.gz> <file_name1> <file_name2>

Compress as *.bz2 (higher compression):

tar -jcvf <archive_name.tar.bz2> <file_dir_name>

Extract a *.bz2 archive:

tar -jxvf <archive_name.tar.bz2>


Obviously, many of these commands can be used locally, but they are most useful while working on servers.

Find out the server date and time:

date

Measure time taken to run a script or command:

time <python3 script.py>

Find out the space on the desk:

df -h

Check free memory (Linux):

free -m

Create an alias for a command: (details)

alias <command>

To save aliases, put this in ~/.bash_aliases

nano ~/.bash_aliases

For example, if you add this command, use quotes for multi-word commands:

alias frz=”pip3 freeze”

… save, and restart your Terminal (or run exec bash), the next time you type frz in the Terminal, it will run the command pip3 freeze

Avoid ending a command if the local Terminal is closed:

screen

Create a new screen with a name:

screen -S <name>

Create a new screen with logging enabled; screenlog.0 is created:

screen -L <name>

Detach the current screen:

Ctrl+a+d

Resume a single screen:

screen -r

Resume a screen from multiple running screen:

screen -r <name> OR screen -r <id>

List the currently running screens:

screen -list screen -ls

End a screen:

screen -X -S <id> quit

Shutdown the machine after finishing a command — separate them with ;

python3 file.py; sudo shutdown
For other shutdown commands, check this answer.

Adjust File permissions, access by the current user only:

chmod 700 <file_name>

For example, this is required before using the *.pem key file provided by AWS E2.

Display RAM used:

free -m

Display GPU memory used:

nvidia-smi

Move a file from a server (e.g. AWS2) to the local Machine (run it from the local machine):

scp <file_name> <user>@<serpver_ip:port>:/<dir_name>

Move a directory from a server (e.g. AWS2) to the local Machine; use -r (run it from the local machine):

scp -r <dir_name> <user>@<serpver_ip:port>:/<dir_name>

Move a file from AWS2 to the local Machine (run it from the local machine):

scp -i <key.pem> <file_name> ubuntu@ec2[…].compute.amazonaws.com:~/<dir_name>

Move a file from the local machine to a server (run it from the local machine):

scp <user>@<server_ip:port>:/<dir_name>/<file_name> </path/on/the/local/machine>

Move a file from Google Could to the local machine:

gcloud compute scp –project <project_name> –recurse <user_name>@machine_name:~/<dir_name>/<file_name> </path/on/the/local/machine>

Log out of the current connection:

Ctrl+d