- Transcript: https://github.com/data-umbrella/event-transcripts/blob/main/2020/09-sam-bail-terminal.md
- Meetup Event: https://www.meetup.com/nyc-data-umbrella/events/271403343/
- Video: https://youtu.be/sqN0bFWrS6Q
- Slides: https://github.com/spbail/terminal-workshop/blob/master/terminal_workshop_slides.pdf
- GitHub: http://github.com/spbail
- Transcriber: Isaack Mungui (Twitter: @aicky254 )
There you go, okay. Hey everybody, welcome to Data Umbrella and PyLadies online event. Just to sort of let you know how the event is gonna go, I'm gonna do an introduction about the meetup group. Sam Bell is going to give a talk and we'll have Q&A. We’ll also sort of watch the Q&A every 10 minutes or so and answer questions along the way. Just to let you know this event is being recorded. About me, I'm a statistician data scientist. I founded Data Umbrella and I'm also a PyLadies organizer, New York City chapter and I am on twitter @reshamas. The mission of Data Umbrella is to provide a welcoming and educational space for underrepresented persons in the field of Data Science. Our website has information, we're on twitter if you'd like to share about this event and just so you know that we are a volunteer-run organization. This is co-promoted with PyLadies which is an international group of Python Ladies and gender minorities and they are you can find up to the local chapter New York City chapter that's our website and on Twitter. Our code of conduct to reiterate from emails, we are dedicated to providing harassment free experience for everyone um please keep this experience to everyone professional, welcoming and inclusive. On the Data Umbrella website there are many resources available and I took some space. There are resources for open source, accessibility and responsibility so if you want to check them out later on please do, So to watch for upcoming events we have them the meetup page but we also have them on Twitter and we're on social media: LinkedIn, YouTube and Facebook; so feel free to follow us on whichever platform is your preference. And so we're gonna get started Sam is in New York as am I. Sam, we met through your PyLadies organizer as well the New York Chapter and so it's really exciting to have Sam present on terminal, I'm gonna let her introduce herself.
All right, great thanks for switching over. Hi everyone, welcome to this webinar my name is Sam. I'm a at this point I'm kind of just saying Data Person. I do data things. I'm based in New York City. I run the partnerships team at a small start-up called superconductive. superconductive is the team behind great expectations the law gentleman on the side here, that's our logo and great expectations is a really cool open-source tool for data quality so here's a very brief plug if you're interested in data quality and in using a Python library that's open source and free for data quality please do hit me up after the talk we have really cool things going on. So the plan for today for this talk about an Intro to Terminal is to give you some background as talk about terminology just explain sort of what do we mean by terminal: shell, bash, zshell all these things. Then second go into some basic navigation, file manipulation, some searching so just kind of like orienting yourself around your file system, go a little bit into environment variables and what your profile is all about and then I will do a very brief introduction to shell scripting. In terms of the structure which might already mention it, I'll be mostly showing slides so this evolved from an interactive workshop that pre-covid we actually used to do hands-on parts so in this in this case, I'll be doing most of the talking and demo-ing some stuff. I'll be stopping after not quite each section but after each major section for questions. Reshama will filt these from the chat and it will also have hopefully have some time for FAQ at the end. We’re a little the the it might get a little tight. I might have to skip a couple of things or rush a little bit but we’ll figure it out. We'll play it by ear and most importantly all the slides are going to be online but please post a link to the slides at the end and the chat so you can also follow along and there are the little DIY blocks or marked as DIY there are many exercises that you can also try out after the workshop so hopefully this is this is going to give you sort of an overview and an idea of how things work and you can ask some questions and then you can use the slides to like work through stuff by yourself after the workshop. All right let's get started so background on terminology; what are all these words? So there's a lot of things that are actually used interchangeably which is very interesting and honestly I had to look up some of this too. So what is a shell? Generally, shell is a user interface which interestingly can be either a command line so kind of what we generally think of as a shell or it could be graphical. So funny enough we actually even consider a graphical user interface GUI also a shell but in general when we talk about shell we kind of mean actually the terminal those sort of black box with a green font that runs a shell and then we often also use the word bash which in this case is a type of shell so it's just one particular program there are lots of others like zshell, kshell, I think there's one called fish but it there's a lot of -sh out there. macOS interestingly switched from bash to zshell in version 10-15, so it was around last fall. The basics between bash and zshell are pretty much the same. I think, once you get really deep into it you'll notice the differences but for me at my level of like sort of navigating around things I didn't actually notice a difference between bash and zshell so just keep that in mind that a lot of times like we're so used to saying bash but funny enough if you use a MacBook you're actually now not in fashion any more, you’re in zshell. So I hope that's given you just an overview of the terminology and honestly like we use a lot of those things interchangeably because again unless you go super deep into it a lot of times these are pretty much the same if you use your terminal that will open a shell etc etc so just to clarify those things okay so now the question we actually already had a question in the chat like why use the terminal and shell at all like there's a lot of people at Apple and Microsoft and Linux that just at Linux distros that spend a lot of time on usability of their graphical user interface why should we even go back to the terminal so one reason is it's often a lot faster or more flexible than doing things sort of by clicking. One of my lecturers had a class called the keyboard is mightier than the mouse and so a lot of times it's really easy to just like write a one-liner and do for example both operations on files to search for files and texts some some things that don't work quite as well in your in your operating system your GUI so that sort of my preference is a lot of times there's something that's just a little easier to do second sometimes you have to even just like navigating around the file system to like run let's start say start up a server or run some programs run some Python code whatever like a lot of times you just do that from from your shell to kind of have to another good example is if you connect to a server if you run any kind of jobs or anything remotely a lot of times you ssh into that server or have some other way of connecting to it and you only have your terminal you don't have a GUI that you can click around another thing is things like setting aliases or environment variables so if you've ever used Postgres for example there's an environment variable called PG password that you set to set a password there's things like your Python path if you're if you're using Python I'm kind of just assuming that a lot of people here are data sort of in in you work with data and are maybe close to Python and have some familiarity with that so I'm kind of making a lot of Python references I think so a lot of times you just kind of have to you think the only way to do things is via your terminal so it's very very helpful to just have some familiarity with it and be able to kind of navigate and find your way around that. I hope that's given you an explanation as to like the motivation of this. So let's just dive right in, navigating the file system. So, some basics to navigate the file system. We have and the way I'm going to do this is I'm going to do the boring stuff and actually just read these things out and then I'm just going to pull up my terminal and show you. So we have several commands to navigate the file system once we're in our shell. We have pwd
, which is the path of our current working directory and usually your terminal will open a shell in your home directory so we'll see that in a second. You can do an ls
, which just lists everything that's in your directory and then pretty much every shell command has a flag so the flag is usually the - something in this case ls -l
, adds a long format flag and ls -la
, adds you can you can concatenate the two flags, it's a long format plus show all so I can show that right now. So I'm currently in my terminal workshop directories the workshop where I have all the content for this. I hope this is big enough for everyone. So let's just do this I do pwd
which says I'm in User/sam/code/terminal-workshop
, so this is the current working directory that I'm in. I can do an ls
. ls
just like shows all the files that are in this directory right now. Going to do ls -l
, so I'm adding this flag which shows me all the files plus some additional information about the files, such as the permissions the owner, the group those on the I'm actually not sure if this is the last edit date or the creation date this must be the last modification date because I've updated my slides recently and so this is last modification date so this is the long format flag and then I can also do ls -la
sorry which lists all, so this also gives me some of the hidden files so that the all flag gives me hidden files. As you can see I have a .git
directory here and an .idea
for my pycharm IDE that's created. So just to give you kind of an idea of like this is usually the first thing and you use it a lot like pwd
to kind of see where you are use ls
to see what's like what's in the directory. So now how do I know what Flags are possible here right how do I know I can use -l
, -la
, -la something else
? There is a command called man
- man which stands for manual and that's built in, it's a UNIX command and I can just say man
and then something like an ls
for example, so the command but I want to have the manual for. Do that and I get a really cool of commands manual that gives me all the information. So this is kind of the documentation about the command that I just typed about ls
and it shows me that I can use the spacebar here to go through this and it shows me all the flags that I’ve just used, so this -a
that I've just used. Include directory entries whose names begin with a dot or the -l
that I've just used listed long format etc etc. So just to give you an idea, man
- man gives you the manual for pretty much all shell commands that are built-in, it's super helpful. There's also a lot of Easter eggs in there these been around for a few decades so there's occasionally some fun stuff to find in there. The way you get out of man
, you can see I have this blinking cursor right at the bottom. Actually what I'm gonna do is I'm gonna make this just tiny, move this up a little bit. This blinking cursor at the bottom and once I see this blinking cursor, all I do is press Q
. Obviously you couldn't see what I just typed but I just hit the letter Q
to get out of man
and so that's just an overview of like some of the very basics. Alright cool so let’s actually move to the next slide.
Some helpful shortcuts, so one of the things that actually I teach almost everyone I work with who I see not using things and you don't always have to type everything in your terminal by hand so there are some really cool shortcuts or some really cool helpers to navigate just the terminal itself. One example is, I can type clear
and I'm actually going to show this now so you can see all my stuff is kind of pushed to the bottom so I write clear
and it clears it up. So it just like moves everything everything up. I can use my up arrow
so instead of typing clear
again all I can do is hit my up arrow
and you can see I didn't type anything, right? It just gives me the previous command so I can hit up up up up up each time and it shows me over all the previous commands. Bash will also tab
complete things which is great so if I want to say for example, if I want to write a command anything that my shell knows about, it will try to autocomplete so in this case I only type CL
and then hit tab
and it actually suggested a bunch of stuff to me. So I can do this edit and it autocomplete so it just works like a tab complete or autocomplete let's say in your IDE for example if you're writing code that also works with directories which is really helpful so there's a lot of really cool stuff oh yeah and so Smith that just said that I can also hit ctrl + L
to clear my screen so hang on ah yes I can also there's also a shortcut so speaking of shortcuts there's a bunch of shortcuts with Ctrl + something
, so the most important ones to use are I can do so let's assume I just type something and I want to go back to the beginning of the line I can do ctrl + A
and again you don't see what I'm typing but just trust me ctrl and the letter A takes me to the beginning of the line and ctrl + E
takes me to the end of the line so just moves my cursor to the end of the line one other helpful thing is if I'm typing multiple words and then I realize oh I'd actually want to change this ctrl + W
removes the word that's right in front of my cursor so which also and spaces are also considered words so this one will take me to this will take me to the beginning of the next word including the included space. So I can basically just like remove all the words that I've just typed or the commands that i’ve just type one by one and then also if I want to, i’ve typed the command but I don't actually want to run it ctrl + C
will basically just cancel this and move me to the next line. So there's a bunch of stuff and again this is something that you know it this is a good thing to do on cheat sheets to just look up you don't have to memorize all this but at some point it will become second nature. Oh, the one really interesting thing is Ctrl + R
that I want to show. So I've shown you control just the up arrow right which loops through all the commands. Let's assume I want to find the ls
command that I just wrote because it's like super complex. One thing I could do is hit ctrl + R
, it takes me to the back search and then I could type ls
for example and you can see it searches in my history for this terminal it searches all the times that I've written anything involving ls
and then each time I hit ctrl + R
again, it shows me yeah you can see I don't know I've done a lot of ls
. So each time it shows me like the last time I've used ls
. So this isn't really the way I always use this is for my marp
which is the code that generates these slides that I've just shown. I can never remember the syntax so I just ctrl + R
and then I find the marp
command and I don't have to memorize it, so this is super helpful. Um, there's a question that Chad asked: what’s the difference between control
and command
? I think it's actually exactly the same so on my macbook it's ctrl + R
and I think it's the same in the terminal on Windows, if you use git bash.
Alright cool. So, navigating the file system. So now that I know where I am, one thing I want to do I don't want to just be stuck in my file system forever but I actually want to change where I am so I can for example execute code in a different directory. So there's a bunch of things I can do. Let's assume I open a new directory and as I've told you oh this is this is fun sorry. Okay, so I actually set my this is really funny I actually set my terminal to open in the same to open a new tab in the same directory where I have my previous tab, generally all the terminals by default are set to open in your home directory I set mine to actually open in the same directory but let's assume pwd
I'm in this in this directory and I actually want to go one level up for example to see all my directories that I have under code so I can do cd
and then I can either type in the name of the directory that I want to go to so /Users/Sam/code
and as you can see I am now in /Users/Sam/code
, or alternatively one of the things I can do is use one of the nice aliases that we have. In this case, it's dot dot to go one level up so now I am in /Users/Sam
right. So the thought that helps me navigate around there's another cd
change directory shortcut that's the tilde. Tilde takes me to my home directory so as you can see uh this is the same Users/sam
let's say I go back into code
so now I'm in in Users/sam/code
and I go to the cd ~
and back up in my home directory so the key shortcuts or aliases say you can remember here are put back the ~
is for your home directory the .
which just means the current directory and the ..
which means the parent directory and one of the things that you can also do is just concatenate these so I can say for example I want to let's go back to /code/terminal-workshop
, I can say and just use the ..
as a file path so this takes me not just one but two levels up so in the second the grandparent and then I'm back to the Users/sam
, so I skip the code directory here. So just one one thing to keep in mind is this with cd
you kind of navigate around and you have a bunch of aliases to take you to specific directories.
Alright cool. Basic file operation. I'm gonna move a little bit faster just because we're really tight on time. Okay, there's a bunch of basic file operations, so you might want to use the keywords here are mkdir
, touch
and open
, so I can show this real quick as an example so let's just go back to my /code/terminal-workshop
directory. I'll do an ls
to see what's in there. I’ll just clear
real quick, so I can do a mkdir
call it mydir
and as you can see here I just created a directory called mydir
. Ok, so I can go into mydir
and then I can if I want to create an empty file honestly touch
I mostly just use it to test whether I have permissions to create anything in there. Usually or obviously if I'm creating like any kind of file I'm you know using a text for that so touch newfile.txt
and you can see I created a new file and then if I want to I can open newfile.txt
and this will open it in my whatever is set up as a default editor and this case is really just my macOS default text editor that opens newfile.txt
and I have a new file that I've just created here. So just just one example of how you can make quickly if you want to often like for configurations or if you want to store anything there this is kind of a quick way to do this and you don't have to click around or your text editor and navigate in your finder and stuff. I really like doing this.
Alright file operations number two this is where it gets really interesting because you can do copy you can move and rename files and you can remove files from your command line and this is where I think things start getting more powerful than if you do them in your in your finder because you can sort of bulk modify stuff. So let's assume I have a file hello.txt
. Okay, do ls
. I have hello.txt
, newfile.txt
. I can cp hello.txt
to let's say hello2.txt
. Just making a copy and you can see I now have hello.txt
and hello2.txt
and the content of those would be identical so now I can rename on the command line hello2
to let's say hello3
for whatever reason. Rename that and then if I do ls
again I have hello
and hello3
because hello2
was renamed to hello3. This is pretty straightforward. So this is and then also if I want to I can remove let's say clean up newfile.txt
so I can also delete my newfile.txt
by just typing rm
for remove. So this is, where just gets powerful is by using a wildcards. So the wildcard operator is the asterisk like the little star and this is where I can do bulk operations. For example, if I wanted to remove every single file called hello.something in this directory, I could just say rm hello*
. Okay I'm doing this this is let's consider it irreversible so please do use this with caution and but I can basically just say rm hello.txt
hello anything that starts with hello and if I do an ls
again, I have nothing in there because I just removed hello.txt
and hello3.txt
. So this wildcard operator is really if there is one thing you take away from this that you don't know yet I think this is super super helpful because that allows it to do bulk operations and you can also use it in a way such as hello*
for example or .csv
to remove all your CSV files or you can do it for example through an ls
to just list everything to show everything that starts with hello and as a CSV file etc etc so you can do lots of stuff with that.
Right, looking at file content. So and again like I'm picking up speed a little bit just to get through the key stuff here again this is all going to be on the slides so I have a file a sample CSV file from the New York open data website as an example and there's a bunch of stuff you can do to look at CSV files so again just kind of going back to what I said about this is a little bit aimed at data folks and this is the kind of stuff that you do a lot or that I do a lot so let's go cd
up into my directory so ls
to see I have my permits.csv
so let’s assume I've downloaded permits.csv
and I want to know what's going on with this file and I don't know I said I want to open it and like Excel or read into a Pandas dataframe or all that. I just want to kind of want to get an idea with what's in that file. So I have different options one which is a little bit of the danger zone but I can use a command called cat
. cat
really just prints the entire file to the screen the reason why I'm saying this is a danger zone is because that might be a very large file and it might just portion a lot of content to your screen in this case I know it's large but it's not crazy large so let's just run it. So i’m running cat
and it basically just printed everything in that file so screen and it can if I want I'm actually able to just scroll through this. So cat
is kind of cool for smaller files in particular to just print the whole thing to the screen immediately and you don't have to open it separately. Another really cool command is that I actually found more useful than cat
is head
and you also have a matching tail
. So head
gives you the first n rows of your file that you're looking at so in this case if I have head permits.csv
and I think it defaults to five or ten ten sorry. It defaults to the first ten rows so in this case especially for like things like CSV files this is always really nice for me to see the header row and to get a little bit of the data so this is literally the head if you're using pandas this is the head for pandas and then in addition to head
we also have a tail
which gives me the bottom few rows, nice. And I do actually have two okay head
if I and again just like your pandas if I want to only see the top something rows of this this file I can also do -n
which specifies the end like the end number of rows that I want to see and I can say I only want the top two rows so now you can see. Let me repeat this and just do this. Now I can only it only prints the top two rows. Just keep in mind that this is a CSV file obviously head
does not know that this is a CSV file so it doesn't give you the top two data rows but it gives you the top two text rows and or text lines and in whatever file it is that you're looking at. And you can do the exact same thing with tail
, boom so that gives me the bottom two lines. Pretty straightforward. This is super helpful I think if you download a file you get a file from somewhere and you just want to get a quick look without opening anywhere else like head
and tail
. cat
, head
and tail
is very helpful. I'll actually pause for oh I got it yep so head
gives you the the top rows and tail
gives you the bottom rows so bottom so tail
starts counting from the bottom head
starts counting from the top. I’ll actually pause real quick to see if there's any other questions. Yeah so a lot of the things this is this is an incremental workshop so I'm actually going to talk about less
too so thanks for thanks for mentioning that, it's all coming. Any other questions? Alright, great let's keep going. So speaking of which so a nicer way to navigate around files is to use a command called less
. less
is basically an interactive text reader that allows you to do a little bit more navigation. So one thing that you can do is just say less
and then permits.csv
and the reason why it's called less
is because there's also program called more
. less
is basically more
oh my god less
is more
but with more functionality. UNIX things are funny and weird and quirky and it's been going on for a long time and this is like read up on it it's pretty funny so if you say less permits.csv
, less
is the interactive text reader so again it shows you it starts at the top of the file. You can navigate around really just using the arrow keys up and down to go through the rows you can use your spacebar to page between whole pages right or there's also some really cool built-in functionality for searching in less
. If you use a /
, so again you see the blinking cursor after the colon at the bottom if you use a /
, that gets into search mode. So now I can for example say I want to find everything that says film so I just type in film
and it shows me all the the search results for film I can navigate through those search results using p
and n
. p
stands for previous result and n
stands for next result. So, well this is the top of the file so I keep going down and and you can see it just bumps the line that it where it finds that search result to the top of the screen so it just moves it up and highlights it. Keep in mind the search is case-sensitive. In order to make it case insensitive, you type -i
. Wait, okay and then you ignore your cace and searches so -i
gives you case insensitive search so now I can also search for film
. Oh great oh this is not found sorry I'm like getting mixed up okay ignore cases there we go. Okay so and then I can navigate through that even though I've searched for lower case film
. And again just like with man
with man
, most of the time for a lot of UNIX programs the way to get out of whatever it is you're stuck in just hit Q
and I'll get you back up. Alright so that was less. next up oh I've actually already explained finding set text and files so less
is a really good way to do this there's a bunch of other ways so this is option one as to use less
.
Option two is to use something called grep
. grep
actually I don't know you might have heard someone saying like oh I'll just grep
for this like grep
kind of and by that we mean grep
not grav not grab but grep
. So grep
is also a UNIX tool that comes built-in your shell that allows you to search in files from your command line. So I can show this real quick. So we just type grep
then you type your search word yes sorry I get the syntax like that sometimes you search a certain search word in this case I'm grepping for the word film in my permit.csv
file and it doesn't give me a result why do I not get a result because it's case-sensitive. So if I grep
for film, it now prints every single line where it finds the word film. um the other way to do this is to say grep -i
and then search for whatever case. I can even do this. So -i
makes it case sensitive and again this is one thing that like occurs in so many contexts in your shell. So many programs like the -i
usually makes things case insensitive okay and it just found and then grep
just prints all the different all the lines where it finds that search result. So again, if you're working with data like this is occasionally pretty helpful to just search really quickly for a specific word or specific lines that contain the thing that you're looking for. You just grep
for it.
Alright one thing one way - the way I use this a lot. I don’t use grep
a lot is if I don't want to just print all the lines where I find something but I want to actually see the recurrence and Swasey I see you're asking to go back to the slide for less
. I think this is it. I would like to keep moving though, but the slides will be online and you can totally do like look look things up afterwards, just in the interest of time. Okay so sorry jump in back the pipe operator so the pipe operator chains any two commands and pipes the output of command A of the first command into command B. So, I'm going to show an example and I'm going to clear
my screen. So one other tool that's super helpful is word count wc
word count -l
counts the number of lines in a specific file. So in this case, if I want to count the lines in my permit.csv
I do wc -l permits.csv
and it tells me I have 42 thousand rows again keep in mind word count again doesn't know that it's a CSV file and that it has a header row so this always includes the header row if you're looking at data if it's CSV files specifically it just says this is the number of lines that I see. So one way to use this is to pipe the output of your grep
to your word count. Okay, so I'm going to show this by grepping right case-insensitive grep
so I want to see get all the lines that contain the word film so I want to see how often do I see the word thumb and then pipe this pipe this - word count and keep in mind I don't need that argument I don't need the permit.csv as the argument here because word can knows through the pipes that it should be counting this input it should be taking this as the input rather than having an argument as the input. Alright so what do you think happens I'll give everyone like a second to think what what do you think is the output of this? so so the output of grep -i film permits.csv etc etc | wc -l
will be a number oh that's a fantastic question but I'll finish this and so be the number of times that the word sorry the number of lines where the word film occurs okay which is probably I don't know somewhat less than the 42,000 that we have in the file. Right, we have exactly 7,410 lines where the word film occurs which means there could be multiple occurrences in one line this is really just to give you an idea of this but again like if you know your data pretty well and you know your CSV structure pretty well this can often be super helpful. So yeah we have some 7410 lines that contain the word film; might be multiple mentions of the word film. One thing that you could get really sneaky is to do this and basically be prescriptive about your CSV structure and in this case we know it only occurs once per line. Alright, great so that was um I will actually skip this part just in the interest of time actually no this is well we'll have time for that cool.
Okay, so this is searching stuff in a file that you already know is there so I know I have my permits.csv and I just want to do some like data manipulation, data searching and stuff cool. So, one thing that you also might want to do is finding files and I think honestly in this case I find the terminal’s so much easier to use then the macOS search I don't know why but the Mac OS search to search for specific files with a specific file types just always turns into a huge pain. So one of the things you can do is use a command called find
, then you always have to specify the directory and remember in this case in this example the directory could just be dot right and then you have to specify the flag which could be in this example there's tons and tons and tons of flags to find this is kind of the most common one either -name
or -iname
again -i
is case-insensitive so -name
or -iname
and then you can basically just search for anything in your quotes that is in the filename somehow. So I'm going to show this correctly so in this case where am i okay so I'm going to find in my directory um let's say I just want to find any CSV file right so I want to find anything again you can use asterisk for a wildcard I want to find anything that says Oh ‘cause I did not use my -name
flag so I'll just do that again so again find in this directory and everything below fine is recursive by default find anything that in the file name has something matching .csv
something that ends in .csv
. In this case not that exciting it's only my permits.csv. To show you that it was really recursive, I'm actually going one level up and just repeating that let's say with anything that starts with npi
. I have a lot of npi
files it's a health care provider index files and I use this a lot for testing so I know all my code all my repos have a bunch of copies of this so I can do find . -iname
to see if there's anything that's also an uppercase NPI there anything that starts with NPI bah blah blah and ends in CSV and it shows me a bunch of NPI CSV files. The way I use this a lot of times which is really cool is basically say how often do I have this file in my directory so thinking back to the pipe operator and word count is I can basically pipe the output of find
to my word count so this instead of me counting each line one two three four five I can just say this boom and it tells me it's there nine times so this is a really cool and this is something that I don't even know how I would do this using my finder or Explorer on Windows or Mac, this is something that I think is just a really nice quick one liner in your shell. Anything else about find? Again, keep in mind it's recursive by default which is super super helpful a lot of times for a lot of commands you have to specify recursiveness oh and the other thing you can also specify a specific directory to search in so it's not just .
as in like this directory and anything below but you can say okay find everything in let's say code. Let's find everything in my code directory which in this case is the same that says `npi.csv’. So - so there's ways to basically point it at a directory to look at. Great, any questions so far about fine and again like find has a lot of flags a lot of functionality this is just sort of the most frequently used one for me.
Right, this one I'm going to go through real quick. Environment variables and Profile Files. So, Environment Variables are system-wide global variables so your shell knows about. You can literally just type in env
and it will show you all the Environment Variables; I don't think I have any passwords stored here. So these are all the - oh, there is a password in my Postgres connection screen, so I won't show you that. It's all dummy data, that's all our test databases so I don't have any concerns. So env
shows you all the environment variables that your shell currently knows about and you can echo
a specific variable by referring to using the dollar sign. So this is across shells variables are referenced by the dollar signs so I can say echo user
so user and and everything is case-sensitive in your shell so my user is set to Sam or I can say this is the thing that you probably see a lot your python path oh that's interesting my Python path is empty let's see about my path okay so my path for example has Conda things mostly and then executables and binaries so this is so now we learn two things one is echo
, echo “hello”
. echo
just print anything you throw at it and the other thing is you reference environment variables using your dollar sign so now the question is obviously cool, these are the environments variables that are built in right that are set somewhere how can I set my own environment variables if I want to use them in any like script for example or if my if my application wants it again like my Python path for some reasons empty a lot of times it says add it to your Python path or set your path. So you set environment variables using a command called export
I can show that real quick so the syntax is export and then these assign a name of the variable and then you say equals the value so this is pretty straightforward export myvar=42
or some text in quotes. Shells are very very sensitive to spaces, quoting etc etc so make sure like the syntax is correct. One thing I think you is echo export myvar
and when I'm setting it I don't use the dollar sign it is hello
this is very boring and then I can say echo myvar
so now I use the dollar sign to reference this well it says hello because that's the value of my variable I can also overwrite this so now I want myvar to be say 43 and I echo
it now it's 43. So this is a really nice way to set your environment variables in your shell and your in your terminal. It's basically a global variable that your particular shell knows about and the only problem here is, if I open a new tab and I'm using a different terminal from usual so I'm not sure if this actually sources it but let's see what happened for type echo myvar
and on a new tab, it's empty because we wherever I set my environment variables only that shell actually knows about that environment variable that I just set. That kind of makes sense I guess because there is a scope to that variable. The way to make that permanent to make if I want myvar
to be available in all my in all my shells like in all my terminals that I opened by by default is to send what's called a Profile File, so you might have heard of something called a bash profile or a shell profile or bash RC or other things and I'm really just going through this real quick.
There is a profile file that every terminal reads at startup and it it does what's called sourcing your environment variables. So if you're using bash for example if you’re on Git bash it's usually called bash_profile
or .profile
, bashrc
. There are differences between them please do read up but it gets pretty nitpicky. For zshells or the whatever the default is on Mac OS right now it's a file in your home directory called zshrc
and I can cat
thus as we’ve done before because I know mine is pretty small and this is basically everything that's in my zshrc
. So it does a bunch of exports it sets like I use Apache airflow it's sets my Airflow home to this. It Conda at some stuff if you add it I've got a bunch of aliases for Git, and every time you open your terminal your your terminal will look at this file and your Profile File and create all these variables so you have them available. So I'm not going to show the example but you can totally go through that in your on your own when you have the slides.
Okay, last thing shell scripting 101. Instead of typing all these commands to the command line every single time that gets pretty annoying you can use you can basically just dump all your commands into a script. So, I can show you that. clear
and I go into terminal-workshop
. Okay, so I have this file samscript.sh
that I've prepared and again I can use cat
or less
for example to show what's in the content of this file and in this case is just one line that echos something, really boring. The way I can run the script there's multiple different ways, but I can basically say for example samscript.sh
, this is one way to execute my script. So what I would expect to happen is for this is for the echo
to just print to the screen so basically all I want to see is hello this is the script saying hi
if I execute the script. Well that doesn't work permission denied. Why is the permission denied? If I go back to the line that I'm highlighting here, this block and at the very beginning actually shows you the file permissions and in this case what I see is I have read/write permissions so I have this on the slides for more detail but basically you have read/write/execute, read/write/execute, read/write/execute for user, group and other and in this case I can only read and write but can't execute my script. So what I have to do is change my permissions. The way you change your permissions is yet another command called chmod
; change the mode of my file and I'm just gonna make some magic happen right now real quick u+x
, so I'm setting the user gets execution rights on samscript.sh
. I'm changing the permissions of what I could do with a script so if I go back and do an ls -la samscript.sh
, you can see that read/write execute so now I have set plus X. That means I can execute and run my script. ooh really really really basic this is just the very beginnings of shell scripting you can do for loops, you can do whiles, you can do counts. You have data structures in there so shell scripting is a whole new topic on its own. This is just basically me showing you that you can take any shell command and dump it into a file, change the permissions; make sure the permissions are right and then you can just run it.
Alright, I'm going to wrap up so we have a couple of minutes left at the end for congratulations this was it. I'm going to wrap up and do a quick summary at the end and then we have a couple of minutes for questions so what we've covered: Directory Navigation, File Manipulation so remember pwd
, ls
, cd
, cp
, mv
for move and all that. We’ve covered searching for texts and finding files with a copied less
and grep
and find
um we've covered environment variables and your shell profile and I've showed you a tiny tiny little shell script. This is this section is actually super important because this really you have a tiniest tiniest tiniest overview anything I could squeeze into this one hour other important things to look up, sudo
command super super important super sudo
command means you basically overriding default permissions and you're doing something as a super user so you will run into this sooner or later so look it up what sudo
does. um one thing that's super helpful this is just a nice to have is setting aliases for shell commands you might have seen that I have a bunch of aliases for my Git commands just because I don't want to type that much. Text manipulation, so using Vim as a text editor even if it's just the most basic commands for Vim, super super helpful. For all your data folks out there, CSVKit is a super awesome library that allows you to do some really cool CSV manipulation in your command-line, strongly recommend looking it up. So remember when I said like your find
or your grep
or all that doesn't know that it's a CSV file, it has a header of row etc etc CSVKit basically knows that you're dealing with a CSV file or we're dealing with data and it knows about columns and knows about rows so really really cool. One other interesting thing to look up is ways to kill and exit kill processes and exit things for example if you're starting up a Jupyter notebook you can ctrl + C
it sometimes processes hang you have to do other things so do look that up how do you accept basic kill processes and what's the difference between ctrl + z
, ctrl + C
, ctrl + D
and all that. And then there’s other fun stuff like cal
for example, date
you can whoops check disk space looking using things like du
, dh
. There's some fun utility is called cowsay
. cowsay
is just overwrites echo
and prints a cow. So there's a bunch of other fun stuff for that's built into your your shell but this is really just, I only just scratched the surface. So and again the slides are going to be on line so I know this was like a whirlwind tour pretty fast of all the terminals stuff. I hope this was helpful. I hope this has given you some idea, some idea of like what the key things are.
Does anyone have questions? We have two minutes left for questions. There aren't any questions. I can maybe see if cowsay
works. Oh no, i’ll just brew install
it. Any other questions? Okay it was fine. Oh, Angel that's a great question. So Angel is asking how important is bash or shell scripting for job searches? You mean for the job search itself or for actually for for being getting hired. I assume you mean like for for employers so this is really interesting because I actually think based on my experience interviewing candidates and looking at resumes and stuff no one ever tests for that in a job search like no interview will ever like test your scripting skills. I think it's something that comes out very quickly in your productivity. Um so I do think being really good at scripting things and navigating around your shell will increase your productivity pretty quickly and will just show that you kind of know what you're doing and will, you know stop you from like manually having to do a lot of a lot of clicking and stuff. I do think in terms of like actually finding a job it's usually not part of the interview process so you might not meet that. Oh so Kimberley. I'm sorry go and top down Candida asked CSVKit like here a number of times missing words at 3:05 I don't actually know about CSVKit specifically. I would recommend look it's super it's it's got a bunch of really cool stuff in there so that's all I can say is I don't know exactly but try it out and then Kimberly asks whether there is a good resource for translating shell commands into bash windows friendly shells so two things: one option on Windows is something called Git Bash which usually comes with if you install Git for windows it will also install Git Bash which is a wrapper around your Windows command line that is Bash like it's not that great honestly but it does the job for some basic stuff. About translating between Bash or just the UNIX shells in Windows shell, I don't actually know fantastic question. I will find something, I'm actually very very curious. I'll find something and put it into the repository put it in the slides that's a great question thanks Kimberley. I know we're over time, sorry everyone. Any other questions? Uh, scripts I would run often to increase efficiency. Great question, so I think I use find
on the command line lots of some many scripts to run but it's just commands to run. There's other things you could do to have like a little script that for example copies or bulk renames files so if you do a lot of file operations I think that's one of the examples like just moving files from one from one directory into another for example those kinds of things that you usually do by clicking for bulk operations I think that's the biggest efficiency win, yeah. Any other questions? Alright I think and feel free just going back to my slides. Feel free to connect with me on Twitter if you have it as @spbail or connect with me on LinkedIn. If you have any additional questions, more than happy to answer them let me know. I'll hand this back to Reishama to wrap up. Thanks everyone. I can hear and see you.
Yes, okay great thank you so much Sam for your presentation and there will be recording which I will post on the Data Umbrella website and thank you so much for joining us and thank you Sam for the presentation. A lot of people really enjoyed it. Thank you so much that's great thanks for all the questions they were really good.