Lesson 4: Modules and command-line arguments Today's lesson will go into some details of building a larger program. But first, a couple of digressions: ============ Comments =================== Python's comment character is a hash mark, #. Any time you see #, anything after it on the line isn't part of the program. It's just a note to yourself or to anyone else who reads your program. So comments can be on their own line: # Print the "99 bottles of beer" song or they can be on the same line with some Python code: for i in range(99, 0, -1) : # Loop downward from 99 print i, "bottles of beer" # print the next line of the song As you write longer programs, it's good to add comments for anything that might be confusing or hard to read. =============== Shebang! ============================ Another digression, on how to run your programs more easily. You may have seen some students including this as the first line of their solutions: #!/usr/bin/env python That's a special code that tells the operating system this is a python program. That way, you can run the program simply by typing the filename; no need to say "python progname". #! is called a "shebang" because # is sometimes pronounced "hash" (or "sharp") and ! is pronounced "bang", and "shebang" is more fun to say than "hash-bang" or "sharp-bang". You'll see them in scripts of all languages -- python, perl, ruby, bash etc. If you want to know more about the "/usr/bin/env" part, see Wikipedia: http://en.wikipedia.org/wiki/Shebang_%28Unix%29#Portability If your system uses python3 by default, this is one of those places where you should say python2 or python2.7 or whatever instead of just python. If you're on Windows, a shebang line won't help you, but it doesn't hurt to include one in your programs so they'll be easier to run on Linux and Mac. ============ Importing modules =================== Okay, let's get back to actual programming. One of Python's great advantages is that it comes with a huge slew of built-in libraries, called "modules", to help you do nearly anything. Want to write a web browser or a video editor, or search Twitter, or figure out where Jupiter is in the sky? There's a Python module to help. Randall Munroe (of XKCD) thinks so too: http://xkcd.com/353/ Okay, so how do you use a module? Simple: once you find the module you want, just say "import modulename" at the top of your program. ======== The "sys" module, for command-line arguments ========== One module you'll use a lot is the sys module, because that's the one that lets you get command-line arguments. Here's a program that just prints any arguments you give it: #! /usr/bin/env python import sys for arg in sys.argv : print arg Save that to a file -- I called my program "args" (notice I didn't use a .py extension, though Windows users might need it). Make it executable: chmod +x args Then run it a few times, with different command-line arguments: $ args 1 2 3 args 1 2 3 $ args hello, world args hello, world When you import the module called sys, you automatically get a variable called sys.argv that gives you all the command-line arguments the user typed -- including the program name. Of course, most of the time you'll want to loop over all the *other* arguments but not the program name -- you'd want to print hello, and world, not args and hello, and world. So how do you get around that? sys.argv is a list so you can use slices. Remember slices from lesson 3? In this case you want all the arguments starting with number 1 (0 is the program name). That's sys.argv[1:]. So you can change the program so the loop reads: for arg in sys.argv[1:] : print arg and you'll get exactly what you need: it will print all the arguments but not the program name. ================= String to int conversion ==================== sys.argv is a list, but each elements is a string. So if you need to use them as numbers -- for instance, in range(0, sys.argv[1]) -- you can convert to an integer with int(sys.argvp[1]). If you need a floating point number, use float() rather than int(). import sys num = int(sys.argv[1]) for i in range(0, num): print i ==================== Reading files ======================== A lot of the time, when you're passing arguments to a program, they're names of files you want to open. For instance, remember our word count program? Wouldn't that be a lot more useful if you could give it a filename, and it would print the number of words in that file? Here's how you would read from a file in Python: let's say you wanted to read from that "args" program you just wrote (and you're still in that directory). file = open("args") for line in file : print "Read a line:", line file.close() open(filename) gets you a file object. If you loop over it (for line in file), it reads the file line by line, giving you each line as a string. It's always a good idea to close files after your program is finished with them. When your program finishes running, Python will close the file anyway -- but eventually, when you write bigger programs, they might not finish right away, and they might have to open a whole bunch of files, and after a while that could cause problems (it's like when Firefox has been running for days and it just grows bigger and bigger) ... so it's good to get in the habit right at the beginning. Of course, you don't necessarily want to read your "args" file: you want to read whatever file the user suggested. If you're only reading one file, that's argument number 1 (remember, 0 is the program name): import sys file = open(sys.argv[1]) for line in file : print "Read a line:", line file.close() ========================= Homework ============================ 1. With the little example I gave earlier, the one that used num = int(sys.argv[1]): if you run it and don't give an argument, you'll get an error. Why? Can you think of a way to check whether the user forgot to supply an argument, and print an error message if so? 2. Write a program that takes a filename and prints the number of lines in the file. (You can check its results with wc -l filename.) 3. How would you extend this so that you can count lines in multiple files, not just one? So you could say $ mywordcounter file1 file2 file3 4. Here's a harder problem, an exercise in debugging (which is a big part of programming, sadly): a. Write a program that counts words in a file (or multiple files, if you prefer). Use the same split() and len() you used in lesson 2. b. Compare the number of words from your program to what wc -w gives. (If you're on a platform that doesn't have wc, run it on a small file and count by hand.) Are the answers the same? c. Here's the debugging part: why aren't they the same? (You don't have to fix it: just figure out the problem.) Hint: if you're splitting each line into a list, try printing the list to see what's in it. In python, if you have a list called words, you can just say print words -- you don't have to do anything fancy like you would in some languages. d. Optional, harder: fix the problems and make your word count program give the same answer as wc -w. Hint 1: one Python function that will come in handy is strip(): it strips off any leading and trailing spaces. So if you have a string s = " hello, world ", then s.strip() would give you "hello, world". By the way, I haven't mentioned Python's documentation, but most Python modules have excellent online docs. Here's strip(): http://docs.python.org/library/string.html#string.strip Hint 2: If you're inside a loop, say, looping over lines, and you decide you don't care about this line, you can skip to the next one by saying: continue For instance, in a loop where you don't care about negative numbers: for i in list_of_numbers : if i < 0 : continue do_stuff_for_positive_numbers(i) You can break out of a loop completely with: break Don't drive yourself too crazy trying to get an exact match with wc. There are some special cases where splitting at spaces might not give the same answer as wc -w, and there are some other Python modules (specifically re, regular expressions) that can do a better job. The purpose of this exercise is to give you a taste of debugging, fixing problems as you find them and thinking about what special cases might arise.