Python web programming

Python web programming
Python can be used to support web programming that falls into one of two general categories client programming – accesses web sites or web applications server programming – script executed by the server to provide a web site, perform server-side applications, etc.

What is a web client? Any program that retrieves data from a web server using the HTTP protocol Examples: web browsers – contact web servers using HTTP protocol and display HTTP responses web crawlers – traverse the web automatically to gather information web service clients (service requester) – request and process data from a web service provider; the web service provider responds using some web service protocol such as RSS (Rich Site Summary) or RPC (remote procedure call)

Python web client programming
modules that come standard with Python urllib – interface for fetching data across the web urllib2 – an interface for fetching data across the web that allows you to specify HTTP request headers, handle authentication and cookies httplib – makes http requests; is used by urllib and urllib2 HTMLParser – for parsing HTML and XHTML files xmlrpclib – allows clients to call methods on a remote server cookielib (used to be Clientcookie) – provides classes for handling HTTP cookies

Python web client programming
popular modules that can be downloaded and used utidylib and mxTidy – for cleaning up html BeautifulSoup – permissive HTML parser html5lib – for parsing html into a tree

Python urllib This module provides a high-level interface for fetching data across the World Wide Web urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames (also can only open for reading)

some urllib methods urlopen( url[, data[, proxies]]) - Open a network object denoted by a URL for reading. If the URL does not represent a local file, it opens a socket (a connection between two programs) to a server somewhere on the network; file-like object is returned typically the request is a GET; if data is given, then the request is a POST proxies is a dictionary that can be used to specify the proxy for different types of requests prx = {'http': '

some urllib methods urlopen continued
the file-like object returned supports the following methods read – read max number of bytes or entire file if size omitted readline – read a single line readlines – returns list of lines read fileno – returns integer file number descriptor close – close file info – returns header info geturl – returns actual URL used to access the resource

#!/usr/local/bin/python
import urllib import re page = urllib.urlopen(" text = page.read() atagstr = '<a href="([^"]+)">(.*)</a>' matches = re.findall(atagstr, text) for link, name in matches: name = name.strip() print "%s: %s" % (name, link) OUTPUT Home: index.html Handouts: handouts.html Resources: resources.html Homeworks: homeworks.html Slides: slides.html CS 5530, fall 2007:

import urllib page = urllib.urlopen(" #print HTTP response headers print page.info() #print actual URL of response print page.geturl() OUTPUT Date: Mon, 06 Oct :34:09 GMT Server: Apache/2.2.3 (Debian) DAV/2 SVN/1.4.2 mod_ssl/2.2.3 OpenSSL/0.9.8c mod_wsgi/2.0 Python/2.4.4 Last-Modified: Sun, 05 Oct :56:12 GMT ETag: "6008a-416c-ede3d700" Accept-Ranges: bytes Content-Length: 16748 Connection: close Content-Type: text/html

Some urllib methods urlencode – takes a sequence of name, value pairs and converts to the appropriate url-encoded string urlretrieve(url[, filename[, reporthook[, data]]]) - copy network object denoted by URL to a local file; returns a tuple (filename, headers) where filename is the local file name under which the object can be found, and headers is result of the call to info(); reporthook is a method called after each block is read

import urllib #pass urlencode a dictionary of key value pairs data = urllib.urlencode({"op1":13,"op2":27}) print "urlencoded data is", data #list of tuples also works #data = urllib.urlencode([("op1",13),("op2",27)]) page = urllib.urlopen(" data) text = page.read() print text OUTPUT urlencoded data is op1=13&op2=27 <html> <head> <title>CS 5530 Sum-mer</title> </head> <body> <h1>CS 5530 Sum-mer</h1> <p> The sum of 13 and 27 is 40. </p> </body> </html>

An easy way to steal a page's source
#!/usr/local/bin/python import urllib page = urllib.urlretrieve(" "temp") print open("temp").read() Now temp is the name of file that contains the entire contents of the body of the response returned by the server for a request for

Some urllib methods quote(string[, safe]) – returns a string in which all characters that have special significance in URLs have been replaced by URL-friendly versions(such as %7E instead of ~); optional safe parameter specifies additional characters that should not be quoted -- default value is '/'. quote_plus – works like quote but replaces spaces with plus signs and quotes '/' unquote(string) – reverse of quote unquote_plus(string) – reverse of quote plus

#!/usr/local/bin/python import urllib
url = " Script.pl" print "url: ", url print "quote(url): ", urllib.quote(url) print "quote_plus(url): ", urllib.quote_plus(url) print "unquote(quote(url)):", urllib.unquote(urllib.quote(url)) print "unquote_plus(quote_plus(url)):", \ urllib.unquote_plus(urllib.quote_plus(url)) OUTPUT url: Script.pl quote(url): http%3A// quote_plus(url): http%3A%2F%2Fwww.domain.com%2F%7Ejoe%2FThe+Script.pl unquote(quote(url)): Script.pl unquote_plus(quote_plus(url)): Script.pl

HTMLParser module defines a parser class called HTMLParser which is used to parse HTML files used by subclassing HTMLParser and overriding the event-handling methods event-handling methods are automatically called when a particular piece of input is found in the HMTL works best if the HTML is well-defined (XHTML)

HTMLParser methods handle_starttag(tag, attrs) – called when a starttag is encountered; attrs is a sequence of (name, value) pairs embedded within the tag handle_startendtag(tag, attrs) – called on XHTML style empty tags (<br />, <hr />) handle_endtag(tag) – called when an end tag is encountered handle_data(data) – called on textual data (outside of tags)

HTMLParser methods handle_charref(ref) - called when character reference of the form &#ref; is encountered; for example, ¼ which refers to ¼ handle_entityref(name) – called when entity references of the form &name; are encountered; for example, ¼ which refers to ¼ handle_comment(data) – called when HTML comment encountered; for example, ; called on comment contents

HTMLParser methods handle_decl(decl) – called when declaration encountered; for example, decl is the entire contents between <! and > handle_pi(data) – called when processing instructions (<?proc ..?>) are encountered; for example, <?php print(“hello”) > data is everything between the <? and > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " g/TR/xhtml1/DTD/xhtml1-transitional.dtd">

Screen Scraping screen scraping is the process whereby a program downloads Web pages and extracts useful information If the Web pages are XHTML compliant, then HTMLParser can be used to do the scraping If the Web pages are messy HTML (missing end tags, for example) then a program like “Tidy” (available, but not in Python standard library) can be used to clean up the HTML or the HTML can be parsed with BeautifulSoup instead of HTMLParser

#This one finds links embedded like this: #<h4><a name="google-mountain-view-ca-usa"> #<a class="reference" href=" #Google</a> ... #To print out: Google ( # from urllib import urlopen from HTMLParser import HTMLParser class Scraper(HTMLParser): in_h4 = False in_link = False def handle_starttag(self, tag, attrs): attrs = dict(attrs) if tag == "h4": self.in_h4 = True if tag == "a" and "href" in attrs: self.in_link = True self.chunks = [] self.url = attrs["href"] def handle_data(self, data): if self.in_link: self.chunks.append(data) def handle_endtag(self, tag): if tag == "h4": self.in_h4 = False if tag == "a": if self.in_h4 and self.in_link: print "%s (%s)" % ("".join(self.chunks), self.url) self.in_link = False text = urlopen(" parser = Scraper() parser.feed(text) parser.close()

Suppose input to scraper is:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " <html xmlns=" xml:lang="en" lang="en"> <head> <title>Python Job Board</title> </head> <body> <h4><a name="google-mountain-view-ca-usa"> <a class="reference" href=" </h4> <p>Google has great jobs</p> <h4><a name="appstate"> <a class="reference" href=" Computer Science Department, Appalachian State University</a></a> <p>CS department is looking for a new professor</p> </body> </html>

Output of Scraper would be:
Google ( Computer Science Department, Appalachian State University (

Monitor calls to handle_starttag
tag= html attrs= [('xmlns', ' ('xml:lang', 'en'), ('lang', 'en')] tag= head attrs= [] tag= title attrs= [] tag= body attrs= [] tag= h4 attrs= [] tag= a attrs= [('name', 'google-mountain-view-ca-usa')] tag= a attrs= [('class', 'reference'), ('href', ' tag= p attrs= [] tag= a attrs= [('name', 'appstate')] tag= a attrs= [('class', 'reference'), ('href', '

Monitor calls to handle_data
data= Python Job Board data= Google data= Google has great jobs data= Computer Science Department, Appalachian State University data= CS department is looking for a new professor extraneous newlines are omitted

CGI Programming CGI (Common Gateway Interface) – standard mechanism by which a Web server can pass queries (for example, supplied by a web form) to a program; results displayed as a web page simple way of creating web applications without writing a special-purpose application server only need to set up server to support CGI we can write the script in any language as long as language has mechanism to retrieve inputs from server

CGI Programming General steps
server needs to be configured to support cgi scripts put script in the indicated cgi directory use Unix shebang notation to indicate the interpreter for the script (or could simply be a binary executable) set permissions so that everyone can read and execute script (although, only you should be able to write to it) Note: CGI programming is not without security risks (see Apache slides)

A Simple CGI Python Script
Content-type is used to tell the server what kind of output is being generated by the script The Content-type line and the actual content must be separated by a blank line #!/usr/local/bin/python #print HTTP header print 'Content-type: text/html\n' #print HTTP body print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" \ " \ <html xmlns=" xml:lang="en" lang="en"> \ <head> \ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" /> \ <title>CS 5530: Web Programming II</title> \ <link rel = "stylesheet" type = "text/css" href="mainPages.css" /> \ </head> \ <body> \ <h1> \ CS 5530: Web Programming II \ </h1> \ <h4>Hello From Python</h4> \ </body> \ </html>'

Python cgi Module provides classes for accessing form data
several classes available FieldStorage – newest class added (we'll be looking at this one) SvFormContentDict – stores single value form content as dictionary; it assumes each field name occurs in the form only once. FormContentDict – stores multiple value form content as a dictionary (the form items are lists of values). Useful if your form contains multiple fields with the same name.

cgi module FieldStorage class
form = cgi.FieldStorage() form = cgi.FieldStorage(keep_blank_values = True) instantiate the FieldStorage class causes it to read the form data Form fields containing empty strings are ignored; to keep such values, provide a true value for the optional keep_blank_values keyword parameter when creating the FieldStorage instance. FieldStorage instance can be indexed like a dictionary using the names identified in the form

FieldStorage instance supports dictionary methods has_key, keys, and len form.has_key(“name”) returns True if value available for name form.keys() returns list of keys in the FieldStorage instance form.len() returns number of key, value pairs in the FieldStorage instance

indexing into the FieldStorage instance with a key value returns another FieldStorage instance; use value attribute to get the actual value (as a string) associated with the key form[“name”].value #value of name form field or use getvalue() method to get the value associated with the key; getvalue() method takes an optional parameter that indicates what to return if there is no value form.getvalue(“name”, None) #return None if no name

if form data contains more than one field with the same name, form[“name”] is list of FieldStorage instances and form.getvalue(“name”) returns a list of strings use form.getlist(“name”) to always get a list of strings; if only one value associated with name then the list contains only one element

Debugging CGI script If the script does not accept input then you can simply run the script at the Linux prompt useful for checking for syntax error If you need to run the script via a browser use an html form to invoke the script or invoke the script yourself by giving the URL and appending the input to the script at the end of the URL (this is an HTTP get request)

cgitb for debugging Python cgitb (CGI trace back) module generates a helpful web page if your program aborts with an exception Without it, your browser is going to display the very unhelpful message “Internal Server Error” Enable cgitb functionality by: import cgitb cgitb.enable()

cgitb for debugging You'll want to remove your enable of cgitb when the program is deployed; otherwise, cgitb will give too much information (if an exception occurs) to someone accessing your script through a browser Note: if the HTTP headers generated by your script are wrong, you'll still get the message “Internal Server Error”

Simple form <?xml version = "1.0" encoding= "utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html xmlns=" xml:lang="en" lang="en"> <head> <title>CS 5530 Sum-mer</title> </head> <body> <h1>CS 5530 Sum-mer</h1> <form method="post" action=" <p> <label>Operand 1:<input name = "op1" type = "text" size = "25" /></label> </p> <label>Operand 2:<input name = "op2" type = "text" size = "25" /></label> <input type="submit" name="Submit Entry" value="Submit Entry"></td> </form> </body> </html>

Python script to process form
#!/usr/local/bin/python # import cgitb import cgi cgitb.enable() form = cgi.FieldStorage() #one way to get the value op1 = form["op1"].value #another way to get the value #if no "op2" key in form, then return "0" op2 = form.getvalue("op2", "0") result = int(op1) + int(op2) print "Content-type: text/html\n" print '<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" \ " \ <html> <head> <title>CS 5530 Sum-mer</title> </head> \ <body> \ <h1>CS 5530 Sum-mer</h1> \ <p> The sum of', op1, ' and ', op2, 'is', result, '</p></body></html>'

What happens if error occurs

mod_python allows embedding Python interpreter within the Apache server for a considerable boost in performance and added flexibility in designing web based applications CGI scripts via the mod_python cgihandler can be served upwards of 10 times faster than plain CGI We'll talk about mod_python handlers, but first, how does Apache handle requests ...

How Apache handles requests
Apache processes requests in phases (1) translate the requested URI to a file location (2) read the file and send it to the client (3) log the request Exactly which phases occur, depend upon the request eg. authentication phase may need to occur A handler is a piece of software that processes one phase Apache may (depending upon configuration) call more than handler to process a phase

Apache handlers for each phase performed by Apache, there is a default Apache handler additional handlers are provided by Apache modules, like mod_python main job of mod_python is to act as a dispatcher between an Apache handler and Python code written by a developer mod_python handlers only do some work if a configuration directive specifies what function to perform

Example configuration
<Directory /mywebdir> AddHandler mod_python .py PythonHandler myscript PythonDebug On </Directory> AddHandler directive tells Apache that any request for any file ending with .py in the /mywebdir directory or a subdirectory thereof needs to be processed by mod_python "PythonHandler myscript" directive tells mod_python to process the generic handler using the myscript script (generic handler is the phase that produces the response)

Example Configuration
<Directory /mywebdir> AddHandler mod_python .py PythonHandler myscript PythonDebug On </Directory> "PythonDebug On" directive instructs mod_python to send error output to the client (in addition to the logs)

What happens <Directory /mywebdir> AddHandler mod_python .py PythonHandler myscript PythonDebug On </Directory> When a request comes in, Apache starts stepping through its request processing phases calling handlers in mod_python mod_python checks whether a directive for that handler was specified in the configuration (mod_python acts as a dispatcher) No action will be taken by mod_python for all handlers except for the generic handler

What happens <Directory /mywebdir> AddHandler mod_python .py PythonHandler myscript PythonDebug On </Directory> The handler executed by mod_python is the handler function in the myscript script - name determined by mod_python by taking the directive name (PythonHandler), chopping off the Python, and converting to lower case Note that no matter what .py file is requested, the same handler is executed

The handler The handler takes as input the request object
this object contains all info about the request (IP address, headers, URL, etc.) handler uses this request to generate the response (there is no response object) from mod_python import apache def handler(req): req.content_type = "text/plain" req.write("Hello World!") return apache.OK

The handler from mod_python import apache def handler(req): req.content_type = "text/plain" req.write("Hello World!") return apache.OK content type is required by the response; this one doesn't generate html so content type is text/plain call to req.write causes the response headers to be sent to the client and specifies the body of the response

The handler return Apache.OK tells Apache that the request was handled
from mod_python import apache def handler(req): req.content_type = "text/plain" req.write("Hello World!") return apache.OK return Apache.OK tells Apache that the request was handled could also have returned something like below and the appropriate response would have been sent to client (and error also logged) apache.HTTP_INTERNAL_SERVER_ER ROR apache.HTTP_FORBIDDEN

More on handlers A handler is a function that processes a particular phase of a request Apache processes requests in phases - read the request, process headers, provide content, etc. Every phase is handled by either a handler in the Apache core or one of its modules, such as mod_python which passes control to functions provided by the user and written in Python

Rules of Handlers handler function will always be passed a reference to a request object handler function must return either: apache.OK, meaning this phase of the request was handled by this handler and no errors occurred apache.DECLINED, meaning Apache needs to look for another handler in subsequent modules apache.HTTP_ERROR, meaning an HTTP error occurred - many possible values for HTTP_ERROR: see

Rules of Handlers Instead of returning HTTP error code, handlers can raise the apache.SERVER_RETURN exception, providing HTTP error as the exception value: raise apache.SERVER_RETURN, apache.HTTP_FORBIDDEN Handlers can send content to the client using the req.write() method. Client data, such as POST requests, can be read by using the req.read() function.

Filter Handler A filter handler can alter the input (from the client) or the output (to the client) of the server Output filter directives below tells the server that all .py files should be processed by CAPITALIZE filter: #capitalize is the name of the handler, CAPITALIZE is the name under #which it is registered PythonOutputFilter capitalize CAPITALIZE AddOutputFilter CAPITALIZE .py capitalize.py program contains a function called outputfilter that is passed reference to a filter object allowing the server output to be read and modified

Python*Handler Directive Syntax
All request handler directives have the following syntax: Python*Handler handler [handler ...] [ | .ext [.ext ...] ] handler is a callable object that accepts a single argument - request object; handler can be name of module in which case the name of handler is taken from directive .ext is a file extension multiple handlers will cause handlers to be executed sequentially depending upon the handler, you can use the | to only handle files with a specific extension

request handler directives
PythonPostReadRequestHandler – called after request read but before other phases processed PythonTransHandler – called to translate URI into file name PythonHeaderParserHandler – called to process request headers early in the process PythonAccessHandler – called to put restrictions on resource request (for example, by IP number) PythonAuthenHandler – called to handle authentication

PythonAuthzHandler – if present, called right after authen handler to handle additional authentication PythonTypeHandler – used to set documentation type info PythonFixupHandler – used to fix up header fields PythonHandler – the generic handler used to process the request body and generate response; often this is the only one supplied by the programmer

PythonLogHandler – used to perform logging activities PythoncleanupHandler – called right before Apache destroys the request object PythonInputFilter – handler for input from the client PythonOutputFilter – handler for output to the client PythonConnectionHandler – handler called for initial TCP connection

Access to Apache Internals
Python interface to Apache internals is contained in a module named apache apache module can only be imported by a script running under mod_python apache module imported like this: from mod_python import apache the module provides a number of functions and objects that provide access to apache internals; we're going to focus on the request object

Request object The request object is a Python mapping to the Apache request_rec structure When a handler is invoked, it is always passed a single argument - the request object You can dynamically assign attributes to it as a way to communicate between handlers There are a bunch of request object methods and request object data members available

Some request object methods
read([len])- Reads at most len bytes directly from the client, returning a string with the data read; if len argument omitted, reads all data given by the client. readline([len]) - Like read() but reads until end of line (at most) readlines([sizehint])- Reads all lines using readline and returns a list of the lines read; sizehint parameter causes read of at least sizehint bytes of data, up to the completion of the line

methods to dynamically add handlers; persists only through the lifetime of the request add_handler add_input_filter add_output_filter get_remote_host([type, str_is_ip]) - determine remote client's DNS name or IP number; optional parameters indicate how to perform lookup (DNS name or IP address) and what to return

write(string[, flush=1]) - Writes string directly to the client, then flushes the buffer, unless flush is 0. flush() - flushes the output buffer is_https() - Returns non-zero if the connection is using SSL/TLS (secure exchange). Will always return zero if the mod_ssl Apache module is not loaded.

A few request object members
the_request - string containing the first line of the request assbackwards - indicates an HTTP/0.9 “simple” request. This means that the response will contain no headers, only the body hostname - Host, as set by full URI or Host: header method - string containing the method - 'GET', 'HEAD', 'POST', etc.

SetHandler versus AddHandler
Whether we use a SetHandler directive or an AddHandler directive will depend upon how we'll want to respond to requests AddHandler – specifies the handler to use for a request made to a file that has a particular extension AddHandler mod_python .py - Allows us to serve python files as well as other types of files from the same directory (the directory to which the directive applies)

SetHandler versus AddHandler
SetHandler – specifies the handler to use for all requests to a resource within a particular directory - allows us to access files in our directory without the .py extension - however, all files in the directory will be handled by mod_python unless other directives specify otherwise - note to serve files without an extension MultiViews must not be enabled (or otherwise Apache figures out the file to serve) SetHandler mod_python

mod_python publisher handler
a limitation to the mod_python generic handler is that we can only specify one script to be executed in response to a request to a python file within the indicated directory publisher handler allows us to specify a particular python application, as well as a function within that application or a variable within that application in other words, the publisher handler exposes functions and variables to the web as if they were documents

Suppose .htaccess file contains these directives AddHandler mod_python .py PythonHandler mod_python.publisher Request for causes index function in script.py to be executed Request for causes foo function in script.py to be executed (or value of a foo variable to be displayed)

methods within a module can be visible to the client variables within a module (outside of any method) also visible to the client we can also hide visibility by starting method or variable name with an underscore

.htaccess file in directory: ~can/public_html/python
#if directory given, serve the index.py file DirectoryIndex index.py #serve all files in directory using mod_python SetHandler mod_python #generic handler is mod_python.publisher PythonHandler mod_python.publisher #send debugging info to client if error PythonDebug On

index.py file from mod_python import apache def index(req):
welcome = "<h1>method index welcomes you!</h1>" req.content_type = "text/html" req.send_http_header() req.write(welcome) def hello(req): welcome = "<h1>method hello welcomes you!</h1>" def _goodbye(req): welcome = "<h1>method _goodbye says later!</h1>" greeting = "hello from greeting variable" _bye = "bye from bye variable"

Invoking index method in index.py
Any of these will work:

invoking hello method in index.py
Any of these will work:

viewing value of greeting
Any of these will work: ng But any accesses that resolve to _bye or _goodbye will cause a server 404 (document not found) error to be generated

mod_python.publisher handler
we don't have to use the request object to generate a response for the client building a string representing an XHTML document works fine if your function doesn't use the request object, don't declare it and it won't be passed from mod_python import apache def index(): welcome = '''<html> <title>Individual Welcome</title> <body> <h1>Welcome to the one and only you</h1> </body> </html>''' return welcome

Getting form data You can specify in the handler argument list, the names of the data items in the form and mod_python.publisher will automatically initialize them Or, you can use the form attribute of the request object (the form attribute is a FieldStorage object) and get the values via the form attribute

sum form again <?xml version = "1.0" encoding= "utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html xmlns=" xml:lang="en" lang="en"> <head> <title>CS 5530 Sum-mer</title> </head> <body> <h1>CS 5530 Sum-mer</h1> <form method="post" action=" <label>Operand 1: <input name = "op1" type = "text" size = "25" /> </label> </p> <p> <label>Operand 2: <input name = "op2" type = "text" size = "25" /> <input type="submit" name="Submit Entry" value="Submit Entry"></td> </form> </body>

Getting form data from handler parameters
from mod_python import apache def index(req, op1, op2): if (not op1.isdigit() or not op2.isdigit()): page = "<h3>Invalid Input</h3>" else: res = str(int(op1) + int(op2)) page = "<h3>The sum of " + op1 + " plus " + op2 + " is " + res + "<h3>" #Send the content type and header to the browser req.content_type = "text/html" req.send_http_header() req.write(page)

Getting form data from request object
from mod_python import apache def index(req): op1 = req.form.getfirst("op1", "") op2 = req.form.getfirst("op2", "") if (not op1.isdigit() or not op2.isdigit()): page = "<h3>Invalid Input</h3>" else: res = str(int(op1) + int(op2)) page = "<h3>The sum of " + op1 + " plus " + op2 + " is " + res + "<h3>" #Send the content type and header to the browser req.content_type = "text/html" req.send_http_header() req.write(page)

Notes about form data there are more methods available for using the request object form attribute for grabbing form data then the getfirst indicated here – look on-line If your script is going to return form data to the browser in order to display it, be sure to escape the form data to avoid script injection If your script uses the form data to access a database, be sure to sanitize the form data to avoid SQL injection

mod_python publisher for authentication
Publisher handler looks for __auth__, __access__ and __auth_realm__ attributes __auth__ can be either a function or a dictionary function takes request object, username and password; returns True (authentication passed) or False (sends HTTP_UNAUTHORIZED to client causing password box to be displayed) if dictionary, username and password attributes of request object compared to key and value pairs in dictionary; if not found, HTTP_UNAUTHORIZED sent to client

mod_python publisher for authentication
__access__ can be a function or a list function takes request object and username and returns either True or False; False causes HTTP_FORBIDDEN to be returned to client if list, username attribute of request object is compared to each item in list; if not found, HTTP_FORBIDDEN returned to client __auth__ream__ is the name of the string sent to the client to be displayed in password dialog box

Authentication example
from mod_python import apache __auth_realm__ = "Cindy's private website" def index(req): welcome = "<h1>You are welcome to visit</h1>" #Send the content type and header to the browser req.content_type = "text/html" req.send_http_header() req.write(welcome) def __auth__(req, user, pswd): return (user == "can" and pswd == "canorris" or user == "jbf" and pswd == "jbfenwick") def __access__(req, user): return user == "can"

Another authentication example
from mod_python import apache __auth_realm__ = "Cindy's private website" __auth__ = {"can":"canorris", "jbf":"jbfenwick"} __access__ = ["can"] def index(req): welcome = "<h1>You are welcome to visit</h1>" #Send the content type and header to the browser req.content_type = "text/html" req.send_http_header() req.write(welcome)

when the script is requested ...

after entering username/password

Problems with publisher authentication
in the example provided, the password is plain text making it easy for other users on the machine to view the password password can instead be encrypted (see python sha module) module level code is executed even if authentication fails if this needs to be avoided, use Apache mod_auth module or mod_python's PythonAuthenHandler

mod_python PSP handler
PSP handler is a handler that processes documents using the PSP class in mod_python.psp module psp module provides a way to convert text documents (usually HTML documents) containing Python code embedded in special brackets into pure Python code suitable for execution within a mod_python handler, thereby providing a versatile mechanism for delivering dynamic content in a style similar to ASP, JSP and others.

How to configure Apache for PSP
in .htaccess file or directory container AddHandler mod_python .psp PythonHandler mod_python.psp PythonDebug server configuration is On, then by appending an underscore ("_") to the end of the url you can get a nice side-by-side listing of original PSP code and resulting Python code generated by the psp module AddHandler mod_python .psp .psp_ PythonDebug On Note: be sure to change this when debugging is finished

.psp files contain both html (or other text) and python code
python code needs to be surrounded by special tags <% ... %> - contains code to be executed; result of the code does not become part of the generated response <%= ...%> - contains expression to be evaluated; result of the expression becomes part of the generated response ... %> - contains Python directive (include) <% %> - contains comments

Example <?xml version = "1.0" encoding= "utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html xmlns=" xml:lang="en" lang="en"> <head> <title>CS 5530 Time</title> </head> <body> <% import time %> <h1> Hello world, the time is <%=time.strftime("%Y-%m-%d, %H:%M:%S")%> </h1> </body> </html>

Output of time.psp

Response returned to client

Browser view of time.psp_
This is the code that I wrote

Python code in psp files
remember a change of indentation indicates where a block ends in psp files, a block will contain all non-python statements even if the non-python statements are not indented Consider the difference between the next two examples

block.psp <?xml version = "1.0" encoding= "utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html xmlns=" xml:lang="en" lang="en"> <head> <title>CS 5530 Block in PSP</title> </head> <body> <% page = "" for n in range(3): page = page + "<p>Here is a line</p>" %> <p>Here is another line</p> page = "<h1>" + page + "</h1>" <%=page %> </body> </html>

Browser view of block.psp

blockAgain.psp <?xml version = "1.0" encoding= "utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html xmlns=" xml:lang="en" lang="en"> <head> <title>CS 5530 Block in PSP</title> </head> <body> <% page = "" for n in range(3): page = page + "<p>Here is a line</p>" %> page = "<h1>" + page + "</h1>" <p>Here is another line</p> <%=page %> </body> </html>

Browser view of blockAgain.psp

Use of PSP in my sum example
<?xml version = "1.0" encoding= "utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html xmlns=" xml:lang="en" lang="en"> <head><title>CS 5530 Sum-mer</title></head> <body> <h1>CS 5530 Sum-mer</h1> <% page = "" op1 = req.form.getfirst("op1", "") op2 = req.form.getfirst("op2", "") if (not op1.isdigit() or not op2.isdigit()): page = "<h3>Invalid Input</h3>" else: res = str(int(op1) + int(op2)) page = "<h3>The sum of " + op1 + " plus " + op2 + " is " + res + "</h3>" # comment ends block %> <%=page %> </body> </html> contents of the file sum.psp

PSP for templating PSP as shown in the earlier examples violates the Model-View-Controller (MVC) paradigm because the application logic (the model) resides inside of the presentation logic (the view) To get a cleaner separation between the application and the presentation, we can use the PSP class within the psp module as a templating tool .py file is used to perform the logic template resides in another file to provide the view

Important PSP methods PSP constructor –
takes as input the request object and the name of a file that contains the template template file contains PSP tags of the form <%= name %> the PSP parser will replace these tags by indicated expression run([vars, flush]) execute the code by parsing the PSP source vars is a dictionary keyed by strings that will be passed in as global variables flush indicates whether output should be flushed (default no)

Using PSP templating for sum example
from mod_python import apache, psp def index(req, op1, op2): if (not op1.isdigit() or not op2.isdigit()): page = "<h3>Invalid Input</h3>" else: res = str(int(op1) + int(op2)) page = "<h3>The sum of " + op1 + " plus " + op2 + " is " + res + "<h3>" req.content_type = "text/html" req.send_http_header() template = psp.PSP(req, filename="sum.html") template.run({'response':page}) my sum.py file performs the application logic

Using PSP templating for sum example
<?xml version = "1.0" encoding= "utf-8"?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" " <html xmlns=" xml:lang="en" lang="en"> <head> <title>CS 5530 Sum-mer</title> </head> <body> <h1> CS 5530 Sum-mer </h1> <%=response %> </body> </html> my sum.html is the template that provides the view

Output is as expected

Serving .psp files and .py files
DirectoryIndex index.psp AddHandler mod_python .psp .psp_ .py <FilesMatch "\.(psp|psp_)$"> PythonHandler mod_python.psp </FilesMatch> <Files *.py> PythonHandler mod_python.publisher </Files> PythonDebug On Here is how I can set up my .htaccess file so that I can serve .py files and .psp files from the same directory

Python web programming

Similar presentations

Presentation on theme: "Python web programming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Python web programming

Similar presentations

Presentation on theme: "Python web programming"— Presentation transcript:

Similar presentations

About project

Feedback