(Warning: CGI scripts without
appropriate safeguards can compromise your site's
security. The scripts presented here are simple
examples and are not assured to be secure for actual
Web use.)
CGI scripts [26] are scripts that reside on a
web server and can be run by a client (browser). The
client accesses a CGI script by its URL, just as they
would a regular page. The server, recognizing that the
URL requested is a CGI script, runs it. How the server
recognizes certain URLs as scripts is up to the server
administrator. For the purposes of this text, we will
assume that they are stored in a distinguished
directory called cgi-bin. Thus, the script
testcgi.scm on the server www.foo.org would
be accessed as http://www.foo.org/cgi-bin/testcgi.scm.
The server runs the CGI script as the user nobody,
who cannot be expected to have any
PATH knowledge (which is highly subjective
anyway). Therefore the introductory magic line for a
CGI script written in Scheme needs to be a bit more
explicit than the one we used for ordinary Scheme
scripts. Eg, the line
":";exec mzscheme -r $0 "$@"
implicitly assumes that there is a particular shell
(bash, say), and that there is a PATH, and that
mzscheme is in it. For CGI scripts, we will need
to be more expansive:
This gives fully qualified pathnames for the shell and
the Scheme executable. The transfer of control from
shell to Scheme proceeds as for regular scripts.
Here is an example Scheme CGI script,
testcgi.scm, that outputs the settings of some
commonly used CGI environment variables. This
information is returned as a new, freshly created, page
to the browser. The returned page is simply whatever
the CGI script writes to its standard output. This is
how CGI scripts talk back to whoever called them -- by
giving them a new page.
Note that the script first outputs the line
content-type: text/plain
followed by a blank line. This is standard ritual
for a web server serving up a page. These two lines
aren't part of what is actually displayed as the page.
They are there to inform the browser that the page being sent
is plain (ie, un-marked-up) text, so the browser can
display it appropriately. If we were producing text
marked up in HTML, the
content-type would be
text/html.
The script testcgi.scm:
#!/bin/sh":";exec /usr/local/bin/mzscheme -r $0 "$@";Identify content-type as plain text.
(display"content-type: text/plain") (newline)
(newline)
;Generate a page with the requested info. This is;done by simply writing to standard output.
(for-each
(lambda (env-var)
(displayenv-var)
(display" = ")
(display (or (getenvenv-var) ""))
(newline))
'("AUTH_TYPE""CONTENT_LENGTH""CONTENT_TYPE""DOCUMENT_ROOT""GATEWAY_INTERFACE""HTTP_ACCEPT""HTTP_REFERER"; [sic]"HTTP_USER_AGENT""PATH_INFO""PATH_TRANSLATED""QUERY_STRING""REMOTE_ADDR""REMOTE_HOST""REMOTE_IDENT""REMOTE_USER""REQUEST_METHOD""SCRIPT_NAME""SERVER_NAME""SERVER_PORT""SERVER_PROTOCOL""SERVER_SOFTWARE"))
testcgi.scm can be called directly by opening it on
a browser. The URL is:
http://www.foo.org/cgi-bin/testcgi.scm
Alternately, testcgi.scm can occur as a link in an
HTML file, which you can click. Eg,
... To view some common CGI environment variables, click
<a href="http://www.foo.org/cgi-bin/testcgi.scm">here</a>.
...
However testcgi.scm is launched, it will produce a
plain text page containing the settings of the
environment variables. An example output:
testcgi.scm does not take any input from the user.
A more focused script would take an argument
environment variable from the user, and output the
setting of that variable and none else. For this, we
need a mechanism for feeding arguments to CGI scripts.
The form tag of HTML provides this capability.
Here is a sample HTML page for this purpose:
The user enters the desired environment variable (eg,
GATEWAY_INTERFACE) in the
textbox and clicks the submit button. This causes all
the information in the form -- here, the setting of
the parameter envvar to the value
GATEWAY_INTERFACE -- to be collected and sent to
the CGI script identified by the form, viz,
testcgi2.scm. The information can be sent in one
of two ways: (1) if the form's method=get (the
default), the information is sent via the environment
variable called QUERY_STRING; (2) if the form's
method=post, the information is available to the
CGI script at the latter's standard input port
(stdin). Our form uses QUERY_STRING.
It is testcgi2.scm's responsibility to extract the
information from
QUERY_STRING, and output the answer page
accordingly.
The information to the CGI script, whether arriving via
an environment variable or through stdin, is
formatted as a sequence of parameter/argument pairs.
The pairs are separated from each other by the &
character. Within a pair, the parameter occurs first
and is separated from the argument by the =
character. In this case, there is only
one parameter/argument pair, viz,
envvar=GATEWAY_INTERFACE.
The script testcgi2.scm:
#!/bin/sh":";exec /usr/local/bin/mzscheme -r $0 "$@"
(display"content-type: text/plain") (newline)
(newline)
;string-index returns the leftmost index in string s;that has character c
(definestring-index
(lambda (sc)
(let ((n (string-lengths)))
(letloop ((i0))
(cond ((>=in) #f)
((char=? (string-refsi) c) i)
(else (loop (+i1))))))))
;split breaks string s into substrings separated by character c
(definesplit
(lambda (cs)
(letloop ((ss))
(if (string=?s"") '()
(let ((i (string-indexsc)))
(ifi (cons (substrings0i)
(loop (substrings (+i1)
(string-lengths))))
(lists)))))))
(defineargs
(map (lambda (par-arg)
(split#\=par-arg))
(split#\& (getenv"QUERY_STRING"))))
(defineenvvar (cadr (assoc"envvar"args)))
(displayenvvar)
(display" = ")
(display (getenvenvvar))
(newline)
Note the use of a helper procedure split to split
the QUERY_STRING into parameter/argument pairs
along the & character, and then splitting parameter
and argument along the = character. (If we had
used the post method rather than get, we would
have needed to extract the parameters and arguments
from the standard input.)
The <input type=text> and <input type=submit>
are but two of the many different input tags
possible in an HTML form. Consult [26] for
the full repertoire.
In the example above, the parameter's name or the
argument it assumed did not themselves contain any
`&' or `=' characters. In general, they may.
To accommodate such characters, and not have them be
mistaken for separators, the CGI argument-passing
mechanism treats all characters other than letters,
digits, and the underscore, as special, and
transmits them in an encoded form. A space is encoded
as a `+'. For other special characters, the
encoding is a three-character sequence, and consists of
`%' followed the special character's hexadecimal
code. Thus, the character sequence `20% + 30% =
50%, &c.' will be encoded as
Instead of dealing anew with the task of getting and
decoding the form data in each CGI script, it is
convenient to collect some helpful procedures into a
library file
cgi.scm. testcgi2.scm can then be written
more compactly as
#!/bin/sh":";exec /usr/local/bin/mzscheme -r $0 "$@";Load the cgi utilities
(load-relatve"cgi.scm")
(display"content-type: text/plain") (newline)
(newline)
;Read the data input via the form
(parse-form-data)
;Get the envvar parameter
(defineenvvar (form-data-get/1"envvar"))
;Display the value of the envvar
(displayenvvar)
(display" = ")
(display (getenvenvvar))
(newline)
This shorter CGI script uses two utility procedures
defined in cgi.scm. parse-form-data to read
the data supplied by the user via the form. The data
consists of parameters and their associated values.
form-data-get/1 finds the value associated with a
particular parameter.
cgi.scm defines a global table called
*form-data-table* to store form data.
;Load our table definitions
(load-relative"table.scm")
;Define the *form-data-table*
(define*form-data-table* (make-table'equstring=?))
An advantage of using a general mechanism such
as the parse-form-data procedure is that we can
hide the details of what method (get or
put) was used.
The environment variable
REQUEST_METHOD tells which method was used to transmit
the form data. If the method is GET, then the form
data was sent as the string available via another
environment variable, QUERY_STRING. The auxiliary
procedure
parse-form-data-using-query-string is used to pick
apart QUERY_STRING:
The helper procedure split, and its helper
string-index, are defined as in sec 17.2.
As noted, the incoming form data is a sequence of
name-value pairs separated by &s. Within each
pair, the name comes first, followed by an =
character, followed by the value. Each name-value
combination is collected into a global table, the
*form-data-table*.
Both name and value are encoded, so we
need to decode them using the url-decode procedure
to get their actual representation.
`+' is converted into space. A triliteral of the
form `%xy'
is converted, using the procedure hex->char into
the character whose
ascii encoding is the hex number `xy'.
The POST method sends form data via the script's
stdin. The number of characters sent is placed in
the environment variable CONTENT_LENGTH.
parse-form-data-using-stdin reads the required
number of characters from stdin, and populates the
*form-data-table* as before, making sure to decode
the parameters' names and values.
It remains to retrieve the values for specific
parameters from the *form-data-table*. Note that
the table associates a list with each parameter, in
order to accommodate the possibility of multiple values
for a parameter. form-data-get retrieves all the
values assigned to a parameter. If there is only one
value, it returns a singleton containing that value.
In our examples so far, the CGI script has generated
plain text. Generally, though, we will want to
generate an HTML page. It is not uncommon for a
combination of HTML form and CGI script to trigger a
series of HTML pages with forms. It is also common to
code all the action corresponding to these various
forms in a single CGI script. In any case, it is
helpful to have a utility procedure that writes out strings
in HTML format, ie, with the HTML special characters
encoded appropriately: