Teach Yourself Scheme in Fixnum Days introduction to the programming language Scheme">

Chapter 2

Data types

A data type is a collection of related values. These collections need not be disjoint, and they are often hierarchical. Scheme has a rich set of data types: some are simple (indivisible) data types and others are compound data types made by combining other data types.

2.1 Simple data types

The simple data types of Scheme include booleans, numbers, characters, and symbols.

2.1.1 Booleans

Scheme's booleans are #t for true and #f for false. Scheme has a predicate procedure called boolean? that checks if its argument is boolean.

(boolean? #t)              =>  #t 
(boolean? "Hello, World!") =>  #f

The procedure not negates its argument, considered as a boolean.

(not #f)              =>  #t 
(not #t)              =>  #f 
(not "Hello, World!") =>  #f

The last expression illustrates a Scheme convenience: In a context that requires a boolean, Scheme will treat any value that is not #f as a true value.

2.1.2 Numbers

Scheme numbers can be integers (eg, 42), rationals (22/7), reals (3.1416), or complex (2+3i). An integer is a rational is a real is a complex number is a number. Predicates exist for testing the various kinds of numberness:

(number? 42)       =>  #t 
(number? #t)       =>  #f 
(complex? 2+3i)    =>  #t 
(real? 2+3i)       =>  #f 
(real? 3.1416)     =>  #t 
(real? 22/7)       =>  #t 
(real? 42)         =>  #t 
(rational? 2+3i)   =>  #f 
(rational? 3.1416) =>  #t 
(rational? 22/7)   =>  #t 
(integer? 22/7)    =>  #f 
(integer? 42)      =>  #t

Scheme integers need not be specified in decimal (base 10) format. They can be specified in binary by prefixing the numeral with #b. Thus #b1100 is the number twelve. The octal prefix is #o and the hex prefix is #x. (The optional decimal prefix is #d.)

Numbers can tested for equality using the general-purpose equality predicate eqv?.

(eqv? 42 42)   =>  #t 
(eqv? 42 #f)   =>  #f 
(eqv? 42 42.0) =>  #f

However, if you know that the arguments to be compared are numbers, the special number-equality predicate = is more apt.

(= 42 42)   =>  #t 
(= 42 #f)   -->ERROR!!! 
(= 42 42.0) =>  #t

Other number comparisons allowed are <, <=, >, >=.

(< 3 2)    =>  #f 
(>= 4.5 3) =>  #t

Arithmetic procedures +, -, *, /, expt have the expected behavior:

(+ 1 2 3)    =>  6 
(- 5.3 2)    =>  3.3 
(- 5 2 1)    =>  2 
(* 1 2 3)    =>  6 
(/ 6 3)      =>  2 
(/ 22 7)     =>  22/7 
(expt 2 3)   =>  8 
(expt 4 1/2) =>  2.0

For a single argument, - and / return the negation and the reciprocal respectively:

(- 4) =>  -4 
(/ 4) =>  1/4

The procedures max and min return the maximum and minimum respectively of the number arguments supplied to them. Any number of arguments can be so supplied.

(max 1 3 4 2 3) =>  4 
(min 1 3 4 2 3) =>  1

The procedure abs returns the absolute value of its argument.

(abs  3) =>  3 
(abs -4) =>  4

This is just the tip of the iceberg. Scheme provides a large and comprehensive suite of arithmetic and trigonometric procedures. For instance, atan, exp, and sqrt respectively return the arctangent, natural antilogarithm, and square root of their argument. Consult R5RS [22] for more details.

2.1.3 Characters

Scheme character data are represented by prefixing the character with #\. Thus, #\c is the character c. Some non-graphic characters have more descriptive names, eg, #\newline, #\tab. The character for space can be written #\ , or more readably, #\space.

The character predicate is char?:

(char? #\c) =>  #t 
(char? 1)   =>  #f 
(char? #\;) =>  #t

Note that a semicolon character datum does not trigger a comment.

The character data type has its set of comparison predicates: char=?, char<?, char<=?, char>?, char>=?.

(char=? #\a #\a)  =>  #t 
(char<? #\a #\b)  =>  #t 
(char>=? #\a #\b) =>  #f

To make the comparisons case-insensitive, use char-ci instead of char in the procedure name:

(char-ci=? #\a #\A) =>  #t 
(char-ci<? #\a #\B) =>  #t

The case conversion procedures are char-downcase and char-upcase:

(char-downcase #\A) =>  #\a 
(char-upcase #\a)   =>  #\A

2.1.4 Symbols

The simple data types we saw above are self-evaluating. Ie, if you typed any object from these data types to the listener, the evaluated result returned by the listener will be the same as what you typed in.

#t  =>  #t 
42  =>  42 
#\c =>  #\c

Symbols don't behave the same way. This is because symbols are used by Scheme programs as identifiers for variables, and thus will evaluate to the value that the variable holds. Nevertheless, symbols are a simple data type, and symbols are legitimate values that Scheme can traffic in, along with characters, numbers, and the rest.

To specify a symbol without making Scheme think it is a variable, you should quote the symbol:

(quote xyz) 
=>  xyz

Since this type of quoting is very common in Scheme, a convenient abbreviation is provided. The expression

'E

will be treated by Scheme as equivalent to

(quote E)

Scheme symbols are named by a sequence of characters. About the only limitation on a symbol's name is that it shouldn't be mistakable for some other data, eg, characters or booleans or numbers or compound data. Thus, this-is-a-symbol, i18n, <=>, and $!#* are all symbols; 16, -i (a complex number!), #t, "this-is-a-string", and (barf) (a list) are not. The predicate for checking symbolness is called symbol?:

(symbol? 'xyz) =>  #t 
(symbol? 42)   =>  #f

Scheme symbols are normally case-insensitive. Thus the symbols Calorie and calorie are identical:

(eqv? 'Calorie 'calorie) 
=>  #t

We can use the symbol xyz as a global variable by using the form define:

(define xyz 9)

This says the variable xyz holds the value 9. If we feed xyz to the listener, the result will be the value held by xyz:

xyz 
=>  9

We can use the form set! to change the value held by a variable:

(set! xyz #\c)

Now

xyz 
=>  #\c

2.2 Compound data types

Compound data types are built by combining values from other data types in structured ways.

2.2.1 Strings

Strings are sequences of characters (not to be confused with symbols, which are simple data that have a sequence of characters as their name). You can specify strings by enclosing the constituent characters in double-quotes. Strings evaluate to themselves.

"Hello, World!" 
=>  "Hello, World!"

The procedure string takes a bunch of characters and returns the string made from them:

(string #\h #\e #\l #\l #\o) 
=>  "hello"

Let us now define a global variable greeting.

(define greeting "Hello; Hello!")

Note that a semicolon inside a string datum does not trigger a comment.

The characters in a given string can be individually accessed and modified. The procedure string-ref takes a string and a (0-based) index, and returns the character at that index:

(string-ref greeting 0) 
=>  #\H

New strings can be created by appending other strings:

(string-append "E " 
               "Pluribus " 
               "Unum") 
=>  "E Pluribus Unum"

You can make a string of a specified length, and fill it with the desired characters later.

(define a-3-char-long-string (make-string 3))

The predicate for checking stringness is string?.

Strings obtained as a result of calls to string, make-string, and string-append are mutable. The procedure string-set! replaces the character at a given index:

(define hello (string #\h #\e #\l #\l #\o))  
hello 
=>  "Hello" 
 
(string-set! hello 1 #\a) 
hello 
=>  "Hallo"

2.2.2 Vectors

Vectors are sequences like strings, but their elements can be anything, not just characters. Indeed, the elements can be vectors themselves, which is a good way to generate multidimensional vectors.

Here's a way to create a vector of the first five integers:

(vector 0 1 2 3 4) 
=>  #(0 1 2 3 4)

Note Scheme's representation of a vector value: a # character followed by the vector's contents enclosed in parentheses.

In analogy with make-string, the procedure make-vector makes a vector of a specific length:

(define v (make-vector 5))

The procedures vector-ref and vector-set! access and modify vector elements. The predicate for checking if something is a vector is vector?.

2.2.3 Dotted pairs and lists

A dotted pair is a compound value made by combining any two arbitrary values into an ordered couple. The first element is called the car, the second element is called the cdr, and the combining procedure is cons.

(cons 1 #t) 
=>  (1 . #t)

Dotted pairs are not self-evaluating, and so to specify them directly as data (ie, without producing them via a cons-call), one must explicitly quote them:

'(1 . #t) =>  (1 . #t) 
 
(1 . #t)  -->ERROR!!!

The accessor procedures are car and cdr:

(define x (cons 1 #t)) 
 
(car x) 
=>  1 
 
(cdr x) 
=>  #t

The elements of a dotted pair can be replaced by the mutator procedures set-car! and set-cdr!:

(set-car! x 2) 
 
(set-cdr! x #f) 
 
x 
=>  (2 . #f)

Dotted pairs can contain other dotted pairs.

(define y (cons (cons 1 2) 3)) 
 
y 
=>  ((1 . 2) . 3)

The car of the car of this list is 1. The cdr of the car of this list is 2. Ie,

(car (car y)) 
=>  1 
 
(cdr (car y)) 
=>  2

Scheme provides procedure abbreviations for cascaded compositions of the car and cdr procedures. Thus, caar stands for ``car of car of'', and cdar stands for ``cdr of car of'', etc.

(caar y) 
=>  1 
 
(cdar y) 
=>  2

c...r-style abbreviations for upto four cascades are guaranteed to exist. Thus, cadr, cdadr, and cdaddr are all valid. cdadadr might be pushing it.

When nested dotting occurs along the second element, Scheme uses a special notation to represent the resulting expression:

(cons 1 (cons 2 (cons 3 (cons 4 5)))) 
=>  (1 2 3 4 . 5)

Ie, (1 2 3 4 . 5) is an abbreviation for (1 . (2 . (3 . (4 . 5)))). The last cdr of this expression is 5.

Scheme provides a further abbreviation if the last cdr is a special object called the empty list, which is represented by the expression (). The empty list is not considered self-evaluating, and so one should quote it when supplying it as a value in a program:

'() =>  ()

The abbreviation for a dotted pair of the form (1 . (2 . (3 . (4 . ())))) is

(1 2 3 4)

This special kind of nested dotted pair is called a list. This particular list is four elements long. It could have been created by saying

(cons 1 (cons 2 (cons 3 (cons 4 '()))))

but Scheme provides a procedure called list that makes list creation more convenient. list takes any number of arguments and returns the list containing them:

(list 1 2 3 4) 
=>  (1 2 3 4)

Indeed, if we know all the elements of a list, we can use quote to specify the list:

'(1 2 3 4) 
=>  (1 2 3 4)

List elements can be accessed by index.

(define y (list 1 2 3 4)) 
 
(list-ref y 0) =>  1 
(list-ref y 3) =>  4 
 
(list-tail y 1) =>  (2 3 4) 
(list-tail y 3) =>  (4)

list-tail returns the tail of the list starting from the given index.

The predicates pair?, list?, and null? check if their argument is a dotted pair, list, or the empty list, respectively:

(pair? '(1 . 2)) =>  #t 
(pair? '(1 2))   =>  #t 
(pair? '())      =>  #f 
(list? '())      =>  #t 
(null? '())      =>  #t 
(list? '(1 2))   =>  #t 
(list? '(1 . 2)) =>  #f 
(null? '(1 2))   =>  #f 
(null? '(1 . 2)) =>  #f

2.2.4 Conversions between data types

Scheme offers many procedures for converting among the data types. We already know how to convert between the character cases using char-downcase and char-upcase. Characters can be converted into integers using char->integer, and integers can be converted into characters using integer->char. (The integer corresponding to a character is usually its ascii code.)

(char->integer #\d) =>  100 
(integer->char 50)  =>  #\2

Strings can be converted into the corresponding list of characters.

(string->list "hello") =>  (#\h #\e #\l #\l #\o)

Other conversion procedures in the same vein are list->string, vector->list, and list->vector.

Numbers can be converted to strings:

(number->string 16) =>  "16"

Strings can be converted to numbers. If the string corresponds to no number, #f is returned.

(string->number "16") 
=>  16 
 
(string->number "Am I a hot number?") 
=>  #f

string->number takes an optional second argument, the radix.

(string->number "16" 8) =>  14

because 16 in base 8 is the number fourteen.

Symbols can be converted to strings, and vice versa:

(symbol->string 'symbol) 
=>  "symbol" 
 
(string->symbol "string") 
=>  string

2.3 Other data types

Scheme contains some other data types. One is the procedure. We have already seen many procedures, eg, display, +, cons. In reality, these are variables holding the procedure values, which are themselves not visible as are numbers or characters:

cons 
=>  <procedure>

The procedures we have seen thus far are primitive procedures, with standard global variables holding them. Users can create additional procedure values.

Yet another data type is the port. A port is the conduit through which input and output is performed. Ports are usually associated with files and consoles.

In our ``Hello, World!'' program, we used the procedure display to write a string to the console. display can take two arguments, one the value to be displayed, and the other the output port it should be displayed on.

In our program, display's second argument was implicit. The default output port used is the standard output port. We can get the current standard output port via the procedure-call (current-output-port). We could have been more explicit and written

(display "Hello, World!" (current-output-port))

2.4 S-expressions

All the data types discussed here can be lumped together into a single all-encompassing data type called the s-expression (s for symbolic). Thus 42, #\c, (1 . 2), #(a b c), "Hello", (quote xyz), (string->number "16"), and (begin (display "Hello, World!") (newline)) are all s-expressions.