previous up next contents
Previous: 1 さぁ始めよう / Getting Up: Clojureリファレンス Next: 3 The REPL and

Subsections


2 The Reader

Clojure is a homoiconic language, which is a fancy term describing the fact that Clojure programs are represented by Clojure data structures. This is a very important difference between Clojure (and Common Lisp) and most other programming languages--Clojure is defined in terms of the evaluation of data structures and not in terms of the syntax of character streams/files. It is quite common, and easy, for Clojure programs to manipulate, transform and produce other Clojure programs.

That said, most Clojure programs begin life as text files, and it is the task of the reader to parse the text and produce the data structure the compiler will see. This is not merely a phase of the compiler. The reader, and the Clojure data representations, have utility on their own in many of the same contexts one might use XML or JSON etc.

One might say the reader has syntax defined in terms of characters, and the Clojure language has syntax defined in terms of symbols, lists, vectors, maps etc. The reader is represented by the function read, which reads the next form (not character) from a stream, and returns the object represented by that form.

Since we have to start somewhere, this reference starts where evaluation starts, with the reader forms. This will inevitably entail talking about data structures whose descriptive details, and interpretation by the compiler, will follow.

2.1 Reader forms

2.1.1 Symbols

Symbols begin with a non-numeric character and can contain alphanumeric characters and *, +, !, -, _, and ? (other characters will be allowed eventually, but not all macro characters have been determined). '/' has special meaning, it can be used once in the middle of a symbol to separate the namespace from the name, e.g. my-namespace/foo. '/' by itself names the division function. '.' has special meaning--it can be used one or more times in the middle of a symbol to designate a fully-qualified class name, e.g. java.util.BitSet, or in namespace names. Symbols beginning or ending with '.' are reserved by Clojure. Symbols containing / or . are said to be 'qualified'. Symbols beginning or ending with ':' are reserved by Clojure. A symbol can contain one or more non-repeating ':'s.

2.1.2 Literals

Strings - Enclosed in "double quotes". May span multiple lines. Standard Java escape characters are supported. Numbers - as per Java, plus indefinitely long integers are supported, as well as ratios, e.g. 22/7. Floating point numbers with an M suffix are read as BigDecimals. Integers can be specified in any base supported by Integer.parseInt(), that is any radix from 2 to 36; for example 2r101010, 8r52, 36r16, and 42 are all the same Integer. Characters - preceded by a backslash: \c. \newline, \space and \tab yield the corresponding characters. nil Means 'nothing/no-value'- represents Java null and tests logical false Booleans - true and false Keywords Keywords are like symbols, except: They can and must begin with a colon, e.g. :fred. They cannot contain '.' or name classes. A keyword that begins with two colons is resolved in the current namespace: In the user namespace, ::rect is read as :user/rect

2.1.3 Lists

Lists are zero or more forms enclosed in parentheses: (a b c)

2.1.4 Vectors

Vectors are zero or more forms enclosed in square brackets: [1 2 3]

2.1.5 Maps

Maps are zero or more key/value pairs enclosed in braces: :a 1 :b 2 Commas are considered whitespace, and can be used to organize the pairs: :a 1, :b 2 Keys and values can be any forms.

2.1.6 Sets

Sets are zero or more forms enclosed in braces preceded by #: #{:a :b :c}

2.2 Macro characters

The behavior of the reader is driven by a combination of built-in constructs and an extension system called the read table. Entries in the read table provide mappings from certain characters, called macro characters, to specific reading behavior, called reader macros. Unless indicated otherwise, macro characters cannot be used in user symbols.

2.2.1 Quote (')

'form => (quote form)

2.2.2 Character

As per above, yields a character literal.

2.2.3 Comment (;)

Single-line comment, causes the reader to ignore everything from the semicolon to the end-of-line.

2.2.4 Meta ()

form => (meta form)

2.2.5 Deref (@)

@form => (deref form)

2.2.6 Dispatch (#)

The dispatch macro causes the reader to use a reader macro from another table, indexed by the character following #: # - see Sets above Regex patterns (#"pattern") A regex pattern is read and compiled at read time. The resulting object is of type java.util.regex.Pattern. Metadata (#^) Symbols, Lists, Vector, Sets and Maps can have metadata, which is a map associated with the object. The metadata reader macro first reads the metadata and attaches it to the next form read: #^:a 1 :b 2 [1 2 3] yields the vector [1 2 3] with a metadata map of :a 1 :b 2. A shorthand version allows the metadata to be a simple symbol or keyword, in which case it is treated as a single entry map with a key of :tag and a value of the symbol provided, e.g.: #^String x is the same as #^:tag String x Such tags can be used to convey type information to the compiler. Var-quote (#') #'x => (var x) Anonymous function literal (#()) #(...) => (fn [args] (...)) where args are determined by the presence of argument literals taking the form Ignore next form (#_) The form following #_ is completely skipped by the reader. (This is a more complete removal than the comment macro which yields nil).

2.2.7 Syntax-quote (`, note, the ``backquote'' character), Unquote ( ) and Unquote-splicing ( @)

For all forms other than Symbols, Lists, Vectors, Sets and Maps, `x is the same as 'x.

For Symbols, syntax-quote resolves the symbol in the current context, yielding a fully-qualified symbol (i.e. namespace/name or fully.qualified.Classname). If a symbol is non-namespace-qualified and ends with '#', it is resolved to a generated symbol with the same name to which '_' and a unique id have been appended. e.g. x# will resolve to x_123. All references to that symbol within a syntax-quoted expression resolve to the same generated symbol.

For Lists/Vectors/Sets/Maps, syntax-quote establishes a template of the corresponding data structure. Within the template, unqualified forms behave as if recursively syntax-quoted, but forms can be exempted from such recursive quoting by qualifying them with unquote or unquote-splicing, in which case they will be treated as expressions and be replaced in the template by their value, or sequence of values, respectively.

For example:
user=> (def x 5)
user=> (def lst '(a b c))
user=> `(fred x ~x lst ~@lst 7 8 :nine)
(user/fred user/x 5 user/lst a b c 7 8 :nine)
The read table is currently not accessible to user programs.

(read)
(read stream)
(read stream eof-is-error)
(read stream eof-is-error eof-value)
(read stream eof-is-error eof-value is-recursive)

Reads the next object from stream, which must be an instance of java.io.PushbackReader or some derivee. stream defaults to the current value of *in* . eof-is-error defaults to true, in which case encountering the end of file during the read is an error. If eof-is-error is nil, then eof-value (defaults to nil) will be returned when EOF is encountered. Finally is-recursive (defaults to nil) indicates that this call to read is happening within another call to read.


previous up next contents
Previous: 1 さぁ始めよう / Getting Up: Clojureリファレンス Next: 3 The REPL and
MARUI Atsushi
2013-01-12