Clojure is a homoiconic language, which is a fancy term describing the fact that Clojure programs are represented by Clojure data structures. This is a very important difference between Clojure (and Common Lisp) and most other programming languages--Clojure is defined in terms of the evaluation of data structures and not in terms of the syntax of character streams/files. It is quite common, and easy, for Clojure programs to manipulate, transform and produce other Clojure programs.
That said, most Clojure programs begin life as text files, and it is the task of the reader to parse the text and produce the data structure the compiler will see. This is not merely a phase of the compiler. The reader, and the Clojure data representations, have utility on their own in many of the same contexts one might use XML or JSON etc.
One might say the reader has syntax defined in terms of characters, and the Clojure language has syntax defined in terms of symbols, lists, vectors, maps etc. The reader is represented by the function read, which reads the next form (not character) from a stream, and returns the object represented by that form.
Since we have to start somewhere, this reference starts where evaluation starts, with the reader forms. This will inevitably entail talking about data structures whose descriptive details, and interpretation by the compiler, will follow.
Symbols begin with a non-numeric character and can contain alphanumeric characters and *
, +
, !
, -
, _
, and ?
(other characters will be allowed eventually, but not all macro characters have been determined). '/' has special meaning, it can be used once in the middle of a symbol to separate the namespace from the name, e.g. my-namespace/foo. '/' by itself names the division function. '.' has special meaning--it can be used one or more times in the middle of a symbol to designate a fully-qualified class name, e.g. java.util.BitSet, or in namespace names. Symbols beginning or ending with '.' are reserved by Clojure. Symbols containing / or . are said to be 'qualified'. Symbols beginning or ending with ':' are reserved by Clojure. A symbol can contain one or more non-repeating ':'s.
Strings - Enclosed in "double quotes". May span multiple lines. Standard Java escape characters are supported.
Numbers - as per Java, plus indefinitely long integers are supported, as well as ratios, e.g. 22/7. Floating point numbers with an M suffix are read as BigDecimals. Integers can be specified in any base supported by Integer.parseInt(), that is any radix from 2 to 36; for example 2r101010, 8r52, 36r16, and 42 are all the same Integer.
Characters - preceded by a backslash: \c
. \newline
, \space
and \tab
yield the corresponding characters.
nil Means 'nothing/no-value'- represents Java null and tests logical false
Booleans - true and false
Keywords
Keywords are like symbols, except:
They can and must begin with a colon, e.g. :fred.
They cannot contain '.' or name classes.
A keyword that begins with two colons is resolved in the current namespace:
In the user namespace, ::rect is read as :user/rect
Lists are zero or more forms enclosed in parentheses: (a b c)
Vectors are zero or more forms enclosed in square brackets: [1 2 3]
Maps are zero or more key/value pairs enclosed in braces: :a 1 :b 2 Commas are considered whitespace, and can be used to organize the pairs: :a 1, :b 2 Keys and values can be any forms.
Sets are zero or more forms enclosed in braces preceded by #
:
#{:a :b :c}
The behavior of the reader is driven by a combination of built-in constructs and an extension system called the read table. Entries in the read table provide mappings from certain characters, called macro characters, to specific reading behavior, called reader macros. Unless indicated otherwise, macro characters cannot be used in user symbols.
For all forms other than Symbols, Lists, Vectors, Sets and Maps, `x is the same as 'x.
For Symbols, syntax-quote resolves the symbol in the current context, yielding a fully-qualified symbol (i.e. namespace/name or fully.qualified.Classname). If a symbol is non-namespace-qualified and ends with '#', it is resolved to a generated symbol with the same name to which '_' and a unique id have been appended. e.g. x# will resolve to x_123. All references to that symbol within a syntax-quoted expression resolve to the same generated symbol.
For Lists/Vectors/Sets/Maps, syntax-quote establishes a template of the corresponding data structure. Within the template, unqualified forms behave as if recursively syntax-quoted, but forms can be exempted from such recursive quoting by qualifying them with unquote or unquote-splicing, in which case they will be treated as expressions and be replaced in the template by their value, or sequence of values, respectively.
For example:
user=> (def x 5) user=> (def lst '(a b c)) user=> `(fred x ~x lst ~@lst 7 8 :nine) (user/fred user/x 5 user/lst a b c 7 8 :nine) |
(read) (read stream) (read stream eof-is-error) (read stream eof-is-error eof-value) (read stream eof-is-error eof-value is-recursive) |
Reads the next object from stream, which must be an instance of java.io.PushbackReader or some derivee. stream defaults to the current value of *in* . eof-is-error defaults to true, in which case encountering the end of file during the read is an error. If eof-is-error is nil, then eof-value (defaults to nil) will be returned when EOF is encountered. Finally is-recursive (defaults to nil) indicates that this call to read is happening within another call to read.