-
Notifications
You must be signed in to change notification settings - Fork 17
Datatypes
Static type checking helps catch errors at compile time, producing more solid code which often executes faster because expensive dynamic runtime type checks may be omitted. However, it usually hurts conciseness and expressiveness, because the meaning of code may be obscured by the weight of repeated type declarations. Further, nominal type-checking can be overly restrictive, requiring rigid conformance to type-hierarchies for type compatibility.
Irken's global Type-Inference allows most type-declarations to be omitted. When combined with exhaustive pattern matching, it offers a very concise and safe programming model.
Irken's records are Structurally Typed, using Row Polymorphism, allowing record types to be type compatible without any shared type hierarchy. Algebraic datatypes offer both nominal and structural typing.
Irken has parametric typing, allowing types in functions and type specifications to be expressed as a type-variable. With type inference, type parameters are inferred automatically, though they may also be specified explicitly. Parametrics allow us to write one data-structure, such as (list 'a)
, and safely use it over any datatype 'a
; they assure the type (list int)
is not confused with the type (list string)
; and they allow parametric polymorphism, where we can define functions which operate over parametric datatypes.
Here we define a function which takes a function and a value, and calls the function with the value, returning the result. This works for any
(include "lib/basis.scm")
(define (fun a b) (a b)) ;; function of type ( ('a -> 'b) 'c -> 'b )
(fun (lambda (x) "a") 1) ;; -> "a" , fun is ( (int -> string) int -> string)
(fun (lambda (x) x) 1) ;; -> 1 , fun is ( (int -> int ) int -> int)
Irken implements algebraic data types, also known as discriminated unions, or variants, using the datatype
declaration. The following declaration defines Irken's list
type. list
is not built into the compiler, but rather declared as part of the prologue in the file "lib/pair.scm".
(datatype list
(:nil)
(:cons 'a (list 'a))
)
In English, this means that the datatype list
consists of two named variants. The first, :nil
, contains no additional data, and corresponds to an empty list. The second, :cons
, contains two objects. The first element is of type 'a
, and the second is of type (list 'a)
, making this a recursively defined datatype. In other words, a cons object has a CAR
of any type, and a CDR
of a list of that same type. The type 'a
is a type variable, which means it can take on any type.
When you match
against a datatype, the type checker will force you to deal with each variant. This is the source of Irken's type safety: it's impossible to accidentally not handle a nil
case.
Another consequence of static type checking is that in Irken, all lists are monomorphic, i.e. contain elements of all the same type. Lists such as '(1 2 3)
and '("a" "b" "c")
are valid, while the expression '(1 "two" three)
will not pass the type checker. While this may sound restrictive, variants allow you to declare a union datatype
of these three types and make a list of those objects.
A constructor is an automatically generated function that creates an object based on one of the branches/variants of a datatype declaration. The name of a constructor function is formed from the names of the datatype and variant, delimited with a colon: :.
For example, here we create some lists using variant constructors, and a construction macro
(list:nil) ;; -> () empty list
(list:cons 1 (list:nil)) ;; -> (1) list of one element
(LIST 1 2 3 4) ;; -> (1 2 3 4) list constructed with a macro
In the ML family of languages, variant constructors are usually globally unique, so the datatype prefix is unnecessary. A single global namespace for constructors means you can't use the same name more than once. This leads to the use of Hungarian notation (e.g, TXNil
); or constructors that contain their typenames (e.g, Color_Blue
). Irken instead requires the datatype name to disambiguate static variants, Color:Blue
.
In Irken, you can also use variants ad-hoc, without making any type declarations. These are called polymorphic variants, and help make irken feel more like a dynamic language. Irken performs static type-checking to assure that you pack and unpack them properly, and that all occurrences of a specific variant along a code path have the same structure and type. To construct a polymorphic variant, we simply leave off the datatype name:
(define v0 (:thingy 12))
(define v1 (:pair 3 #t))
Next let's define a function that operates over dynamic variants, using pattern matching:
(define fun
(:thingy n) -> n
(:pair n _) -> n
)
(printn v0) ;; -> (:pair 3 #t)
(printn (fun v0)) ;; -> 12
(printn (fun v1)) ;; -> 3
(printn (fun (:thingy "hello"))) ;; -> "hello"
(map fun (LIST (:thingy "a") (:pair "b" "c"))) ;; -> ("a" "b")
The function fun
is parametric, and it's type is written (|thingy='a pair=('a,'b)| -> 'a)
This means fun
will:
- accept a variant called
thingy
containing one element (of type 'a); or - accept a variant named
pair
containing two elements (where the first one is the same 'a type held by thingy) - return a value of type
'a
, the same type as the first element of the variants
Polymorphic variants are very handy; their ease of use, without declaration, anywhere, feels like a dynamically typed language like Python, yet they come with strong compile-time type-safety.
OCaml also supports polymorphic variants, they are introduced with a backquote in OCaml's syntax.
Irken supports records based on Row Polymorphism, a form of compile-time Structural Typing which solves the loss of information problem.
Before we explain what that means, let's cover the basics of Structural Typing. This means that a record is type-compatible with a record type if it contains a matching set of fields, no subtyping relationship is required. These fields can by any datatype, including functions and other records.
Let's start by creating a record. While you can declare a type-signature for record types, there is no need to. You can simply create and use them ad-hoc, and they are still statically type-checked.
(define my-record {name="Fred" age=28 alive=#t})
The brackets {
and }
surround the record literal. Each slot in the record is named with a label. When you run this file through the compiler, you'll see the type assigned to my-record:
my-record: {name=string age=int alive=bool}
As far as the compiler is concerned, the ordering of these fields is arbitrary. Here's a function that will do something with that record:
(define (print-name rec)
(printf rec.name))
Below we show the type of print-name
as discovered by the compiler. This function is polymorphic on the field name
, meaning it accepts any record with the field name. The ellipsis (...
) means it is an open-record-type, meaning it will match records which also contain additional fields.
print-name: {name=string, ...}
Rows are a very powerful feature, and one that you won't find in Standard ML or Haskell. Of the mainstream ML's, only OCaml supports them, and uses them as the base for their OO/class features.
To read more about records, see Row Polymorphism.
Next: Pattern Matching