-
Notifications
You must be signed in to change notification settings - Fork 27
/
Tutorial_04_Polymorphism.lhs
488 lines (384 loc) · 14.5 KB
/
Tutorial_04_Polymorphism.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
Polymorphism {#polymorphism}
============
\begin{comment}
\begin{code}
{-@ LIQUID "--short-names" @-}
{-@ LIQUID "--no-termination" @-}
{-@ LIQUID "--scrape-used-imports" @-}
module Tutorial_04_Polymorphism
( safeLookup
, unsafeLookup
, vectorSum, vectorSum'
, absoluteSum, absoluteSum'
, dotProduct
, sparseProduct, sparseProduct'
, eeks
, head, head', head''
) where
import Prelude hiding (head, abs, length)
import Data.List (foldl')
import Data.Vector hiding (head, foldl')
absoluteSum' :: Vector Int -> Int
dotProduct :: Vector Int -> Vector Int -> Int
absoluteSum :: Vector Int -> Int
sparseProduct, sparseProduct' :: Vector Int -> [(Int, Int)] -> Int
-- {-@ fail eeks @-}
-- {-@ fail head @-}
-- {-@ fail unsafeLookup @-}
-- {-@ fail dotProduct @-}
\end{code}
\end{comment}
Refinement types shine when we want to establish
properties of *polymorphic* datatypes and higher-order
functions. Rather than be abstract, let's illustrate this
with a [classic][dmlarray] use-case.
\newthought{Array Bounds Verification} aims to ensure
that the indices used to retrieve values from an array are indeed
*valid* for the array, i.e. are between `0` and the
*size* of the array. For example, suppose we create
an `array` with two elements:
~~~~~{.spec}
twoLangs = fromList ["haskell", "javascript"]
~~~~~
Lets attempt to look it up at various indices:
\begin{code}
eeks = [ok, yup, nono]
where
ok = twoLangs ! 0
yup = twoLangs ! 1
nono = twoLangs ! 3
\end{code}
If we try to *run* the above, we get a nasty shock: an
exception that says we're trying to look up `twoLangs`
at index `3` whereas the size of `twoLangs` is just `2`.
~~~~~{.sh}
Prelude> :l 03-poly.lhs
[1 of 1] Compiling VectorBounds ( 03-poly.lhs, interpreted )
Ok, modules loaded: VectorBounds.
*VectorBounds> eeks
Loading package ... done.
"*** Exception: ./Data/Vector/Generic.hs:249 ((!)): index out of bounds (3,2)
~~~~~
\newthought{In a suitable Editor} e.g. Vim or Emacs,
or if you push the "play" button in the online demo,
you will literally see the error *without*
running the code. Lets see how LiquidHaskell
checks `ok` and `yup` but flags `nono`, and along
the way, learn how it reasons about *recursion*,
*higher-order functions*, *data types* and *polymorphism*.
Specification: Vector Bounds {#vectorbounds}
--------------------------------------------
First, let's see how to *specify* array bounds safety by *refining*
the types for the [key functions][vecspec] exported by `Data.Vector`,
i.e. how to
1. *define* the size of a `Vector`
2. *compute* the size of a `Vector`
3. *restrict* the indices to those that are valid for a given size.
<div class="toolinfo">
\newthought{Imports}
We can write specifications for imported modules -- for which we
*lack* the code -- either directly in the client's source file or
better, in `.spec` files which can be reused across multiple client
modules.
\newthought{Include} directories can be specified when checking a file.
Suppose we want to check some file `target.hs` that imports an external
dependency `Data.Vector`. We can write specifications for `Data.Vector`
inside `include/Data/Vector.spec` which contains:
~~~~~{.spec}
-- | Define the size
measure vlen :: Vector a -> Int
-- | Compute the size
assume length :: x:Vector a -> {v:Int | v = vlen x}
-- | Lookup at an index
assume (!) :: x:Vector a -> {v:Nat | v < vlen x} -> a
~~~~~
</div>
Using this new specification is now a simple matter of telling LiquidHaskell
to include this file:
~~~~~{.sh}
$ liquid -i include/ target.hs
~~~~~
LiquidHaskell ships with specifications for `Prelude`, `Data.List`,
and `Data.Vector` which it includes by default.
\newthought{Measures} are used to define *properties* of
Haskell data values that are useful for specification and
verification. Think of `vlen` as the *actual*
size of a `Vector` regardless of how the size was computed.
\newthought{Assumes} are used to *specify* types describing the semantics of
functions that we cannot verify e.g. because we don't have the code
for them. Here, we are assuming that the library function `Data.Vector.length`
indeed computes the size of the input vector. Furthermore, we are stipulating
that the lookup function `(!)` requires an index that is between `0` and the real
size of the input vector `x`.
\newthought{Dependent Refinements} are used to describe relationships
*between* the elements of a specification. For example, notice how the
signature for `length` names the input with the binder `x` that then
appears in the output type to constrain the output `Int`. Similarly,
the signature for `(!)` names the input vector `x` so that the index
can be constrained to be valid for `x`. Thus, dependency lets us
write properties that connect *multiple* program values.
\newthought{Aliases} are extremely useful for defining
*abbreviations* for commonly occurring types. Just as we
enjoy abstractions when programming, we will find it
handy to have abstractions in the specification mechanism.
To this end, LiquidHaskell supports *type aliases*.
For example, we can define `Vector`s of a given size `N` as:
\begin{code}
{-@ type VectorN a N = {v:Vector a | vlen v == N} @-}
\end{code}
\noindent and now use this to type `twoLangs` above as:
\begin{code}
{-@ twoLangs :: VectorN String 2 @-}
twoLangs = fromList ["haskell", "javascript"]
\end{code}
Similarly, we can define an alias for `Int` values
between `Lo` and `Hi`:
\begin{code}
{-@ type Btwn Lo Hi = {v:Int | Lo <= v && v < Hi} @-}
\end{code}
\noindent after which we can specify `(!)` as:
~~~~~{.spec}
(!) :: x:Vector a -> Btwn 0 (vlen x) -> a
~~~~~
Verification: Vector Lookup
---------------------------
Let's try to write some functions to sanity check the specifications.
First, find the starting element -- or `head` of a `Vector`
\begin{code}
head :: Vector a -> a
head vec = vec ! 0
\end{code}
When we check the above, we get an error:
~~~~~{.liquiderror}
src/03-poly.lhs:127:23: Error: Liquid Type Mismatch
Inferred type
VV : Int | VV == ?a && VV == 0
not a subtype of Required type
VV : Int | VV >= 0 && VV < vlen vec
In Context
VV : Int | VV == ?a && VV == 0
vec : Vector a | 0 <= vlen vec
?a : Int | ?a == (0 : int)
~~~~~
\noindent LiquidHaskell is saying that `0` is *not* a valid index
as it is not between `0` and `vlen vec`. Say what? Well, what if
`vec` had *no* elements! A formal verifier doesn't
make *off by one* errors.
\newthought{To Fix} the problem we can do one of two things.
1. *Require* that the input `vec` be non-empty, or
2. *Return* an output if `vec` is non-empty, or
Here's an implementation of the first approach, where we define
and use an alias `NEVector` for non-empty `Vector`s
\begin{code}
{-@ type NEVector a = {v:Vector a | 0 < vlen v} @-}
{-@ head' :: NEVector a -> a @-}
head' vec = vec ! 0
\end{code}
<div class="hwex" id="Vector Head">
Replace the `undefined` with an *implementation* of `head''`
which accepts *all* `Vector`s but returns a value only when
the input `vec` is not empty.
</div>
\begin{code}
head'' :: Vector a -> Maybe a
head'' vec = undefined
\end{code}
<div class="hwex" id="Unsafe Lookup"> The function `unsafeLookup` is
a wrapper around the `(!)` with the arguments flipped. Modify the
specification for `unsafeLookup` so that the *implementation* is
accepted by LiquidHaskell.
</div>
\begin{code}
{-@ unsafeLookup :: Int -> Vector a -> a @-}
unsafeLookup index vec = vec ! index
\end{code}
<div class="hwex" id="Safe Lookup">
Complete the implementation of `safeLookup` by filling
in the implementation of `ok` so that it performs a bounds
check before the access.
</div>
\begin{code}
{-@ safeLookup :: Vector a -> Int -> Maybe a @-}
safeLookup x i
| ok = Just (x ! i)
| otherwise = Nothing
where
ok = undefined
\end{code}
Inference: Our First Recursive Function
---------------------------------------
Ok, let's write some code! Let's start with a recursive
function that adds up the values of the elements of an
`Int` vector.
\begin{code}
-- >>> vectorSum (fromList [1, -2, 3])
-- 2
vectorSum :: Vector Int -> Int
vectorSum vec = go 0 0
where
go acc i
| i < sz = go (acc + (vec ! i)) (i + 1)
| otherwise = acc
sz = length vec
\end{code}
<div class="hwex" id="Guards">
What happens if you *replace* the guard with `i <= sz`?
</div>
<div class="hwex" id="Absolute Sum">
Write a variant of the above function that computes the
`absoluteSum` of the elements of the vector.
</div>
\begin{code}
-- >>> absoluteSum (fromList [1, -2, 3])
-- 6
{-@ absoluteSum :: Vector Int -> Nat @-}
absoluteSum = undefined
\end{code}
\newthought{Inference}
LiquidHaskell verifies `vectorSum` -- or, to be precise,
the safety of the vector accesses `vec ! i`.
The verification works out because LiquidHaskell is able to
*automatically infer* ^[In your editor, click on `go` to see the inferred type.]
~~~~~{.spec}
go :: Int -> {v:Int | 0 <= v && v <= sz} -> Int
~~~~~
\noindent which states that the second parameter `i` is
between `0` and the length of `vec` (inclusive). LiquidHaskell
uses this and the test that `i < sz` to establish that `i` is
between `0` and `(vlen vec)` to prove safety.
**Note** you need to run `liquid` with the option `--no-termination`
or make sure your source file has `{-@ LIQUID "--no-termination" @-},
otherwise the code for `go` fails the now default termination check.
We will come back to this example later to see how to verify termination
using metrics.
<div class="hwex" id="Off by one?">
Why does the type of `go` have `v <= sz` and not `v < sz` ?
</div>
Higher-Order Functions: Bottling Recursion in a `loop`
------------------------------------------------------
Let's refactor the above low-level recursive function
into a generic higher-order `loop`.
\begin{code}
loop :: Int -> Int -> a -> (Int -> a -> a) -> a
loop lo hi base f = go base lo
where
go acc i
| i < hi = go (f i acc) (i + 1)
| otherwise = acc
\end{code}
We can now use `loop` to implement `vectorSum`:
\begin{code}
vectorSum' :: Vector Int -> Int
vectorSum' vec = loop 0 n 0 body
where
body i acc = acc + (vec ! i)
n = length vec
\end{code}
\newthought{Inference} is a convenient option. LiquidHaskell finds:
~~~~~{.spec}
loop :: lo:Nat -> hi:{Nat|lo <= hi} -> a -> (Btwn lo hi -> a -> a) -> a
~~~~~
\noindent In English, the above type states that
- `lo` the loop *lower* bound is a non-negative integer
- `hi` the loop *upper* bound is a greater then or equal to `lo`,
- `f` the loop *body* is only called with integers between `lo` and `hi`.
\noindent
It can be tedious to have to keep typing things like the above.
If we wanted to make `loop` a public or exported function, we
could use the inferred type to generate an explicit signature.
At the call `loop 0 n 0 body` the parameters `lo` and `hi` are
instantiated with `0` and `n` respectively, which, by the way
is where the inference engine deduces non-negativity.
Thus LiquidHaskell concludes that `body` is only called with
values of `i` that are *between* `0` and `(vlen vec)`, which
verifies the safety of the call `vec ! i`.
<div class="hwex" id="Using Higher-Order Loops">
Complete the implementation of `absoluteSum'` below.
When you are done, what is the type that is inferred for `body`?
</div>
\begin{code}
-- >>> absoluteSum' (fromList [1, -2, 3])
-- 6
{-@ absoluteSum' :: Vector Int -> Nat @-}
absoluteSum' vec = loop 0 n 0 body
where
body i acc = undefined
n = length vec
\end{code}
<div class="hwex" id="Dot Product">
The following uses `loop` to compute
`dotProduct`s. Why does LiquidHaskell flag an error?
Fix the code or specification so that LiquidHaskell
accepts it. </div>
\begin{code}
-- >>> dotProduct (fromList [1,2,3]) (fromList [4,5,6])
-- 32
{-@ dotProduct :: x:Vector Int -> y:Vector Int -> Int @-}
dotProduct x y = loop 0 sz 0 body
where
body i acc = acc + (x ! i) * (y ! i)
sz = length x
\end{code}
Refinements and Polymorphism {#sparsetype}
----------------------------------------
While the standard `Vector` is great for *dense* arrays,
often we have to manipulate *sparse* vectors where most
elements are just `0`. We might represent such vectors
as a list of index-value tuples:
\begin{code}
{-@ type SparseN a N = [(Btwn 0 N, a)] @-}
\end{code}
\noindent Implicitly, all indices *other* than those in the list
have the value `0` (or the equivalent value for the type `a`).
\newthought{The Alias} `SparseN` is just a
shorthand for the (longer) type on the right, it does not
*define* a new type. If you are familiar with the *index-style*
length encoding e.g. as found in [DML][dml] or [Agda][agdavec],
then note that despite appearances, our `Sparse` definition
is *not* indexed.
\newthought{Sparse Products}
Let's write a function to compute a sparse product
\begin{code}
{-@ sparseProduct :: x:Vector _ -> SparseN _ (vlen x) -> _ @-}
sparseProduct x y = go 0 y
where
go n [] = n
go n ((i,v):y') = go (n + (x!i) * v) y'
\end{code}
LiquidHaskell verifies the above by using the specification
to conclude that for each tuple `(i, v)` in the list `y`, the
value of `i` is within the bounds of the vector `x`, thereby
proving `x ! i` safe.
\newthought{Folds}
The sharp reader will have undoubtedly noticed that the sparse product
can be more cleanly expressed as a [fold][foldl]:
~~~~~{.spec}
foldl' :: (a -> b -> a) -> a -> [b] -> a
~~~~~
\noindent We can simply fold over the sparse vector, accumulating the `sum`
as we go along
\begin{code}
{-@ sparseProduct' :: x:Vector _ -> SparseN _ (vlen x) -> _ @-}
sparseProduct' x y = foldl' body 0 y
where
body sum (i, v) = sum + (x ! i) * v
\end{code}
\noindent
LiquidHaskell digests this without difficulty.
The main trick is in how the polymorphism of
`foldl'` is instantiated.
1. GHC infers that at this site, the type variable `b` from the
signature of `foldl'` is instantiated to the Haskell type `(Int, a)`.
2. Correspondingly, LiquidHaskell infers that in fact `b`
can be instantiated to the *refined* `(Btwn 0 (vlen x), a)`.
Thus, the inference mechanism saves us a fair bit of typing and
allows us to reuse existing polymorphic functions over containers
and such without ceremony.
Recap
-----
This chapter gave you an idea of how one can use refinements
to verify size related properties, and more generally, to
specify and verify properties of recursive and polymorphic
functions. Next, let's see how we can use LiquidHaskell to
prevent the creation of illegal values by refining data
type definitions.