data Ticker = GOOG | MSFT | APPL | CANO | NOOB
derive Eq Ticker
List and Path
Lists are the predominant data structure in Frege and it will come at no surprise that there are lots of clever ways of dealing with them.
We will start with a data model that is a dramatically simplified version of what you may find in real life. From there we will explore various ways of querying the data.
The domain model
Let’s assume that a bank has many clients that hold many investment portfolios that in turn are made of positions where each position tells how many items of a financial instrument (let’s assume stock shares for a given ticker) are in that portfolio.
The tickers are easily modeled with an enumeration
The rest of the domain is just normal data structures in record syntax.
data Bank = Bank { clients :: [Client] }
data Client = Client { portfolios :: [Portfolio] }
data Portfolio = Portfolio { positions :: [Position] }
data Position = Position { soMany :: Int, ticker :: Ticker }
Now let’s create an example bank that has 1000 clients, each having 3 portfolios with various positions.
bank = Bank { clients = replicate 1000
Client { portfolios = replicate 3
Portfolio { positions = [
Position { soMany = 1, ticker = APPL },
Position { soMany = 2, ticker = MSFT },
Position { soMany = 8, ticker = CANO }
]}
}
}
If you haven’t seen replicate
before: it takes a size and an element and produces a list of
elements of that size. The type is
replicate :: Int → a → [a]
Note
|
Short and friendly to incremental development
Defining the domain has been very short. Using positional parameters instead of record syntax would
have been even shorter. But so it is easier to extend the data structures later.Even the creation of many sample values was quick and it is easy to play around with various sizes. |
Finally, we need some pricing information for our stock tickers for later valuation. We use a simple association list.
prices = [ (GOOG, 100) , (MSFT, 200) , (APPL, 300) , (CANO, 400) ]
Note that there is no NOOB
in the list of prices. We can use that later when testing the
lookup of prices for tickers that are not in the list.
Let’s do that right away with invariants that tell how to value a position: by multiplying the price of the ticker by how many we have of those.
import Test.QuickCheck
zeroValue = once $ 0 == value (Position 5 NOOB)
multipleValue = property $ \n -> 100 * n == value (Position n GOOG)
Note
|
Writing the test made us think about the NOOB case: what should the value of a position be
if no price for the ticker can be found? We decided for 0 , which does the trick for our
application-level task at hand. A more widely used library function should rather
signal the error, maybe by returning a Maybe .
|
When we go in this test-first style, the compiler will complain that there is no
value
function. Let’s give it one, using the lookup
function from Data.List
to find the price for the ticker.
import Data.List (lookup)
value :: Position -> Int
value position = calculate $ lookup position.ticker prices where
calculate Nothing = 0
calculate (Just price) = position.soMany * price
The lookup
function returns a Maybe
since the lookup may fail and we draw
some attention to this possibility by defining the local calculate
function with case discrimination on whether the Maybe is Nothing
or
Just
the price. (The less dramatic alternative would have been to
to use the maybe
function like maybe 0 …
)
How big is the bank?
Banks are often compared by their assets under management, i.e. by the value of all positions in all portfolios of all their clients.
Hey, we have the data to calculate that! And here are the functions that we need to get to that data along with their types.
Number | Function | Type |
---|---|---|
<1> |
|
|
<2> |
|
|
<3> |
|
|
<4> |
|
|
Hm, something interesting becomes apparent here: each function returns a list of elements that the next function needs as single elements. So maybe we can generalize over this pattern and bind the functions like we did in "Easy IO".
So binding <1> and <2> would be
<1> <2> return type [Client] -> (Client -> [Portfolio]) -> [Portfolio]
Binding <2> and <3> would be
<2> <3> return type [Portfolio] -> (Portfolio -> [Position]) -> [Position]
As you see, there is a general pattern behind it such that bind has the type:
[a] → (a → [b]) → [b]
You will be glad to hear that this bind function is already available and just like in the
case of "Easy IO", it is denoted with the >>=
operator.
So combining <1> and <2> becomes
bank.clients >>= Client.portfolios
Combining <2> and <3> becomes
Client.portfolios >>= Portfolio.positions
Combining (<1> and <2>) and <3> becomes
bank.clients >>= Client.portfolios >>= Portfolio.positions
Important
|
Tadaaaa!
We have arrived at a simple "path" expression for all positions of all portfolios for all the bank’s clients!
|
To finally drive the point home, here is the first version of calculating the assets under management by using the bind, mapping positions to their values, and summing those up.
assetsUnderManagement1 = sum $
map value $
bank.clients >>= Client.portfolios >>= Portfolio.positions
The "do" notation and comprehension
Another lesson from "Easy IO" was that bind allows us to use the "do" notation, which leads to the following code.
assetsUnderManagement2 = sum $
map value do
client <- bank.clients
portfolio <- client.portfolios
portfolio.positions
The single intermediate values must now be drawn from the list by means of the ←
arrow.
But wait! This looks and sounds utterly familiar and even has the same meaning as in list comprehensions!
assetsUnderManagement3 = sum
[value position |
client <- bank.clients,
portfolio <- client.portfolios,
position <- portfolio.positions
]
And in fact, both notations are equivalent and differ only in style.
Path queries - almost SQL
Suppose we are not interested in all assets but only in the total value of all Canoo shares in our bank. With a list comprehension, this is simple to do and yields another interesting analogy to SQL queries.
allCanoo3 = sum
[value position | -- SELECT
client <- bank.clients, -- FROM
portfolio <- client.portfolios,
position <- portfolio.positions,
position.ticker == CANO -- WHERE
]
The value
function is like a SQL projection, position
is
a selection, the lists give the data source, and the guards
make the where-clauses.
We said that "do" notation is equivalent. Here is how it looks with filters as where-clauses:
allCanoo2 = sum $
map value do
client <- bank.clients
portfolio <- client.portfolios
filter canoo portfolio.positions
where
canoo position = position.ticker == CANO
One can see the subtle differences in style.
Finally, the path version with filtering.
allCanoo1 = sum $
map value $
bank.clients >>= Client.portfolios >>= filter canoo . Portfolio.positions where
canoo position = position.ticker == CANO
Such a filter can be placed at any step in the path and besides filtering, one can just as well apply mapping inside the path evaluation.
It all falls in place
We started with an everyday business scenario and discovered some profound properties of lists
-
they make nice path expressions
-
they can be used with the "do" notation
-
comprehensions are not so special
-
we can query a graph of references analogous to SQL
Overall, comprehensions seem to be the most versatile notation, especially when filtering and projection is needed anyway. For mere aggregation, path notation is just fine.
Path expressions in other languages can also be rather succinct. Our running example
would for example be the Groovy GPath
bank.clients*.portfolios*.positions.findAll{it.ticker == CANO}*.value().sum()
However, one cannot compare the visual appearance of the code only.