tesser.core

The essential folds: map, mapcat, take, filter, some, any?, into, etc, plus common fold combinators.

“Now we will tesser, we will wrinkle again. Do you understand?” “No,” Meg said flatly. Mrs. Whatsit sighed. “Explanations are not easy when they are about things for which your civilization still has no words. Calvin talked about traveling at the speed of light. You understand that, little Meg?” “Yes,” Meg nodded. “That, of course, is the impractical, long way around. We have learned to take short cuts wherever possible.” “Sort of like in math?” Meg asked. “Like in math.”

– Madeline L’Engle, A Wrinkle In Time.

Tesser structures partly-concurrent folds.

(tesser some-collections a-fold) uses a-fold to combine some information from every element in some-collections: a collection of collections of inputs. Like reducers and transducers, it takes fold as the ultimate collection operation. Unlike transducers, Tesser folds are not sequential. They include an explicitly concurrent reduction over arbitrary partitions of the collection, and a sequential reduction over those reduced values, termed combine. Like transducers, Tesser folds can also map and filter their inputs, and perform post-processing transformation after each reduction and the combine step.

Tesser folds can be composed with the standard collection threading idioms, and evaluated against a collection of collections using (tesser colls fold).

(->> (t/map inc)
     (t/filter even?)
     (t/fold +)
     (t/tesser [[1 2 3] [4 5 6]]))
; => 2 + 4 + 6 = 12

any

(any)(any fold__6851__auto__)

Returns any single input from the collection. O(chunks). For instance:

(t/tesser [[1 2 3] [4 5 6]] (t/any))
; => 4

assert-compiled-fold

(assert-compiled-fold f)

Is this a valid compiled fold? Throws with a helpful message if the fold is invalid.

chunk

compile-fold

(compile-fold fold)

Compiles a fold (a sequence of transforms, each represented as a function taking the next transform) to a single map like

{:reducer-identity  (fn [] ...),
 :reducer           (fn [acc x] ...)
 ...}

count

(count)(count fold__6851__auto__)

How many inputs are there? For instance:

(->> (t/filter even?)
     (t/count)
     (t/tesser [[1 2 3] [4 5 6]]))
; => 3

deftransform

macro

(deftransform & args)

Deftransform, assuming transforms should be appended to the end of the fold; e.g. innermost.

deftransform*

macro

(deftransform* conjoiner name docstring args & body)

We’re trying to build functions that look like…

(defn map
  "Takes a function `f` and an optional fold. Returns a version of the
  fold which finally calls (f element) to transform each element."
  ([f]
   (map f []))
  ([f fold]
   (append fold
           (fn build [{:keys [reducer] :as downstream}]
             (assoc downstream :reducer
                    (fn reducer [acc input] (reducer acc (f input)))))))))

Which involves a fair bit of shared boilerplate: the single-arity variant of the transform, the append/prepend logic, the annealing function and its destructuring bind, etc. We’ll wrap these up in an anaphoric macro called deftransform, which takes a function (e.g. append) to conjoin this transform with the fold. Within the body, reducer-identity-, reducer-, post-reducer-, combiner-identity-, combiner-, post-combiner- are all bound to the downstream transform’s component functions, and downstream is bound to the downstream transform itself.

defwraptransform

macro

(defwraptransform & args)

Like deftransform, but prepends the given transform to the beginning of the fold; e.g. outermost.

empty?

(empty? & [f])

Returns true iff no inputs arrive; false otherwise. For instance:

(t/tesser [[]] (t/empty?))
; => true

every?

(every? pred & [f])

True iff every input satisfies the given predicate, false otherwise. For instance:

(t/tesser [[1 3 5]] (t/every? odd?))
; => true

extremum

(extremum compare)(extremum compare fold__6851__auto__)

Finds the largest element using a comparison function, e.g. compare. For example:

(t/tesser [[5 4] [3 2] [1 0]] (t/extremum compare))
; => 5

facet

(facet)(facet fold__6851__auto__)

Your inputs are maps, and you want to apply a fold to each value independently. Facet generalizes a fold over a single value to operate on maps of keys to those values, returning a map of keys to the results of the fold over all values for that key. Each key gets an independent instance of the fold.

For instance, say you have inputs like

{:x 1, :y 2}
{}
{:y 3, :z 4}

Then the fold

(->> (facet)
     (mean))

returns a map for each key’s mean value:

{:x 1, :y 2, :z 4}

filter

(filter pred)(filter pred fold__6851__auto__)

Takes a predicate function pred and an optional fold. Returns a version of the fold which only passes on inputs to subsequent transforms when (pred input) is truthy.

(->> (t/filter odd?)
     (t/into [])
     (t/tesser [[1 2 3 4 5 6]]))
 ; => [1 3 5]

fold

(fold m)(fold m fold__6851__auto__)

An arbitrary terminal fold. Takes a compiled fold and yields an uncompiled fold that can be composed with the usual Tesser transforms (map, filter, etc), or passed directly to tesser. For instance, a sum of numbers:

(->> (t/fold {:reducer-identity (constantly 0) :reducer + :post-reducer identity :combiner-identity (constantly 0) :combiner + :post-combiner identity}) (t/tesser 5 6 7] [] [8)) ; => 26

Fold has several shortcuts to make defining folds more concise:

  • :reducer: if m is a function, not a map, defaults to m.
  • :combiner: defaults to :reducer
  • :reducer-identity: defaults to :identity, or else :reducer
  • :combiner-identity: defaults to :identity, or else :combiner
  • :post-reducer: defaults to :reducer, or identity if :reducer has no unary arity.
  • :post-combiner defaults to :combiner, or identity if :combiner has no unary arity.

So we can find a sorted set of all inputs using:

(->> (t/fold {:identity sorted-set :reducer conj :combiner into}) (t/tesser 1 2] [2 3)) ; => #{1 2 3}

And if we provide a function instead of a map, Tesser interprets it using the transducer-style arities: (m) for identity, (m acc) for post-reducer/post-combiner, (m acc input) for reducer/combiner, etc.

(t/tesser 1 2 3] [4 5 6 (t/fold +)) ; => 21

frequencies

(frequencies)(frequencies fold__6851__auto__)

Like clojure.core/frequencies, returns a map of inputs to the number of times those inputs appeared in the collection.

(t/tesser [[1 2 3] [1 1 1]] (t/frequencies))
; => {1 4, 2 1, 3 1}

fuse

(fuse fold-map)(fuse fold-map fold__6851__auto__)

You’ve got several folds, and want to execute them in one pass. Fuse is the function for you! It takes a map from keys to folds, like

(->> (map parse-person)
     (fuse {:age-range    (->> (map :age) (range))
            :colors-prefs (->> (map :favorite-color) (frequencies))})
     (tesser people))

And returns a map from those same keys to the results of the corresponding folds:

{:age-range   [0 74],
 :color-prefs {:red        120
               :blue       312
               :watermelon 1953
               :imhotep    1}}

Note that this fold only invokes parse-person once for each record, and completes in a single pass. If we ran the age and color folds independently, it’d take two passes over the dataset–and require parsing every person twice.

Fuse and facet both return maps, but generalize over different axes. Fuse applies a fixed set of independent folds over the same inputs, where facet applies the same fold to a dynamic set of keys taken from the inputs.

Note that fuse compiles the folds you pass to it, so you need to build them completely before fusing. The fold fuse returns can happily be combined with other transformations at its level, but its internal folds are sealed and opaque.

group-by

(group-by category-fn)(group-by category-fn fold__6851__auto__)

Every input belongs to exactly one category, and you’d like to apply a fold to each category separately.

Group-by takes a function that returns a category for every element, and returns a map of those categories to the results of the downstream fold applied to the inputs in that category.

For instance, say we have a collection of particles of various types, and want to find the highest mass of each particle type:

(->> (t/group-by :type)
     (t/map :mass)
     (t/max)
     (t/tesser [[{:name :electron, :type :lepton, :mass 0.51}
                 {:name :muon,     :type :lepton, :mass 105.65}
                 {:name :up,       :type :quark,  :mass 1.5}
                 {:name :down,     :type :quark,  :mass 3.5}]]))
; => {:lepton 105.65, :quark 3.5}

into

(into coll)(into coll fold__6851__auto__)

Adds all inputs to the given collection using conj. Ordering of elements from distinct chunks is undefined.

(t/tesser [[1 2 3] [4 5 6] [7 8 9]] (t/into []))
; => [7 8 9 1 2 3 4 5 6]

keep

(keep f)(keep f fold__6851__auto__)

Takes a function f and an optional fold. Returns a version of the fold which finally calls (f input) to transform each element, and passes it on to subsequent transforms only when the result of (f input) is truthy.

(->> (t/keep {:a 1 :b 2})
     (t/into [])
     (t/tesser [[:a :b] [:c :d]]))
; => [1 2]

map

(map f)(map f fold__6851__auto__)

Takes a function f and an optional fold. Returns a version of the fold which finally calls (f input) to transform each element.

(->> (t/map inc) (t/into []) (t/tesser [[1 2] [3 4]]))
; => [2 3 4 5]

mapcat

(mapcat f)(mapcat f fold__6851__auto__)

Takes a function f and an optional fold. Returns a version of the fold which finally calls (f input) to transform each element. (f input) should return a sequence of inputs which will be fed to the downstream transform independently.

(->> (t/mapcat seq) ; explode strings into seqs of chars
     (t/set)
     (t/tesser [["meow" "mix"]]))
; => #{\e \i \m \o \w \x}

max

(max & [f])

Finds the largest value using compare. For example:

(t/tesser [[:a :b] [:c :d]] (t/max))
; => :d

min

(min & [f])

Finds the smallest value using compare. For example:

(t/tesser [[:a :b] [:c :d]] (t/min))
; => :a

not-every?

(not-every? pred & [f])

True if there exists an input which does not satisfy the given predicate. For instance:

(t/tesser [[1 3 5] [6]] (t/not-every? odd?))
; => true

post-combine

(post-combine f)(post-combine f fold__6851__auto__)

Transforms the output of a fold by applying a function to it.

For instance, to find the square root of the mean of a sequence of numbers, try

(->> (t/mean) (t/post-combine sqrt) (t/tesser nums))

For clarity in ->> composition, post-combine composes in the opposite direction from map, filter, etc. It prepends a transform to the given fold instead of appending one. This means post-combines take effect in the same order you’d expect from ->> with normal function calls:

(->> (t/mean)                 (->> (mean nums)
     (t/post-combine sqrt)         (sqrt)
     (t/post-combine inc))         (inc))

range

(range & [f])

Returns a pair of [smallest largest] inputs, using compare. For example:

(t/tesser [[4 5 6] [1 2 3]] (t/range))
; => [1 6]

reduce

(reduce f init)(reduce f init fold__6851__auto__)

A fold that uses the same function for the reduce and combine phases. Unlike normal Clojure reduce, this reduce doesn’t take a collection: it just returns a fold which can be applied to a collection via tesser. Why? You might want to compose the reduction with something else using fuse, map it with post-combine, etc etc.

Follows the clojure reducers and transducers conventions for arities:

  • (constantly init) is used to generate identity elements.
  • (f acc input) folds elements in the reduce and combine phases.
  • (f acc) post-reduces and post-combines, unless (f acc) throws clojure.lang.ArityException, in which case we return acc directly.

This means you can use probably (reduce f init) as a phase anywhere f is associative and commutative, and where init is immutable.

(->> (t/map inc)
     (t/reduce + 0)
     (t/tesser [[1 2 3] [4 5 6]]))
; => 27

Due to technical limitations Tesser can’t distinguish between

(reduce + upstream-fold)

where we’re transforming an uncompiled fold by adding a reduce phase, and

(reduce + 0)

where we’re defining a new phase out of thin air with 0 as the initial value. Consequently, we always interpret the second argument as an initial value. We don’t provide an equivalent for (reduce +) yet. Someday. Use (fold + +) or (reduce + (+)) instead.

remove

(remove pred)(remove pred fold__6851__auto__)

Takes a predicate function pred and an optional fold. Returns a version of the fold which only passes on inputs to subsequent transforms when (pred input) is nil or false.

(->> (t/remove odd?)
     (t/into [])
     (t/tesser [[1 2 3 4 5 6]]))
 ; => [2 4 6]

replace

(replace m & [f])

Given a map of replacement pairs, maps any inputs which are keys in the map to their corresponding values. Leaves unrecognized inputs alone.

(->> (t/replace {:x false})
     (t/into [])
     (t/tesser [[:x :y]]))
; => [false :y]

set

(set)(set fold__6851__auto__)

A hash-set of distinct inputs. For instance:

(->> (t/map inc)
     (t/set)
     (t/tesser [[1 2 3] [4 5 6]]))
; => #{7 4 6 3 2 5}

some

(some pred)(some pred fold__6851__auto__)

Returns the first logical true value of (pred input). If no such satisfying input exists, returns nil.

This is potentially less efficient than clojure.core/some because each reducer has to find a matching element independently, and they have no way to communicate when one has found an element. In the worst-case scenario, requires N calls to pred. However, unlike clojure.core/some, this version is parallelizable–which can make it more efficient when the element is rare.

(t/tesser [[1 2 3] [4 5 6]] (t/some #{1 2}))
; => 1

take

(take n)(take n fold__6851__auto__)

Like clojure.core/take, limits the number of inputs passed to the downstream transformer to exactly n, or if fewer than n inputs exist in total, all inputs.

(->> (t/map inc)
     (t/take 5)
     (t/into [])
     (t/tesser [[1 2 3] [4 5 6] [7 8 9]]))
; => [6 7 5 3 4]

Space complexity note: take’s reducers produce log2(chunk-size) reduced values per chunk, ranging from 1 to chunk-size/2 inputs, rather than a single reduced value for each chunk. See the source for why this is the case.

tesser

(tesser chunks fold)

Compiles a fold and applies it to a sequence of sequences of inputs. Runs num-procs threads for the parallel (reducer) portion of the fold. Reducers take turns combining their results, which prevents unbounded memory consumption by the reduce phase.

(t/tesser [["hi"] ["there"]] (t/fold str))
; => "therehi"

transform

(transform f)(transform f fold__6851__auto__)

An arbitrary transform. Takes a function that maps one compiled fold map to another, and an optional uncompiled fold, and returns an uncompiled fold with that transformation applied innermost; e.g. it controls inputs last, and post-processes outputs first.

wrap-transform

(wrap-transform f)(wrap-transform f fold__6851__auto__)

An arbitrary transform. Takes a function that maps one compiled fold map to another, and an optional uncompiled fold, and returns an uncompiled fold with that transformation applied outermost; e.g. it controls inputs first, and post-processes outputs last.