tesser.hadoop

Helps you run a fold on Hadoop!

dsink

(dsink work-dir file-name)

Given a work directory and a string name for this file, builds a dsink for a fold to dump its output to, in [NullWritable FressianWritable] format.

view source

error?

(error? x)

Is this an error object?

view source

execute

(execute graph conf)(execute graph conf job-name)

Like parkour.graph/execute, but specialized for folds. Takes a parkour graph, a jobconf, and a job name. Executes the job, then returns a sequence of fold results. Job names will be automatically generated if not provided.

view source

fold

(fold conf input workdir fold-var & args)

A simple, all-in-one fold operation. Takes a jobconf, workdir, input dseq, var which points to a fold function, and arguments for the fold function. Runs the fold against the dseq and returns its results. Names output dsink after metadata key :tesser.hadoop/output-path in fold symbol. If absent, uses the conf key tesser.hadoop.output-path and finally falls back to the fold symbol. On error, throws an ex-info.

view source

fold*

(fold* graph fold-var & args)

Takes a Parkour graph and applies a fold to it. Takes a var for a function, taking args, which constructs a fold. Returns a new (unexecuted) graph. The output of this job will be a single-element Fressian structure containing the results of the fold applied to the job’s inputs.

view source

fold-mapper

(fold-mapper fold-name fold-args input)

A generic, stateful hadoop mapper for applying a fold to a Hadoop dataset. This function returns a mapper for fold defined by make-fold applied to fold-name & additional args.

view source

fold-reduce-twice

(fold-reduce-twice conf input workdir fold-var & args)

Like fold, but in two stages (jobs). First job uses the number of reduce tasks specified in mapred.reduce.tasks. The second job does nothing in the map step, then uses one reduce task.

view source

fold-reducer

(fold-reducer fold-name fold-args input)

This function returns a parkour reducer for fold defined by make-fold applied to fold-name & additional args

view source

fold-reducer-without-post-combiner

(fold-reducer-without-post-combiner fold-name fold-args input)

Like fold-reducer, but omits :post-combiner so it can be used in reduce tasks that are continued in multiple jobs.

view source

gen-job-name!

(gen-job-name!)

Generates a new job name. Job names start at a random small integer and increment sequentially from there. Job names are printed to stderr when generated.

view source

identity-mapper

(identity-mapper records)

Does nothing in the map step.

view source

job-name-counter

view source

partition-randomly

(partition-randomly _ _ nparts)

Partitions map task outputs randomly and uniformly among the reduce tasks.

view source

print-error

(print-error e)

Print an error to err.

view source

rehydrate-fold

(rehydrate-fold fold-name fold-args)

Takes the name of a function that generates a fold (a symbol) and args for that function, and invokes the function with args to build a fold, which is then compiled and returned.

view source

resolve+

(resolve+ sym)

Resolves a symbol to a var, requiring the namespace if necessary. If the namespace doesn’t exist, throws just like clojure.core/require. If the symbol doesn’t exist after requiring, returns nil.

view source

serialize-error

(serialize-error state input e)

Convert an exception to an error.

view source

set-one-reducer!

(set-one-reducer! conf)

Takes a jobconf, returns the jobconf with mapred.reduce.tasks set to 1.

view source

Generated by Codox

Tesser.all 1.0.3

Project

Namespaces

Public Vars

tesser.hadoop

dsink

error?

execute

fold

fold*

fold-mapper

fold-reduce-twice

fold-reducer

fold-reducer-without-post-combiner

gen-job-name!

identity-mapper

job-name-counter

partition-randomly

print-error

rehydrate-fold

resolve+

serialize-error

set-one-reducer!