Written by Felipe Tome
This guide covers the core Dagger.jl pieces you will need for basic usage:
Dagger.@spawn / Dagger.spawn
task options (Options, inline options, with_options)
visualization basics
DArrays (distributed arrays)
datadeps (spawn_datadeps, In, Out, InOut, Deps)
GPU usage
multi-GPU usage
distributed execution
The snippets below were checked against the environment: Julia 1.12.4, Dagger 0.19.3.
Use Dagger.@spawn to create a task graph; passing one DTask into another creates dependencies.
Use options either inline (Dagger.@spawn scope=... name=... f(x)) or with Dagger.Options.
Use Dagger.with_options for block-scoped defaults.
For basic graph visualization, enable logging, run tasks, then call Dagger.show_logs(..., :graphviz).
Use DArrays when you want array-style operations over partitioned data.
For mutable shared data, use Dagger.spawn_datadeps with In/Out/InOut.
For GPU execution, load a backend (CUDA/AMDGPU/oneAPI/Metal) and either run unpinned (automatic placement) or pin with scope(...).
For distributed runs, add workers and scope tasks to specific workers when needed.
Minimal setup
julia
Then in Julia:
using Dagger
If you want more CPU parallelism, launch Julia with threads:
julia -t8
@spawn essentials
Think of each Dagger.@spawn as one node in a DAG (Directed Acyclic Graph): a graph where edges are dependencies and there are no cycles.
If a task argument is another DTask, Dagger adds an edge automatically.
fetch(task) waits and returns the final value.
wait(task) waits but does not return the value.
using Dagger
square(x) = x * x
inc(x) = x + 1
a = Dagger.@spawn square(4) # returns DTask
b = Dagger.@spawn inc(a) # depends on a
@show fetch(b) # 17
You can also use the function form:
t = Dagger.spawn(+, 40, 2)
@show fetch(t) # 42
How to use options
Dagger options are per-task execution hints/configuration. They let you control how a task is scheduled and labeled, for example:
where it can run (scope)
how it appears in logs/visualizations (name)
other runtime behavior through Dagger.Options fields
You can think of options as metadata attached to a task definition, not the task's actual input data.
In practice, options let you:
place tasks on specific resources (CPU worker/thread/GPU) with scope
make task graphs easier to read with name
add explicit extra task dependencies with syncdeps
control advanced behavior like raw-chunk execution (meta) and scheduling/resource hints
You have three common ways to set options.
@spawnusing Dagger
t = Dagger.@spawn name="task-add" +(40, 2)
@show fetch(t) # 42
Two useful options to start with:
scope: where a task can run (worker/thread/device constraints)
name: friendly name for logging and visualization
Example:
using Dagger
t = Dagger.@spawn scope=Dagger.scope(worker=1) name="local-sqrt" sqrt(81)
@show fetch(t) # 9.0
Dagger.Optionsusing Dagger
opts = Dagger.Options(; name="sum-task", scope=Dagger.scope(worker=1))
t = Dagger.spawn(+, opts, 10, 32)
@show fetch(t) # 42
Dagger.with_optionsUse this when you want many tasks in one block to inherit the same option values.
using Dagger
t = Dagger.with_options(; scope=Dagger.scope(worker=1), name="scoped-task") do
Dagger.@spawn sqrt(81)
end
@show fetch(t) # 9.0
Visualization basics
Basic flow:
enable logging
run/fetch your tasks
collect logs
render and save (inside Julia)
DAG graph (:graphviz): a dependency graph (Directed Acyclic Graph) of tasks and edges. This is best for understanding task ordering, missing/extra dependencies, and overall pipeline structure.
Gantt chart (:plots_gantt): a timeline view where bars show when tasks ran and on which processor. A Gantt chart is best for performance debugging: spotting idle gaps, load imbalance, serialization, or bottlenecks.
For Gantt charts, Dagger can render different targets (:execution, :processor, :scheduler) depending on whether you want task-level timing, processor activity, or scheduler-event timing.
using Dagger
using GraphViz # install once: import Pkg; Pkg.add("GraphViz")
Dagger.enable_logging!(all_task_deps=true)
x = Dagger.@spawn name="base" sum(1:10)
y = Dagger.@spawn name="double" x * 2
fetch(y)
logs = Dagger.fetch_logs!()
gv = Dagger.render_logs(logs, :graphviz) # GraphViz.Graph
# Save SVG directly from Julia (no external CLI pipeline)
open("dagger_graph.svg", "w") do io
show(io, MIME"image/svg+xml"(), gv)
end
Dagger.disable_logging!()
If GraphViz.jl is installed: Dagger.render_logs(logs, :graphviz)
If Plots.jl + DataFrames.jl are installed: Dagger.render_logs(logs, :plots_gantt)
Save graph output as SVG inside Julia:
using Dagger
using GraphViz
Dagger.enable_logging!(all_task_deps=true)
t = Dagger.@spawn sum(1:10)
fetch(t)
logs = Dagger.fetch_logs!()
gv = Dagger.render_logs(logs, :graphviz)
open("dagger_graph.svg", "w") do io
show(io, MIME"image/svg+xml"(), gv)
end
Dagger.disable_logging!()
Save Gantt plots rendered through Plots:
using Dagger
using Plots, DataFrames
Dagger.enable_logging!(all_task_deps=true)
t = Dagger.@spawn sum(1:10)
fetch(t)
logs = Dagger.fetch_logs!()
p = Dagger.render_logs(logs, :plots_gantt)
savefig(p, "dagger_gantt.png")
savefig(p, "dagger_gantt.pdf")
Dagger.disable_logging!()
DArrays basics
DArray is Dagger's distributed array type: data is partitioned into chunks, and operations can run chunkwise in parallel.
DArrayusing Dagger
A = Dagger.distribute(rand(Float32, 64, 64), Dagger.Blocks(32, 32))
B = map(x -> 2f0 * x, A)
s = sum(B)
M = collect(B)
@show typeof(A) size(A)
@show s
@show size(M) eltype(M)
using Dagger
A = rand(Dagger.Blocks(64, 64), Float32, 256, 256)
Z = zeros(Dagger.Blocks(64, 64), Float32, 256, 256)
@show typeof(A) typeof(Z)
@show size(A) size(Z)
Datadeps basics
Datadeps ("data dependencies") is Dagger's way to safely schedule tasks that read/write shared mutable data by declaring how each task accesses that data.
Use datadeps when tasks mutate shared data.
The rule of thumb:
In(x): read x
Out(x): write x
InOut(x): read and write x
unspecified dependency defaults to read (In)
Run mutable task groups inside Dagger.spawn_datadeps() do ... end.
using Dagger
fill1!(x) = (fill!(x, 1); nothing)
add2!(x) = (x .+= 2; nothing)
sumv(x) = sum(x)
A = zeros(Int, 6)
t_before = Ref{Any}()
t_after = Ref{Any}()
Dagger.spawn_datadeps() do
Dagger.@spawn fill1!(Out(A)) # writes A
t_before[] = Dagger.@spawn sumv(In(A))
Dagger.@spawn add2!(InOut(A)) # mutates A
t_after[] = Dagger.@spawn sumv(In(A))
end
@show fetch(t_before[]) # 6
@show fetch(t_after[]) # 18
@show A # [3, 3, 3, 3, 3, 3]
Important behavior in datadeps regions:
Dagger determines ordering from dependency annotations.
spawn_datadeps waits for all submitted tasks before returning.
Avoid calling fetch on tasks from inside the same datadeps block.
Deps(...)Deps(x, ...) is an advanced wrapper used to attach one or more dependency modifiers to the same value (for finer-grained partial-region dependency tracking). Most users can start with In/Out/InOut and only move to Deps when they need custom aliasing behavior.
DArray chunksusing Dagger
add2_chunk!(x) = (x .+= 2f0; nothing)
A = zeros(Dagger.Blocks(32, 32), Float32, 64, 64)
Dagger.spawn_datadeps() do
for Ac in Dagger.chunks(A)
Dagger.@spawn fill!(Out(Ac), 1f0)
Dagger.@spawn add2_chunk!(InOut(Ac))
end
end
M = collect(A)
@show sum(M) # 12288.0
@show unique(M) # Float32[3.0]
GPU usage (single GPU)
GPU execution is opt-in through backend packages. Load one of:
using CUDA (NVIDIA)
using AMDGPU (AMD)
using oneAPI (Intel)
using Metal (Apple)
You can use GPUs in two modes:
Unpinned: let Dagger place tasks automatically (usually based on GPU-array compatibility).
Pinned: force task placement to specific device(s) using scope(...).
Unpinned CUDA example:
using Dagger, CUDA
CUDA.functional() || error("CUDA is not functional")
A = CUDA.rand(Float32, 2048, 2048)
t = Dagger.@spawn sum(abs2, A) # no explicit scope pinning
@show fetch(t)
Pinned mode uses backend-specific scope keys:
CUDA: cuda_gpu / cuda_gpus
AMDGPU: rocm_gpu / rocm_gpus
oneAPI: intel_gpu / intel_gpus
Metal: metal_gpu / metal_gpus
If the backend package is not loaded, those scope keys are not available.
Pinned CUDA example:
using Dagger, CUDA
CUDA.functional() || error("CUDA is not functional")
A = CUDA.rand(Float32, 2048, 2048)
t = Dagger.@spawn scope=Dagger.scope(cuda_gpu=1, worker=1) sum(abs2, A)
@show fetch(t)
DArray + GPU example (CUDA)using Dagger, CUDA
CUDA.functional() || error("CUDA is not functional")
gpu_chunk_sum(x) = sum(abs2, CUDA.CuArray(x))
A = fetch(rand(Dagger.Blocks(512, 512), Float32, 2048, 2048))
chunk_tasks = [Dagger.@spawn gpu_chunk_sum(c) for c in Dagger.chunks(A)]
total = sum(fetch.(chunk_tasks))
@show total
Multi-GPU usage
For multi-GPU workloads you can also choose unpinned or pinned placement.
Unpinned pattern (allocate inputs on each GPU, then spawn normally):
using Dagger, CUDA
CUDA.functional() || error("CUDA is not functional")
length(CUDA.devices()) > 0 || error("No CUDA devices found")
arrays = [CUDA.device!(dev) do
CUDA.rand(Float32, 2048, 2048)
end for dev in CUDA.devices()]
tasks = [Dagger.@spawn sum(abs2, A) for A in arrays] # no scope pinning
results = fetch.(tasks)
@show length(results)
Pinned pattern (explicit one-task-per-device):
using Dagger, CUDA
gpu_sum(n) = sum(abs2, CUDA.rand(Float32, n, n))
ngpu = length(CUDA.devices())
ngpu > 0 || error("No CUDA devices found")
tasks = [
Dagger.@spawn scope=Dagger.scope(cuda_gpu=g, worker=1) name="gpu-$g" gpu_sum(2048)
for g in 1:ngpu
]
results = fetch.(tasks)
@show ngpu results
DArray chunks (pinned CUDA example)using Dagger, CUDA
CUDA.functional() || error("CUDA is not functional")
ngpu = length(CUDA.devices())
ngpu > 0 || error("No CUDA devices found")
gpu_chunk_sum(x) = sum(abs2, CUDA.CuArray(x))
A = fetch(rand(Dagger.Blocks(1024, 1024), Float32, 4096, 4096))
tasks = [
Dagger.@spawn scope=Dagger.scope(cuda_gpu=mod1(i, ngpu), worker=1) gpu_chunk_sum(c)
for (i, c) in enumerate(Dagger.chunks(A))
]
total = sum(fetch.(tasks))
@show total
Notes:
This pattern scales when each GPU gets enough work (coarse tasks are better than many tiny tasks).
For mixed CPU/GPU pipelines, keep dependent tasks on the same GPU when possible to reduce transfers.
Use pinning when you need deterministic device placement or strict resource partitioning.
Distributed execution (multi-process / multi-node)
Dagger can schedule across Julia workers (Distributed processes).
Basic workflow:
start workers (julia -p N or addprocs)
load packages on all workers
spawn tasks normally (or pin with scope(worker=...))
using Distributed
pids = addprocs(2)
@everywhere using Dagger
using Dagger
t_any = Dagger.@spawn sum(1:1_000_000)
t_w1 = Dagger.@spawn scope=Dagger.scope(worker=pids[1]) myid()
t_w2 = Dagger.@spawn scope=Dagger.scope(worker=pids[2]) myid()
@show fetch(t_any)
@show fetch(t_w1) fetch(t_w2)
rmprocs(pids)
Distributed + GPU pinning example (CUDA):
using Distributed
pids = addprocs(2)
@everywhere using Dagger, CUDA
using Dagger
t = Dagger.@spawn scope=Dagger.scope(worker=pids[1], cuda_gpu=1) CUDA.device()
@show fetch(t)
rmprocs(pids)
Common pitfalls
@spawn expects a single function call expression, not an arbitrary begin ... end block.
If tasks mutate shared data and you do not use datadeps annotations, results may be wrong.
Over-constraining scope can leave no eligible processors and cause scheduling errors.
Logging/visualization has overhead; disable it in normal benchmark runs.
Backend-specific GPU scope keys only exist after loading the backend package.
Unpinned GPU scheduling is data-driven; use pinning when exact GPU assignment matters.
Dagger.chunks(...) is not exported; call it with the Dagger. prefix.
In distributed mode, remember @everywhere using ... for packages/functions needed on workers.
Practical starter template
using Dagger, GraphViz
function run_pipeline(x)
Dagger.enable_logging!(all_task_deps=true)
try
t1 = Dagger.@spawn name="square" x^2
t2 = Dagger.@spawn name="plus1" t1 + 1
result = fetch(t2)
logs = Dagger.fetch_logs!()
gv = Dagger.render_logs(logs, :graphviz)
open("pipeline.svg", "w") do io
show(io, MIME"image/svg+xml"(), gv)
end
return result
finally
Dagger.disable_logging!()
end
end
Dagger docs: https://juliaparallel.org/Dagger.jl/stable/
Dagger.jl repo: https://github.com/JuliaParallel/Dagger.jl