-
Notifications
You must be signed in to change notification settings - Fork 2
/
presentation_notes.txt
53 lines (42 loc) · 5.24 KB
/
presentation_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
FINAL
Slide 1 [title]:
Our goal is to build a fast differentiable rasterizer for quadratic and cubic bezier curves. Before we talk about why we need this and how we did this, I will explain what a bezier curve is and what rasterization is.
Slide 2 (bezier intro):
Bezier curves are vector curves that are defined by a set of ordered points called control points. For example, the curve on the board is a quadratic bezier curve that is defined by three points and the process of converting three points into a curve displayed on a finite resolution is rasterization.
Our project is building a rasterizer for these curves that can carry gradient information. The ability to carry gradient information, or having the differentiable operation is a tool that can be used in any gradient based methods including deep neural networks which have been shown to be powerful.
Going back to how the curve can be drawn from points, straight lines are drawn between consecutive points, and the bezier curve is drawn by taking a linear combination of two lines. (ON THE BOARD DO THIS). In order to do this for monitors that contain discrete pixels, we divide the curve into small points and compute the location of each point. The points are close enough to each other so that when we project them together on the monitor, it looks like a curve.
Here are some more examples of our rasterization. We have a cubic bezier cuve, and we have a letter a from arial font donated from karol's group.
What is a bezier curve? Vector curve defined by control points.
What is rasterization? In order to render the curve onto a display with a finite resolution, pixels that fall on the curve must be determined.
Font glyphs - TrueType fonts are defined as a combination of quadratic bezier curves (with 3 control points each)
Tons of different algorithms are part of every modern graphics pipeline.
Our project - make it differentiable and fast so it can be used as a building block in gradient based methods like optimization / deep learning.
application - reverse rendering, you can use this in a generative model, and find the control points that define a generated glyph.
Slides X [curves]
show curve examples, describe
heres a quadratic bezier - 3 points
heres a cubic bezier - 4 points
describe how to rasterize a bezier curve - either use iterative pixel based methods, or what many modern libraries do (skia etc) simply find points distributed along the curve, and use very hardware optimized straight line drawing algorithms to form an approximate curve. Instead of that, we need gradient information - draw a gaussian at each point and superimpose them to get a smooth, differentiable function with good gradient information.
Benchmark slides
Naive technique as described above is very slow, running on the CPU in vectorized Torch code.
It is also very memory intensive, its doing a lot of unnecessary operations off of the curve.
Solution? only do operations along the curve, on a square around each point. (raster_shrunk), this greatly reduces memory consumption, but is still slow.
Move the tensors onto the GPU, speeds it up a bit.
This is launching so many kernels (proportional to the number of points) and stressing the CPU-GPU connections, that the cost is not really worth it.
GPU is always a tradeoff between embarrasingly parallel operations and the overhead associated with moving data between the CPU and GPU. You want an algorithm that can effectively leverage a few massively parallel GPU operations, and than move it back to the CPU. Also any operations that can be done on the CPU while the GPU is working is essentially free cycles.
test the original naive implementation (one big multiply) in cuda, while finding the points on the CPU
massive speedup, even with the "wasted" operations
profiled, benchmarked - speed further increased by replacing slow operations (certain calls to expand with certain array sizes perform very poorly do to bad cache/memory access patters, replaced these with much faster array broadcasting ops), and doing all the calculations in half precision. (which also results in lower memory usage by a constant factor)
only need to calculate at the point the region the curve covers-draw a bounding box and calculate only with that
for many curves, the cost of indexing and finding these bounds is passed by the savings from reducing multiplies
slide cubic
but this doesn't really help when the curve covers most of the canvas, and the memory usage still scales horribly.
if you need to render a composition of many bezier curves, you might have to break it into smaller parts
tiled
you want to effectively perform your rasterization at points along the curve, but this incurs a not insignificant cost in finding these regions. How can we do this?
break the canvas in to NxN square tiles
find the points (along with the gradient function) the needs to be rastered in that square, which can be done with a simple bounds check
executed the multiply for each of these in parallel in cuda streams on SM processors
- only need to render in the squares that contain points/gradient information (more squares = tighter bounds but more indexing overhead)
done asynchrounously so memory usage is much less.
big performance boost on composite curves / glyphs, still reasonable performance on simple curves