-
Notifications
You must be signed in to change notification settings - Fork 178
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
d27af30
commit 51c003c
Showing
126 changed files
with
3,438 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,58 @@ | ||
# blog | ||
Some notes on things I find interesting and important. | ||
I am a researcher and computer scientist. I was once in San Francisco, but am now traveling. | ||
|
||
**Note**: Things may be a bit of a mess here while I de-jekyll all the previous posts. This may all go away if I find that jekyll actually did something valuable for me, but I wouldn't worry about that. | ||
|
||
|
||
### Posts | ||
|
||
--- | ||
|
||
2015-07-31 [The impact of fast networks on graph analytics, part 2.](https://github.com/frankmcsherry/blog/posts/2015-07-31.md) | ||
|
||
--- | ||
|
||
2015-07-08 [The impact of fast networks on graph analytics, part 1.](https://github.com/frankmcsherry/blog/posts/2015-07-08.md) | ||
|
||
--- | ||
|
||
2015-05-12 [Differential graph computation]((https://github.com/frankmcsherry/blog/posts/2015-05-12.md)) | ||
|
||
--- | ||
|
||
2015-05-04 [Abomonation: terrifying serialization]((https://github.com/frankmcsherry/blog/posts/2015-05-04.md)) | ||
|
||
--- | ||
|
||
2015-04-19 [Data-parallelism in timely dataflow]((https://github.com/frankmcsherry/blog/posts/2015-04-19.md)) | ||
|
||
--- | ||
|
||
2015-04-11 [Worst-case optimal joins, in dataflow]((https://github.com/frankmcsherry/blog/posts/2015-04-11.md)) | ||
|
||
--- | ||
|
||
2015-04-07 [Differential dataflow]((https://github.com/frankmcsherry/blog/posts/2015-04-07.md)) | ||
|
||
--- | ||
|
||
2015-02-04 [Bigger data; same laptop]((https://github.com/frankmcsherry/blog/posts/2015-02-04.md)) | ||
|
||
--- | ||
|
||
2015-01-15 [Scalability! But at what COST?]((https://github.com/frankmcsherry/blog/posts/2015-01-15.md)) | ||
|
||
--- | ||
|
||
2014-12-29 [Timely dataflow: core concepts]((https://github.com/frankmcsherry/blog/posts/2015-12-29.md)) | ||
|
||
--- | ||
|
||
2014-12-27 [Timely dataflow: reboot]((https://github.com/frankmcsherry/blog/posts/2015-12-27.md)) | ||
|
||
--- | ||
|
||
2014-12-16 [Columnarization in Rust, part 2]((https://github.com/frankmcsherry/blog/posts/2015-12-16.md)) | ||
|
||
--- | ||
|
||
2014-12-15 [Columnarization in Rust]((https://github.com/frankmcsherry/blog/posts/2015-12-15.md)) |
Binary file not shown.
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
<html> | ||
<body> | ||
<h1>uk_2007_05, GraphX, 16x8, 10G</h1> | ||
<img src="caelum-401.png" /> | ||
<img src="caelum-402.png" /> | ||
<img src="caelum-403.png" /> | ||
<img src="caelum-404.png" /> | ||
<img src="caelum-405.png" /> | ||
<img src="caelum-406.png" /> | ||
<img src="caelum-407.png" /> | ||
<img src="caelum-408.png" /> | ||
<img src="caelum-409.png" /> | ||
<img src="caelum-410.png" /> | ||
<img src="caelum-411.png" /> | ||
<img src="caelum-412.png" /> | ||
<img src="caelum-413.png" /> | ||
<img src="caelum-414.png" /> | ||
<img src="caelum-302.png" /> | ||
<img src="caelum-303.png" /> | ||
</body> | ||
</html> |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
21 changes: 21 additions & 0 deletions
21
assets/timeseries/pagerank/graphx_uk_16x8_10g_zoom/index.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
<html> | ||
<body> | ||
<h1>uk_2007_05, GraphX, 16x8, 10G; zoomed into iterations</h1> | ||
<img src="caelum-401.png" /> | ||
<img src="caelum-402.png" /> | ||
<img src="caelum-403.png" /> | ||
<img src="caelum-404.png" /> | ||
<img src="caelum-405.png" /> | ||
<img src="caelum-406.png" /> | ||
<img src="caelum-407.png" /> | ||
<img src="caelum-408.png" /> | ||
<img src="caelum-409.png" /> | ||
<img src="caelum-410.png" /> | ||
<img src="caelum-411.png" /> | ||
<img src="caelum-412.png" /> | ||
<img src="caelum-413.png" /> | ||
<img src="caelum-414.png" /> | ||
<img src="caelum-302.png" /> | ||
<img src="caelum-303.png" /> | ||
</body> | ||
</html> |
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
21 changes: 21 additions & 0 deletions
21
assets/timeseries/pagerank/timely_twitter_16x8_10g/index.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
<html> | ||
<body> | ||
<h1>twitter_rv, timely, 16x8, 10G</h1> | ||
<img src="caelum-401.png" /> | ||
<img src="caelum-402.png" /> | ||
<img src="caelum-403.png" /> | ||
<img src="caelum-404.png" /> | ||
<img src="caelum-405.png" /> | ||
<img src="caelum-406.png" /> | ||
<img src="caelum-407.png" /> | ||
<img src="caelum-408.png" /> | ||
<img src="caelum-409.png" /> | ||
<img src="caelum-410.png" /> | ||
<img src="caelum-411.png" /> | ||
<img src="caelum-412.png" /> | ||
<img src="caelum-413.png" /> | ||
<img src="caelum-414.png" /> | ||
<img src="caelum-314.png" /> | ||
<img src="caelum-313.png" /> | ||
</body> | ||
</html> |
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
<html> | ||
<body> | ||
<h1>uk_2007_05, timely, 16x8, 10G</h1> | ||
<img src="caelum-401.png" /> | ||
<img src="caelum-402.png" /> | ||
<img src="caelum-403.png" /> | ||
<img src="caelum-404.png" /> | ||
<img src="caelum-405.png" /> | ||
<img src="caelum-406.png" /> | ||
<img src="caelum-407.png" /> | ||
<img src="caelum-408.png" /> | ||
<img src="caelum-409.png" /> | ||
<img src="caelum-410.png" /> | ||
<img src="caelum-411.png" /> | ||
<img src="caelum-412.png" /> | ||
<img src="caelum-413.png" /> | ||
<img src="caelum-414.png" /> | ||
<img src="caelum-314.png" /> | ||
<img src="caelum-313.png" /> | ||
</body> | ||
</html> |
Binary file added
BIN
+49.7 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-313.png
Oops, something went wrong.
Binary file added
BIN
+50 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-314.png
Oops, something went wrong.
Binary file added
BIN
+50.7 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-401.png
Oops, something went wrong.
Binary file added
BIN
+50.6 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-402.png
Oops, something went wrong.
Binary file added
BIN
+50.4 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-403.png
Oops, something went wrong.
Binary file added
BIN
+50.1 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-404.png
Oops, something went wrong.
Binary file added
BIN
+50.2 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-405.png
Oops, something went wrong.
Binary file added
BIN
+50.4 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-406.png
Oops, something went wrong.
Binary file added
BIN
+50 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-407.png
Oops, something went wrong.
Binary file added
BIN
+50.8 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-408.png
Oops, something went wrong.
Binary file added
BIN
+50.2 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-409.png
Oops, something went wrong.
Binary file added
BIN
+50.3 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-410.png
Oops, something went wrong.
Binary file added
BIN
+50 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-411.png
Oops, something went wrong.
Binary file added
BIN
+50.7 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-412.png
Oops, something went wrong.
Binary file added
BIN
+50.4 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-413.png
Oops, something went wrong.
Binary file added
BIN
+50.4 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/caelum-414.png
Oops, something went wrong.
21 changes: 21 additions & 0 deletions
21
assets/timeseries/pagerank/timely_uk_16x8_10g_processagg/index.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
<html> | ||
<body> | ||
<h1>uk_2007_05, timely, 16x8, 10G</h1> | ||
<img src="caelum-401.png" /> | ||
<img src="caelum-402.png" /> | ||
<img src="caelum-403.png" /> | ||
<img src="caelum-404.png" /> | ||
<img src="caelum-405.png" /> | ||
<img src="caelum-406.png" /> | ||
<img src="caelum-407.png" /> | ||
<img src="caelum-408.png" /> | ||
<img src="caelum-409.png" /> | ||
<img src="caelum-410.png" /> | ||
<img src="caelum-411.png" /> | ||
<img src="caelum-412.png" /> | ||
<img src="caelum-413.png" /> | ||
<img src="caelum-414.png" /> | ||
<img src="caelum-314.png" /> | ||
<img src="caelum-313.png" /> | ||
</body> | ||
</html> |
Binary file added
BIN
+57.4 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-313.png
Oops, something went wrong.
Binary file added
BIN
+57.3 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-314.png
Oops, something went wrong.
Binary file added
BIN
+58.5 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-401.png
Oops, something went wrong.
Binary file added
BIN
+57.2 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-402.png
Oops, something went wrong.
Binary file added
BIN
+56.9 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-403.png
Oops, something went wrong.
Binary file added
BIN
+57.3 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-404.png
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+57.4 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-406.png
Oops, something went wrong.
Binary file added
BIN
+58.3 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-407.png
Oops, something went wrong.
Binary file added
BIN
+57.9 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-408.png
Oops, something went wrong.
Binary file added
BIN
+57.3 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-409.png
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+56.8 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-411.png
Oops, something went wrong.
Binary file added
BIN
+57.5 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-412.png
Oops, something went wrong.
Binary file added
BIN
+57.1 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-413.png
Oops, something went wrong.
Binary file added
BIN
+57.3 KB
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/caelum-414.png
Oops, something went wrong.
21 changes: 21 additions & 0 deletions
21
assets/timeseries/pagerank/timely_uk_16x8_10g_workeragg/index.html
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
<html> | ||
<body> | ||
<h1>uk_2007_05, timely, 16x8, 10G</h1> | ||
<img src="caelum-401.png" /> | ||
<img src="caelum-402.png" /> | ||
<img src="caelum-403.png" /> | ||
<img src="caelum-404.png" /> | ||
<img src="caelum-405.png" /> | ||
<img src="caelum-406.png" /> | ||
<img src="caelum-407.png" /> | ||
<img src="caelum-408.png" /> | ||
<img src="caelum-409.png" /> | ||
<img src="caelum-410.png" /> | ||
<img src="caelum-411.png" /> | ||
<img src="caelum-412.png" /> | ||
<img src="caelum-413.png" /> | ||
<img src="caelum-414.png" /> | ||
<img src="caelum-314.png" /> | ||
<img src="caelum-313.png" /> | ||
</body> | ||
</html> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
--- | ||
layout: post | ||
title: "Columnarization in Rust" | ||
date: 2014-12-15 17:00:00 | ||
categories: columnarization serialization rust | ||
published: true | ||
--- | ||
|
||
So like everyone without a job, I've started to learn [Rust](http://www.rust-lang.org). And like everyone who has started to learn Rust, I now feel it is very important to tell you about my experience with it. | ||
|
||
The project I'll talk about is for **columnarization**, a technique from the database community for laying out structured records in a format that is more convenient for serialization than the records themselves. This is important if you want to send data around in binary format at high throughputs, which is indeed something I often enjoy. | ||
|
||
The rough idea is that, starting from a vector of some type, or `Vec<T>` in Rust, we want to repeatedly reduce the complexity of the type `T`, at the cost of increasing the number of vectors we have on hand. There are roughly three types of rules we'll follow: | ||
|
||
1. `Vec<uint>` A vector of base types is our base case. Do nothing. | ||
|
||
2. `Vec<(S, T)>` A vector of pairs is transformed to a pair of vectors: `(Vec<S>, Vec<T>)`. | ||
|
||
3. `Vec<Vec<T>>` A vector of vectors goes to a pair `(Vec<uint>, Vec<T>)`, corresponding to the original vector lengths, and their concatenated payloads. | ||
|
||
These three rules, and generalizations thereof, allow us to transform vectors of fairly complex types into a sequence of vectors of simple types. These vectors of simple types are then very easy to serialize and deserialize, simply casting typed vectors to and from byte vectors. | ||
|
||
## Implementation ## | ||
|
||
I've gone and tried this out in Rust. Naturally, I won't share the first few evolutions with you, because you might conclude that neither Rust nor Frank is much of anything you'd want to be associated with. However, the current state of the project really appeals to me, and shows off some things I don't think I could easily do in most other languages I am familiar with. | ||
|
||
For the curious, the repository can be cloned from [https://github.com/frankmcsherry/columnar](https://github.com/frankmcsherry/columnar). | ||
|
||
The trait ("interface") I've implemented is `ColumnarVec<T>`, whose role in life is to `push` and `pop` records, much like a `Vec<T>` in Rust. That is to say, it accepts records and adds them to its columnar stash, and it can return those records when asked. This isn't yet much more interesting than `Vec<T>`, so the `ColumnarVec<T>` also adds the ability to `encode` its contents into byte arrays, and `decode` from byte arrays populating its contents. | ||
|
||
{% highlight rust %} | ||
pub trait ColumnarVec<T> | ||
{ | ||
fn push(&mut self, T); | ||
fn pop(&mut self) -> Option<T>; | ||
|
||
fn encode(&mut self, &mut Vec<Vec<u8>>); | ||
fn decode(&mut self, &mut Vec<Vec<u8>>); | ||
} | ||
{% endhighlight %} | ||
|
||
The intended use of a `ColumnarVec` is to repeatedly call `push` with your favorite records, decide that you'd like to `encode` them to binary, ship the resulting arrays to a dear friend, who is then able to `decode` into her empty `ColumnarVec` and read the contents out using `pop`. | ||
|
||
### uints, and basic types ### | ||
|
||
For basic types, we have a very basic implementation of `ColumnarVec<T>`: we just use a `Vec<T>`. When records are pushed or popped, we simply use the underlying methods on `Vec`. | ||
|
||
To `encode` we swap in an empty vector at `*self` (using `mem::swap`, which takes two mutable references), cast the swapped out vector to a `Vec<u8>`, and push it on to the `Vec<Vec<u8>>` stack. | ||
|
||
To `decode` we pop a byte vector from the stack, casting to a `Vec<T>`, and then assign it to `*self`. This overwrites anything currently in the `ColumnarVec`, which ... perhaps is not expected behavior. | ||
|
||
### Pairs, tuples, and structs ### | ||
|
||
Pairs are a bit more interesting than base types, in that we want to destructure the pair into its component elements so that they can each be pushed into their corresponding vectors. More generally, we will only assume that the two types in the pair have implementations of `ColumnarVec` supporting them. | ||
|
||
In Rust, as well as other sophisticated languages, we can indicate that any pair of types implementing `ColumnarVec<T1>` and `ColumnarVec<T2>` do themselves implement `ColumnarVec<(T1, T2)>`. We simply name the types, state the constraints, and then provide the implementation: | ||
|
||
{% highlight rust %} | ||
impl<T1, R1, T2, R2> ColumnarVec<(T1, T2)> for (R1, R2) | ||
where R1: ColumnarVec<T1>, | ||
R2: ColumnarVec<T2>, | ||
{ | ||
fn push(&mut self, (x, y): (T1, T2)) | ||
{ | ||
self.0.push(x); // push into first ColumnarVec | ||
self.1.push(y); // push into second ColumnarVec | ||
} | ||
|
||
fn pop(&mut self) -> Option<(T1, T2)> | ||
{ | ||
self.0.pop().map(|x| (x, self.1.pop().unwrap())) | ||
} | ||
|
||
fn encode(&mut self, buffers: &mut Vec<Vec<u8>>) | ||
{ | ||
self.0.encode(buffers); | ||
self.1.encode(buffers); | ||
} | ||
|
||
fn decode(&mut self, buffers: &mut Vec<Vec<u8>>) | ||
{ | ||
self.1.decode(buffers); | ||
self.0.decode(buffers); | ||
} | ||
} | ||
{% endhighlight %} | ||
|
||
One appealing part of Rust, and several other similar languages, is that one can specify an implementation in a fairly light-weight manner. Here, I've not even defined a new type to support the methods, I've just indicate that they exist for a family of pairs of two types. | ||
|
||
### Vectors and collections ### | ||
|
||
Vectors, and variable sized fields like `Option<T>`, are where things start to get a bit sticky with columnarization. This is not only where we need to do some non-trivial re-shaping of the data, but also where we need to start dealing with dynamically allocated memory ourselves: people calling `pop` are going to want to see some vectors in their results. | ||
|
||
One of the very appealing parts of Rust is its notion of ownership of data, and this is indicated obliquely in the signature of the `push` method. Whereas the `ColumnarVec` itself is passed in as a reference (the `&mut self` oddness), the record of type `T` is unadorned, meaning it is the actual record and we own it now. Owning the record is great, because it means no one else owns it, and if we want to rip it in to small pieces in the name of columnarization, we are free to do so. | ||
|
||
A second appealing part of owning the record is that it means we also own all of the dynamically allocated memory the record owns, for example any `Vec` fields we might find. Obviously they hold valuable data we want to columnarize, but once that is done and the data are moved out, we can simply snag the `Vec` and hold on to it for future use. Future use such as needing a `Vec` when someone calls `pop`. By stashing the vectors we can avoid much interaction with the allocator when repeatedly encoding and decoding, and that means performance. | ||
|
||
Let's look at the `push` and `pop` methods for the implementation of `ColumnarVec<Vec<T>>`. The first performs the possibly unsurprising tasks of pushing the input vector's length into a `ColumnarVec<uint>` and pushing its contents into a `ColumnarVec<T>`. Once done with the vector's contents, it stashes the empty-but-allocated vector, which is the part that I thought was really clever. The `pop` method runs in the opposite direction, fetching a stashed vector (or allocating if one doesn't exist), reading the intended length, and then filling the vectors contents before returning the vector. | ||
|
||
{% highlight rust %} | ||
impl<T, R1, R2> ColumnarVec<Vec<T>> for (R1, R2, Vec<Vec<T>>) | ||
where R1: ColumnarVec<uint>, | ||
R2: ColumnarVec<T>, | ||
{ | ||
fn push(&mut self, mut vector: Vec<T>) | ||
{ | ||
self.0.push(vector.len()); | ||
while let Some(record) = vector.pop() { self.1.push(record); } | ||
self.2.push(vector); // once empty, stash the vector | ||
} | ||
|
||
fn pop(&mut self) -> Option<Vec<T>> | ||
{ | ||
if let Some(len) = self.0.pop() | ||
{ | ||
// fetch or allocate a vector, then fill and return. | ||
let mut vector = self.2.pop().unwrap_or(Vec::new()); | ||
for _ in range(0, len) { vector.push(self.1.pop().unwrap()); } | ||
Some(vector) | ||
} | ||
else { None } | ||
} | ||
|
||
// ... encode and decode call the corresponding methods on R1 and R2 ... | ||
} | ||
{% endhighlight %} | ||
|
||
This approach to vector columnarization steers us clear of the traditional hazards of memory allocation inside what is meant to be a tight loop. In applications where one is repeatedly encoding, exchanging, and decoding data, in steady state the program will not need to allocate any new memory, which is an excellent position to be in. | ||
|
||
## Performance ## | ||
|
||
Measuring the performance of columnarization in Rust turned out to be harder than I expected. Mostly, coming from a managed JIT background, one could reasonably believe that only effect of an optimizing compiler was to randomize line numbers. Rust and LLVM are a fair bit smarter and without the right test harness they wil just optimize away your program. | ||
|
||
The harness is in [example.rs](https://github.com/frankmcsherry/columnar/blob/master/examples/example.rs), and tests out columnar encoding and decoding of a few different datatypes. Be warned that these numbers are likely optimistic, as fewer optimizations will apply in the wild. | ||
|
||
1. Serializing `uint` data goes very fast, because it is just copying data and casting a pointer, if that. | ||
|
||
`Encoding/decoding at 11.24GB/s` | ||
|
||
2. Serializing `(uint, (uint, uint))` data goes less fast, but still quite fast: | ||
|
||
`Encoding/decoding at 5.29GB/s` | ||
|
||
3. Serializing `Vec<Vec<uint>>` data goes slower, due to conditional logic in the loop, but is still fast. | ||
|
||
`Encoding/decoding at 2.26GB/s` | ||
|
||
These throughputs numbers will obviously vary as the shape of the data change. Currently, working with `Option<T>` types is slowest, due to the large ratio of conditional logic to actual bytes moved. | ||
|
||
These are great numbers by my experience, and for about 150 lines of code all written in the base language (rather than code-gened nonsense), I am delighted. |
Oops, something went wrong.