March 18, 2021
Scan 1 PB at 300MB/s (SATA r2)
Today we want to clearly see the communications.
Replication | Partitioning (Sharding) | |
---|---|---|
look familiar?
Can we run each worker on one partition?
N partitions in, N partitions out
N partitions in, N partitions out
Trick question, just combines partitions!
N partitions in, 1 partition out
Aggregate needs to process N partitions.
Final aggregate only needs to process N tuples.
↓↓↓↓↓
Every partition from one table needs to pair
with every partition from the other.
↓↓↓↓↓↓
(R1⋈S1)⊎…⊎(R1⋈SK) …⊎…⊎… (RN⋈S1)⊎…⊎(RN⋈SK)S1 | S2 | S3 | S4 | |
R1 | R1⋈S1 | R1⋈S2 | R1⋈S3 | R1⋈S4 |
R2 | R2⋈S1 | R2⋈S2 | R2⋈S3 | R2⋈S4 |
R3 | R3⋈S1 | R3⋈S2 | R3⋈S3 | R3⋈S4 |
R4 | R4⋈S1 | R4⋈S2 | R4⋈S3 | R4⋈S4 |
N workers gets us √N scaling
S1 | S2 | S3 | S4 | |
R1 | R1⋈S1 | R1⋈S2 | R1⋈S3 | R1⋈S4 |
R2 | R2⋈S1 | R2⋈S2 | R2⋈S3 | R2⋈S4 |
R3 | R3⋈S1 | R3⋈S2 | R3⋈S3 | R3⋈S4 |
R4 | R4⋈S1 | R4⋈S2 | R4⋈S3 | R4⋈S4 |
Back to N scaling for N workers
What if the partitions aren't aligned so nicely?
Can we do better?
Focus on R1⋈BS1
Problem: All tuples in R1 and S1 need to be
sent to the same worker.
Idea 1: Put the worker on the node that has the data!
Problem: What if the data is on 2 different nodes?
Idea 1.b: Put the worker on one of the nodes with data.
Can we reduce network use more?
Problem: Worker 2 is still sending a lot of data.
Idea: Compress πB(S1)
(not all errors are equal)
User: Is Alice part of the set? | filter: Yes |
User: Is Eve part of the set? | filter: No |
User: Is Fred part of the set? | filter: Yes |
Test always returns Yes if the element is in the set.
Test usually returns No if the element is not in the set.
A bloom filter is an array of bits.
M: Number of bits in the array.
K: Number of hash functions.
Each bit vector has ∼K bits set.
Key1 | 00101010 |
Key2 | 01010110 |
Key3 | 10000110 |
Key4 | 01001100 |
Key1 | 00101010 |
Key2 | 01010110 |
Key3 | 10000110 |
Key4 | 01001100 |
Key1&01111110 | 00101010 | ✅ |
Key3&01111110 | 00101010 | ❌ |
Key4&01111110 | 01001100 | ✅ |
(False positive)