tidb/join/README.md
This is the homework for PingCAP Talent Plan Online of week 4. This homework is a simplified version of ACM SIGMOD Programming Contest 2018.
The task is to evaluate batches of join queries on a set of pre-defined relations. Each join query specifies two relations, multiple (equality) join predicates, and one (sum) aggregation. The challenge is to fully utilize the CPU and memory resources and execute the queries as fast as possible.
NOTE: go 1.12 is required
The simple interface Join(f0, f1 string, offset0, offset1 []int) (sum uint64) is defined in join.go. Our test harness will feed two relations and two columns' offsets array to the interface every time, and check the correctness of the output result. Explaination to the four input arguments and one output argument of the interface are list as follows:
The (equality) join predicates are specified by the offset0/1. The form of the join predicates is like:
relation0.cols[offset[0]] = relation1.cols[offset[0]] and relation0.cols[offset[1]] = relation1.cols[offset[1]]...
Example: Join("/path/T0", "/path/T1", []int{0, 1}, []int{2, 3})
Translated to SQL:
SELECT SUM(T0.COL0)
FROM T0, T1
ON T0.COL0=T1.COL2 AND T0.COL1=T1.COL3
We provide a sample as join_example.go: JoinExample which performs a simple hash join algorithm. It uses the relation0 to build the hash table, and probe the hash table for every row in relation1.
join_example.go:JoinExample.pprof.Note:
BenchmarkJoin and BenchmarkJoinExample which can be found in benchmark_test.go to evaluate your program. Test data will NOT be outside of what we've provided.Join in join.go to accomplish this task.t. You can load them into a DBMS to test your program.
JoinTest defined in join_test.go. You can write your own unit tests, but please make sure JoinTest can be passed before.make test to run all the unit tests.make bench to run all the benchmarks.