docs/design/2021-09-13-parser-as-submodule-of-tidb.md
This is an refined version of the draft thread Move parser back to pingcap/tidb.
In short, I want to move pingcap/parser back to the pingcap/tidb repository in the form of go sub-module.
I will explain why and how to migrate the parser back to the TiDB repository.
The original PR moving out parser is tidb#7923: Move TiDB parser to a separate repository. When we migrated to go module 111, @tiancaiamao decided to do so, because:
We can put the generated
parser.goin the repository directly, rather than generate it using Makefile script. The drawback of this way is that every timeparser.yis touched, there will be many lines change inparser.go. Then the TiDB repo will be inflate quickly.A better way is moving parser to a separated repo, so every time parser is changed, only go.mod need to change accordingly.
Personally, that is a weird argument for me. Indeed, pingcap/tidb does not store the diff of parser.go anymore, but pingcap/parser does. How is moving the inflation from one repo to another better?
Another argument is about the messy dependency. The separation does improve the case.
In fact, it isn't a big problem. However, the development becomes inconvenient. You could argue that we have workarounds, but why not make the development easier?
Problems above have defeated the previous argument only go.mod need to change accordingly. We only separated code into two repositories, actual work did not decrease. However, our development experience does getting worse.
Obviously, we could move back to the old good way. But it must be a go sub-module, because:
The detailed plan:
replace github.com/pingcap/parser => ./parser.pingcap/tidb/parser. I mean something like import ( . github.com/pingcap/tidb/parser/xxx), create a dummy function will eliminate the error of not used import. All code of pingcap/parser will be removed in this step. This will delevery new updates without the need of migrating importing paths.There will be a tracking issue, and the whole progress is public. Step 1 or 2 will likely take one or two weeks. Internally we could do the migration for internal tools while doing step 3.
Step 4 and 5 are long-running tasks to smoothly end the support of the old parser.
Nothing needs to be done before step 5. However, one will need to migrate the import path eventually, before the final deprecation.
pkg -> forked tidb -> other pingcap/parser.Nothing different for the complex dependencies. Just do the migration before the deprecation. And we don't actually remove the old commits for not breaking old apps.
It is not a problem. pingcap/parser only has unit tests, and integration tests make gotest of pingcap/tidb. We could run integration tests of tidb, and only run unit tests.
Since sub-module is merely another module inside another module, we need to release it separately from the main module. That means our release team needs to tag parser/vX.X.X, which can be solved by a script: thanks to the identical release cycle/version between parser and tidb.
We can not simply remove it. Importing pingcap/parser to build applications needs parser.go file, but go module doesn't support something like build scripts in rust or postbuild in nodejs to auto-generate parser.go.
One solution is to only include the file for the released version. But TiDB does not support package semver, go get xxx/parser won't simply give a buildable package. That will be too frustrating for users.