JS monorepos in prod 5: merging Git repositories and preserve commit history
At Adaltas, we maintain several open-source Node.js projects organized as Git monorepos and published on NPM. We shared our experience to work with Lerna monorepos in a set of articles:
Now is the turn of our popular open-source Node CSV project to be migrated to a monorepo. This article will walk you through the available approaches, technics, and tools used to migrate multiple Node.js projects hosted on GitHub into the Lerna monorepo. At the end, we provide a bash script we used for migrating the Node CSV project. This script can be applied to a different project with just a little modification.
Requirements for migration
The Node CSV project combines 4 NPM packages to work with CSV files in Node.js wrapped by the umbrella
csv package. Each NPM package has its rich commit history, and we wanted to save the maximum information from the old repositories. There are our requirements for migration:
- preserve commit history with maximum information (such as tags, its messages, and merging commits)
- ameliorate commit messages to follow the Conventional Commits specification
- preserve GitHub issues
Well, we have 5 NPM packages to migrate to the Lerna monorepo:
We want to achieve a directory structure that looks like this:
packages/ csv/ csv-generate/ csv-parse/ csv-stringify/ stream-transform/ lerna.json package.json
Choosing Git log strategy
When migrating repositories into a monorepo, you merge their commit logs. There are 3 suggested strategies in the image below.
- Single branch
It provides a straightforward log containing only commits on the default (master) branches of all packages. Different logs are joined sequentially by adding the latest commit of the previous package as a parent commit to the first commit of the next package. This strategy breaks the sorting of the log by the date of commits.
- Multiple branches with a common parent
This improves the visual perception of the log by splitting branches of different repositories. A new parent commit is added to all the first commits of the branches. In the end, all the branches are merged into the default branch.
- Multiple branches with different parents
This strategy doesn’t rewrite the first commits of old repositories. It requires minimal intervention into commit history and seems logically more correct because initially, the repositories didn’t have a common parent.
Merging commit logs
Lerna has a built-in mechanism for gathering existing standalone NPM packages into a monorepo preserving commit history. The
lerna import command imports a package from an external repository into
packages/. The sequence of commands is pretty simple: you need to initialize Git and Lerna repositories, make the first commit, and then start importing packages from locally cloned Git repositories. You can find basic usage instructions in the documentation here.
lerna import, you can only follow the 1st or the 2nd Git log strategy described above. For the 2nd one, you need to create a separate branch per importing repository like this:
git checkout -b package-1 lerna import /path/to/package-1 git checkout master git checkout -b package-2 lerna import /path/to/package-2
lerna import provides an easy-to-use tool to migrate repositories to the Lerna monorepo. However, it flattens the commit history reducing merge commits, and it doesn’t migrate tags and their messages. Unfortunately, these limitations didn’t meet our requirement to save maximum information from existing repositories and we had to use a different tool.
git merge command provides merging unrelated histories using the
--allow-unrelated-histories option. It preserves the full commit history of a targeted branch with its tags. In this case, you will achieve the 3rd Git log strategy.
Merging a commit history of an external repository into a current one using
--allow-unrelated-histories as simple as running 2 commands:
git remote add -f <external-repo-name> <external-repo-path> git merge --allow-unrelated-histories <external-repo-name>/<branch-name>
Rewriting commit messages
To put more order and transparency into the combined commit log, we prefix all commit messages with their package names. Additionally, we make them compatible with the Conventional Commits specification which we follow in our latest projects. This specification standardizes the commit messages making them more readable and easy to automate.
To implement this, we need to rewrite all commit messages by prefixing them with the string like
We chose the
choretype just to make it compatible with the specification, and we didn’t want to make complex regular expressions to fully support it.
There are 2 tools to rewrite commit messages:
Following the Git recommendation, we choose the
git filter-repo. After installing the tool using these instructions, the command to rewrite the commit messages of a current repository is:
git filter-repo --message-callback 'return b"chore(<package-name>): " + message'
To see more usage examples of rewriting repository history with
git filter-repo, you can follow this documentation.
Transferring GitHub issues
After migrating repositories and publishing a new monorepo to GitHub, we want to transfer existing GitHub issues from the old repositories. Issues can be transferred from one repository to another using the GitHub interface. You can follow this guide to learn the instructions.
Unfortunately, at the time of this writing, there is no possibility to make a bulk issues transfer. Issues must be transferred one by one. But this can give you an excuse to “forget” to transfer annoying pending issues created by the project community;)
What about GitHub pull requests? There will be a loss and we have to live with it. A good thing is that links between issues written in commentaries and linked pull requests will be saved thanks to redirecting.
The migration bash script leverages the chosen approaches and tools described above. It generates the
./node-csv directory containing the Node CSV project files reorganized as a Lerna monorepo.
#!/bin/sh set -e REPOS=( https://github.com/adaltas/node-csv https://github.com/adaltas/node-csv-generate https://github.com/adaltas/node-csv-parse https://github.com/adaltas/node-csv-stringify https://github.com/adaltas/node-stream-transform ) OUTPUT_DIR=node-csv PACKAGES_DIR=packages rm -rf $OUTPUT_DIR && mkdir $OUTPUT_DIR && cd $OUTPUT_DIR git init . git remote add origin $REPOS for repo in $REPOS[@]; do splited=($repo//// ) package=$splited[$#splited[@]-1]/node-/ rm -rf $TMPDIR/$package && mkdir $TMPDIR/$package && git clone $repo $TMPDIR/$package git filter-repo \ --source $TMPDIR/$package \ --target $TMPDIR/$package \ --message-callback "return b'chore($package): ' + message" git remote add -f $package $TMPDIR/$package git merge --allow-unrelated-histories $package/master -m "chore($package): merge branch 'master' of $repo" mkdir -p $PACKAGES_DIR/$package files=$(find . -maxdepth 1 | egrep -v ^./.git$ | egrep -v ^.$ | egrep -v ^./$PACKAGES_DIR$) for file in $files// /[@]; do mv $file $PACKAGES_DIR/$package done git add . git commit -m "chore($package): move all package files to $PACKAGES_DIR/$package" git branch init/$package $package/master done rm $PACKAGES_DIR/**/CONTRIBUTING.md rm $PACKAGES_DIR/**/CODE_OF_CONDUCT.md rm -rf $PACKAGES_DIR/**/.github git add . git commit -m "chore: remove outdated packages files"
To run this script, simply create an executable file, for example with the name
migrate.sh, paste the script’s content inside it, and run it with the command:
chmod u+x ./migrate.sh ./migrate.sh
Note! Don’t forget to install
git-filter-repobefore running the script.
Notes for each step of the script:
Configuration variables define the list of repositories to be migrated, the destination directory of the new Lerna monorepo, and the folder for packages inside it. You can modify these variables to reuse this script for your project.
2.Initialize a new repository
We initialize a new repository. The first repository is also registered as the remote
3.1.Get package name
It extracts package names from their repositories links. In our case, the repositories are prefixed with
node-which we don’t want to keep.
3.2.Rewrite commit messages via a temporary repository
To add a prefix to the commits of each package using the pattern
chore(, we need to make it separately for every repository. This is possible via a repository locally cloned to a temporary folder.
3.3.Merge the repository into monorepo
At first, we add a locally cloned repository as a remote to the monorepo. Then, we merge its commit history specifying a merge commit message.
3.4.Move repository files to the packages folder
After merging, the files of the merged repository appear under the monorepo root directory. Following the structure we want to achieve, we move those files to the
packagesdirectory and commit it.
3.5.Create a new branch
The commit history is now associated with our monorepos through a remote repository. The history will be lost if the original repository is erased. To store the history in the monorepo, we create a branch which track the remote repository and prefixed it with
4.Cleanup and remove outdated files
For the sake of illustration, we clean up some package files that are outdated thanks to the migration. Some of those file shall be moved to the repository root directory.
The GIT repository is now ready and, as such, qualifies as a monorepo. To make it usuable, additionnal files must be created such as a root
package.json file, the
lerna.json configuration file if using Lerna and a
README file. Refer to the first article of our serie to apply the necessary changes and initiliaze your monorepo with Lerna.
Migration of existing open-source projects requires you to be tidy and meticulous because a little mistake can ruin the job of your users. All the steps must be carefully analyzed and well tested. In this article, we have covered the scope of work to migrate multiple Node.js projects to the Lerna monorepo. We have considered different approaches, technics and available tools to automate the migration on the example of our Node CSV open-source project.