Monday, June 15, 2020

Export tfvc to git.

Several people on the internet say they used git-tfs to export and that it took some manual fixing to get it to look ok, but they never went into detail what that manual fixing was. This is what I did. Git-tfs runs for me at about 6.8 commits/min, at about 5.97 files/commit. That was over a batch of about 300 commits with 1792 file changes which took about 44 minutes. The final result was 27,353 commits (from 8 years of tfvc) and a 550Mb repository. This took about a month having it run in the background, with periodic restarts. It didn't run over weekends, because mandatory restarts would happen "outside work hours", and branch renames needed manual attention (I probably could have scripted it, but I didn't figure out how to find the renames other than looking at deleted branches in source control explorer, or by reviewing the messages output from git-tfs). I used 47 grafts to fix a few history breaks, but mostly renames. If you don't have any renames it will run faster.

Useful gitk views:

Add to .config/git/gitk:
set permviews {{Remote-date {} {--remotes --date-order} {}}
{Remote-Simple-Date {} {--remotes --date-order --simplify-by-decoration} {}}
{{missing authors} {} {--remotes {--author=.*DOMAIN\.tfs.*}} {}}
}
But replace DOMAIN with the actual windows domain of the users (ours was corp, but windows default will be workgroup?

Batches

Your computer will bluescreen at points, IT department will force a mandatory restart, TFS will go down for maintence, your VPN will turn off... 3 times a day, You will miss an author or git-tfs cannot figure out a merge parent, and so you will need to redo a section. Use git tfs fetch -t ######, and copy the resulting git project folder to a safe place. keep a record of which were at what point.

Branches

git-tfs will decide that the main branch ends at 5.6, but it really continues off on branches of branches of branches of 5.6. Sometimes getting the branch to the end will cause the next branch to be initialized automatically due to some merges back into the older branch. But some times it won't.
git tfs branch --init $/Team1-scrum/project/6.0/trunk --no-fetch
--no-fetch followed by fetch -t #### helps with renames and computer troubles.

Merging Broken History

Some one will have moved the code from one project repository, to another by copying the source folder to a new folder. Manually add branch to .git/config
[tfs-remote "default"]
 url = http://tfs:8080/tfs/PrimaryCollection/
 repository = $/Team1/project/1.0/trunk
[tfs-remote "project/2.0/trunk"]
 url = http://tfs:8080/tfs/PrimaryCollection/
 repository = $/Team1/project/2.0/trunk
   ⋮
[tfs-remote "project/5.0/trunk"]
 url = http://tfs:8080/tfs/PrimaryCollection/
 repository = $/Team1-scrum/project/5.0/trunk
then run git-tfs fetch -i project/5.0/trunk then use a graft. see grafts.

Renames

AKA: "warning: this changeset 193940 is a merge changeset. But git-tfs is unable to determine the parent changeset." ugh renames. At some point your supervisor thought that renames were "the bomb", and had the team use them everywhere. Renames break git-tfs. You can set IgnoreBranches=true, but then you get helpful commit messages like "Merge - woops", and no relation to the actual commit with the actual message.
Workaround: Treat it kind of like broken history.
  1. Run git-tfs on the main branch up to the changeset of the rename
    git tfs fetch -i project/6.15/trunk -t 358062
  2. Run git-tfs on the branch up to the rename
    git tfs fetch -i project/6.15/branches/6.15.5 -t 358061
  3. Set initial-changeset and add the branch manually to .git/config,
    [git-tfs]
        workspace-dir = c:\\w1
        ignore-branches = false
        ignore-not-init-branches = false
        ignore-branches-regex = .*(?:branchs|Skies).*
        export-metadatas = true
        disable-gitignore-support = true
        initial-changeset = 358062[tfs-remote "project/6.15/branches/6.15.6"]
        url = http://tfs:8080/tfs/PrimaryCollection/
        repository = $/Team1-scrum/project/6.15.6/trunk
    
  4. Run fetch on the one changeset
    git tfs fetch -i Core/6.15/branches/6.15.6 -t 358062
    (In my case, ~12,000 files took between 1.5-3 hours to get that one changeset. I don't know what it did during that time. The first 40 minutes is downloading the files, then git-tfs was in some "if changset is a rename do stuff for each file" loop. The process sits at 0% cpu, and 0% io for the whole time. I think there might be a problem in one of the libraries it depends on.)
  5. Match up the child/parent hashes with a graftreplacement, and verify that you didn't lose any commits. Ideally the commit should have no changes, but in my case most of the renames also altered version manifest files, or test playlist files, and other things that have version numbers.
  6. Then move on with another git tfs fetch -i project/6.15/trunk -t #####.
This works with renames because renames delete all files in the old branch, and create all files in the new branch, so all the files get fetched. Since fetch only gets the files listed in the changset, this does not work for skipping changesets, unless the next changeset reverts all those changes, etc.

GraftsReplacements

I think that git-tfs might find previous commit id's by saving notes pointing to commit hashes?, so I didn't change the hashes until I was done with git-tfs. (If this isn't the case, you could rebase instead? The way I did it seems to work ok.)(see Addendum, with git-tfs bootstrap. rebases probably would have worked.) I used a graft file, .git/info/grafts, https://git.wiki.kernel.org/index.php/GraftPoint, and then at the end of exporting turned that into replacements. https://mirrors.edge.kernel.org/pub/software/scm/git/docs/git-replace.html

Make GraftsReplacements Permanent

Since I'm on windows 10... I download filter-repo, https://github.com/newren/git-filter-repo, saved it in C:\Program Files\Git\mingw64\libexec\git-core\git-filter-repo , changed #!/usr/bin/env python3 to just #!/usr/bin/env python . I used a graft file git replace --convert-graft-file. Then ran git filter-repo --force

Push to Remote

git branch --all will probably show a million branches now. All the ones that start with tfs/ are the git-tfs exported branches. I was at 280. Dump git branch --all somewhere and massage it to look like git checkout -b nice_branch_name tfs/long_path_to_branch_folder. I did something like regex tfs/Core/(.*)/branches/(.*) into git branch -b $2 tfs/Core/$1/branches/$2, then run all of those, git filter-repo --force, and then run git push --set-upstream origin --all

Addendum

I thought I was done. During export the company decided to migrate to Azure Devops. Now all the markers are wrong. try git filter-repo to change urls to work items, git-tfs markers etc.
I can try just putting it all in the command. It is only a few lines. ... could probably one line it pretty easily. It would be a long line... I'll just shift enter for multple lines. Don't forget to escape double quotes.
PS C:\blah> git filter-repo --commit-callback '
>> msg = commit.message.decode(\"utf-8\")
>> newmsg = msg.replace(\"http://tfs:8080/tfs/PrimaryCollection\", \"https://dev.azure.com/Company\")
>> commit.message = newmsg.encode(\"utf-8\")
>> ' --force
Parsed 52227 commits
New history written in 328.30 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 087f91945a blah blah
Enumerating objects: 346091, done.
Counting objects: 100% (346091/346091), done.
Delta compression using up to 8 threads
Compressing objects: 100% (82068/82068), done.
Writing objects: 100% (346091/346091), done.
Total 346091 (delta 259364), reused 346030 (delta 259303), pack-reused 0
Completely finished after 443.37 seconds.
PS C:\blah> git tfs fetch -i Team1/project/7.15/trunk | tee-object -append -filepath .\.git\git-tfs_output.txt
Fetching from TFS remote 'Team1/project/7.15/trunk'...
C517459 = 22479e47965497a6ef463561fd9bba7e0eec2721

C517495 = a1d489d549619fa7aba9c75eaddbedf71e57e0b0
C517496 = 121904adb55dd14748b8fd5212eb1c1ad485734a
C517497 = c5ff0cd4f9f601356e7e3357716545354096a992
C517523 = ba60e45569c1a31b0574ecea4b7aec59e613d35e
C517544 = c15456e36f48d0bddf9c48b1d236ce6381ea598a
C517547 = 7549153a8c17138dcf6a9057151c61670685749c
C517590 = 0b428c7b897c9a5ea6195ea97434defdceaa2d31
C517611 = 9e395115d83fd0d2170f249ec8e0b4b596a55f85
C517613 = fb68cf3aea8741b2ff3a90eabcc58fe2233b2275
C517633 = 5f120b9d7d8e33a2f4bad39d0eb2480cc8db516a
C517778 = 411b5a02ec104ec485dd6d7a0e8464d28b7701ce
C517877 = 55f0f4ac3776a3ddb445d32584da25dacf107ad1
C517882 = e9221fc672238cbade656c3079c3683c800f45c5
C517884 = 1069db9ef920f610589a8a0d44e50993196bfa5e
PS C:\blah> gitk --date-order --remotes
PS C:\blah>
phew.
Don't use --refs HEAD, or it won't get all the branches. I used --force because I didn't want to bother pushing/cloning/pushing.

Addendum-Addendum

Trying to get more changes sets from ADO:

Azure DevOps Services

WARNING - ACTION RECOMMENDED

Your requests are being delayed.
On 6/22/2020 at 5:48 PM (UTC) we detected that your resource consumption on https://dev.azure.com/Company/ exceeded one of our limits. To maintain service availability for other users, we began delaying some of your requests. Your requests will continue being delayed until your resource consumption returns below our limits.
You can audit your request history by visiting the usage page for your organization. Please visit our documentation to learn more.

documentation

personal usage exceeds 200 times the consumption of a typical user within a (sliding) five-minute window.
Delays range from a few milliseconds per request up to 30 seconds. Once consumption goes to zero or the resource is no longer overwhelmed, the delays will stop within five minutes.
This is going to take forever... Maybe I need to run fetches in batches of 5 minutes. ... Checking how long this batch of 2148 commits took. 20 minutes for inital commit, 23 minutes for ~550 commits until throttling email, 45 minutes for ~1600 commits after throttling email. so 23.9 commits/min before, then 35.6 commits/min after. So it was faster after I got the email saying it was throttling. Either way ADO docker server is faster than our our old on prem i-don't-know-what-it-was server.

Branch-DeleteBranch-Rename

  1. Branching 3.5.3 From $/Team1/Project/3.5
  2. Updated some version files
  3. Whoops, deleting, shouldn't have branched
  4. renaming 3.5.2 to 3.5.3
yay.

PS C:\project> git filter-repo --commit-callback '
>> msg = commit.message.decode(\"utf-8\")
>> newmsg = msg.replace(\"3.5.3;C1515\", \"3.5.3_deleted;1515\")
>> commit.message = newmsg.encode(\"utf-8\")
>> ' --force
Parsed 14574 commits
New history written in 35.78 seconds; now repacking/cleaning...
Repacking your repo and cleaning out old unneeded objects
HEAD is now at 4bcefff9c Branched from $/Team1/Project/3.7/trunk
Enumerating objects: 102957, done.
Counting objects: 100% (102957/102957), done.
Delta compression using up to 8 threads
Compressing objects: 100% (27802/27802), done.
Writing objects: 100% (102957/102957), done.
Total 102957 (delta 73787), reused 102953 (delta 73786), pack-reused 0
Completely finished after 42.71 seconds.
PS C:\project>
Also, delete the tfs remote ref from .git/refs/remotes/tfs/..., or since you just ran git filter repo, it probably got packed into .git/packed-refs. Just open the file in a text editor and delete the one line with "refs/remotes/tfs/Team1/Project/3.5.3". (For some reason git branch -d -r tfs/Team1/Project/3.5.3 deleted more than just that one ref... maybe it will work for you though.) Then do the steps in "Renames". (set initial-commit in .git/config...

PS C:\project> git tfs fetch -i Team1/Projects/branches/3.5.3 -t 15161 | tee-object -append -filepath .\.git\git-tfs_output.txt
Fetching from TFS remote 'Loyalty3X/Core/3.12/branches/3.12.3'...
C15161 = a7a9a025f8ecbf26d93387a2727dab9aedfacd5c

PS C:\project> git replace --graft a7a9a025f8ecbf26d93387a2727dab9aedfacd5c bd33b9f18c91070c64797d38fe98aebb894832f3
PS C:\project>
)

Branch - DeleteBranch - reBranch

Try moving the remote ref to the new branch parent, by editing refs/remotes/tfs/.../branch, setting initial-commit to newbranchChangeset in .git/config like in the rename scenario, and running git tfs fetch -i branch -t newbranchChangeset

Sources:

No comments:

Post a Comment