Moving from GitHub to Self-hosted Git Repositories: My Journey

This was originally posted to /r/selfhosted on reddit, a website I no longer consider worth providing with content.

August 29th, 2022

This website functions sort of like a work portfolio, where I show off the things I do. Since one of the things I make is software, I have project showcases for my homemade software tools, large and small. I see my personal website as an opportunity to be a craftsman without deadlines, to come up with solutions that are fun to me even if they're not the most efficient way, and to have everything under one roof if I can. I have moved my stuff that used to be distributed on services like SlideShare, Scribd and YouTube all onto my website.

For the last few years, the only major holdout was my open source code on GitHub. This post is about my considerations and goals for moving from GitHub to self-hosted Git repositories, possible solutions I considered, the solution I ended up going with, and how I integrated it into my existing website design. It's about a mixture of self-hosting, service integration and web development. I hope some of you will find it interesting.

What I Wanted

At the start of this project, I had about a dozen repositories on GitHub, nearly all with no outside activity to speak of and with me as the sole developer. I barely network in the free software community, I have contributed small patches here and there but most of the time I make things for my personal use. I'm in the privileged position as a software developer where if a tool I want doesn't exist, I can take a shot at creating it myself, and if I like what I make I often put it under a free license because I might as well.

Here's a list of what I wanted for my self-hosted Git repositories:

And here's what I didn't want or need:

Other Possible Solutions

I briefly installed the community version of GitLab, largely because that's what I use at work, but it really struggled to run on my server and starved other processes of resources, so it didn't last longer than an afternoon of experimentation. Way too bulky for my use case.

I also looked at Gitea. In terms of ready-to-run products you can download, it's probably the closest to what I was after, but it still does a lot more than I need and I feel a bit iffy about it being kind of a black box – a compiled binary based on source code in a language I don't know. Integrating it into my website seems like it would have been a major undertaking. Also I gave it a few hours of fiddling and if I remember right I couldn't really get it to run. Might have been something about dependencies, I don't remember, it's been a while.

A much more barebones idea I also considered was setting up a highly restricted user account on my server, putting all the Git repositories in its home directory with read-only access, and just sort of… informing people of the SSH URL if they wanted to clone anything. I quickly decided against this because it didn't fulfill nearly enough of my wants and I wasn't confident I could secure it enough. Apparently gitolite or gitosis can help with this but I didn't look into them very much.

What I Went With: Gitweb + My Own Wrapper Script

Gitweb is Git's “official” web frontend. It's essentially a CGI script written in Perl that makes a directory containing Git repositories web-browsable. It's very lightweight, doesn't have any concept of user accounts or social features, and just knows what Git commands to run to get the relevant information out of the repositories. The cherry on top is that it's essentially a finished project – it gets like 3 commits a year, mostly to keep up with web standards, and I don't think anything short of Git itself breaking its user interface is going to break this thing.

Its main downside is that it looks and feels a bit old school – the visual design feels very early 2000s and it's not mobile-friendly at all. I knew there were some features inspired by GitHub I'd need to add to the landing pages, like a nice button with the clone URL and a zip download link and a readme display case. I also wasn't too happy with the URLs that use ugly query string parameters instead of a virtual path hierarchy.

I could have made these modifications to the gitweb Perl script directly, but despite the relative rarity of updates, I was still not thrilled with the prospect of merging future versions with my modifications time and again, plus my Perl experience is very limited. So I decided to write a wrapper script in Python that handles the CGI request.

So if you click on a Git link on my website, the server does the following:

I dare say that it's pretty neat, and also plenty fast, which I admit I was concerned about. My wrapper script does things like accepting a hierarchical path request and converting that into a query string for gitweb (and the other way around for internal URLs in the output), adding the download popup to the repository landing page, and surrounding the content with my website's header and footer for the coveted seamless integration.

Showing the readme file on the landing page was one of the bigger challenges. I found this code snippet by Erik Post, which is a patch for gitweb itself to do the thing, and which allowed me to learn which Git commands to run in what order to achieve the result I needed, which I then implemented in Python in my wrapper script. I also run the file through Python's markdown module at that point.

Also, gitweb doesn't offer clone-able http(s) URLs. I don't fully understand why, it seems they just decided against putting that in there. They have documentation on how to configure Apache to delegate the appropriate URLs to Git's HTTP backend instead of gitweb, so after futzing with my own Apache config for a while (which already relies on a probably overly complex web of rewrite rules) I gave up and implemented the rules into my wrapper script, which now knows when to call Git directly instead of gitweb.

I made sure that the static files gitweb needs are available on my server and spent some time modifying gitweb's CSS to bring it in line with my website. Making the individual pages responsive involved liberal use of CSS grid to shuffle table cells around. Try dragging your browser window smaller on one of the pages if you'd like to see it.

Conclusion

Here's the result! This list and the repository views linked on it are provided by gitweb, whereas the project pages accessible from the overview are part of my statically generated website. Please let me know if you see anything that doesn't work or that looks wrong.

I'm extremely satisfied with what I got out of this. I've only had it running for a few weeks now so I'll see if I run into any issues down the line, but so far I'm optimistic. If you'd like to know more about any part of this or need help deciding whether to replicate it, ask away.

I have deleted most of my old GitHub repositories now. Some I decided to keep on there because their GitHub URLs have appeared in external publications that I cannot update, so I don't want to kill the links. Those also happen to be my most “popular” repositories (if you go check, please don't laugh – like I said, my projects don't tend to be popular) and this way people can keep one-click forking them over there if they feel like it.

Comments

You can leave a comment by replying to this Mastodon post from your own account on Mastodon, Firefish, Akkoma, or any other ActivityPub-capable social network that can exchange replies with Mastodon.