X
Popular Searches

How to Garbage Collect the GitLab Container Registry to Free Up Storage

Graphic showing the GitLab logo, a stylised fox head

GitLab’s Container Registry provides a convenient place to store your Docker images. Over time, the Container Registry can eat up your disk space as more layers are added. Here’s how to free up storage by removing redundant material.

The Container Registry lets you store Docker images alongside your project’s source code. If you’re keeping large images in your registry, you can find your storage cost quickly balloons beyond your expectations. GitLab retains each layer indefinitely, even after it becomes redundant.

Setting a Cleanup Policy

The first step in regaining your storage space is to configure a Container Registry Cleanup Policy. Cleanup Policies are applied individually to each project. This means you can customise the retention period to suit each codebase.

Visit your project in GitLab and click the “Settings” link in the sidebar. Switch to the “CI / CD” category and expand the “Clean up image tags” section near the bottom of the page.

Toggle the “Enabled” button to the on position to activate the Cleanup Policy. Next, choose when to run the policy – “every day” is a good default.

The next section, “Keep these tags”, allows you to define tags which the Cleanup Policy will leave alone. The two options, “keep the most recent” and “keep tags matching”, are independent of each other. You could choose to keep dev and nightly, supplemented by the five most recent tags. The latest tag is always included, in addition to any set tags.

The following section, “Remove these tags”, defines the whitelist of tags to remove. Tags which don’t match the regex pattern won’t be touched. Adjust the “Remove tags older than” value to set the maximum lifetime of each tag, before it gets cleaned up. When you’re done, click the green “Save” button.

Using the API

Applying cleanup policies through the web UI can quickly get tedious. Use the API instead if you’re changing multiple projects.

curl --request PUT --header 'Content-Type: application/json;charset=UTF-8' --header "PRIVATE-TOKEN: <access_token>" --data-binary '{"container_expiration_policy_attributes":{"cadence":"1month","enabled":true,"keep_n":1,"older_than":"14d","name_regex":"","name_regex_delete":".*","name_regex_keep":"latest' "https://gitlab.example.com/api/v4/projects/<project_id>"

You’ll need to generate an API access token by heading to your Profile page in GitLab. Use the token as <access_token> in the command above. Adjust the URL to point to your project – its ID can be found on its project page within GitLab.

Running the command above will apply a registry cleanup policy that runs every month and cleans images older than 14 days. The latest tag, and the most recent tag (keep_n), will be retained; all others will be eligible for removal (.*).

Effects of the Cleanup Policy

The Cleanup Policy handles untagging of images based on the criteria you set. The tags will be deleted from your container registry. They’ll no longer show up in your project’s Container Registry screen and won’t be pullable by Docker clients.

Untagging an image isn’t the same as actually deleting it though. The Cleanup Policy doesn’t recycle data, so you may still see high storage use even after pruning disused tags.

This is because the image layers remain on your GitLab server, cached for future reference. To remove the data for good, you should next run the Container Registry Garbage Collection procedure.

Garbage Collection

Running Garbage Collection will delete all image layers which aren’t linked to a tag. This will result in the removal of images that got untagged by your cleanup policy. It can also dispose of old layers which became redundant when you pushed a new version of a tag.

Garbage Collection must be invoked manually via GitLab’s command line interface. Connect to your GitLab server over SSH and run the following command:

sudo gitlab-ctl registry-garbage-collect

The Garbage Collection process will run. Any unused tags within your Container Registry will be recycled. Garbage Collection looks for untagged images across your entire GitLab instance.

Assuming you let the cleanup policy run first, you should now see a healthy reduction in storage use. If it’s your first time running garbage collection on a frequently used GitLab installation, you might have reclaimed several gigabytes of space.

Removing Untagged Manifests and Layers

You can reclaim even more space by instructing garbage collection to also remove untagged image manifests and unreferenced layers. This is a more destructive operation, although it’s normally what you’re expecting to see.

sudo gitlab-ctl registry-garbage-collect -m

Adding the -m flag will delete any layer not directly associated with a tagged image manifest. This results in the loss of cached layers and intermediate build steps.

By default, Docker and the GitLab Container Registry retain all created layers, even if they’re no longer referenced. This means you can always retrieve a previously known layer using its unique content-addressable identifier, even if no longer possesses a tag.

This is why removing these layers is not enabled by default. You need to be aware of the implications before running the command as it could have serious consequences in some workflows. Nonetheless, using the -m switch is often desirable – it will free up much more disk space and shouldn’t have any side effects if you only reference images using tag names.

Running Garbage Collection on a Schedule

Cleanup Policies run automatically on the frequency you’ve configured. Garbage Collection isn’t setup by default, which is why a first run can deliver a dramatic reduction in storage utilisation.

To run garbage collection on a schedule, you’ll need to add the command to your system’s crontab. Create a file /etc/cron.d/registry-garbage-collection with the following content to run garbage collection every Monday at 2am:

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

0 2 * * 1  root gitlab-ctl registry-garbage-collect

Limitations of Garbage Collection

The time taken to perform garbage collection will depend on the amount of data there is to delete. Garbage Collection requires the Container Registry service to be stopped while it’s running. This means your users won’t be able to pull or push images until the process completes.

You can reduce the impact of downtime by switching the registry into read-only mode, running the command and then switching back to read-write. The registry can remain running throughout but users won’t be able to push images. In addition, switching modes requires GitLab to be “reconfigured” (sudo gitlab-ctl reconfigure), which can in itself cause downtime depending on how your installation is setup.

You need to edit the following lines in /etc/gitlab/gitlab.rb:

registry['storage'] = {
    'maintenance' => {
      'readonly' => {
        'enabled' => true
      }
    }
  }

Run sudo gitlab-ctl reconfigure, then use one of the garbage collection commands. Once it’s done, disable read-only mode by changing the enabled line in your gitlab.rb back to false, then reconfigure GitLab again.

James Walker James Walker
James Walker is a contributor to CloudSavvy IT. He is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs. He has experience of managing complete end-to-end web development workflows, using technologies including Linux, GitLab, Docker and Kubernetes. Read Full Bio »

The above article may contain affiliate links, which help support CloudSavvy IT.