Be Careful With Your Gradle Repository Declarations

Gradle has a sophisticated process for downloading, caching, and managing third-party dependencies. However Gradle first needs to find where these dependencies are hosted. It will try to resolve each dependency by checking repositories one-at-a-time in the order they are listed in build.gradle files. Out of the box, a new Android Studio project will add two Gradle repositories to the project:

allprojects {
    repositories {
        google()
        jcenter()
    }
}

For each dependency, Gradle will first check Google’s repository for a matching dependency. If a match is found, it will then move on to the next dependency. If not, Gradle will then check JCenter’s repository. This linear search is very inefficient and creates potential security issues during the build process.

The security flaws are well documented in other stories. Simply put, if a malicious person puts a compromised “fake” artifact on a repository that is listed before a repository containing the “real” artifact, then Gradle will use that fake artifact; this situation can be hard to detect if you’re not explicitly looking for it.

I want to focus on the second issue: the inefficiencies caused by Gradle checking repositories that do not have the requested artifact.

Imagine adding another dependency to the empty Android Studio project, one that is hosted on JCenter. Gradle will wastefully search for that dependency on Google’s repository, before moving on to checking JCenter. This leads to increased build times, especially on machines with poor internet connection.

Finding Redundant Checks

To find these inefficiencies, start by deleting all previously downloaded dependencies. This will delete the dependency cache on the entire machine. An unmetered network connection is essential here because these experiments will produce a lot of network traffic.

$  rm -rf ~/.gradle/caches/

When running a build task (with -debug enabled), finding the relevant log messages that indicate when a repository is checked and missed can be a daunting task given the amount of output Gradle will produce. Take the empty Android Studio project from above and add a new dependency that is explicitly not in Google’s repository. Since Gradle will try Google’s repository first, this dependency acts as a “tracer” to easily find helpful information in the massive output of Gradle logs. Any artifact will do; let’s randomly pick Timber and add it to the app/build.gradle dependency list:

app/build.gradle

dependencies {
    ...
    implementation 'com.jakewharton.timber:timber:4.7.1'
}

Now run a build task and grep to find timber logs . The goal is to see Gradle first try to download Timber from Google’s repository, then fail, then retry on Jcenter.

$ ./gradlew app:buildDebug --debug | grep timber

...
// start to resolve Timber
// list repos to check
Build operation 'Resolve com.jakewharton.timber:timber:4.7.1' started
Attempting to resolve component for com.jakewharton.timber:timber:4.7.1 using repositories [Google, BintrayJCenter]

// first check Google; not found
Detected non-existence of module 'com.jakewharton.timber:timber:4.7.1' in resolver cache 'Google'

// checking JCenter; found
Metadata file found for module 'com.jakewharton.timber:timber:4.7.1' in repository 'BintrayJCenter'.
Using com.jakewharton.timber:timber:4.7.1 from Maven repository 'BintrayJCenter'

// finished
ExternalResourceResolver] Metadata file found for module 
Completing Build operation 'Download https://jcenter.bintray.com/com/jakewharton/timber/timber/4.7.1/timber-4.7.1.pom'

Several snippets highlight what is happening:

  • Attempting to resolve component for
  • Detected non-existence of module
  • Metadata file found for module
  • Completing Build operation 'Download

Next, create a grep matcher to visualize the process for every dependency:

$ rm -rf ~/.gradle/caches/
...
$ ./gradlew app:buildDebug --debug| \
    grep -e "Attempting to resolve component for" \
    -e "Detected non-existence of module" \
    -e "Metadata file found for module" \
    -e "Completing Build operation 'Download" \
    > output.txt

Explore A Real Android Project

I encourage you to explore the output file after running the script on your own projects. Pick one of your dependencies and search the output to see how it’s resolved. Even with the grep matcher, the output file will contain a lot of information, since it includes lines for every dependency, including transitive dependencies. It’s worth noting that Gradle resolves these dependencies in parallel, so you will see some overlapping logs. Also remember that Gradle is resolving dependencies for the build script (from repositories listed in buildscript block) and dependencies for the app itself (from repositories listed in the allprojects block).

The goal is not only to see where your dependencies are coming from, but where you can speed up the build process by finding and removing any unnecessary resolution attempts. You can see how often a missed resolution is happening by searching the output file for “Detected non-existence of module”. I have quite a few in my project, all leading to slowed down build time.

Update the grep matcher to capture just the repository misses:

$ rm -rf ~/.gradle/caches/
...
$ ./gradlew app:buildDebug --debug| grep -e "Detected non-existence of module" > output.txt

How To Fix Repository Misses?

Let’s run the above script on a real project of mine, with the following repositories block. The individually listed Maven repositories are for a few libraries that are not hosted on JCenter. We’ll get to those later.

allprojects {
    repositories {
        google()
        jcenter()
        maven { url "https://jitpack.io" }
        maven { url 'https://maven.google.com/' }
    }
}   

Running the script reveals hundreds of failed checks, most to Google’s repository. There are many reasons to put Google’s repository first. However placing Google first means that the dependencies that come from JCenter and other repositories will unnecessarily be search for in Google’s repository.

Let’s make the first efficiency gain by instructing Gradle to use Google only for Google-specific artifacts. Luckily Gradle provides us a tool to limit which dependencies a repository is used for with the includeGroup and includeGroupByRegex properties.

allprojects {
    repositories {
        google {
            content {
                includeGroupByRegex "com.android.*"
                includeGroupByRegex "androidx.*"
                includeGroupByRegex "android.arch.*"
                includeGroupByRegex "com.google.*"
            }
        }       
        jcenter()
        maven { url "https://jitpack.io" }
        maven { url 'https://maven.google.com/' }
    }
}

In this example, Gradle will use the Google repository only for groups matching the supplied regexes. The “group” in each regex is the bit of the dependency string before the first colon (e.g. com.jakewharton.timber in com.jakewharton.timber:timber:4.7.1). I prefer to use includeGroupByRegex instead of includeGroup so that I don’t have to list each group individually.

So far, the allprojects block looks fine, but what about the buildscript?

The buildscript block on my project is similar and even contains additional repos:

buildscript {
    repositories {
        google()
        jcenter()
        maven { url 'https://maven.fabric.io/public' }
        maven { url "https://plugins.gradle.org/m2/" }
        maven { url "https://jitpack.io" }
        maven { url "https://plugins.gradle.org/m2/" }
    }
    ...
}

Rerunning the script reveals many fewer misses. These misses fall into two categories:

  • Dependencies that match the Google regex, but are not on Google’s repository (e.g. com.google.dagger, com.google.auto.value)
  • Dependencies that are not on JCenter and need to fallback to one of the individually listed Maven URLs (e.g. com.github.chrisbanes:PhotoView, com.crashlytics.sdk.android:crashlytics)

The former set is actually OK. Believe it or not, Google hasn’t moved all of it’s artifacts to its own repository and some still come from JCenter.

Fabric / Crashlytics

Let’s now look at these individual Maven URL repositories. I can’t remember why I needed to add them initially or if they are still needed. The only one that is obvious is of course Fabric. Let’s go look at Fabric’s documentation to see why adding a repository was needed. The docs specify that you need to add a separate Maven URL repo:

buildscript {
    repositories {
        google()

        // The "Fabric" maven repository
        maven {
           url 'https://maven.fabric.io/public'
        }
    }

It’s not clear where to put maven.fabric.io in relation to the other repositories: should JCenter go before or after the Maven Fabric repository?

If JCenter is put before Maven Fabric repository, Gradle will unnecessarily check JCenter for Crashlytics. If JCenter is listed after Maven Fabric, Gradle will unnecessarily check maven.fabric.io for all other dependencies.

A test will confirm this. By removing Maven Fabric repository and building the app, Gradle is expected to throw a build error, since it can’t resolve Fabric artifacts. A custom grep matcher is not needed this time. Gradle outputs the following error:

$ rm -rf ~/.gradle/caches/
$ ./gradlew app:buildDebug


> Could not resolve all artifacts for configuration ':classpath'.
   > Could not find io.fabric.tools:gradle:1.31.2.
     Searched in the following locations:
       - https://jcenter.bintray.com/io/fabric/tools/gradle/1.31.2/gradle-1.31.2.pom
       - https://plugins.gradle.org/m2/io/fabric/tools/gradle/1.31.2/gradle-1.31.2.pom
       - https://jitpack.io/io/fabric/tools/gradle/1.31.2/gradle-1.31.2.pom
     Required by:
         project :

Gradle is not finding Fabric in the JCenter, or any of the other Maven repositories. The correct way to fix this is to put Maven Fabric repository before JCenter, but instruct Gradle to only use it for resolving Crashlytics:

repositories {
    google {
        content {
            includeGroupByRegex "com.android.*"
            includeGroupByRegex "androidx.*"
            includeGroupByRegex "android.arch.*"
            includeGroupByRegex "com.google.*"
        }
    }
    maven {
        url 'https://maven.fabric.io/public'
        content {
            includeGroupByRegex "com.crashlytics.*"
            includeGroupByRegex "io.fabric.*"
        }
    }
    jcenter()
    maven { url "https://plugins.gradle.org/m2/" }
    maven { url "https://jitpack.io" }
    maven { url = uri("https://plugins.gradle.org/m2/") }
}

The process can be repeated until most misses are resolved:

  1. Remove a single Maven Repository
  2. Run build task
  3. Identify the artifact Gradle failed to resolve
  4. Add that Maven Repository back, add a matching includeGroupByRegex for that artifact, and move the repository above JCenter
  5. Repeat until Gradle runs the task without error.

Results

Once I perform this process on my project, I end up with these resulting repositories blocks:

buildscript {
    repositories {
        google {
            content {
                includeGroupByRegex "com.android.*"
                includeGroupByRegex "androidx.*"
                includeGroupByRegex "android.arch.*"
                includeGroupByRegex "com.google.*"
            }
        }
        maven {
            url 'https://maven.fabric.io/public'
            content {
                includeGroupByRegex "com.crashlytics.*"
                includeGroupByRegex "io.fabric.*"
            }
        }
        jcenter()
    }
    ...
}

...       

allprojects {
    repositories {
        google {
            content {
                includeGroupByRegex "com.android.*"
                includeGroupByRegex "androidx.*"
                includeGroupByRegex "android.arch.*"
                includeGroupByRegex "com.google.*"
            }
        }
        maven {
            url "https://jitpack.io"
            content {
                includeGroupByRegex "com.github.PaulinaSadowska.*" // for RxWorkManagerObservers
                includeGroupByRegex "com.github.andrzejchm.RESTMock"
                includeGroupByRegex "com.github.chrisbanes"

            }
        }
        maven {
            url 'https://maven.google.com/'
            content {
                includeGroupByRegex "io.fabric.*"
                includeGroupByRegex "com.crashlytics.*"
            }
        }
        jcenter()
    }
}

This technique helped identify repositories that are not needed and has the added benefit of documenting why these repositories are needed in the first place. If I later remove Crashlytics, I’ll easily see that I can remove a repository too.

A Few Things To Note

So far these experiments were run by building a debug app. However you may have repositories listed in your project that contain dependencies that are needed for other tasks (e.g. testing, app signing, other build variants). So you’ll want to test a bit more exhaustively, or at least be on the lookout for further Could not resolve all artifacts errors.

It’s also worth noting that all of these improvements kick in only when Gradle needs to download an artifact. If you are using Gradle’s Dependency Cache effectively, then these improvements will not benefit you very much. The real benefit would come on your C.I. server, if it does not rely on Gradle’s Dependency Cache.

Library Authors

What are the best practices for library authors? The easiest thing is to host directly on JCenter. Most libraries do this already, however several libraries common to Android development are not hosted on Google nor JCenter. In their setup documentation these libraries include both a dependency code snippet and an addition to the repository block (e.g. Fabric).

The best way library authors can help their users is to document a repository addition that contains a includeGroup line and a note to place the repository above JCenter. It should be clear why both of these are needed. These steps protect the library from being artifact spoofed. If users list the repository last and if a malicious person puts a fake copy of the library on JCenter, then users will build with the fake version.

These steps also protect users from increased build time. By instructing users to put the custom repository before JCenter, but not providing them with a includeGroup instructions, then Gradle will check the custom repository for every artifact that will eventually be resolved by JCenter, slowing down build times.

Taking It Even Further

Running the check command one more time shows that there are still a few missed attempts to download Google artifacts not hosted on Google’s repository. These misses also contribute to a slower build process. To further reduce this inefficiency, use Gradle’s excludeGroup and excludeGroupByRegex properties.

repositories {
    google {
        content {
            includeGroupByRegex "com.android.*"
            includeGroupByRegex "androidx.*"
            includeGroupByRegex "android.arch.*"
            includeGroupByRegex "com.google.*"

            excludeGroupByRegex "com.google.dagger.*"
            excludeGroupByRegex "com.google.code.gson.*"
            excludeGroupByRegex "com.google.guava.*"
            excludeGroupByRegex "com.google.errorprone.*"
            excludeGroupByRegex "com.google.code.findbugs.*"
            excludeGroupByRegex "com.google.auto.*"
            excludeGroupByRegex "com.google.j2objc.*"
            excludeGroupByRegex "com.google.protobuf.*"
            excludeGroupByRegex "com.google.googlejavaformat.*"
        }
    }

If Google moves these dependencies to their own repository, then they will likely stop suppling them through JCenter. This will be apparent from another “Could not resolve all artifacts” error. In such a case remove the offending excludeGroup line.

Hopefully this technique can speed up some of your builds, especially your initial builds until Gradle’s Dependency Cache kicks in.

Note: Shortly after this writing, Google deprecated Fabric and its custom Maven repository. All Crashlytics artifacts are now hosted on Google’s Maven repository.

A big thanks goes to Martin Marconcini for reviewing this post.