Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for scanning APK files #3517

Merged
merged 25 commits into from
Nov 15, 2024
Merged

Add support for scanning APK files #3517

merged 25 commits into from
Nov 15, 2024

Conversation

joeleonjr
Copy link
Contributor

@joeleonjr joeleonjr commented Oct 28, 2024

Description:

APK (Android Package Kit) files are used by Android to install and distribute applications. These files are essentially zip archives with a specific directory structure for Android apps. We currently scan them as normal zip archives; however, most of the locations within an APK that secrets would live aren't properly decompiled/decoded during our regular zip file scanning. This PR adds special support to decompile APKs and then search them for secrets.

The most robust approach for searching APKs for secrets is to use a decompiler like jadx and then run TruffleHog against the output. The downsides are twofold: (1) this would require TruffleHog users to install jadx, (2) decompiling takes a while (up to several minutes) and a lot of memory. Instead of going this route, we rely on two golang libraries (dextk and apkparser) that balance functionality and performance to get us 80% of the way there without any external dependencies.

Through this PR, TruffleHog users can now scan for secrets in APK files. Here's what is specifically scanned:

XML

Android'sxml files need to be decoded in order to properly scan them because the places were secrets might live are often stored as reference IDs instead of plain text strings. This PR runs an Android XML decoder that uses the resources.arsc file as context to automatically resolve most resource reference IDs into their corresponding value.

AndroidManifest.xml

This approach includes the important AndroidManifest.xml file.

Strings.xml

One of the xml files that is most likely to contain a secret is called strings.xml. This file proved a challenge b/c during the APK compilation process, the file is transformed in a way that when we run unzip file.apk, we can't just see the strings.xml file. A tool like jadx would easily decompile it, but since we're not using it, we had to find a different way to get at that data.

We found that the resources.arsc file houses the key/value pairs that might contain secrets from the strings.xml file in the resources ID range: 0x7f000000-0x7fffffff. So we iterate through all resources of type string in that range, and search for secrets there. This seems to work for most scenarios, but admittedly we need greater testing.

Dex

A DEX file contains compiled code (it’s where the Java or Kotlin source code is transformed into bytecode). APK files generally include at least one DEX file, usually named classes.dex, but if the app is large or modular, there might be multiple DEX files—like classes2.dex, classes3.dex.

We run a golang-based Dex decompiler that helps us identify multiple relevant instruction types from within the bytecode. The one most likely to contain a secret is const-string. The rest are for providing context to our potential secret values.

The challenge is the keyword we need to clue in our scanning engine is often located too far from the secret given that our decompilation method is lightweight and imperfect. As a result, we implement keyword scanning against the decompiled code. If a keyword that we support is found, we then append that keyword to every value (read const-string instruction value) and toss it in for scanning. This ensures we don't lack appropriate coverage and is similar in implementation to our work on Postman.

Note: Since we can't get all of the scanner keywords via the engine pkg (import issues) like we did for Postman, we create a separate file named apk_keywords.go. In an ideal world, defaults.go is ripped out of engine and moved into pkg/detectors, so that we don't need to have the same data listed in two places.

Everything else

All other files are just read-in like normal and passed to a chunk for scanning. We likely won't see many secrets from these files, but it's worth a review. Examples of these types of files are: .json, .properties, etc.

Checklist:

  • Tests passing (make test-community)?
  • Lint passing (make lint this requires golangci-lint)?

@CLAassistant
Copy link

CLAassistant commented Oct 28, 2024

CLA assistant check
All committers have signed the CLA.

pkg/handlers/handlers.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Show resolved Hide resolved
@rgmz
Copy link
Contributor

rgmz commented Oct 28, 2024

Some other common XML files that can probably be safely excluded:

"error": "failed to decode xml file third_party/java_src/error_prone/project/annotations/Google_internal.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}
"error": "failed to decode xml file third_party/java_src/error_prone/project/annotations/Annotations.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}
"error": "failed to decode xml file jsr305_annotations/Jsr305_annotations.gwt.xml: Chunk: 0x00006d20: Unknown chunk id 0x6d20"}

https://github.com/smlbiobot/cr/blob/master/apk/2.0.1/com.supercell.clashroyale-2.0.1.decoded/unknown/third_party/java_src/error_prone/project/annotations/Google_internal.gwt.xml
https://github.com/google/guava/blob/master/guava-gwt/src/com/google/common/annotations/Annotations.gwt.xml
https://github.com/Goujer/kanojo_app/blob/72c8374c139e16d528bdbdd16a274851da68d753/app/src/main/jsr305_annotations/Jsr305_annotations.gwt.xml#L3

// lightweight version that will search for secrets in the most common files that contain them.
// And run in a fraction of the time (ex: 15 seconds vs. 5 minutes)

// ToDo: Scan nested APKs (aka XAPK files). ATM the archive.go file will skip over them.
Copy link

@bugbaba bugbaba Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for .xapk files here is how MobSF a popular security scanning tool handling it.

its unzipping the archive -> reading the manifest.json file -> extracting the apk with base id and only scanning that apk.

MobSF/Mobile-Security-Framework-MobSF@a558693

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.apkm is another common format (at least for ApkMirror).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bugbaba I appreciate the idea re: .xapk files. IMO the cleanest way to resolve the lack of .xapk scanning is to address it in the archive.go file. Basically, unzip .xapk like any other zip, and then call back out to the HandleFile function in handlers.go, so that any unique file that requires a special handler can be dealt with. And maybe it's not that exact approach, but something along those lines.

I'll put some effort into that in a different PR.

@bugbaba
Copy link

bugbaba commented Oct 31, 2024

Actual Filename should also be added to the output.

image

}

// processDexFile decodes the dex file and returns the relevant instructions
func processDexFile(ctx logContext.Context, rdr io.ReadCloser) (io.Reader, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have this outside of the apk specific case like if we find .dex file directly outside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Might be something for a separate PR, just so that this can get out the door. But I like where you're going with this. I think we should probably have several additional file handlers to handle specific types like dex, pyc, etc.

@bugbaba
Copy link

bugbaba commented Oct 31, 2024

After I ran this against an android app that I know has an intentionally hard-coded API key for LokaliseToken which gets detected if we decompile the .apk file using the jadx and then scan its output. But when I scanned the same apk file using this code it wasn't detecting it.


The api key was in the .dex file which was correctly handled ✔️

image
The reConstString regex is correct and was detecting the line ✔️

image
The parseDexInstructions was correctly handling it ✔️

But it's not getting detected because, as with the majority of the trufflehog detectors it relies on keywords and since after the processing of the dex file, we only have the API key and no other text around it, it fails.

image
The decompiled code

@joeleonjr
Copy link
Contributor Author

joeleonjr commented Oct 31, 2024

After I ran this against an android app that I know has an intentionally hard-coded API key for LokaliseToken which gets detected if we decompile the .apk file using the jadx and then scan its output. But when I scanned the same apk file using this code it wasn't detecting it.

The api key was in the .dex file which was correctly handled ✔️

image The reConstString regex is correct and was detecting the line ✔️

image The parseDexInstructions was correctly handling it ✔️

But it's not getting detected because, as with the majority of the trufflehog detectors it relies on keywords and since after the processing of the dex file, we only have the API key and no other text around it, it fails.

image The decompiled code

@bugbaba is there anyway you could share that apk file?

@joeleonjr joeleonjr closed this Oct 31, 2024
@joeleonjr joeleonjr reopened this Oct 31, 2024
@bugbaba
Copy link

bugbaba commented Oct 31, 2024

@joeleonjr Couldn't find you in the discord server, Please ping ben10_01 on discord or nomanAli181 on twitter

@joeleonjr
Copy link
Contributor Author

@bugbaba Pleaes give this new implementation a try. Basically, we followed our process for Postman scanning and are now providing relevant keywords close into to all const-string values, so that we don't miss as much. It's still not perfect, but should be better and still performant.

@joeleonjr joeleonjr marked this pull request as ready for review November 1, 2024 19:08
@joeleonjr joeleonjr requested a review from a team as a code owner November 1, 2024 19:08
@joeleonjr joeleonjr requested a review from a team as a code owner November 1, 2024 19:08
@joeleonjr
Copy link
Contributor Author

Note to reviewers: Please look at how we handle an error caused by calling h.processAPK(). Specificallly, I thought it would make sense to call newArchiveHandler().HandleFile() if the APK parsing failed. That would ensure any file that is handled as an .apk would still be processed as a .zip in case it was mislabeled by the user (shouldn't happen, but who knows).

@bugbaba
Copy link

bugbaba commented Nov 2, 2024

It is now able to detect the key for that specific apk file.
But is there any way to avoid having pkg/handlers/apk_keywords.go file? As its become another file to maintain with each detector change.

@bugbaba Pleaes give this new implementation a try. Basically, we followed our process for Postman scanning and are now providing relevant keywords close into to all const-string values, so that we don't miss as much. It's still not perfect, but should be better and still performant.

Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to provide my initial comments before I dive into the actual apk handler logic.

pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/handlers.go Outdated Show resolved Hide resolved
@dustin-decker
Copy link
Contributor

We should add a feature flag for this called EnableAPKHandler which would be off by default (empty bool value) in the feature package: https://github.com/trufflesecurity/trufflehog/blob/main/pkg/feature/feature.go

We can turn it on by default in OSS, and when it's imported into Enterprise it will be off by default unless we override it with a feature flag.

Joe, let me know if you want to sync on how this works.

Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ve taken an initial pass at the review and covered a good portion of the implementation. Let me know if it’s easier to go over any of this synchronously. I’ll finish the remainder of the review tomorrow morning.

pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Show resolved Hide resolved
Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, forgot to hit submit review last night for a couple more comments. My mistake.

pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few final comments, but otherwise this looks great to me! With the changes I mentioned, we managed to reduce scan times by ~60% on a ~35MB .apk file I tested, dropping from ~9s to ~3s

Screenshot 2024-11-14 at 12 10 55 PM

pkg/handlers/apk.go Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
pkg/handlers/apk.go Outdated Show resolved Hide resolved
@ahrav
Copy link
Collaborator

ahrav commented Nov 15, 2024

Great job, @joeleonjr! 🥳 This looks like it was quite the project to get working. I’m excited to see it in action—and to handle the inevitable user question about verifying findings we found in their .apk file. 🤣

Copy link
Contributor

@dustin-decker dustin-decker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, but i'll defer to @ahrav for final approval

Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks again for getting this implemented. 🙇

@joeleonjr joeleonjr merged commit 8f2ebc9 into main Nov 15, 2024
13 checks passed
@joeleonjr joeleonjr deleted the apk-scanning branch November 15, 2024 18:25
@iamunixtz
Copy link

usage of trufflehog to scanning apk?

@ahrav
Copy link
Collaborator

ahrav commented Dec 5, 2024

usage of trufflehog to scanning apk?

Yes, that's correct. Feel free to check out our blog for additional details.

@iamunixtz
Copy link

trufflehog shoul improve regex on findings api keys and others secrets like i test on apk which contains api keys but it failed to get them

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants