Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: 003random/getJS
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v1.0.0
Choose a base ref
...
head repository: 003random/getJS
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v2.0.0
Choose a head ref
  • 1 commit
  • 8 files changed
  • 1 contributor

Commits on Jul 7, 2024

  1. Copy the full SHA
    86ea726 View commit details
Showing with 648 additions and 328 deletions.
  1. +21 −0 LICENSE
  2. +123 −92 README.md
  3. +133 −0 extractor/extractor.go
  4. +10 −0 go.mod
  5. +35 −0 go.sum
  6. +91 −236 main.go
  7. +50 −0 runner/objects.go
  8. +185 −0 runner/runner.go
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 003random

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
215 changes: 123 additions & 92 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,109 +1,140 @@
# GetJS
[![License](https://img.shields.io/badge/license-MIT-_red.svg)](https://opensource.org/licenses/MIT)
[![contributions welcome](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/003random/getJS/issues)
<h2 align="center">JavaScript Extraction CLI & Package</h2>
<p align="center">
<a href="https://pkg.go.dev/github.com/003random/getJS">
<img src="https://pkg.go.dev/badge/github.com/003random/getJS">
</a>
<a href="https://github.com/003random/getJS/releases">
<img src="https://img.shields.io/github/release/003random/getJS.svg">
</a>
<a href="https://github.com/003random/getJS/blob/master/LICENSE">
<img src="https://img.shields.io/badge/license-MIT-blue.svg">
</a>
</p>

getJS is a tool to extract all the javascript files from a set of given urls.

The urls can also be piped to getJS, or you can specify a singel url with the -url argument. getJS offers a range of options,
[getJS](https://github.com/003random/getJS) is a versatile tool designed to extract JavaScript sources from web pages. It offers both a command-line interface (CLI) for straightforward URL processing and a package interface for more customized integrations.

varying from completing the urls, to resolving the files.
## Table of Contents

## Prerequisites
- [Installation](#installation)
- [CLI Usage](#cli-usage)
- [Options](#options)
- [Examples](#examples)
- [Package Usage](#package-usage)
- [Importing the Extractor](#importing-the-extractor)
- [Example](#example)
- [Version Information](#version-information)
- [Contributing](#contributing)
- [License](#license)

Make sure you have [GO](https://golang.org/) installed on your system.
## Installation

### Installing
To install `getJS`, use the following command:

getJS is written in GO. You can install it with `go get`:
`go get github.com/003random/getJS`

```
go install github.com/003random/getJS@latest
```
## CLI Usage

# Usage
Note: When you supply urls from different sources, e.g. with stdin and an input file, it will add all the urls together :)
Example: `echo "https://github.com" | getJS --url https://example.com --input domains.txt`

To get all options, do:
```bash
getJS -h
```


| Flag | Description | Example |
|------|-------------|---------|
| --url | The url to get the javascript sources from | getJS --url https://poc-server.com |
| --method | The request method. e.g. POST or GET. Default: "GET"| getJS --url https://poc-server.com --method POST |
| --timeout | The request timeout. Default: 10 (secs) | getJS --url https://poc-server.com --timeout 15 |
| --insecure | Skip SSL certificate verification. Use when the cert is expired or invalid | getJS --url https://poc-server.com --insecure |
| --header | Custom request header(s) | getJS --url https://poc-server.com --header "Authorization: Bearer token" |
| --input | Input file with urls | getJS --input domains.txt |
| --output | The file where to save the output to | getJS --output output.txt |
| --verbose | Display info of what is going on | getJS --verbose |
| --complete | Complete the urls. e.g. /js/index.js -> htt<span></span>ps://example.<span></span>com/js/index.js | getJS --complete |
| --resolve | Resolve the output and filter out the non existing files (Can only be used in combination with --complete) | getJS --complete --resolve |
| --nocolors | Don't color the output | getJS --nocolors |

## Examples

![screenshot](https://poc-server.com/getJS/screenshot_.png)


getJS supports stdin data. To pipe urls to getJS, use the following:

```bash
$ cat domains.txt | getJS
```

To save the js files, you can use:
```bash
$ getJS --complete --url https://poc-server.com | xargs wget
### Options

`getJS` provides several command-line options to customize its behavior:

- `-url string`: The URL from which JavaScript sources should be extracted.
- `-input string`: Optional URLs input files. Each URL should be on a new line in plain text format. Can be used multiple times.
- `-output string`: Optional output file where results are written to. Can be used multiple times.
- `-complete`: Complete/Autofill relative URLs by adding the current origin.
- `-resolve`: Resolve the JavaScript files. Can only be used in combination with `--complete`.
- `-threads int`: The number of processing threads to spawn (default: 2).
- `-verbose`: Print verbose runtime information and errors.
- `-method string`: The request method used to fetch remote contents (default: "GET").
- `-header string`: Optional request headers to add to the requests. Can be used multiple times.
- `-timeout duration`: The request timeout while fetching remote contents (default: 5s).

### Examples

#### Extracting JavaScript from a Single URL

`getJS -url https://destroy.ai`

or

`curl https://destroy.ai | getJS`

#### Using Custom Request Options

`getJS -url "http://example.com" -header "User-Agent: foo bar" -method POST --timeout=15s`

#### Processing Multiple URLs from a File

`getJS -input foo.txt -input bar.txt`

#### Saving Results to an Output File

`getJS -url "http://example.com" -output results.txt`

## Package Usage

### Importing the Extractor

To use `getJS` as a package, you need to import the `extractor` package and utilize its functions directly.

### Example

```Go
package main

import (
"fmt"
"log"
"net/http"
"net/url"

"github.com/003random/getJS/extractor"
)

func main() {
baseURL, err := url.Parse("https://google.com")
if (err != nil) {
log.Fatalf("Error parsing base URL: %v", err)
}

resp, err := extractor.FetchResponse(baseURL.String(), "GET", http.Header{})
if (err != nil) {
log.Fatalf("Error fetching response: %v", err)
}
defer resp.Body.Close()

// Custom extraction points (optional).
extractionPoints := map[string][]string{
"script": {"src", "data-src"},
"a": {"href"},
}

sources, err := extractor.ExtractSources(resp.Body, extractionPoints)
if (err != nil) {
log.Fatalf("Error extracting sources: %v", err)
}

// Filtering and extending extracted sources.
filtered, err := extractor.Filter(sources, extractor.WithComplete(baseURL), extractor.WithResolve())
if (err != nil) {
log.Fatalf("Error filtering sources: %v", err)
}

for source := range filtered {
fmt.Println(source.String())
}
}
```

If you would like the output to be in JSON format, you can combine it with [@Tomnomnom's](https://github.com/tomnomnom) [toJSON](https://github.com/tomnomnom/hacks/tree/master/tojson):
```bash
$ getJS --url https://poc-server.com | tojson
```

To feed urls from a file use:
```bash
$ getJS --input domains.txt
```

To save the results to a file, and don't display anything, use:
```bash
$ getJS --url https://poc-server.com --output results.txt
```

If you want to have a list of full urls as output use:
```bash
$ getJS --url domains.txt -complete
```

If you want to only show the existing js files, use:
```bash
$ getJS --url domains.txt --complete --resolve
```

## Built With

* [GO](http://golang.org/) - GOlanguage
* [Goquery](https://github.com/PuerkitoBio/goquery) - HTML parser with syntaxes like jquery, in GO

## Version Information

This is the v2 version of `getJS`. The original version can be found under the tag [v1](https://github.com/003random/getJS/tree/v1).

## Contributing

You are free to submit any issues and/or pull requests :)
Contributions are welcome! Please open an issue or submit a pull request for any bugs, feature requests, or improvements.

## License

This project is licensed under the MIT License.

## Acknowledgments

* [@jimen0](https://github.com/jimen0) for helping getting me started with GO


---

*This is my first tool written in GO. I created it to learn the language more. (useful feeback is always welcome!)*
This project is licensed under the MIT License. See the [LICENSE](https://github.com/003random/getJS/blob/master/LICENSE) file for details.
133 changes: 133 additions & 0 deletions extractor/extractor.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
package extractor

import (
"fmt"
"io"
"log"
"net/http"
"net/url"

"github.com/PuerkitoBio/goquery"
)

// ExtractionPoints defines the default HTML tags and their attributes from which JavaScript sources are extracted.
var ExtractionPoints = map[string][]string{
"script": {"src", "data-src"},
}

// FetchResponse fetches the HTTP response for the given URL.
func FetchResponse(u string, method string, headers http.Header) (*http.Response, error) {
req, err := http.NewRequest(method, u, nil)
if err != nil {
return nil, err
}

req.Header = headers

return http.DefaultClient.Do(req)
}

// ExtractSources extracts all JavaScript sources found in the provided HTTP response reader.
// The optional extractionPoints can be used to overwrite the default extraction points map
// with a set of HTML tag names, together with a list of what attributes to extract from.
func ExtractSources(input io.Reader, extractionPoints ...map[string][]string) (<-chan url.URL, error) {
doc, err := goquery.NewDocumentFromReader(input)
if err != nil {
return nil, err
}

var (
urls = make(chan url.URL)
points = ExtractionPoints
)

if len(extractionPoints) > 0 {
points = extractionPoints[0]
}

go func() {
defer close(urls)
for tag, attributes := range points {
doc.Find(tag).Each(func(i int, s *goquery.Selection) {
for _, a := range attributes {
if value, exists := s.Attr(a); exists {
u, err := url.Parse(value)
if err != nil {
log.Println(fmt.Errorf("invalid attribute value %s cannot be parsed to a URL: %w", value, err))
continue
}

urls <- *u
}
}
})
}
}()

return urls, nil
}

// Filter applies options to filter URLs from the input channel.
func Filter(input <-chan url.URL, options ...func([]url.URL) []url.URL) (<-chan url.URL, error) {
output := make(chan url.URL)
go func() {
defer close(output)
var urls []url.URL
for u := range input {
urls = append(urls, u)
}

for _, option := range options {
urls = option(urls)
}

for _, u := range urls {
output <- u
}
}()
return output, nil
}

// WithComplete is an option to complete relative URLs.
func WithComplete(base *url.URL) func([]url.URL) []url.URL {
return func(urls []url.URL) []url.URL {
var result []url.URL
for _, u := range urls {
result = append(result, complete(u, base))
}
return result
}
}

// WithResolve is an option to filter URLs that resolve successfully.
func WithResolve() func([]url.URL) []url.URL {
return func(urls []url.URL) []url.URL {
var result []url.URL
for _, u := range urls {
if resolve(u) {
result = append(result, u)
}
}
return result
}
}

// complete completes relative URLs by adding the base URL.
func complete(source url.URL, base *url.URL) url.URL {
if source.IsAbs() {
return source
}
return *base.ResolveReference(&source)
}

// resolve checks if the provided URL resolves successfully.
func resolve(source url.URL) bool {
resp, err := http.Get(source.String())
if err != nil {
return false
}
defer resp.Body.Close()

_, err = io.Copy(io.Discard, resp.Body)
return err == nil && (resp.StatusCode >= http.StatusOK && resp.StatusCode < http.StatusMultipleChoices)
}
10 changes: 10 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
module github.com/003random/getJS/v2

go 1.22

require github.com/PuerkitoBio/goquery v1.8.1

require (
github.com/andybalholm/cascadia v1.3.1 // indirect
golang.org/x/net v0.7.0 // indirect
)
Loading