Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid detecting other files as SVG #594

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

fcharlie
Copy link

eg:

var svgText=`<svg></svg>`

before: image/svg+xml
after: text/plain; charset=utf-8

eg:

```golang
var svgText=`<svg></svg>`
```
@gabriel-vasile
Copy link
Owner

gabriel-vasile commented Oct 29, 2024

@fcharlie, the benchmarks report that that svg detection time increased from 200ns to 5000ns and memory allocs from 0 to 5. svg is now the slowest detector.

I'm not against an implementation using regex, like the one in this PR, but I'd like to explore other options as well:

Edit: the slowness is not a blocker, as long as detection accuracy is improved. Finding out which approach provides best accuracy/performance trade-off requires more effort and I would appreciate a lot if you helped with that.

@fcharlie
Copy link
Author

fcharlie commented Oct 30, 2024

@gabriel-vasile Introducing regular expressions to detect svg may indeed reduce performance.

Below is a simple svg detection function I implemented with reference to file/file, but I cannot guarantee that it can fully cover the detection of svg files:

var (
	xmlPrefix         = []byte("<?xml version=")
	xmlVersionRegex   = regexp.MustCompile(`['"\ \t]*[0-9.]+['"\ \t]*`)
	svgDocumentPrefix = []byte("<!DOCTYPE svg PUBLIC")
	svgPrefix         = []byte("<svg")
)

func detectSVG(b []byte) bool {
	if bytes.HasPrefix(b, xmlPrefix) {
		mlen := min(len(b), 10+len(xmlPrefix))
		return xmlVersionRegex.Match(b[len(xmlPrefix):mlen]) && bytes.Contains(b, []byte("<svg"))
	}
	if bytes.HasPrefix(b, svgDocumentPrefix) {
		return true
	}
	return bytes.HasPrefix(b, svgPrefix)
}

@fcharlie
Copy link
Author

@gabriel-vasile Optimized svg detection has been reimplemented

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants