Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.2.1 is badly broken, I think because of PR #408 #413

Closed
pond opened this issue Aug 9, 2024 · 5 comments · Fixed by #414
Closed

v3.2.1 is badly broken, I think because of PR #408 #413

pond opened this issue Aug 9, 2024 · 5 comments · Fixed by #414

Comments

@pond
Copy link

pond commented Aug 9, 2024

I'm driving the pipeline in an under-modernisation app for sanitisation only. Configuration:

  class CustomHtmlSanitizationOptions
    CONFIG = Selma::Sanitizer::Config.freeze_config({
      elements: [
        "a",
        "abbr",
        "b",
        # (etc)
    })
  end

Pipeline is super simple:

pipeline = HTMLPipeline.new(
  sanitization_config: CustomHtmlSanitizationOptions::CONFIG,
)

Running with v3.2.1 gives a completely broken result:

irb(main):007> pipeline.call("Some plain text")
=> {}

Running with v3.2.0 works as expected:

irb(main):006> pipeline.call("* This is a bullet point")
=> {:output=>"Some plain text"}

Related gems (albeit not in use as there's no convert filter in this example), all at their latest version as I type this (RedCloth 4.3.4, CommonMarker 1.1.5):

# Textile support [https://rubygems.org/gems/RedCloth]
#
gem 'RedCloth', '~> 4.3'

# Markdown with GFM extensions etc. [https://rubygems.org/gems/commonmarker]
#
gem 'commonmarker', '~> 1.1'

Changing only html-pipeline from = 3.2.1 to = 3.2.0 with bundle update html-pipeline to minimise any other changes in Gemfile.lock toggles the behaviour between working (v3.2.0) or not (v3.2.1).

The error is #408. Noting in passing as a separate issue that I think an internal, hidden, entirely hard-coded and self-described artbirary processing limit of 5MB is a terrible idea that at the very least should've been exposed as config via e.g. ENV var or something, the real bug is here:

v3.2.0...v3.2.1#diff-a879b6756943b6f26d6273ebf61219768dd4ad3482420b8afb36405fe53dec1cR187

...specifically, end unless @convert_filter.nil? # no html, so no sanitization.

Wrong. No convert filter does not mean no sanitisation. Those two concepts are totally unrelated. And what's more, even if you did think you didn't need to sanitise, you should then at least still set result[:output] to the unmodified input.

As far as I can see the code is - with apologies if I offend its author - logically flawed and flat-out broken. The PR should be reverted so it can be reworked and a fixed v3.2.2 released as soon as possible.

I've locked to v.3.2.0 in Gemfile for now.

@gjtorikian
Copy link
Owner

I'm only moderately offended. 😄 But I'm mostly embarrassed. I only had a test for sanitization filter with a convert filter; I don't think it ever occurred to me that someone would want to pass pre-existing HTML through the sanitizer.

In any event, 3.2.2, out soon, will fix this. Thanks for the report!

@pond
Copy link
Author

pond commented Aug 9, 2024

Many thanks. I've been updating a series of Rails 1 and 2 (yep, that old!) apps by essentially rebuilding them into new Rails 7.x containers. HTML Pipeline is a great replacement for Textile processing and ad hoc sanitizers in a forum engine and, likewise, a fault ticket description and comment system. It gives me a relatively easy route to offer Markdown (probably preferred by most for new material) or Textile (for old material) posts along with fine-grained sanitisation control and custom node filters for some edge cases.

The weird use case where I hit this bug is a Wiki engine (Instiki) which has a particularly old architecture and strange (but effective) approach to data conversion and cacheing. There, I just copypasta'd my sanitisation code (eventually I'll centralise that somewhere) but dropped out the converter, because the Wiki engine was already converting stuff, including applying [[WikiWords]] conversion etc., and then passing that for sanitisation.

This is how come I ended up with using the pipeline just for sanitisation (via what will eventually be a cross-application, shared sanitisation approach). Maybe one day I'll dig deeper into the Wiki engine and use a full pipeline including converters, but for now, it's more than enough work just rebuilding all these things under very strict time constraints as-is.

@pond
Copy link
Author

pond commented Dec 15, 2024

@gjtorikian Is there still a plan for 3.2.2 with the above fix in the reasonably near future, or is workload prohibiting progress for a while?

@gjtorikian
Copy link
Owner

3.2.2 was released back in August: https://github.com/gjtorikian/html-pipeline/releases/tag/v3.2.2

But, it looks like the automated workflow to publish the gem failed: https://github.com/gjtorikian/html-pipeline/actions/runs/10322138317

I just kicked off a rebuild and can confirm that 3.2.2 should be live: https://rubygems.org/gems/html-pipeline/versions/3.2.2

Thanks for bringing this to my attention!

@pond
Copy link
Author

pond commented Dec 16, 2024

Ah, that's great news! Many thanks for pushing that - yes, it is indeed now visible on RubyGems 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants