-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: Markdown, storage and security #1184
Comments
I don't think pre-rendering the html will helping with XSS since the same user generated content will show on the page regardless of which method you take. The only security issues I have seen in marked have been ReDoS vulnerabilities. If you want to protect against someone using up resources on your servers in a ReDoS attack, you should pre-render the markdown in a separate process, and if that process takes longer than some threshold (more than a second maybe) cancel it and show an error to the user. I can't really say how well marked's |
On our project we always store the raw, and render/sanitize on request only since as you stated it's super fast. Our establishing owner wanted to store the processed raw data however with sanitizers some security issue may come up (there is precedence) and you would need to regenerate the statics again. That's a hefty CPU task to continually do and could in itself generate a DoS of sorts. e.g. we don't currently cache this process. As far as doing it twice, perhaps on client side, Userscripts can usually defeat this so you wouldn't want that security issue... so it's best to have it done server side as much as possible. e.g. CPU time is server side instead of client side (which sucks but is a fact of life in our arena). |
Hi, thanks for the replies - much appreciated, that's some useful info. I've noticed that Github uses server-side technology only, no JS rendering. Presumably they're storing both raw and rendered versions of all posts. That's a lot of database storage! I'm still unsure of the best way to do this, it seems like there's no definitive answer. Storing the raw data and rendering client-side with JS would obviously be ideal, but only if it's going to be secure. I do wonder what other users of markedjs do with the issue of data/storage/security.. |
@oasiz The reason why GitHub renders on the server-side is because they use Ruby to render Markdown to HTML (Kramdown I think). They literally can't render on the client (unless they find a ruby-to-wasm compiler or start using marked.js 😄) |
@oasiz Can you clarify what security concerns you have with rendering the raw markdown on the client side? I'm guessing the scenario you're imagining is: Attacker Sets profile to malicious content. Victim On visiting the attacker's profile, the server returns the raw data. While rendering it something bad happens. Is the "something bad" what you're concerned about, or am I misunderstanding? |
My 2 cents. You either:
Converting into html on the submitting client and then storing that on the server opens security concerns because the client-side code can be manipulated by a malicious user. ps. Github parses markdown content into html by using a customized version of the C implementation of commonmark |
This approach should work fine. I would use a worker pool to render the JS with a timeout.
I think this is a better approach. Since the markdown being rendered is untrusted, perhaps render it in a WebWorker and enforce a timeout. Hopefully this project doesn't have any more vulnerable regexes but you never know. Possible instructions here.
+1. Under the "malicious client pre-renders the HTML" design, the user can upload whatever HTML they want to. They don't have to execute your code. |
Hi, appreciate the replies :-)
Yes, absolutely correct.
That's what I'd like to do if you guys feel it's secure, as it only involves storing the original data and leaving the rest client-side. Less storage space, less server CPU. Win-win right?! |
I'd recommend you consider the WebWorker-based approach I suggested above. Does it seem feasible for you? That ensures that if we slip up and let a REDOS issue through, your clients can't be attacked by a malicious profile. |
@davisjam Can you add a web worker example to the docs? |
@davisjam Yes I will be implementing whatever is necessary in order to secure the app, so I'll be happy to help in any way I can. Always like a bit of free advertising! Hopefully this won't be a problem, but the site I'm working on isn't currently open to the public - although I'm hoping to release a basic version in a couple of weeks or less. I've never used WebWorkers, so looking forward to learning something new :-) |
@oasiz Are you still interested in adding WebWorkers to the docs? |
I am googling how to use |
Please ignore my last comment. I was wrong. It's right to have |
There is a PR to add web workers and node worker threads to the docs #1432 The PR is just waiting for worker threads to be stable in node. |
Hi guys!
I realise this is a bug reporting section so I hope it's OK to post this, but I was hoping for some expert advice regarding Markdown and security.
Firstly, my situation is that I want to allow members on our website to have a section of their user profile that they can edit using the markdown format, parsed by Marked.
Importantly, the data cannot be trusted - anyone can register for an account so I need to be careful with what can/can't be uploaded and displayed.
Here's how things currently stand on my development site:
Hopefully that makes sense. Now, my question(s)..
Is that overkill? To the best of your knowledge is that safe enough?
In an idea world I'd like to cut out having to store the data twice and just work with the raw data client-side as the rendering seems to be blazingly fast - but from a security standpoint I don't know whether this is doable.
Main question: Would it be horribly irresponsible to just work with the raw data and just output:
marked([raw-unsanitized-data], { sanitize: true })
Possibly as an additional safeguard use DOMPurify:
DOMPurify.sanitize(marked(<raw-unsanitized-data>, { sanitize: true }))
? I can't help but feel like it probably would be, but I'd like to hear it from a real expert.
PS Further info, all site data is SSL and all cookies are secure/HttpOnly etc.
OK, if you're still here - thanks for staying awake!
But seriously, I'm no expert on this by any means so I really appreciate any insight/advice you can give.
Best Regards,
Rob
The text was updated successfully, but these errors were encountered: