Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix scheduled report description double encoding error and provide way to do proper sanitization bit by bit. #7997

Closed
wants to merge 11 commits into from

Conversation

diosmosis
Copy link
Member

Fixes the double encoding error in #7987 by storing scheduled reports descriptions unsanitized and sanitizing only on outputting to HTML.

Also includes the following changes which will allow us to fix #4231 one change at a time:

  • Created RequestSanitizer service and moved all sanitization logic to it (all Common::(un)sanitizeInputValue* methods).
  • Changed API\Request's request parameter accessing & sanitizing logic so it didn't use Common::getRequestVar() and instead just accessed the value manually. Moved the sanitization logic for this to a method in RequestSanitizer.
  • Allow DI config to blacklist certain API method parameters from being sanitized. This allows us to change one part of Piwik at a time to input only sanitization. For example, we can blacklist the add/update site methods when making sure site names are stored unsanitized in the DB in one PR, then in another PR deal w/ another part of Piwik.

Eventually, after all the input sanitization is removed in favor of output sanitization, the RequestSanitizer class should be completely removed. Should be done for 3.0, since it will change the API.

Extra Changes

  • Fixed an issue w/ double encoding in exception messages. Exception messages were always escaped, even for non-XML formats (eg, JSON). I changed this so only XML/HTML formats escape messages. Includes tests.
  • Sanitize HTML provided to notification.js & Piwik_Popover.showError() so there will be no XSS when showing exception messages (or other messages) that contain HTML. Uses angular.js $sanitize service.

Sanitization Change Workflow

With this approach, you can change one part of Piwik to use the sanitize on output approach by doing the following:

  1. Find the API method that creates/updates the entity in question. For the API method parameter that should be stored unsanitized, blacklist it through DI config. This will remove sanitization for that specific parameter.

    To blacklist an API parameter, add the method and parameter name to the 'RequestSanitizer.requestParameterSanitizeBlacklist' array in DI config (see ScheduledReports/config/config.php for an example).

  2. Make sure the property is stored w/o sanitization. Normally, you won't have to do anything in this regard.

  3. Make sure the property is properly escaped when outputting HTML/XML.

  4. Test all places where the property can be displayed in the UI for XSS. (including exception messages)

Once all parts of Piwik are changed, we can make the breaking change of not sanitizing API method parameters before calling the method in 3.0.

TODO:

  • add issue for removing magic quote hack/workaround in Piwik 3.0
  • create new Update in ScheduledReports that unsanitizes report descriptions in the DB change
  • make sure to test all other scheduled reports formats + XSS in scheduled reports pages

Refs #7987 and #4231

@diosmosis diosmosis added c: Security For issues that make Matomo more secure. Please report issues through HackerOne and not in Github. c: Platform For Matomo platform changes that aren't impacting any of our APIs but improve the core itself. Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. c: Design / UI For issues that impact Matomo's user interface or the design overall. labels May 26, 2015
@diosmosis diosmosis added this to the 2.14.0 milestone May 26, 2015
@diosmosis diosmosis self-assigned this May 26, 2015
$isValuePresent = isset($parametersRequest[$name]) && (is_array($parametersRequest[$name]) || strlen($parametersRequest[$name]) !== 0);

$value = $isValuePresent ? $parametersRequest[$name] : $defaultValue;
if ($value instanceof NoDefaultValue) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe this logic + throwing exception could be refactored in a method used here and in Common::getRequestVar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's just the first line that was copied from getRequestVar, but moving it makes a lot of sense. Would clear up part of getRequestVar, too. Already done locally ;)

@mattab
Copy link
Member

mattab commented May 28, 2015

Feedback

@diosmosis diosmosis force-pushed the 7987_sanitize_normalization_start branch from 1bdae5c to 309b7ce Compare May 28, 2015 04:22
@@ -464,13 +365,15 @@ public static function getRequestVar($varName, $varDefault = null, $varType = nu
}
}

// return isset($parametersRequest[$name]) && (is_array($parametersRequest[$name]) || strlen($parametersRequest[$name]) !== 0);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented code

@mnapoli
Copy link
Contributor

mnapoli commented May 28, 2015

Would it make sense to have RequestSanitizer be just a Sanitizer? After all it isn't coupled to the request, it only sanitizes values (except for sanitizing API parameters and the blacklist).

Regarding putting the "blacklist" in the DI config, it works, but it's a bit "disconnected" from the controllers (i.e. I work on a controller, it's not easy to see if request parameters are escaped or not, which is a little critical). What about using an annotation for that? e.g.

class Controller
{
    /**
     * @Escape(false)
     */
    public function saveReport($name, $description, ...)
    ...
}

(could be @NoEscape or whatever else)

The other good thing that comes with that is that the Sanitizer would be entirely ignorant of Request/HTTP stuff and be a simple standalone service.

* @return string
* @throws \Exception If the value is null or a boolean.
*/
public function sanitizeValue($value, $alreadyStripslashed = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about simpler names, sanitize?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can make this change. Note: this class should be removed completely for 3.0, so the names shouldn't matter too much.

@diosmosis
Copy link
Member Author

Would it make sense to have RequestSanitizer be just a Sanitizer? After all it isn't coupled to the request, it only sanitizes values (except for sanitizing API parameters and the blacklist).

The point of this class (which may not be mentioned in the issue description or class docs, the PR is still a WIP) is to centralize the logic so we can remove it easily in 3.0. We can call it Sanitizer, it's not really important to me what the name is. I went w/ RequestSanitizer since it is only used to sanitize query parameters in an HTTP request.

Regarding putting the "blacklist" in the DI config, it works, but it's a bit "disconnected" from the controllers (i.e. I work on a controller, it's not easy to see if request parameters are escaped or not, which is a little critical). What about using an annotation for that? e.g.

It is disconnected, and it is supposed to be disconnected. Using this class & setup, we can change one part of Piwik at a time to be "store unsanitized, sanitize on output" by adding the info to DI (maintaining existing behavior in other places and thus preventing unwanted xss vulnerabilities). Then, when we change the default handling of parameters to be unsanitized in 3.0, we can just remove the class and DI entries. If we use annotations or something else coupled with the plugins, we'd have to modify every API method and remove the useless annotation.

Also using DI means the code isn't a publicly available API, which is better in this case.

Finally, a quick note, this only affects APIs, so not Controllers (who don't actually get called w/ request parameters).

@diosmosis diosmosis added Needs Review PRs that need a code review and removed Pull Request WIP Indicates the current pull request is still work in progress and not ready yet for a review. labels May 28, 2015
@diosmosis diosmosis force-pushed the 7987_sanitize_normalization_start branch from 8b9f845 to 4d62a83 Compare May 28, 2015 23:02
@mnapoli
Copy link
Contributor

mnapoli commented May 29, 2015

OK I see your point (it's all only temporary). FYI I am not very confident we will be able to fix this whole sanitization issue in 3.0 (and thus remove this class), I mean it's a huge-major thing with high risk of creating security issues (XSS) while refactoring… Somehow I think it won't be as fast as we would like to. That's why I was pushing with spending more time making this more "durable", but if you are confident then I don't want to slow you down :)

@diosmosis
Copy link
Member Author

The ultimate goals is to do no sanitization on query parameters in API requests. That's a HUGE breaking change, so If we can't get to that point by 3.0, it will probably never happen. To make the process faster, maybe I can create an automated XSS test. The UI tests do this already to some extent, but it's certainly not complete. And maybe I can come up w/ a set of patterns for JS coding to avoid XSS (for example, some code builds HTML directly w/ the result of AJAX queries, this is not necessary w/ angularjs).

For an XSS test, we could potentially add <script>$('body').html('xss ?')</script> to every varchar/char/text/blob/etc. column we can find in the DB. Or just add the data by hand, and test that we've covered every potential table column. This would be hardening of what is currently done in UITestFixture. This would test having untrusted data in Piwik's DB.

I'm not sure how to test having untrusted input returned from Piwik (either because it is in the DB, or due to a man in the middle attack). We could fake XHR responses... but triggering specific code paths is impossible w/ the jquery soup much of the frontend consists of.

@diosmosis diosmosis force-pushed the 7987_sanitize_normalization_start branch from 4d62a83 to 7fbc56c Compare May 30, 2015 04:35
@diosmosis
Copy link
Member Author

Note to merger: This PR could potentially introduce XSS, so it would be a good idea to do manual UI tests before merging.

@mattab
Copy link
Member

mattab commented Jun 1, 2015

For an XSS test, we could potentially add <script>$('body').html('xss ?')</script> to every varchar/char/text/blob/etc. column we can find in the DB.

Unfortunately automated tests for XSS is not realistic nor practical for now... we will need to manually check every instance of variable in all templates, and html generators, and manually sanitise the output for the context (js, html attribute, etc.). it's tedious work but needs to be done. Don't think we can do it by 3.0.0 as it's already huge scope https://github.com/piwik/piwik/milestones/3.0.0 but we shall discuss (#4231 is tentatively in 3.0.0)

Note to merger: This PR could potentially introduce XSS

I don't think we should rely on the reviewer to catch XSS but rather the original code author. You don't write in PR that you checked instances of the use of variables and check they are sanitised... did you, or is this PR really risky in this regard?

@diosmosis
Copy link
Member Author

You don't write in PR that you checked instances of the use of variables and check they are sanitised...

I checked where the description property was used and where exception messages were used. Perhaps I checked every possible place, perhaps I missed something. Perhaps someone could double check.

@mattab
Copy link
Member

mattab commented Jun 1, 2015

I checked where the description property was used and where exception messages were used. Perhaps I checked every possible place, perhaps I missed something. Perhaps someone could double check.

Ok thanks for more details, good to know 👍

@diosmosis diosmosis force-pushed the 7987_sanitize_normalization_start branch 2 times, most recently from 0883260 to 7de6acc Compare June 2, 2015 23:52
@mattab mattab added this to the 3.0.0 milestone Jun 23, 2015
@mattab mattab removed this from the 2.14.0 milestone Jun 23, 2015
@mattab
Copy link
Member

mattab commented Jun 23, 2015

@diosmosis as you suspected, because this PR is quite complex, it didn't get merged in time. I guess it be a possibility to fix all sanitisations problems at once in #4231

return $value;
/** @var RequestSanitizer $requestSanitizer */
$requestSanitizer = StaticContainer::get('Piwik\Http\RequestSanitizer');
return $requestSanitizer->sanitizeValues($value, $alreadyStripslashed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check this for performance stuff and profile it? Eg we sometimes call this during bulk request hundred thousands of times. I presume StaticContainer::get should add too much overhead but just to be sure. Same for following. Maybe we could profile a bulk request with 600 tracking requests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tracking 600 requests takes 10.5s for me on master and ~11.5s on this branch. Investigating the cause, though likely due to StaticContainer::get.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I can't really fix this now. It's due to StaticContainer, but I can't fix it w/o removing StaticContainer. Which I can't do w/o removing static functions like getRequestVar. It's not possible w/o a large commit.

I suggest, before we work on 3.0, we should create some xhprof profiles for specific parts of Piwik that we don't want performance regressions in (like tracking, archiving, all websites/multisites). We post the PHP benchmark code + xhprof profiles online somewhere. Then while working on 3.0, we can re-run the benchmarks periodically and make sure that we're not regressing, or at least how to eventually fix the regression, before the 3.0 release. What do you think?

If you agree, could you list the places we shouldn't regress? You've done most of the performance tuning, I think.

diosmosis added 11 commits June 30, 2015 11:31
…tion logic in Common.php methods to this class.
…r() so we can specify through DI which API request parameters to not sanitize.
…ports.updateReport and ScheduledReports.addReport.
…ns in the DB. Includes version bump to core (to next beta).
…sages were escaped for XML/HTML regardless of the format, so JSON output would include escaped HTML. When alerted, or displayed on the screen, this results in double encoded messages. Fix involves formatting only for XML/HTML API renderers. More tests included.
…ed in Piwik_Popover.showError and old notification.js JavaScript methods.
@diosmosis diosmosis force-pushed the 7987_sanitize_normalization_start branch from 7de6acc to dde2704 Compare June 30, 2015 19:10
@mattab mattab removed this from the 3.0.0 milestone Dec 15, 2015
@mattab
Copy link
Member

mattab commented Feb 1, 2016

Realistically I think this won't get merged. Instead we shall try to fix the issue at once everywhere, which should make things clearer and easier in the mid term.

@mattab mattab closed this Feb 1, 2016
@mattab mattab added this to the 3.0.0 milestone Feb 1, 2016
@mattab mattab deleted the 7987_sanitize_normalization_start branch March 15, 2016 00:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c: Design / UI For issues that impact Matomo's user interface or the design overall. c: Platform For Matomo platform changes that aren't impacting any of our APIs but improve the core itself. c: Security For issues that make Matomo more secure. Please report issues through HackerOne and not in Github. Needs Review PRs that need a code review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants