-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LTD-5730-eu-xml-bad-chars #2287
Conversation
exporter/core/validators.py
Outdated
|
||
class SpecialCharacterStringValidator: | ||
message = "There's a invalid special charactor in this field." | ||
regex_string = r"^[\000-\031]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking for comments around this as this may not be the best regex
doesn't cover everything in the table. But for some reason couldn't get the more generic ones to work as expected.
Since These Characters can't be seen by the end user. |
0a2ff7a
to
84faff4
Compare
def clean_address(self): | ||
address = self.cleaned_data["address"] | ||
return remove_non_printable_characters(address) | ||
|
||
def get_layout_fields(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be adding these for exporter address and name inputs too? Does that have an impact on the xml data?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those form components are different during registration .
I guess potentially it's possible but never seen the issue and the risk is much lower since only non uk based companies can free enter the address.
I guess the risk here comes from the fact they copy and paste party details from other forms during Licence submission.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand. You're probably doing the right thing in only fixing one thing at a time. Gives comfort that you've investigated though.
|
||
|
||
def remove_non_printable_characters(str): | ||
return "".join([c for c in str if ord(c) > 31 or ord(c) in [9, 10, 13]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cleaning the data is the best approach I can think of as the user probably wouldn't know how to react to the validation message.
If I understand this part correctly though it seems to me that the tab, newline and carriage return ascii are being removed completely rather than being replaced by something that can be parsed in python e.g spaces, /n or /r
I know that this is to output correctly in xml but I'm think our data should show as an accurate representation of what the user inputs maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no it's the other way round anything less 31 with the exception of [9, 10, 13] is removed from printable chars.
If you have a look in the tests it confirms these chars stay but everything else is removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything less than 31 and 9, 10 and 13 is being removed but they aren't being replaced with anything. Your test does confirm this but what I'm saying is that it might better represent user input if they are replaced with equivalent python parseable version of what the ascii char represents
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No still not correct anything less then 31 is removed (because these are meaningless unix chars i.e /x02
how we want [9, 10, 13] to stay as this are carriage return linefeed , tab etc and hence stays as is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May I suggest including the character instead of ascii value so it is clear what the character is?
eg ord('\n')
for 10
If that is not possible for all characters then add a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh yes added comment to make it clearer what we are doing here.
form = parties.EndUserAddressForm(request=request, data=data) | ||
|
||
form.is_valid() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an assert missing for the expected
variable here here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no this test doesn't check if form is valid as everything is valid but we need to call .is_valid() so I can access cleaned_data and check the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I understand better what's going on with this now. Looks like it does the job.
https://uktrade.atlassian.net/browse/LTD-5730
This is attempting to prevent an issue that manifests itself when the EU XML failed causing a P1.
This is the because the end user address typically has ASCII -- Nonprintable Characters
which has accidentally made it's way in during cut/paste in the address field.
I have for now added the validation to address/name as these have been the most problematic.