Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialized input -> Deserialization -> Reserialization should be symmetrical #45

Closed
mcshaz opened this issue Jul 14, 2016 · 20 comments
Closed
Labels

Comments

@mcshaz
Copy link
Contributor

mcshaz commented Jul 14, 2016

The previous nuget release (2.1.3) was not serializing any of the timezone info, but was serializing events (albeit setting any and all attendee properties other than uri to null as part of the Attendees.Add operation).

In the current github master branch, events are not serialized either.

Test cases are included with a pull request, but as a brief example:

using (var cal = new Calendar() { Method = "PUBLISH", Version = "2.0" })
{
    var evt = new Event
    {
         Summary = "Testing",
         Start = new CalDateTime(2010, 3, 25),
         End = new CalDateTime(2010, 3, 26)
     }
    cal.Events.Add(evt);
    var serializer = new CalendarSerializer(new SerializationContext());
    var serializedCalendar = serializer.SerializeToString(cal);
}

will currently produce

BEGIN:VCALENDAR
METHOD:PUBLISH
PRODID:-//github.com/rianjs/ical.net//NONSGML ical.net 2.1//EN
VERSION:2.0
END:VCALENDAR
rianjs added a commit that referenced this issue Jul 14, 2016
…able. Some minor updates to broken serialization unit tests: style, formatting. #45
@rianjs
Copy link
Collaborator

rianjs commented Jul 14, 2016

Did this work in previous versions of ical.net, or is this a recent regression? Do you know?

@rianjs
Copy link
Collaborator

rianjs commented Jul 14, 2016

I will see about writing a whopper test in the next few days that takes all of the ICS files and deserializes, reserializes, and compares the two strings to assert the symmetry of these operations. I think they should be equal, or equal if we sort each subcomponent. If they aren't equal, I want to know how bad it is...

@rianjs rianjs changed the title Serializing is not including event or timezone info Serialized input -> Deserialization -> Reserialization should by symmetrical Jul 14, 2016
@rianjs rianjs added the bug label Jul 14, 2016
@rianjs rianjs changed the title Serialized input -> Deserialization -> Reserialization should by symmetrical Serialized input -> Deserialization -> Reserialization should be symmetrical Jul 14, 2016
@mcshaz
Copy link
Contributor Author

mcshaz commented Jul 14, 2016

I’ll run a git bisect and get back to you (I should have thought to do this when I made the ticket)

@mcshaz
Copy link
Contributor Author

mcshaz commented Jul 14, 2016

nuget packages (not commits)
ICal.Net 2.1.1 to 2.1.3:
AddAttendee - setting properties to null
RemoveAttendee - pass

SerializeDeserialize - pass
EventPropertiesSerialized - pass
TimezoneSerialize - begin & end vtimezone, containing only TZID (no other information)
AttendeesSerialize - missing properties (URI only included)

so it is 2.1.4 (current) where the serialization stops working completely (although attendee properties are no longer being clobbered).

@rianjs
Copy link
Collaborator

rianjs commented Jul 14, 2016

Immensely frustrating, thanks.

rianjs added a commit that referenced this issue Jul 14, 2016
Fix regression bug introduced during a 'minor cleanup' ugh #45
@rianjs
Copy link
Collaborator

rianjs commented Jul 14, 2016

I just merged a PR that fixes a bunch of the serialization unit tests. Only 3 fail now. While I was there, I fixed the platform-specific AppendLine and Environment.NewLine usage to \r\n per the spec. Thanks for the heads-up on that, too.

I unlisted the broken nuget package, too.

@rianjs
Copy link
Collaborator

rianjs commented Jul 16, 2016

It appears that serializing an Event is a lossy operation: put in DateTime.Now as a value, and everything smaller than a millisecond is trimmed, but this loss occurs during serialization, not during CalDateTime instantiation.

I should figure out what, specifically, is happening here. Ceiling? Floor? Rounding?

Anyway, this is a problem. Business rules should be enforced during object instantiation, not during serialization.

rianjs added a commit that referenced this issue Jul 16, 2016
…ents.

Equality and hashing working for Events, CalDateTimes, and Calendars.
Fixed bug where DateTimes didn't have their sub-second precision truncated #45
rianjs added a commit that referenced this issue Jul 16, 2016
rianjs added a commit that referenced this issue Jul 17, 2016
…. Related to work on Attendees #45. Perhaps useful for API changes in v3, too?
rianjs added a commit that referenced this issue Jul 17, 2016
@mcshaz
Copy link
Contributor Author

mcshaz commented Jul 18, 2016

I have thought a bit more time about the discussions in pull request #49 and how best to test serialization. To summarise:

Object -> Serialise -> Deserialise -> Compare objects

if any properties are not serialised, or are serialised incorrectly, these are important tests and will pick that up straight away. Weaknesses of using this as the only test of serialisation are:

  • If deserialization is failing, these tests will also fail, and the developer must be aware of where to concentrate efforts, and therefore it is important to clearly classify these tests as SerializeThenDeserialize.
  • If deserialisation is a little flexible as compared to the spec (e.g. case insensitive property values), features which could be a useful component of the deserialization library can no longer reliably test that serialization output meets the spec claimed by the library.
  • If extra/unnecessary text or properties are being serialized, this will most likely not be picked up. Issue Using Calendar.AddProperty duplicates property name into value #27 is an example.

Current, hastily designed tests of mine

Text parsing will not be as robust as a dedicated library. They are not IMHO useless at this stage of development, as they have helped clarify the nature of a few flaws in serialization.

File -> Deserialise -> Serialise -> Compare text

I do not know enough about how rigid the spec is in terms of the order of properties & values, but it could be argued the 'overly restrictive' nature of the tests I wrote and discussed in the point above would persist or be even more restrictive, and therefore do not offer a great advantage.

Separate spec validation library

This would to some extent be the best solution because it is applicable beyond the tests - when designing a consuming application, having a GetSpecValidationErrors type of function would be very useful - for example returning problems like an Event with a missing Organiser, or METHOD:PUBLISH calendar erroneously including an event with an attendee.
If a 'kitchen sink' type calendar object (multiple events, timezones, alarms etc) both passed the SerializeThenDeserialize test and also the serialised output validated against the spec, it would be very reassuring.
The problems are writing and maintaining such a library would not be a trivial task.
I see Doug Day actually wrote some kind of iCalendar validation parser, but it seems to no longer be hosted, and I am unsure of the licence and where the source code is.
One short term hack could be to add a project which in the short term posts data using something like:

using (var client = new HttpClient())
{
   ...
    var response = await client.PostAsync("http://icalendar.org/validator.html", content);
    var responseString = await response.Content.ReadAsStringAsync();
   //will then have to parse html using an appropriate library
}

Obviously this will be too slow for running regularly, but might be a good idea before updating nuget packages. The clearest weakness will be it will be have to parse HTML output, which has been designed for visual assessment by a human, and might reasonably change over time without warning. Another solution which does nothing about the speed aspect, but makes things a whole lot more robust is to ask the owners of http://iCalendar.org or http://severinghaus.org if they would be willing to add an AJAX type service, which returns a JSON array of problems discovered.
I am happy to put the above hack together if you think it might be useful until something better can be designed.

@rianjs
Copy link
Collaborator

rianjs commented Jul 19, 2016

So interestingly... I went back to a very old version of dday.ical. (Before I started making changes.) Here's what happened with your simple attendee case, @mcshaz :

var simple = new Attendee("MAILTO:[email protected]")
{
    CommonName = "James James",
    Role = "CHAIR",
    RSVP = true,
    ParticipationStatus = ParticipationStatus.Tentative
};

var serialized = SerializeToString(calendar);

The output, once I replaced the \r\n new lines... needless to say, it's completely wrong:

BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//ddaysoftware.com//NONSGML DDay.iCal 1.0//EN
BEGIN:VTIMEZONE
TZID:America/Los_Angeles
END:VTIMEZONE
BEGIN:VEVENT
ATTENDEE:DDay.iCal.Attendee
DTEND:20160719T205402
DTSTAMP:20160719T195403
DTSTART:20160719T195402
SEQUENCE:0
UID:3b38432f-288b-46cd-882b-69d051652964
END:VEVENT
END:VCALENDAR

@rianjs
Copy link
Collaborator

rianjs commented Jul 20, 2016

So, your arguments, as I understand them are:

  • Strictness in casing for determining equality shouldn't be ignored: PROPERTY: != Property:
  • Detecting whether extra text is serialized/deserialized would be difficult to find if you compare objects to objects

I think I don't agree with these assertions. If you look at the branch I'm working on now, I have implemented several symmetric serialization/deserialization tests, focusing on what Equals and GetHashCode each mean. (Bear with the odd state of the code... it's a work in progress.) These tests are written at a lower level, in many cases, specifically comparing strings:

https://github.com/rianjs/ical.net/compare/SymmetricalSerialization?expand=1#diff-f9ec1f5ef49aee7069be21576da75f80R11

If an objects-to-objects comparison isn't catching a bug, I would prefer to write a test targeting the component that's misbehaving, even if it's as simple as:

var simple = new Attendee("MAILTO:[email protected]")
{
    CommonName = "James James",
    Role = "CHAIR",
    RSVP = true,
    ParticipationStatus = ParticipationStatus.Tentative
};

Assert.AreEqual("CHAIR", simple.Role);

If we have to write tests at the parsing level, we can do that, too.

In general, comparing text isn't a good idea. The spec doesn't specify an ordering for various attributes (that I am aware of), so comparing text would be problematic and brittle. Determining whether one attendee is equal to another attendee would be difficult. (Have you looked at how Outlook serializes things? Omg.) The only time this would be appropriate would be comparing serialized outputs in a concurrency scenario to make sure the serialization is stable, deterministic, and thread-safe.

(For what it's worth, ordering is implied in some of my Equals methods, when comparing collection contents. As a policy at some point, Properties should be serialized and deserialized in a sorted fashion, rather than insertion order. Same for lists of strings for DELEGATED-FROM, for example. Collection equivalence is probably OK, too.)

RE: Separate spec validation library... I'm not averse to this idea, but I'm not going to take it on.

Btw, your code sample should be written as:

using (var client = new HttpClient())
{
    using (var response = await client.PostAsync("http://icalendar.org/validator.html", content))
    {
        var responseString = await response.Content.ReadAsStringAsync();
    }
   //will then have to parse html using an appropriate library
}

HttpResponseMessages implement IDisposable, as do HttpContent objects, but disposing the HttpResponseMessage will also dispose of the Content object, but disposing of the HttpClient will NOT dispose the HttpResponseMessage. Yeah, it's weird.

rianjs added a commit that referenced this issue Jul 20, 2016
@rianjs
Copy link
Collaborator

rianjs commented Jul 20, 2016

Hmm, Attendees are starting with ATTENDEE; instead of ATTENDEE:. This might be a good opportunity to change the implementation to use Attendee.ToString() and get rid of the AttendeeSerializer (which really just delegates the serialization semantics to another class anyway. sigh

@mcshaz
Copy link
Contributor Author

mcshaz commented Jul 20, 2016

I think we have somewhat crossed wires here - I have tried to list out 4 alternatives each with positives and negatives, and I think we agree on almost everything:

  • string comparison is a bad idea, which includes my tests, and the concept of string -> object -> string -> compare (File -> Deserialise -> Serialise -> Compare text) in the headings I have given)
  • There was a little side note that the tests have picked some things up - have you seen pull request Attendees validating #54? The tests have picked up that the folded line was occuring without the mandated space character. This problem with line folding is fixed in the pull request. I was merely making the point they have served a purpose, but are not a long term solution & need to be replaced.
  • The object -> string -> object -> compare objects is a great idea and IMHO should be a part of the library, and the string comparison tests I have written in order to quickly identify some problems, does not belong in the tests long term.
  • To make the the tests even better, in addition to the object -> string -> object -> compare test above, I think it would be ideal to also test that the output validates against an icalendar validator of some description. The 4th heading in my post above describes initial thoughts on pros & cons. The ideal would be to get the code and licence from Doug Day's validator. As a quick and nasty measure (I seem to be specialising in that) I have created a fork which posts the serialized output to the icalendar.org online validator. As per the post above - a long way from perfect but a start. If code serialises and deserialises to objects with equivalent property values AND the serialised output validates against the spec, we can be reasonably certain this is a fully working product!

@mcshaz mcshaz closed this as completed Jul 20, 2016
@mcshaz mcshaz reopened this Jul 20, 2016
@mcshaz
Copy link
Contributor Author

mcshaz commented Jul 20, 2016

Appologies - clicked close and comment by mistake - reopened.

As per the comment on online validation in my 'experimental' branch, the tests are validating or not validating serialised output appropriately using the online icalendar.org validator. each test takes 1-2 seconds, and as stated, a single post could be used with an icalendar object containing just about every feature imaginable. If you think this feature would be worth pursuing, I will ask the site owners about the possibility of providing a web service which responds with a JSON array of validation errors, as the current method of parsing the HTML is too unstable to be of real use.

rianjs added a commit that referenced this issue Jul 20, 2016
rianjs added a commit that referenced this issue Jul 20, 2016
@rianjs
Copy link
Collaborator

rianjs commented Jul 21, 2016

Yes, I misread your ideas. I'm on board with the first three bullets.

The validator thing, though... in theory I'm OK with it, but it fails several valid serializations for time zone reasons. ("America/New_York is an invalid time zone", which is bollocks.) I think the actual problem is the way the DATE-TIME time zones are specified which is actually OK unless it's UTC, as you noted in your PR recently. I haven't had the time or energy to pull on that particular thread any more, however, as I have a need for working attachments at work, so that's where my efforts are focused at this moment.

Feel free to reach out to the validator folks... I wouldn't be surprised if the validator itself is just a wrapper around a REST endpoint that you'd just need to know how to call. (That's how I would implement it, anyway.)

@mcshaz
Copy link
Contributor Author

mcshaz commented Jul 21, 2016

("America/New_York is an invalid time zone", which is bollocks.)

I could be wrong, but I think the timezones I have seen do not validate - issue #58 raised in an attempt to clarify

@rianjs
Copy link
Collaborator

rianjs commented Jul 22, 2016

Incredibly, all line folds are technically wrong, and always have been. This is why ATTACH components are not serializing correctly.

https://tools.ietf.org/html/rfc5545#section-3.1

Lines of text SHOULD NOT be longer than 75 octets, excluding the line break. Long content lines SHOULD be split into a multiple line representations using a line "folding" technique. That is, a long line can be split between any two characters by inserting a CRLF immediately followed by a single linear white-space character (i.e., SPACE or HTAB).

The function that purports to do this is TextUtil.WrapLines, which does not insert a whitespace character when it wraps the line... I thought this was something attachment-specific, but it's not. Sigh.

I have implemented a replacement: TextUtil.FoldLines, but I need to get a handle on what the ramifications are with respect to what the unit tests assert... Ugh.

rianjs added a commit that referenced this issue Jul 22, 2016
…"line

folding" where lines are meant to wrap after the 75th character. This was
implemented, but the second part of a fold is the indentation by one space
or tab character on folded lines 2..n. I have re-written the fold function
so that it works properly, which has fixed several serialization failures.
We are now down to 2 broken unit tests instead of 6! #45
@rianjs
Copy link
Collaborator

rianjs commented Jul 22, 2016

@mcshaz - you may be interested in my last commit message:

Serializing many things in dday.ical never worked. The issue was the line folding where lines are meant to wrap after the 75th character. This was implemented, but the second part of a fold is the indentation by one space or tab character on folded lines 2..n. I have re-written the fold function so that it works properly, which has fixed several serialization failures. We are now down to 2 broken unit tests instead of 6!

This fixes binary attachment serialization and deserialization. I'll merge this in this afternoon, and publish a new nuget package. I also want to have a closer look at @ghentsch's issue ( #60 ) which I suspect may be related...

rianjs added a commit that referenced this issue Jul 22, 2016
…ents.

Equality and hashing working for Events, CalDateTimes, and Calendars.
Fixed bug where DateTimes didn't have their sub-second precision truncated #45
rianjs added a commit that referenced this issue Jul 22, 2016
rianjs added a commit that referenced this issue Jul 22, 2016
…. Related to work on Attendees #45. Perhaps useful for API changes in v3, too?
rianjs added a commit that referenced this issue Jul 22, 2016
rianjs added a commit that referenced this issue Jul 22, 2016
…"line

folding" where lines are meant to wrap after the 75th character. This was
implemented, but the second part of a fold is the indentation by one space
or tab character on folded lines 2..n. I have re-written the fold function
so that it works properly, which has fixed several serialization failures.
We are now down to 2 broken unit tests instead of 6! #45
rianjs added a commit that referenced this issue Jul 22, 2016
rianjs added a commit that referenced this issue Jul 22, 2016
…ible types. Mostly constants. There should be no API-breaking changes. #45 #59 #60
@rianjs
Copy link
Collaborator

rianjs commented Jul 22, 2016

Big steps forward in 4f3e4b5

We may be able to close this, but I want to have a look at @mcshaz's broken "remove attendee" test case. A secondary time zone serialization ticket exists at #58 which will carry that shortcoming forward.

@mcshaz - I would appreciate your comments on what might be missing/broken still.

@rianjs
Copy link
Collaborator

rianjs commented Jul 23, 2016

Fixed the "can't remove an attendee" bug in fce4886 -- pulled out as a separate issue #62. Published a new nuget package with the fix, too.

@rianjs
Copy link
Collaborator

rianjs commented Nov 13, 2017

This is largely the case for a while. Closing old tickets.

@rianjs rianjs closed this as completed Nov 13, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants