Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filterable Interface? #86

Closed
olliephillips opened this issue Feb 24, 2017 · 16 comments
Closed

Filterable Interface? #86

olliephillips opened this issue Feb 24, 2017 · 16 comments

Comments

@olliephillips
Copy link
Contributor

Again, just thinking out loud. I know we discussed queries/reports early on and I'll quite happily do this in the client app, but it keeps striking me that a filterable interface could be useful.

A single method item.filter(data) (data) {}

This would allow some basic data manipulation of item type resultset, by ranging through it and making comparisons.

One example, removing items with a timestamp in the future - unpublished as yet if you like. This could be simply achieved if this behaviour was preferred.

Taking it further, if multiple filters could be specified and the API could also accept filter in the querystring to select which filter to use, items could have their own stored procedures in effect.

This cuts down on the handballing of data client side and is maintainable and extendable.

I don't think I could put this together, but wanted to log it. Hope that's ok.

@nilslice
Copy link
Contributor

@olliephillips -

That is a really cool idea.. I think the concept of "stored procedures" in this manner would be very useful. It seems like adding a item.Filterable interface for this fits in with the existing model... in order to have just a single method interface, I would imagine the implementation from the content struct would look like:

// providing the res, req as args.. though could just take the filter name as a string
// what if user wants to do some http/2 server push, or maybe the req has &count=x&offset=y, then
// they could pull just those results to do filtering on? 
func (r *Review) Filter(res http.ResponseWriter, req *http.Request) ([][]byte, error) {
	filter := req.URL.Query().Get("filter")
	var result [][]byte
	switch filter {
	case "published":
		// get all our data out in time-sorted order
		jj := db.ContentAll("Review__sorted")

		// decode to Go structs
		var reviews []Review
		var review Review
		for i := range jj {
			err := json.Unmarshal(jj[i], &review)
			if err != nil {
				return nil, err
			}
			reviews = append(reviews, review)
		}

		// keep reviews with timestamp <= now()
		now := time.Now()
		for i := range reviews {
			rev := reviews[i]
			ts := time.Unix(rev.Timestamp/int64(1000), 0)

			if ts.Before(now) {
				j, err := json.Marshal(rev)
				if err != nil {
					return nil, err
				}

				result = append(result, j)
			}
		}
	}

	return result, nil
}

It feels a bit clunky at first pass, but I think if a Ponzu user is going to want this level of control, its not too much work.. what do you think?

Also, just to clarify, this would only be run on a /api/contents request (multi-result, not single), right?

@olliephillips
Copy link
Contributor Author

I'll work through your code, but yes, only on '/api/contents', to filter rows from the dataset. I can see it's already possible suppress fields from the content item itself.

@nilslice
Copy link
Contributor

nilslice commented Feb 24, 2017

On second thought.. we may be better off thinking about adding some hook into the db package, since the example above would need to buffer the entire data set for Review into memory, and that could be a huge amount if the db is large enough.

By adding a hook into the db package, I mean somehow passing the filtering func into something like

db.ContentFilter(namespace string, func(data []byte) (bool, error)) ([][]byte, error)

This way it could be called from inside a *Bolt.Tx transaction, and process each row the same way it's done here:
https://github.com/ponzu-cms/ponzu/blob/master/system/db/content.go#L301

It would change the above implementation, and simplify it a bit, to something more like:

func (r *Review) Filter(name string) func(data []byte) (bool, error) {
	var fn func(data []byte) (bool, error)

	switch name {
	case "published":
		now := time.Now()
		fn = func(data []byte) (bool, error) {
			var rev Review
			err := json.Unmarshal(data, &rev)
			if err != nil {
				return false, err
			}

			ts := time.Unix(rev.Timestamp/int64(1000), 0)

			if ts.Before(now) {
				return true, nil
			}
		}
	}

	return fn
}

I'm not sure if that code above is even correct, but I hope it illustrates the point.

I really like this idea though -- I appreciate you keeping log of new features/concepts...

@olliephillips
Copy link
Contributor Author

Hey @nilslice I'm keen to have a go at this also, but you'll have to forgive me, interfaces in practice are not my strong point yet. I've put together a strawman of how I think it could work (totally outside of Ponzu here), but I can't really assess the practicalities. I wonder, if you get chance could you let me know if I'm on an efficient or workable track. If you could advise in the context of this example that would be an enormous help too - there's a lot of code to follow in Ponzu :)

@olliephillips
Copy link
Contributor Author

olliephillips commented Mar 8, 2017

Just to add, I envisaged checking if item implements filterable interface in the db.contentAll() function, then ranging through the return of i.filter(), before appending it to posts slice

@olliephillips
Copy link
Contributor Author

It's also occurred to that a map better, given the filters might be individually callable over API, a key to reference them would be needed

@nilslice
Copy link
Contributor

nilslice commented Mar 8, 2017

@olliephillips - the code looks good, and is essentially the pattern I was thinking as well.

I have a couple questions/considerations on what you think would best serve the project & our users.

  1. Since your example shows multiple filter functions, are they all always applied together, or should we allow single filters to be selectively applied individually or in different combinations?

    • Personally, I'd like to be able to mix and match the filters, so that one API call with filters could return something different from another if needed.
  2. Do you think this should completely override the content API results, which by implementing this Filterable interface, every call to get content out is affected by the filter(s)?

    • You can hide normal API results by implementing item.Hideable then checking for the filters param on the request & returning the special error, ErrAllowHiddenItem
  3. Filters should probably be accessible directly through DB calls, but also through the API endpoint. Right?

    • If so, keeping this inside db.Query is probably the best bet, in combination with a &filters=fn1,fn2 HTTP API call.
    • Kept within db.Query, we also get the ability to re-use the other query options like order (asc/desc), offset (paging) & count (limit)

If the Filter(s) shouldn't override all content API results, and be applied selectively, then we need some control over how they are applied from both HTTP API and DB package calls. We could add an option to the db.QueryOpts for 'filters' which would be the key name of any filter func declared in a map[string]filterFunc returned from the item.Filterable#Filter method. Then in db.Query we could check if the filters option is non-nil and apply whichever filter(s) to the results before they are returned. (filters would be a []string, and we'd call the Filter() method on the content type to get the filterFunc from the map).

This is a pretty big feature, and I'd be glad to try and pair on it if you'd like, via screen share or something.. I'm in Los Angeles, CA though so the time difference may be tough. Happy to try and make it work if you're interested.

/cc @kkeuning, @sirikon who have added some great features and I think would have helpful feedback if they'd like to offer it

@olliephillips
Copy link
Contributor Author

olliephillips commented Mar 8, 2017

Thanks for the feedback on that piece.

In terms of:

  1. I think where I got to today was a map return from filter() would be better than slice. I envisage an additional api route such as (for example only) /api/view?name=Published, which could use a single filter func from that map on the key "Pubished". I hadn't really considered chaining, more that each key in the map held a single function representing single filter or stored procedure for the data struct.

  2. I didn't see it as a default override, but activated via a specific request made on the above route. I wanted to have as small a footprint as possible, I think the additional route above is important, with a route handler than implements something similar to ContentAll, though I haven't really thought it a long way through. So everything else works as it does now. I can see what you're saying about getting on the back of the db.Query method and will look further at that.

  3. Hadn't considered internal use, only API. Purely wanting to present different result sets based on properties of data such as published date for example.

If the scope is bigger than I can see, it might be a feature that's too big for me at the moment, since I'm still getting familiar with the Ponzu codebase - but I'd give it a shot :)

I'll have to work on this in my spare time, and since I'm in the UK, I'm not sure I can make a useful pair on this feature, but again happy to give it a shot.

@nilslice
Copy link
Contributor

nilslice commented Mar 9, 2017

Cool - I think we are on the same page for all these points, except maybe the need for another API endpoint. After considering it a bit more, I think we can easily add this to the /api/contents endpoint in a way that wouldn't conflict much with existing code or cause noticeable performance degradation.

Since I am planning to move a couple things around, which would impact this Filterable concept, I think it would be best for me to work on it and check in with you to make sure things are still in line with your goals. If there are parts that you want to work on in parallel with me, that would be great and I'll open a new branch for this once it's ready.

Do you have other considerations for creating a new API endpoint, other than colliding with existing code? I'm just not convinced that adding more than a type assertion and check for the &filter= query param is necessary.

@olliephillips
Copy link
Contributor Author

Appreciate you looking for feedback on this.

Not had lot of time on it, but one thing jumped out at me today.... without knowing the data/item type requested, my approach i.e with separate endpoint on which a view/filter is requested by name leaves me having to work out the data/item struct called from the view name requested. And I'm not sure reflection will get me that far.

So I'd more or less modified my thinking to including the data struct the filter should work on, in the /api/view request too.

And this, I think, makes it little different than what you are suggesting - just another querystring parameter on the api/contents endpoint.

So I think your proposal has smaller footprint actually 👍

Please do include me in the development work, even if only for review.

@nilslice
Copy link
Contributor

nilslice commented Mar 9, 2017

Ok, sounds good!

To clarify:

Not had lot of time on it, but one thing jumped out at me today.... without knowing the data/item type requested, my approach i.e with separate endpoint on which a view/filter is requested by name leaves me having to work out the data/item struct called from the view name requested. And I'm not sure reflection will get me that far.

Since the query params will still include the type name, like this:
/api/contents?type=Review&count=4&order=asc&filter=expired

We'll be able to get the type from our item.Types map just as it's done throughout the codebase, example here: https://github.com/ponzu-cms/ponzu/blob/master/system/api/handlers.go#L40 (where t is the string type name from req.URL.Query().Get("type"), and it gets a pointer to the type from the map by calling the function returned from that map at key t)

Does that cover what you're unsure of?

@olliephillips
Copy link
Contributor Author

olliephillips commented Mar 9, 2017

Sorry, I wasn't clear. I think we are on the same page.

In my comment I was referring to my proposed approach being weak - the separate endpoint. Not yours.

When passed only a view name like this:

/api/view?name=published

My initial idea can't work - I need the object type name or the filter interface implementation is useless?

But by including type name, it's effectively identical to your suggestion:

You had:

/api/contents?type=Review&count=4&order=asc&filter=expired

or

/api/contents?type=Review&filter=expired

I'd have had to do this:

api/view?name=expired&type=Review

Which is, bar, semantics, absolutely the same?

I'm sorry for any confusion - I was in total agreement with your proposal to extend the /api/contents route idea

@nilslice
Copy link
Contributor

nilslice commented Mar 9, 2017

@olliephillips ok, got it! yes, those would be the same aside from semantics. thank you for clarifying

@ivanov
Copy link
Contributor

ivanov commented May 21, 2017

I also found myself reaching for something like this in thinking about how I would implement soft-deletions, by having a Deleted bool on an item... Maybe it should be possible to specify the default filter that gets applies, when there isn't a filter= parameter specified?

@nilslice
Copy link
Contributor

@ivanov - that is interesting.. how would you anticipate it to be configured from the user's perspective? as in, without a filter param set, there would need to be some way for the handler to decide when/when not to filter results. How about using the default case in a switch? If we modify the example above, and call the Filter() method in every request (even if filter= isn't set), you could do:

func (r *Review) Filter(name string) func(data []byte) (bool, error) {
	var fn func(data []byte) (bool, error)

	switch name {
	case "published":
		now := time.Now()
		fn = func(data []byte) (bool, error) {
			var rev Review
			err := json.Unmarshal(data, &rev)
			if err != nil {
				return false, err
			}

			ts := time.Unix(rev.Timestamp/int64(1000), 0)

			if ts.Before(now) {
				return true, nil
			}
		}
        default: 
               // call some function or process data directly here 
               // which would run on every request where filter was
               // not explicitly set
	}

	return fn
}

I think we'd also need to add a http Hook, since the Filter method is in the db package and has no context of the request/response.. This way you could have control over when the Filter is applied based on certain request data.. something like:

// in system/item/item.go
type Hookable interface {
// ...

        BeforeFilter(res http.ResponseWriter, req *http.Request) error
        AfterFilter(res http.ResponseWriter, req *http.Request) error
} 

Whereby returning an error from BeforeFilter would tell the handler to skip the call to Filter altogether, and from AfterFilter would halt the request and respond with a HTTP 500, logging the error.

Your point also highlights what would have been a bug... the Filterable interface was initially only going to be called on /api/contents but, it actually would need to be called on /api/content as well, since in your case a soft-deleted item should be excluded from both single and multiple result-set responses. It should definitely be called on both though, otherwise a soft-deleted item would still be returned if accessed by ID.

Thanks for your feedback on this.. if you have any more thoughts or suggestions please keep them coming!

@olliephillips
Copy link
Contributor Author

Closing this issue. It is on the project roadmap. Please reopen if need to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants