-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feedback? #1
Comments
Nice work. It will be nice to have somewhere better to point people when
they ask about reusing elfeed-curl.
Since you're using make-process — something I should have done — you can
skip coding-system-for-read and process-connection-type. You're already
using :connection-type, so also use :coding. I'm sure you realize this
already, but I want to highlight it anyway (especially for anyone else
following along): Only the "binary" coding system (or equivalent) makes
sense since the input potentially contains multiple different encodings
that need to be decoded individually.
Oh, nevermind, I just noticed you're using call-process in the sync
version, so I suppose you still need to dynamically bind these variables
anyway!
The type of value returned by plz-get (very clever name by the way!)
varies depending on the arguments. You can get away with this sort of
thing in a dynamically typed language, but it would make no sense at all
in a statically typed language. This is a guideline I know I've broken
myself, and, when I have, I always regret it later: Dynamically typed
programs should still be consistent about its types. I mean this in the
deep sense. A function doesn't just consistently return a list, it
returns a list of values of a specific type, and so on — the sort of
consistently required by static typing. It's good practice.
It looks like you never call plz-error? Make it a hard rule for async
that either plz-success or plz-error is called exactly once per request.
(One exception: if the curl program is not found it's fine to report
such a fatal error directly, e.g. via make-process failing.) This is
essential for later managing a queue atop the request function, and it's
one of the things url-retrieve gets completely wrong. With url-retrieve,
some errors are reported synchronously some of the time depending on the
platform (DNS failures, TCP handshake failures, etc.), sometimes
callbacks are called multiple times, and sometimes they're never called
at all! As you're already aware, it's basically impossible to build
anything reliable on top of url-retrieve.
One of the things I enjoy about *not* making elfeed-curl a general URL
fetching library is I get to take some shortcuts. For example, I don't
quite fully parse the response headers. Elfeed is the only consumer of
those headers, and it doesn't need them to be carefully parsed, so it
doesn't matter. But for a general library, you perhaps want to follow
through. Technically I believe you're supposed to decode header text
values as ISO-8859-1, and perhaps also handle RFC 2047 MIME decoding.
(How much does curl handle here already? I don't know, but you can
probably rely on the headers being at least syntactically valid.)
It's important for elfeed-curl to support multiple URLs in a single
invocation. The goal isn't to reduce the number of curl processes, but
to exploit HTTP/1.1 keep-alive when those URLs all belong to the same
domain. If you're following multiple feeds hosted on the same site (e.g.
a bunch of YouTube channels), only one TCP connection needs to be
established to request all those feeds. The story is even better with
HTTP/2. (I think? I don't know how well curl employs it.) It puts less
stress on the server and is just generally a nice thing to do. The
server is also less likely to penalize such a burst of requests if they
use HTTP efficiently. The machinery to make this work is a bit
complicated as you've seen (random delimiter "tokens"), so you may not
think it's worth the trouble for your library, but it is important for
some applications, like Elfeed.
Similar arguments can be made for last-modified and etag headers, which
reduce the load on servers and is just good netiquette. Elfeed handles
these headers at a higher level, and it's probably also out of scope for
plz for the same reason. But it's something to consider.
Returning a string of content is certainly simpler, and it's not wrong
per se. I prefer the buffer-passing method of elfeed-curl purely for the
sake of efficiency. Unlike strings, buffers can be "sliced" using
narrowing, so there's less garbage, fewer allocations, and less copying
when presenting a narrowed process output buffer to callbacks. In
Elfeed's case, the result needs to be stored in a buffer anyway for the
XML parser, so strings are an even less attractive option.
|
Very thorough review. Thanks very much, Chris. Your guidance is invaluable. |
Thanks, done. There are a lot of process-related details like that in Emacs that I haven't worked with before, so I appreciate your pointing them out.
I guessed that was the case; thanks for confirming it.
See below...
Yeah, I was a little uncomfortable with that too, so I've followed your advice and split the synchronous calls into separate functions. There's a little bit of code reuse, but not too much, and maybe that can be improved later. So
Thanks. That should be the case. If it ever isn't, I'll fix it! :)
Errors are now handled: by default, if no error callback is given, one of two errors are signaled:
If an error handler is given with the ELSE argument, it's called instead of signaling an error, and the same (plz-get-sync "https://httpbinnnnnn.org/get/status/404")
;; Signals an error with:
;; (plz-curl-error . #s(plz-error :curl-error (6 . "Couldn't resolve host. The given remote host was not resolved.") :response nil))
(plz-get-sync "https://httpbin.org/get/status/404")
;; Signals an error with:
;; (plz-http-error .
;; #s(plz-error :curl-error nil
;; :response #s(plz-response :version 1.1 :status 404 :headers
;; (("Access-Control-Allow-Credentials" . "true")
;; ("Access-Control-Allow-Origin" . "*")
;; ("Content-Encoding" . "gzip")
;; ("Content-Type" . "text/html")
;; ("Date" . "Wed, 23 Oct 2019 10:39:09 GMT")
;; ("Referrer-Policy" . "no-referrer-when-downgrade")
;; ("Server" . "nginx")
;; ("X-Content-Type-Options" . "nosniff")
;; ("X-Frame-Options" . "DENY")
;; ("X-XSS-Protection" . "1; mode=block")
;; ("Content-Length" . "198")
;; ("Connection" . "keep-alive"))
;; :body "<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 3.2 Final//EN\">\n<title>404 Not Found</title>\n<h1>Not Found</h1>\n<p>The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again.</p>\n")))
;; Async error handler:
(plz-get "https://httpbin.org/get/status/404"
:as 'string
:then (lambda (s)
(message s))
:else (lambda (err)
(message "OOPS: %S" (plz-response-status (plz-error-response err)))))
Thanks, I'll look into those issues.
I think I'll save that for the future, but it sounds like a feature that ought to be supported. Ideally this library would be efficient and support cases like that.
Hm, yeah, I feel like that's inching toward being a browser[0], haha. But if it's needed, it should be supported, so I'll see what happens. I'll think about ways to better expose headers than only through the
I've changed the signatures of the ;; Call callback with response body as a decoded string
;; (decoding is automatic unless ":decode nil" is used).
(plz-get "https://httpbin.org/get"
:as 'string
:then (lambda (string)
(setf test-string string)))
;; Call callback with response body as a parsed JSON object.
;; json-read is called in the response buffer with it narrowed to the
;; response body, after decoding it.
(let* ((test-json)
(process (plz-get "https://httpbin.org/get"
:as #'json-read
:then (lambda (json)
(setf test-json json)))))
test-json)
;; Return the response body as a non-decoded, binary string (in this
;; case, a JPEG image).
(image-type-from-data
(plz-get-sync "https://httpbin.org/image/jpeg" :decode nil))
;; => jpeg Next on the to-do list is probably POST requests and file uploads. At that point I can probably start trying to use it in my own packages, like matrix-client (which currently uses I know you're busy, so I appreciate your feedback! 0: This is mildly amusing, funny how easy it is in Emacs: (defun plz-browse (url)
(interactive "sURL: ")
(let ((dom (plz-get-sync url
:as (lambda ()
(libxml-parse-html-region (point-min) (point-max) url)))))
(with-current-buffer (generate-new-buffer "*plz-browse*")
(shr-insert-document dom)
(pop-to-buffer (current-buffer))))) |
You don't necessarily need to split async and sync into different
functions if you instead make these two modes return the same type. For
example, perhaps it always returns a struct representing the request,
and the process object is a slot on that struct. For sync it's always
the dead process object, and for async it's a live or dead process
object. Once the request is resolved, perhaps it's even populated with
the results (status, headers, maybe the content, etc.).
I should have commented on this before, but I think it's a little
dubious for plz to make a decision about a successful HTTP response
being an "error" or not. That may depend on the context and the nature
of the request. For example, a "410 Gone" might not truly be an error,
but rather a signal to the application that some resource has removed,
and the application is expecting this result eventually. A failure to
get a complete, valid HTTP response (DNS failure, protocol violation,
certificate issues, timeouts, connection errors, etc.) is definitely an
error, though.
Side note: Did you notice that the curl error codes are all less than
100? That's no accident. It means you can use curl's own "0xx" status
codes in-band with HTTP error codes. Every possible error has an
associated, unique numeric code. Everything from 001 to 099 are hard
errors.
I see for :as you allow "buffer" or a function that's called with a
narrowed buffer. Narrowing is a temporary and exclusive state, which is
why it makes sense to invert control and narrow for the duration of a
callback. Here's an alternative to explore: indirect buffers. You could
create an indirect buffer backed by the curl output buffer. Indirect
buffers can be narrowed independently (I think?). Unlike when control is
inverted, the caller still needs to remember to clean up when done (kill
the buffer), though.
Have you played around with my aio library? If there was an aio wrapper
around plz-get, aio-plz-get, then your final example could be
essentially the same as you've written it, but fully async. The
"(plz-get-sync ...)" just becomes "(aio-await (aio-plz-get ...))".
|
@skeeto I've finally made an
Since |
@skeeto Hi Chris,
I finally decided to put together a "simple" HTTP library inspired by
elfeed-curl.el
. This doesn't implement much yet, but I wonder if you have any feedback on the basic design.Probably the most notable issue at the moment is that the async version calls the callback with a parsed response struct,with the body as a string, rather than allowing the callback to work directly in the response buffer. I'll see about making that more flexible.
Also, unlike your library, it doesn't allow fetching of multiple URLs with a single curl invocation. Do you think that's important enough that I should implement that, or would the complexity not be generally worth it?
Thanks for any feedback.
The text was updated successfully, but these errors were encountered: