Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introducing Cukes #431

Merged
merged 156 commits into from
Jan 24, 2014
Merged

Introducing Cukes #431

merged 156 commits into from
Jan 24, 2014

Conversation

steveklabnik
Copy link
Contributor

Please note this is not merge-able yet, this is mostly for discussion purposes.

Hey all! One of the first things I've been tasked with is improving the API specification. Over the weekend I asked around my network, did some research, and spiked out an idea, and I wanted to run it all past you.

Requirements

As always, the first step is to gather requirements. Here's what I gather we need for the API spec:

  1. A way to define what each aspect of the API does.
  2. An automated way to turn those definitions into executable tests.
  3. A way to run those tests regularly to ensure conformance.

All three of these are pretty straightforward, but might as well spell them out.

  1. These specifications must be easy to understand by a large variety of people.

This one is complicated. There are two kinds of concerns here: {in,out}side of Balanced, and across software language groups.

The first is the in/out part. Basically, while we have a very clear idea of what everything does, people who are new to Balanced won't. Just today I got an email from someone, asking if Balanced was appropriate for them, when their marketplace was basically identical to Rent My Bike. I'm happy to answer those kinds of questions, but we need something that will be easy for people who don't live and breathe payments to grok.

Secondly, software language groups. If we write our specs in Python, Pythonistas will be able to understand it, but Haskellers might not. Well, they probably will, but you know what I mean. Everyone wants their preferred language, but our product is an API, and it's language agnostic. Therefore, we need something that's not incredibly closely tied to a particular software language.

  1. The specification has to facilitate collaboration on what the specification actually does.

This is related to #4, but is a bit different. It's not enough that the spec is legible to everyone, but that it's easy to bikeshed. ;)

What we have

Right now, the API spec looks largely like this: https://github.com/balanced/balanced-api/blob/revision1/scenarios/cards/tokenize_without_secret.yml

It's a bunch of YAML files, with a totally custom, bash/Python runner.

This situation is good in that it's language agnostic, but bad in that it's 100% custom, so it's not natively understood by anyone, and while it communicates to some degree of what the specs do, it's almost 100% code. It took me some time to be able to actually understand what's going on, which will always happen, of course, but it's still hard to turn into actual words, which impedes #4 and #5.

Solution

I've spiked out a temporary solution based on a tool called "Cucumber." Before I tell you more, let's compare:

Current: https://github.com/balanced/balanced-api/blob/revision1/scenarios/cards/tokenize_without_secret.yml

Cucumber: https://github.com/balanced/balanced-api/blob/cukes/features/cards/tokenize.feature

The 'current' has two features, and the cucumber only has one, so ignore past line 18 on 'current.'

Cucumber, if you haven't heard of it, is a tool for collaboration between stakeholders. The idea is that you write high level descriptions of what you want your software to do in psudo-English ("Gherkin"), and then that gets transformed into code to actually run and test those descriptions.

While I've implemented the scenarios in Ruby, you can write the 'code' aspect of the tests in a variety of languages. You can also write the features in a variety of languages, too. For example: https://github.com/cucumber/cucumber/blob/master/examples/i18n/el/features/addition.feature

As for how this gets turned into code, I have two versions:

JSON Schema: https://github.com/balanced/balanced-api/blob/cukes/features/step_definitions/card_steps.rb

Assertions: https://github.com/balanced/balanced-api/blob/b91d4264eaace33373606191ef4d32ca1372ac47/features/step_definitions/card_steps.rb

Since we already assert against JSON Schema, I think sticking with that makes sense, as it'll be easier to port.

I actually think the "200 code" assertion may be too low level.

Pros

Basically, this is the exact use case for Cucumber. Because the specifications are written in English, we help make it easier for people to understand, as well as making it easier from people to collaborate across language boundaries. I have some issue with English being the de facto human language that technologists communicate in, but it is what it is, at least for now. This solves #4 and #5, our biggest problems, nicely.

Furthermore, collaboration can be done on the high-level language, without worrying about the details. For example, Matin had some concern over my wording of the last part of that scenario, because 'that's not exactly what that does.' Excellent, we have some sort of communication error here. Cucumber's high-level descriptions bring that kind of thing out, and people can submit pull requests with just the Gherkin, and we can add the implementation ourselves.

I wonder if "elevator pitches for your code" is a good way of describing Cucumber. Maybe I've been in San Francisco too long. ;)

Cons

Cucumber is... not a tool without some controversy. It was originally described as an 'acceptance testing' tool, and so people got into 'test mode' and started writing tests. Cucumber is very bad when used in this way. Eventually, even some helpers were removed [1] to discourage writing tests at a low level. Basically, if you're not writing very high-level descriptions, you are Cuking It Wrong. ;)

Cucumber can be slow. There's the whole Gherkin-code 'compilation' step, so it can take longer than other frameworks. This is mitigatable, but we're already planning on making HTTP requests against our API for our tests, so they'll be slow no matter what the framework is.

As I mentioned above, some people really, really dislike cukes. I think that's mostly due to having terrible Cucumber suites, people have developed some scars. This can be overcome, but there may be some initial 'eww.' There may be with our specs as of right now. ;)

Other options

Basically, if this is a bad idea, we should just move to some sort of very basic unit testing library in a not-to-unusual language. I'd basically use Python and Requests to do this, since it's what we're most familiar with overall. I think that the downsides of being language specific is pretty big, though.

Conclusion

I'm sure I've forgotten a few things, but I'd love to hear feedback, good and bad. I think this is the right move, but I don't always have all the right answers... I would especially love feedback from @mattwynne and @aslakhellesoy .

1: http://aslakhellesoy.com/post/11055981222/the-training-wheels-came-off

Scenarios

CRUD tests

Here's all the cases we need to cover to merge. From https://docs.balancedpayments.com/1.1/api/ :

API KEYS

  • Create a Key
  • Get a Key
  • List API Keys
  • Delete API Key

BANK ACCOUNT VERIFICATIONS

  • Create a Bank Account Verification
  • Get a Bank Account Verification
  • Confirm a Bank Account Verification

BANK ACCOUNTS

  • Tokenize a Bank Account (Direct)
  • Get a Bank Account
  • List Bank Accounts
  • Delete a Bank Account
  • Credit a Bank Account
  • Debit a Bank Account

CALLBACKS

  • Create a Callback
  • Get a Callback
  • List all Callbacks
  • Delete a Callback

CARDS

  • Tokenize a Card (Direct)
  • Get a Card
  • List All Cards
  • Update a Card
  • Deleting a Card
  • Debit a Card

CREDITS

  • Create a Credit
  • Get a Credit
  • List All Credits

CUSTOMERS

  • Create a Customer
  • Get a Customer
  • List Customers
  • Delete a Customer
  • Associate a Card
  • Associate a Bank Account

DEBITS

  • Create a Debit
  • Get a Debit
  • List All Debits
  • Update a Debit
  • Refund a Debit

EVENTS

  • Get an Event
  • List all Events

REFUNDS

  • Create a Refund
  • Get a Refund
  • List All Refunds
  • Update a Refund

REVERSALS

  • Create a Reversal
  • Get a Reversal
  • List All Reversals
  • Update a Reversal

non-crud tests

That's exhaustive. Let's do this.

@mattwynne
Copy link

I like it!

You'll have to see how it plays out to stick to this kind of high-level language in all of your scenarios. Some people might want to see more detail, yet not want to read the Ruby code underneath in the step definition. You can either address this with hand-crafted user docs (which could go in the Gherkin), or you could consider having one scenario per feature that uses more detail and has concrete examples of the JSON request / responses.

I'm interested to see how this develops.

@tef
Copy link

tef commented Dec 3, 2013

I think as it stands, your requirements are manufactured to want Cucumber, so let's look at them:

A way to define what each aspect of the API does.
An automated way to turn those definitions into executable tests.
A way to run those tests regularly to ensure conformance.

The second step is the one you'll have to justify, and I'm not convinced.

. Basically, while we have a very clear idea of what everything does, people who are new to Balanced won't. Just today I got an email from someone, asking if Balanced was appropriate for them, when their marketplace was basically identical to Rent My Bike

Writing these tests in pseudocode won't make people read them. (Unit) tests themselves do not give big picture views of your code, and should not serve as documentation, or example code.

If you have a comprehensive test suite, would you expect people to browse through them one-by-one to see if their use case works?

Secondly, software language groups.

There is an xkcd cartoon here, but instead of using a language some people know, you're going to use a language no-one knows. Although some tests become 'readable', the maintenance of tests becomes labourious. Frequently for each line of english you have to write a custom regular expression to parse it. Your test runner won't be language agnostic either.

Therefore, we need something that's not incredibly closely tied to a particular software language.

Except the implementation will be, and you will need to rewrite the ad-hoc regex rules in each language if you want to avoid this. This is a huge burden for tests and maintenance.

Basically, this is the exact use case for Cucumber

The use case of cucumber (afaik) was so that clients signing off on a project could agree to pages and pages of legalese that was executable, to enable the customer to both understand and specifiy what they want.

This is not your use case. Your use case as presented is "We need better documentation and examples to showcase features and make it easier to get started with the API". Cucumber solves this if you think reading tests is a good substitute for documentation and worked, comprehensive examples in other languages :-)

My experience with cucumber is that it made writing tests slow, and running tests slow, and debugging tests harder.

@BRMatt
Copy link

BRMatt commented Dec 3, 2013

Sorry if I've misunderstood the problem, but have you considered using a tool like api-blueprint to document the API, then use tools that test the API against the defined documentation?

It adds some semantics on top of Markdown for documenting requests/responses/schemas etc. and it's pretty readable, even when browsing via Github.

Admittedly it probably won't capture intricate scenarios as easily as a cuke scenario/bash script, but thought it might be worth bringing up.

@tef
Copy link

tef commented Dec 3, 2013

(nb I'm not against having documentation and examples which can be tested to check they work, but I don't think that is the same as having tests that serve as documentation and examples. c.f python's doctests and the problems with using them)

@steveklabnik
Copy link
Contributor Author

@BRMatt , I'm not opposed, but one of the tricky bits here is that as Balanced moves even further towards hypermedia, we want to move away from documenting exact URLs and HTTP methods, and focus more on workflows. I don't have the most experience with api-blueprint, but generally, these kinds of tools tend to focus on 'RESTish' style documentation rather than this kind, so that's something that's in here too.

@steveklabnik
Copy link
Contributor Author

@tef, I think your 'use case for cukes' is the same thing, just more cynical. ;)

I actually didn't even come up with cukes until the last minute, so maybe they are described as a fit for cukes, but it happened in the other direction.

I don't think new people will go read a bunch of cukes, but it would be nice to show as a response to this kind of question.

@mattwynne
Copy link

@tef makes an important point that the demands of testing and documentation can come into conflict. The RSpec project have tried this, and found that the comprehensive test suite made for pretty boring user documentation. Users don't want to see every single edge case.

My advice in this situation is usually to push the edge cases down into unit tests. If you think there's still benefit in having that complete specification readable by everyone, consider writing two sets of cukes - one that provides a user manual, and another that provides your complete specification.

@Kosmas
Copy link

Kosmas commented Dec 3, 2013

@tef regarding the issue of speed with the tests, I think that is something that will change soon but @mattwynne will be able to confirm this.

I also believe that the cukes would help to have a quick understanding of how things are supposed to work, but in order to fully understand you would have to go further deep into the code.

@MattyO
Copy link

MattyO commented Dec 3, 2013

The python equivalent to cucumber is called lettuce. It a little immature compared to cucumber. Its usefulness may be found in the use cases of interoperability with any current testing tools and libraries you might have or want to use in the future written in python.

Mainly wanted to let you know of its existence.

@BRMatt
Copy link

BRMatt commented Dec 3, 2013

@BRMatt , I'm not opposed, but one of the tricky bits here is that as Balanced moves even further towards hypermedia, we want to move away from documenting exact URLs and HTTP methods, and focus more on workflows. I don't have the most experience with api-blueprint, but generally, these kinds of tools tend to focus on 'RESTish' style documentation rather than this kind, so that's something that's in here too.

@steveklabnik Fair enough, those are reasonable points. I believe the maintainers of api-blueprint are trying to work in support for hypermedia relations, but I'm not sure how that fits in with your plans.

@mjallday
Copy link
Contributor

mjallday commented Dec 3, 2013

@steveklabnik do you see Balanced then implementing the underlying scenarios for the Gherkin in each client library to ensure they conform to a basic functionality (the spec)?

As I'm sitting here figuring out how we can implement the rev1.1 clients I really like the idea of a basic set of functional tests across all client libraries so we don't end up with the issues we had last round such as under-utliized client libraries having issues opened because they are missing esoteric functionality and we don't realise until someone actually tries to use it.

@mahmoudimus
Copy link
Contributor

Here are my thoughts:

I think Gherkin is the way to go here, definitely +1 on it.

First and foremost, there are many Gherkin/Ragel parsers in many different languages. The official client libraries that are supported are: Ruby, Python, PHP, Java, C# and Node.js -- all of which have Gherkin/Ragel runners. So this is largely aiding #4 as well and helps with the training wheels.

We attempted to something very similar to this in our documentation: https://github.com/balanced/balanced-docs/tree/master/clients -- maybe instead of having scenarios like this, we organize tests implemented in different languages and use those to feed into our API? Ben might have more thoughts on this topic.

This is definitely a step in the right direction and however, I'd like to compare it to something like http://apiary.io or RAML.

@steveklabnik
Copy link
Contributor Author

@mjallday maybe; I'm not sure how directly applicable they are, as the clients would need testing that they consume things correctly, and these are tests that stuff is produced correctly.

That said, we should integration test both together, at some point.

@steveklabnik
Copy link
Contributor Author

I just realized that I wrote a blog post a few years back that's directly relevant to this conversation http://blog.steveklabnik.com/posts/2011-12-20-write-better-cukes-with-the-rel-attribute

@JonRowe
Copy link

JonRowe commented Dec 5, 2013

What'd be great is a way to visualise the gherkin documentation online, with each step able to sho the raw response / differing language implementations...

@steveklabnik
Copy link
Contributor Author

@JonRowe , that's what http://relishapp.com is for :)

@JonRowe
Copy link

JonRowe commented Dec 5, 2013

It allows you to visualise the documentation, but it's not very non-developer friendly, I found @chrismdp's rack-usermanual idea very intriguing in this respect. It also doesn't currently offer a means to inspect what it actually does / provide sample responses.

@steveklabnik
Copy link
Contributor Author

Oh that is super cool!

@aslakhellesoy
Copy link

@steveklabnik we'd be happy to set you up with a free https://cucumber.pro/ account once we go live. Cucumber Pro will replace Relish.

Regarding the idea to document the API with Cucumber. If I was a developer using the API, I would like to see scenarios like this:

Feature: Payment

  Scenario: tokenize
    Given some precondition here
    When I http POST to /cards with JSON body:
      """
      {
        "number": "4111111111111111",
        "expiration_month": "12",
        "expiration_year": 2016
      }
      """
    Then the response code should be 201 with:
      """
      {
        ... whatever the expected reply is...
      }
      """

This is more verbose than what you suggested, but I think this level of detail is needed for the audience you're targeting (developers using the API)

@steveklabnik
Copy link
Contributor Author

@aslakhellesoy thanks! Yeah, that makes sense; with something like this, I'm never 100% sure what the right level of abstraction is

@aslakhellesoy
Copy link

To find the right level of abstraction, ask some stakeholders (in your case, users of the API):

  1. Does this make sense?
  2. Would you be able to do this yourself with this info?
  3. Do you need more info?
  4. Is anything puzzling you?

Looking at your feature I think most developers unfamiliar with your API would answer like this:

  1. Yes, I think so
  2. No. I have no idea what HTTP request to issue
  3. Yes, can you give me an example?
  4. I don't understand how the secret key is relevant. Can you explain? (that's what the scenario description is for)
Feature: feature name
  Scenario: scenario name
    Some descriptive text here is useful

    Given ...

@mahmoudimus
Copy link
Contributor

@aslakhellesoy that looks really good! 👍

@andrewsardone
Copy link

We've been using a similar setup at @nutshellcrm. I think it has been working pretty well, though I admit it's tough to find the right level of abstraction.

@chrismdp
Copy link

chrismdp commented Dec 6, 2013

@steveklabnik thanks. Just realised the Rack::Usermanual demo links were broken: should be fixed now.

@steveklabnik
Copy link
Contributor Author

@andrewsardone would you be interested in some help and / or collaboration to turn this into a gem, maybe?

@steveklabnik
Copy link
Contributor Author

@tef, does this style do anything to mitigate your concerns, or you still think it's a terrible idea?

@mahmoudimus
Copy link
Contributor

👍

@steveklabnik
Copy link
Contributor Author

@aslakhellesoy @mattwynne , one more question: We have scenarios that operate like this: https://github.com/balanced/balanced-api/blob/revision1/scenarios/cards/tokenize_without_secret.yml#L20-L23

This (kinda terrible because of the runner) test basically says "run the 'tokenize' scenario, and then make a request to the URI that's located at cards.href in its response."

This kind of thing is going to come up a lot as we move more and more towards hypermedia; "execute this thing, if it passes, execute this other thing, based on what comes back." These feel like two different tests to me, but GivenScenario was depreciated, so I'm not sure how to accomplish this kind of task.

Thoughts?

@mamund
Copy link

mamund commented Dec 9, 2013

I've thought about this, too. Another possible testing challenge is that you're building a client that could possibly travel several paths to a successful outcome. So it's not just if this then that but also something like start at A and IF you eventually get to Z that's a pass ELSE that's a fail

A likely situation for hypermedia, but not something I am aware that current testing tools can handle.

@matin
Copy link
Member

matin commented Dec 9, 2013

@steveklabnik Could you also add a scenario that debits/charges the card that you just tokenized?

One of the most painful parts of our current scenarios is referencing the result of a previous scenario. This becomes even worse if you want to use some sort of a fixture to avoid doing repetitive tasks like tokenizing a card or creating a customer for different scenarios.

@steveklabnik
Copy link
Contributor Author

I am currently re-running the tests over every commit to make sure that bisect works. Assuming that's successful, it's time to finally merge this sucker.

:shipit:

@matthewfl
Copy link
Contributor

@steveklabnik I just noticed that there are some scenarios that do not appear to be implemented, might we double check that we have everything

https://github.com/balanced/balanced-api/blob/revision1/scenarios/debits/create_new_card.yml

@steveklabnik
Copy link
Contributor Author

Okay! I have re-gone through and double checked. We were missing one or two. Let's get that CI building.

@steveklabnik
Copy link
Contributor Author

🎊 :shipit:

steveklabnik added a commit that referenced this pull request Jan 24, 2014
@steveklabnik steveklabnik merged commit 7c22c05 into revision1 Jan 24, 2014
@mjallday
Copy link
Contributor

oh snap! 💃

@julianpistorius
Copy link

@steveklabnik have you looked at 'behave' instead of Cucumber? It's Python and allegedly better than Lettuce. http://pythonhosted.org/behave/index.html

http://pythonhosted.org/behave/comparison.html

@steveklabnik
Copy link
Contributor Author

@julianpistorius I have not, no.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.