Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Build WG issues #562

Closed
Trott opened this issue Jul 8, 2018 · 15 comments
Closed

General Build WG issues #562

Trott opened this issue Jul 8, 2018 · 15 comments

Comments

@Trott
Copy link
Member

Trott commented Jul 8, 2018

I believe there's general agreement about certain issues surrounding Build WG.

  • Most of the people are around only sporadically. This isn't a problem if it's reliable that someone is always around, but that has not been the case.

  • Related to the previous point: Great potential for burnout for the small number who are doing build stuff more than sporadically.

  • @nodejs/build pings are overwhelming. AFAIK, no one pays attention to them. We need to find a way to reduce and/or categorize the pings so that people can pay attention to them again and they can start being more effective.

I'm raising this to the TSC rather than Build WG because Build WG may not have the bandwidth (or resources) to deal with these issues. (These are not new issues.)

I don't have any solutions to propose, at least not yet. (The obvious one--paying people to monitor the Jenkins infrastructure--has been discussed elsewhere and TBH I don't even remember if the last time it was discussed, the conclusion was "no" or "let's ask the Board" or what. I think @mhdawson was the instigator of that last go-around on that conversation, so maybe he remembers.)

@mcollina
Copy link
Member

mcollina commented Jul 9, 2018

I would also note that partecipating in build is a “silent” job, it does not have much recognition and it does not attract much contributions. As an example, we are not listing build wg members in nodejs/node.

I think we should promote (active) build wg members more.

@bnb
Copy link
Contributor

bnb commented Jul 9, 2018

@mcollina I've not participated in build at all, but that's entirely my understanding as well. Zero thanks are given for one of the most vital and important pieces of our infrastrucutre... especially not to those who dedicate a non-trivial amount of time.

From a CommComm perspective, definitely +1 to actively promoting the Build WG members more.

@mhdawson
Copy link
Member

mhdawson commented Jul 9, 2018

I did start the discussion on "paying people" last time (this is the issue nodejs/build#1154), but I probably did not push hard enough to keep the discussion going.

I did talk to the executive director and it sounded like it was a possibility.

The stumbling block was the concern that paying somebody might deter/cause others to be less willing to volunteer on the build WG side and the slippery slope once you start paying some collaborators.

The challenge I see is that volunteers are a good fit "when I have time, I'll look at what needs to be done and do that" as opposed to "It has to be done now, drop everything". Of course if the "drop everything"
is infrequent it still works out ok.

It would be good to have the "impact" of our current state of availability captured somewhere (I'd suggest this issue). As an example, is that it is frustrating because things don't get fixed right away which slows things down (or slows people who currently have time to work on something at a particularly time), or is it that things don't get fixed at all and we are getting into a worse and worse state? The build WG clearly understands there are lots more things that the project (and the build WG) would like to see happen, but I think identifying the top X impacts to day-to-day work that make it urgent for the TSC to step in an help push forward change would help focus the discussion.

Ideally, the best answer is the enable people to help themselves when they come across a problem as opposed to needing to call somebody else in. On that front @gdams it starting to look at setting up ansible tower so that we can let people run cleanup type work more easily but its certainly not going to be a silver bullet and is going to take time to make progress.

We did have a number of new people volunteer and I think most of them are now onboarded (although I could be wrong on that front). This came out of the collaborator summit which does show that more promotion can help so definitely +1 on that front as well.

@Trott
Copy link
Member Author

Trott commented Jul 9, 2018

It would be good to have the "impact" of our current state of availability captured somewhere (I'd suggest this issue).

I think the more telling impact isn't the impact on the project but the impact on the Build WG itself. I think people burn out and check out, even if they don't say as much. I think also recent friction between the two most active folks on the WG probably stem from some or all of these issues on some level. (I'm not sure if they'd agree with that.) It also makes recruiting and onboarding difficult, thus perpetuating the problems.

We did have a number of new people volunteer and I think most of them are now onboarded (although I could be wrong on that front).

Two of the four of them have been onboarded. That we got four volunteers was a result of an extra push by Refael, Tierney, me, and probably others during the Collaborators Summit.

@mhdawson
Copy link
Member

mhdawson commented Jul 9, 2018

If the impact is on the Build WG itself, can we change the expectations on "current state of availability". ie set the expectation that somebody may not be available at all times and people just have to wait? If the project is mostly moving forward as it needs to, but the Build WG feels under stress that might help. If changing that expectation is not reasonable, then that supports the case that we need to get certain things done in a different way (and we'd need to identify those things).

The issue starts out by indicating that the problem is that most people are around only sporadically (I think the expectation in most parts of the project is that people will only be around sporadically). To me the expectation that people in the build WG will be "more available" might be part of the cause of burn out in the build WG... (I also understand why there might be this expectation, as we want builds etc to keep moving forward) Maybe I'm misunderstanding what you meant by sporadically. I'm interpreting it meaning that people are only available asynchronously. I also understand that things not being in a good state is also cause of burn out, even without external pressure so it might just be that members get frustrated because things are not in as good a state as they would like them....

@Trott
Copy link
Member Author

Trott commented Jul 9, 2018

Maybe I'm misunderstanding what you meant by sporadically. I'm interpreting it meaning that people are only available asynchronously.

Lots of subtleties here. First, the people who need Build WG folks available are often other Build WG folks. This is especially true when dealing with the super-privileged infra that is used for releases.

Second, it's not that Old Timer Ted isn't around at a convenient time of day. It's that they might not be paying attention to build stuff for days or weeks at a time.

Third, yes, we can change the expectation that someone will get on a problem within N hours or whatever. But only if we're willing to slow the velocity of the project. I'd actually advocate for that, but I don't think there's much of an appetite for it on the project. I think people might accept that as a temporary measure but I don't know if people would be enthusiastic about it as a permanent solution.

To be honest, though, talking to @maclover7 and @refack about these issues might be more useful than talking to me. They may have different ideas about solutions and whatnot, but I suspect that they would be in agreement about the issues.

@maclover7
Copy link
Contributor

First, just want to say I'm really happy we're continuing to have these conversations, and to try and work through difficulties facing the Build WG. Most of this stuff is not easy to solve, and continuing the dialogue is very important, at least to me.

To put a face on the issues (my intention is not to make this all about me, or overgeneralize what's happening, but I think it would help to give at least one person's perspective), here is a quick-and-fast list of my current difficulties:

  • Difficult to get pull request reviews from subject experts
  • setup/ Ansible scripts are not fully migrated to ansible/, some machines have no working scripts
  • Higher-up-infra is gated to a very small group of people, many of whom have not been involved in any form for months or years
  • I am only a volunteer!!

Like @mcollina and @bnb mentioned, this is largely "silent"/"hidden" work with little-to-no project recognition, which makes it tough to attract contributions. I recently onboarded two new members (Matheus Marchini and Luca Lanziani), and both seem excited about contributing.

Something that might be good to do would be to reset the relationship with users of Build WG services, and establish a more formal "contract" (read: listing out expectations for everybody involved). Maybe this should be done for the Build WG itself, as well? At that point, IMHO, we can figure out how to better use existing or new WG resources (machines, volunteers, Foundation $$) to get over that finish line.

@mhdawson
Copy link
Member

@maclover7 thanks for adding your prespective.

In respect to

Higher-up-infra is gated to a very small group of people, many of whom have not been involved in any form for months or years

I think part of the problem on that front is the visibility of what people are doing. Looking at the list of people list as Infra Admins, I know that other than one person who's been pulled by their current job, people have been active in the last few months (agreed not to the level we'd all like to be). The challenge is that everybody does have a lot on their plate and prioritizes what they get done which may not match up with what other people want/need them to do in order to push forward what they are working on. So I'm not disagreeing that having more people who can help on this front is not a good idea, only that saying that people are completely disengaged is not fair either.

@mhdawson
Copy link
Member

I think this is a key part of the discussion

But only if we're willing to slow the velocity of the project. 

I think it's reasonable to achieve a certain level with volunteers. If that level does not match the expectations for the project then we need to either adjust the expectations or look for other ways to meet the expectation.

I'd agree with you that we should consider the "slowing the velocity of the project". Maybe starting by defining what is reasonable given the current volunteers, and proposing we formalize that as a way to have the discussion about whether we slow the velocity or find another solution.

@Trott
Copy link
Member Author

Trott commented Jul 10, 2018

Relevant to this discussion: https://medium.com/@Trott/on-landing-code-when-ci-fails-f3aa999cda3d

@mhdawson That's how I think we should throttle velocity, FWIW.

@mhdawson
Copy link
Member

@Trott do you think there is a better way to have the discussion about adopting that approach other than just opening a PR to update our onboarding/guidance to state that is the approach along with some of the context? I know there might be a fair amount of discussion, but opening the PR is likely the best way to get it started.

@refack
Copy link

refack commented Jul 10, 2018

From my POV, the situation has improved drastically in the last couple of months.

  1. The status of the test CI cluster seems to be converging towards a minimum of spurious fails.
  2. Number of reported incidents in GitHub & IRC has reduced.
  3. @nodejs/build-files is used more, and is replacing @nodejs/build, so pings have slowed down to a manageable number (less than 5 a week).

This might be due to two reasons. Hopefully it's because capacity is slowly catching up to demand. Alternatively it's because the Collaborators have given up on the infra. We're trying to better understand which one is it...

IMHO better focus and re-aligning of expectations will eliminate the pressure on the Build team.


As a reminder the Build team is tasked with facilitating two seperate tasks:

  • CI testing
  • building of releases

Since the second task is far less frequent, and can be coordinated, and performed by experienced users, IMHO it could receive lower priority for the time being.
So as I see it stabilizing and then improving CI testing should be the main focus for a while. For that we need better feedback, tracking & reporting, and managed expectations.

@Trott
Copy link
Member Author

Trott commented Jul 10, 2018

@Trott do you think there is a better way to have the discussion about adopting that approach other than just opening a PR to update our onboarding/guidance to state that is the approach along with some of the context? I know there might be a fair amount of discussion, but opening the PR is likely the best way to get it started.

@mhdawson That PR already happened, although arguably it snuck in under the radar (although I don't think opening a PR is sneaking anything--then again, it may have been insufficiently clear in the title what was going on?). It was really two PRs. First nodejs/node#19458 and then further tightening in nodejs/node#21645.

What might be good now, assuming there is buy-in on this practice, is maybe to announce it in the discussion board for Collaborators.

@mhdawson
Copy link
Member

mhdawson commented Jul 10, 2018

I'm guessing many people are not going to be aware since there are so many PRs, there is no way we can reasonably expect everybody to keep on top of all of them, particularly collaborators who have more limited time to contribute. Even though I try to read the titles of every Issue, 21645 still slipped by me and I only learned about the "Resume build" (which is great !) from another collaborator last week. Part of that might be that is was only open for 2 days so you had to catch it during that window.

Since it's a change from past expectations, I think we need to be messaging the whole collaborator base, most likely a number of times until we see behavior change. That should either help people become aware and start following the new practice or ignite discussion which we need anyway if we don't have buy in.

Might even be good to have something on the page for starting the build that says "New"
please read. Kind of like signs that advertise when new stop signs are added.

@mhdawson
Copy link
Member

mhdawson commented Apr 3, 2019

Given that we have added a Strategic initiative in https://github.com/nodejs/TSC/blob/master/Strategic-Initiatives.md to look at Build resources can this be closed and have ongoing discussion covered in that initiative?

@Trott Trott closed this as completed Jul 24, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants