Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove Scheduler #5687

Merged
merged 10 commits into from
Feb 8, 2014
Merged

remove Scheduler #5687

merged 10 commits into from
Feb 8, 2014

Conversation

vtjnash
Copy link
Member

@vtjnash vtjnash commented Feb 5, 2014

This removes the Scheduler. @JeffBezanson you've talked about doing this for awhile since it should improve performance by avoiding the extra task switch.

@WestleyArgentum This change set frees the sysimg build of the need for starting the Scheduler. It was needed to permit pulling in other modules into the sysimg. After this change, I can add the following lines to the bottom of my sysimg.jl file to have Gtk fully & instantly available at the REPL.

Base.push!(Base.DL_LOAD_PATH, "/opt/local/lib")
Base.require("Gtk")
Base.pop!(Base.DL_LOAD_PATH)

edit: This is WIP because I need to create a flush_gc_msgs Task to avoid the warning that the scheduler is being called recursively. Also need Jeff's thoughts on how source_path should behave (cf. f9dd0bb).

@WestleyArgentum
Copy link
Member

This is really really exciting! Thanks so much for working on this!

@WestleyArgentum
Copy link
Member

@bass3m might also be interested in trying this out

@jakebolewski
Copy link
Member

@vtjnash looking forward to precompiled packages! It looks like you pulled in some unintended changes (see test/ccall and test/linalg.jl)

@Keno
Copy link
Member

Keno commented Feb 5, 2014

current_module needs to be rooted probably.

@@ -103,10 +103,11 @@
#end

# type Task
# parent::Task
# current_module::Module
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come you don't need to keep track of the parent task anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The parent task used to be whoever first switched to the task, that isn't a very helpful definition and it caused issues with tasks being awakened by some other random task exiting, rather than the condition variable it was waiting for. It was broken before, but harder to notice since the Scheduler was forbidden from waiting on stuff.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 5, 2014

@loladiro where? it's more rooted now than it used to be

end
return default
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not searching the entire parent task chain seems rather likely to break source_path quite badly: we only populate the TLS lazily and this search process is how we find the correct source path for tasks that haven't populated this entry. We should probably generalize this approach since it's quite likely to be something that will be used for other values stored in TLS.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above for why I think this definition of source_path may have been broken anyways. (task Alice spawns task Bob, task Bob yields for a bit then calls source_path, meanwhile Alice has finished loading the file and proceeded to include some other file. Bob now reports that it is in Alice's other file.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was for it to at least work for tasks whose lifetimes are contained within loading the file. With this change, it no longer works in that simple case.

source_path is dynamic; it's the file currently being loaded. Setting up tasks to use it potentially long after the file is done loading is a strange thing to do. A function inside the file that calls source_path will also give a different answer after loading the file. This isn't too different from the task case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense to capture this as a per-module attribute then? Or should I restore the parent parameter, but set it equal to the task that created the task (as opposed to the task that first switched to the task)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but source files and modules aren't 1-to-1 so that won't really fix it. The nice thing about the parent field is that you can get good default values for things in task-local storage without copying the dictionary for each new task.

Going by the creating task instead of the first-switching task sounds like a good idea; you're right that in general somebody might make a task, stash it in the work queue, and some random other task switches to it later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The creating task does seem like a much saner choice for the parent. The fact that the creating task and the first task that switches to it are often the same is a lucky accident.

@Keno
Copy link
Member

Keno commented Feb 5, 2014

I may be missing something, but is t->current_module rooted somewhere?

@vtjnash
Copy link
Member Author

vtjnash commented Feb 5, 2014

It is rooted in t since Task is a julia type (it didn't used to be rooted since it used to be a hidden parameter).

@Keno
Copy link
Member

Keno commented Feb 5, 2014

Ah, I see.

@@ -1,4 +1,8 @@
# source path in tasks
path = Base.source_path()
@test endswith(path, joinpath("test","test_sourcepath.jl"))
@test yieldto(@task Base.source_path()) == path
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking this test is a no-no – this definitely needs to work.

@JeffBezanson
Copy link
Member

This is a great change, thanks for tackling it.

@JeffBezanson
Copy link
Member

The binary file ccalltest should be removed, and the unrelated linalg test changes.

perform_work()
process_events(false)
end
yield()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes worker processes to busy wait (rather, this whole while true loop).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i fixed this locally, but didn't hadn't pushed it yet

@StefanKarpinski
Copy link
Member

This is a great change, thanks for tackling it.

Yes, definitely a great change to have.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 5, 2014

The binary file ccalltest should be removed, and the unrelated linalg test changes.

I fixed the ccalltest file. I need the linalg changes to run the tests locally. So, to me, they aren't unrelated. I should probably cherry-pick it directly onto master then rebase this.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 5, 2014

this passes all tests now @JeffBezanson ping

@@ -196,7 +196,7 @@ debug && println("Solve upper trianguler system")

debug && println("Solve lower triangular system")
x = tril(a)\b
@test_approx_eq_eps tril(a)*x b 1000ε
@test_approx_eq_eps tril(a)*x b 1050ε
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are unrelated and may be a reversion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my quick git blame work told me these are new tests added ~1 week ago (just prior to my branch which is why they ended up here)

@staticfloat
Copy link
Member

@vtjnash I'd appreciate it if you could cherry-pick those linalg test changes, they've been blocking the OSX nightlies for almost a week now. I know @jiahao has been working on a systematic re-think of the linalg tests, but if you've got a relaxation of the bounds that works, I'd love to have that on master.

@andreasnoack
Copy link
Member

@staticfloat I can relax those tests now. My only problem is that all tests pass on my own machines as well as Travis so I need some feedback. I'll relax those that have been reported already.

@staticfloat
Copy link
Member

@andreasnoackjensen I have machines that fail on tests. I'll run any branches/commits you want me to and report on how it works, just ping me.

@andreasnoack
Copy link
Member

Okay. Please try to run anj/relax.

@staticfloat
Copy link
Member

Confirmed working.

On Wed, Feb 5, 2014 at 1:59 PM, Andreas Noack Jensen <
[email protected]> wrote:

Okay. Please try to run anj/relax.

Reply to this email directly or view it on GitHubhttps://github.com//pull/5687#issuecomment-34256403
.

@WestleyArgentum
Copy link
Member

I've also been using this and testing on osx and linux. Works great so far!

@kmsquire
Copy link
Member

kmsquire commented Feb 6, 2014

I've also been using this and testing on osx and linux. Works great so far!

@WestleyArgentum, I'm pretty sure that @staticfloat's response was in regard to a side conversation with @andreasnoackjensen, and not to this PR. ;-)

But it's great that this PR is working for you!

@WestleyArgentum
Copy link
Member

@kmsquire heh, right - sorry for the spam

@vtjnash
Copy link
Member Author

vtjnash commented Feb 7, 2014

@JeffBezanson rebased && ready to merge

local v
try
v = yieldto(P, values...)
if ct.last !== P
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't have to change this now, but this almost seems more like an error --- there should be no reason a task other than P would yield to this one. Something is really broken if that happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was needed to pass pollfd test

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but we should look into why that happens.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. for some reason the old tasks get woken up, finish their work, and then wake up their parent (who was intending to wait for its new task)

First, we keep track of whether a task has asked to be in a particular
module (e.g. by initiating a source file load). If it has, then
current_module is always set appropriately for that task. That should
be uncontroversial.

Next, if a task doesn't care what module it is in, use the
current_module from its parent task. This is similar to the previous
commit's behavior of setting task->current_module at task creation,
except that it sees changes in the parent task's current module. So if
the parent task finishes loading something, the spawned task won't
think it is still inside that module. In other words, remember where
we got current_module rather than remembering the module itself.
state

don't print fatal error for non-root task without exception
handler. this is not fatal and in fact can happen regularly.

small cleanups to yield_until()
now multi.jl deals only with distributed parallelism

remove no-longer-used WeakRemoteRef
@JeffBezanson
Copy link
Member

Ok I'm going to merge this now, as it works perfectly well. We can get deeper into other task-related changes afterwards.

JeffBezanson added a commit that referenced this pull request Feb 8, 2014
@JeffBezanson JeffBezanson merged commit 6a080ea into master Feb 8, 2014
@WestleyArgentum
Copy link
Member

I know I say this a lot, but yay!

@timholy
Copy link
Member

timholy commented Feb 9, 2014

This is totally cool! I love having rapid access to graphics!

I noticed a couple of oddities (e.g., with Gtk compilation fails the 2nd time), but I bet you're aware of them.

My biggest concern is how developers are now going to avoid accidentally contributing their favorite sets of packages to sysimg.jl 😄.

@ihnorton
Copy link
Member

ihnorton commented Feb 9, 2014

Maybe add a userimg.jl and put it in .gitignore?

@ivarne
Copy link
Member

ivarne commented Feb 9, 2014

Or better yet, automatically precompile packages if they are listed in a Make variable that can be set in the Make.user file.

@vtjnash
Copy link
Member Author

vtjnash commented Feb 9, 2014

I noticed a couple of oddities (e.g., with Gtk compilation fails the 2nd time), but I bet you're aware of them.

That's annoying. Somehow I hadn't run across this, but I can see why it happens. @JeffBezanson is there some black-magic we could use to delete all old bindings (esp. Modules) from Main after defining the new Base?

@timholy
Copy link
Member

timholy commented Feb 10, 2014

@ihnorton and @ivarne, I like your suggestions. @ivarne, would your approach allow one to handle the DL_LOAD_PATH issues? (See the example at the top of this issue.)

@vtjnash
Copy link
Member Author

vtjnash commented Feb 10, 2014

ihnorton's solution is nice since it allows arbitrary code to be added to the sysimg

@ivarne
Copy link
Member

ivarne commented Feb 10, 2014

Arbitrary code is definitely more flexible than a make variable. I do not fully understand the LOAD_PATH case, but if packages have different requirements, a simple list of packages to include in the sysimg will be brittle and awkward.

I just wanted to suggest a way to avoid creating a new user editable file, and to simplify debugging because the automatically generated userimg.jl file can print the name of the package that is currently compiling. Automatically finding the compatible package version will also be possible.

@StefanKarpinski
Copy link
Member

I think the userimg.jl idea seems good. It's not a super general solution, of course, since you have to recompile Julia to make it work, but it will make loading a bunch of packages easier for people who are willing to do so. Using a make variable is less flexible and harder to implement.

@StefanKarpinski
Copy link
Member

I did the userimg.jl thing – 78e34eb. Unfortunately, it's a big fragile and you need to use Base.require instead of just require inside of that file. Based on how specific this is, it would actually be nice if this could just be a list of package names to preload, but I couldn't get that to work.

@vtjnash vtjnash deleted the jn/no_scheduler branch February 13, 2014 01:58
@vtjnash
Copy link
Member Author

vtjnash commented Feb 13, 2014

@StefanKarpinski It might be good to move this thread over to pull request #5746, where I address the issue of how fragile it is and the need to use Base.require instead of just require

@StefanKarpinski
Copy link
Member

Ah, I didn't realize that was what you guys were talking about over there. Yes, will do.

@quinnj quinnj mentioned this pull request Feb 25, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.