-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Inko #440
Conversation
I'm cleaning the code up a bit and changing the structure so you can just run What I noticed so far from running I'll add more notes as I make my way through the code and profiling data. |
Per An additional 5% is spent in From there, there's a long tail of methods consuming small amounts of the total execution time (e.g. 3% for bounds checking of array indexes). Most of these involve arrays which isn't surprising given how much they're used in the benchmark. So basically in total there's about 60-70% of time spent that could be easily optimized away by just inlining methods, and getting rid of the current reduction based scheduler. That would leave us with a total runtime of about 1.7 seconds. That's still not great, but it's a lot better and close to what Python does. |
To illustrate the above: if I change the compiler to not emit any reduction code, the runtime is reduced to just under 4 seconds. At that point, 17% of the time is spent in just |
If I further change the compiler to use the "aggressive" optimization level for LLVM ( |
I added a note about this in inko-lang/inko#595 (comment). |
@jinyus One thing I noticed is that the Inko version seems to produce the following error that I'm not seeing with the Rust version:
Perhaps this may also explain Inko being a bit slower, perhaps due to it doing more (or the wrong thing) compared to say Rust? |
Yea, it looks like an off by 1 error while I was trying to optimize it. Will fix |
@yorickpeterse fixed 6597c6f The bug actually resulted in less work being done. Re your notes: I can just add it to the repo but hold off on adding it to the charts until it has been optimized. |
@jinyus I cleaned things up with the following patch: 0001-Clean-up-Inko-implementation.patchFrom 4eee001a21e31941db63e54771d6d600d3703732 Mon Sep 17 00:00:00 2001
From: Yorick Peterse <[email protected]>
Date: Fri, 17 Nov 2023 17:21:58 +0100
Subject: [PATCH] Clean up Inko implementation
---
inko/related.inko | 126 -------------------------
inko/src/main.inko | 180 ++++++++++++++++++++++++++++++++++++
inko/src/main_old.inko | 202 +++++++++++++++++++++++++++++++++++++++++
inko/utils/io.inko | 26 ------
inko/utils/post.inko | 78 ----------------
run.sh | 6 +-
6 files changed, 385 insertions(+), 233 deletions(-)
delete mode 100644 inko/related.inko
create mode 100644 inko/src/main.inko
create mode 100644 inko/src/main_old.inko
delete mode 100644 inko/utils/io.inko
delete mode 100644 inko/utils/post.inko
diff --git a/inko/related.inko b/inko/related.inko
deleted file mode 100644
index 3b5b308..0000000
--- a/inko/related.inko
+++ /dev/null
@@ -1,126 +0,0 @@
-import post.(json_to_posts,related_to_json,RelatedPost)
-import io.(print,read_file,write_file)
-import std.time.(Instant)
-import std.json.Json
-
-let TOPN = 5
-
-class async Main {
- fn async main {
- let json_string = read_file('../posts.json').unwrap
-
- let posts = json_to_posts(json_string)
-
- let start = Instant.new
-
- let post_count = posts.size
-
- let mut tag_map: Map[String, Array[Int]] = Map.with_capacity(128)
-
- let mut idx = 0
- while idx < post_count {
-
- let tags = posts.get(idx).tags
-
- tags.size.times fn (i) {
-
- let t = tags.get(i)
-
- match tag_map.opt_mut(t) {
-
- case Some(v) -> v.push(idx)
-
- case _ -> tag_map.set(t, [idx])
-
- }
- }
-
- idx += 1
- }
-
- let all_related : Array[RelatedPost] = Array.with_capacity(post_count)
-
-
- let mut i = 0
- while i < post_count{
-
-
- let tagged_post_count = Array.filled(with: 0, times: post_count)
-
- let post = posts.get(i)
-
- let mut pt = 0
- while pt < post.tags.size {
-
- let indexes = tag_map.get(post.tags.get(pt))
-
- let mut it = 0
- while it < indexes.size{
-
- tagged_post_count.set(i, tagged_post_count.get(i) + 1)
-
- it += 1
- }
-
- pt += 1
-
- }
-
- tagged_post_count.set(i,0)
-
- let mut top_idx = Array.filled(0,TOPN * 2)
- let mut min_tags = 0
-
- let mut idx = 0
- while idx < post_count{
-
- let count = tagged_post_count.get(idx)
-
- if count > min_tags {
-
- let mut upper_bound = ( TOPN - 2 ) * 2
-
- while upper_bound >= 0 and count > top_idx.get(upper_bound) {
- top_idx.set(upper_bound+2, top_idx.get(upper_bound))
- top_idx.set(upper_bound+3, top_idx.get(upper_bound+1))
- upper_bound -= 2
- }
-
- let insert_pos = upper_bound + 2
- top_idx.set(insert_pos, count)
- top_idx.set(insert_pos+1, idx)
-
- min_tags = top_idx.get(TOPN * 2 - 2 )
-
- }
-
- idx += 1
-
- }
-
- let top_posts = Array.with_capacity(TOPN)
-
- TOPN.times fn (j) {
- let index = top_idx.get(j*2+1)
- top_posts.push(posts.get(index))
- }
-
- all_related.push(RelatedPost{
- @id = post.id,
- @tags = post.tags,
- @related = top_posts
- })
-
- i += 1
- }
-
- let took = start.elapsed.to_millis
- print("Processing time (w/o IO): {took} ms")
-
- let json = related_to_json(all_related)
-
- write_file('../related_posts_inko.json', json).unwrap
-
- }
-}
-
diff --git a/inko/src/main.inko b/inko/src/main.inko
new file mode 100644
index 0000000..9a4d135
--- /dev/null
+++ b/inko/src/main.inko
@@ -0,0 +1,180 @@
+import std.fs.file.(ReadOnlyFile, WriteOnlyFile)
+import std.json.Json
+import std.stdio.STDOUT
+import std.time.Instant
+
+let TOP_N = 5
+
+class Post {
+ let @id: String
+ let @title: String
+ let @tags: Array[String]
+}
+
+class RelatedPost {
+ let @id: String
+ let @tags: ref Array[String]
+ let @related: Array[ref Post]
+}
+
+fn read_posts(path: String) -> Array[Post] {
+ let bytes = ByteArray.new
+
+ ReadOnlyFile
+ .new(path)
+ .then fn (f) { f.read_all(bytes) }
+ .expect('the JSON file must exist')
+
+ let root = match Json.parse(bytes) {
+ case Ok(Array(v)) -> v
+ case _ -> panic('the JSON file must contain a valid array')
+ }
+
+ root
+ .into_iter
+ .map fn (val) {
+ let obj = match val {
+ case Object(v) -> v
+ case _ -> panic('each entry in the JSON array must be an object')
+ }
+
+ let id = match obj.remove('_id') {
+ case Some(String(v)) -> v
+ case _ -> panic('the "_id" key must be a string')
+ }
+
+ let title = match obj.remove('title') {
+ case Some(String(v)) -> v
+ case _ -> panic('the "title" key must be a string')
+ }
+
+ let tags = match obj.remove('tags') {
+ case Some(Array(array)) -> {
+ array
+ .into_iter
+ .map fn (val) {
+ match val {
+ case String(v) -> v
+ case _ -> panic('each tag must be a string')
+ }
+ }
+ .to_array
+ }
+ case _ -> panic('the "tags" key must be an array of strings')
+ }
+
+ Post { @id = id, @title = title, @tags = tags }
+ }
+ .to_array
+}
+
+fn write_posts(path: String, posts: Array[RelatedPost]) {
+ let values = posts
+ .into_iter
+ .map fn (post) {
+ let related = post
+ .related
+ .iter
+ .map fn (related) {
+ let map = Map.new
+ let tags = related.tags.iter.map fn (t) { Json.String(t) }.to_array
+
+ map.set('_id', Json.String(related.id))
+ map.set('title', Json.String(related.title))
+ map.set('tags', Json.Array(tags))
+ Json.Object(map)
+ }
+ .to_array
+
+ let map = Map.new
+ let tags = post.tags.iter.map fn (t) { Json.String(t) }.to_array
+
+ map.set('_id', Json.String(post.id))
+ map.set('tags', Json.Array(tags))
+ map.set('related', Json.Array(related))
+ Json.Object(map)
+ }
+ .to_array
+
+ let out = Json.Array(values).to_string
+
+ WriteOnlyFile
+ .new(path)
+ .then fn (f) { f.write_string(out) }
+ .expect('failed to write to the output JSON file')
+}
+
+class async Main {
+ fn async main {
+ let posts = read_posts('../posts.json')
+ let start = Instant.new
+ let posts_len = posts.size
+ let tag_map: Map[String, Array[Int]] = Map.with_capacity(100)
+
+ posts.iter.each_with_index fn (idx, post) {
+ post.tags.iter.each fn (tag) {
+ match tag_map.opt_mut(tag) {
+ case Some(v) -> v.push(idx)
+ case _ -> tag_map.set(tag, [idx])
+ }
+ }
+ }
+
+ let all_related_posts = Array.with_capacity(posts_len)
+ let tagged_post_count = Array.filled(with: 0, times: posts_len)
+
+ posts.iter.each_with_index fn (i, post) {
+ posts_len.times fn (i) { tagged_post_count.set(i, 0) }
+ post.tags.iter.each fn (tag) {
+ tag_map.get(tag).iter.each fn (i) {
+ tagged_post_count.set(i, tagged_post_count.get(i) + 1)
+ }
+ }
+
+ tagged_post_count.set(i, 0)
+
+ let top_idx = Array.filled(0, TOP_N * 2)
+ let mut min_tags = 0
+ let mut idx = 0
+
+ while idx < posts_len {
+ let count = tagged_post_count.get(idx)
+
+ if count > min_tags {
+ let mut upper_bound = (TOP_N - 2) * 2
+
+ while upper_bound >= 0 and count > top_idx.get(upper_bound) {
+ top_idx.set(upper_bound + 2, top_idx.get(upper_bound))
+ top_idx.set(upper_bound + 3, top_idx.get(upper_bound + 1))
+ upper_bound -= 2
+ }
+
+ let insert_pos = upper_bound + 2
+
+ top_idx.set(insert_pos, count)
+ top_idx.set(insert_pos + 1, idx)
+ min_tags = top_idx.get(TOP_N * 2 - 2 )
+ }
+
+ idx += 1
+ }
+
+ let top_posts = Array.with_capacity(TOP_N)
+
+ TOP_N.times fn (j) {
+ top_posts.push(posts.get(top_idx.get(j * 2 + 1)))
+ }
+
+ all_related_posts.push(RelatedPost {
+ @id = post.id,
+ @tags = post.tags,
+ @related = top_posts,
+ })
+ }
+
+ let took = start.elapsed
+
+ STDOUT.new.print("Processing time (w/o IO): {took.to_millis} ms")
+ write_posts('../related_posts_inko.json', all_related_posts)
+ }
+}
diff --git a/inko/src/main_old.inko b/inko/src/main_old.inko
new file mode 100644
index 0000000..4056fb5
--- /dev/null
+++ b/inko/src/main_old.inko
@@ -0,0 +1,202 @@
+import std.fs.file.(ReadOnlyFile, WriteOnlyFile)
+import std.io.Error
+import std.json.Json
+import std.stdio.STDOUT
+import std.time.Instant
+
+let TOP_N = 5
+
+class Post {
+ let @id: String
+ let @title: String
+ let @tags: Array[String]
+}
+
+class RelatedPost {
+ let @id: String
+ let @tags: ref Array[String]
+ let @related: Array[ref Post]
+}
+
+fn json_to_posts(json_string: String) -> Array[Post] {
+ let parsed = Json.parse(json_string).unwrap
+ let array = match parsed {
+ case Array(a) -> a
+ case _ -> panic("json is not an array")
+ }
+
+ array
+ .into_iter
+ .map fn (n) {
+ let obj = match n {
+ case Object(m) -> m
+ case _ -> panic("json is not an object")
+ }
+
+ let id = match obj.get("_id") {
+ case String(s) -> s
+ case _ -> panic("no _id")
+ }
+
+ let title = match obj.get("title") {
+ case String(s) -> s
+ case _ -> panic("no title")
+ }
+
+ let json_tags = match obj.get("tags") {
+ case Array(a) -> a
+ case _ -> panic("no tags")
+ }
+
+ let tags = json_tags.iter.map fn (n) {
+ match n {
+ case String(s) -> s
+ case _ -> panic("tag is not a string")
+ }
+ }
+
+ Post { @id = id, @title = title, @tags = tags.to_array }
+ }
+ .to_array
+}
+
+fn related_to_json(related: Array[RelatedPost]) -> String {
+ let array = related.iter.map fn (n) {
+ let related = n
+ .related
+ .iter
+ .map fn (n) {
+ let map = Map.new
+
+ map.set("_id", Json.String(n.id))
+ map.set("title", Json.String(n.title))
+ map.set(
+ "tags",
+ Json.Array(n.tags.iter.map fn (n) { Json.String(n) }.to_array)
+ )
+
+ Json.Object(map)
+ }
+ .to_array
+
+ let map = Map.new
+
+ map.set("_id", Json.String(n.id))
+ map.set(
+ "tags",
+ Json.Array(n.tags.iter.map fn (n) { Json.String(n) }.to_array)
+ )
+
+ map.set("related", Json.Array(related))
+ Json.Object(map)
+ }.to_array
+
+ Json.Array(array).to_string
+}
+
+fn read_file(name:String) -> Result[String, Error] {
+ let file = try ReadOnlyFile.new(name)
+ let bytes = ByteArray.new
+
+ try file.read_all(bytes)
+ Result.Ok(bytes.to_string)
+}
+
+fn write_file(name:String, content:String) -> Result[Nil, Error] {
+ let file = try WriteOnlyFile.new(name)
+
+ try file.write_string(content)
+ Result.Ok(nil)
+}
+
+class async Main {
+ fn async main_old {
+ let json_string = read_file('../posts.json').unwrap
+ let posts = json_to_posts(json_string)
+ let start = Instant.new
+ let post_count = posts.size
+ let tag_map: Map[String, Array[Int]] = Map.with_capacity(128)
+ let mut idx = 0
+
+ posts.iter.each_with_index fn (idx, post) {
+ post.tags.iter.each fn (tag) {
+ match tag_map.opt_mut(tag) {
+ case Some(v) -> v.push(idx)
+ case _ -> tag_map.set(tag, [idx])
+ }
+ }
+ }
+
+ let all_related = Array.with_capacity(post_count)
+ let mut i = 0
+
+ while i < post_count {
+ let tagged_post_count = Array.filled(with: 0, times: post_count)
+ let post = posts.get(i)
+ let mut pt = 0
+
+ while pt < post.tags.size {
+ let indexes = tag_map.get(post.tags.get(pt))
+ let mut it = 0
+
+ while it < indexes.size {
+ tagged_post_count.set(i, tagged_post_count.get(i) + 1)
+ it += 1
+ }
+
+ pt += 1
+ }
+
+ tagged_post_count.set(i, 0)
+
+ let top_idx = Array.filled(0, TOP_N * 2)
+ let mut min_tags = 0
+ let mut idx = 0
+
+ while idx < post_count {
+ let count = tagged_post_count.get(idx)
+
+ if count > min_tags {
+ let mut upper_bound = (TOP_N - 2) * 2
+
+ while upper_bound >= 0 and count > top_idx.get(upper_bound) {
+ top_idx.set(upper_bound + 2, top_idx.get(upper_bound))
+ top_idx.set(upper_bound + 3, top_idx.get(upper_bound + 1))
+ upper_bound -= 2
+ }
+
+ let insert_pos = upper_bound + 2
+
+ top_idx.set(insert_pos, count)
+ top_idx.set(insert_pos+1, idx)
+ min_tags = top_idx.get(TOP_N * 2 - 2 )
+ }
+
+ idx += 1
+ }
+
+ let top_posts = Array.with_capacity(TOP_N)
+
+ TOP_N.times fn (j) {
+ let index = top_idx.get(j * 2 + 1)
+
+ top_posts.push(posts.get(index))
+ }
+
+ all_related.push(RelatedPost {
+ @id = post.id,
+ @tags = post.tags,
+ @related = top_posts,
+ })
+
+ i += 1
+ }
+
+ let took = start.elapsed.to_millis
+
+ STDOUT.new.print("Processing time (w/o IO): {took} ms")
+
+ write_file('../related_posts_inko.json', related_to_json(all_related))
+ .unwrap
+ }
+}
diff --git a/inko/utils/io.inko b/inko/utils/io.inko
deleted file mode 100644
index 416e6aa..0000000
--- a/inko/utils/io.inko
+++ /dev/null
@@ -1,26 +0,0 @@
-import std.io.Error
-import std.fs.file.(ReadOnlyFile,WriteOnlyFile)
-import std.stdio.STDOUT
-
-
-fn pub print(msg:String) {
- STDOUT.new.print(msg)
-}
-
-fn pub read_file(name:String) -> Result[String,Error] {
- let file = try ReadOnlyFile.new(name)
-
- let bytes = ByteArray.new
-
- try file.read_all(bytes)
-
- Result.Ok(bytes.to_string)
-}
-
-fn pub write_file(name:String, content:String) -> Result[Nil,Error] {
- let file = try WriteOnlyFile.new(name)
-
- try file.write_string(content)
-
- Result.Ok(nil)
-}
\ No newline at end of file
diff --git a/inko/utils/post.inko b/inko/utils/post.inko
deleted file mode 100644
index fce77e3..0000000
--- a/inko/utils/post.inko
+++ /dev/null
@@ -1,78 +0,0 @@
-import std.json.Json
-
-class pub Post {
- let pub @id: String
- let pub @title: String
- let pub @tags: Array[String]
-}
-
-class pub RelatedPost {
- let pub @id: String
- let pub @tags: ref Array[String]
- let pub @related: Array[ ref Post]
-}
-
-fn pub json_to_posts(json_string: String) -> Array[Post] {
- let parsed = Json.parse(json_string).unwrap
-
- let array = match parsed {
- case Array(a) -> a
- case _ -> panic("json is not an array")
-
- }
-
- array.iter.map fn (n) {
- let obj = match n {
- case Object(m) -> m
- case _ -> panic("json is not an object")
- }
-
- let id = match obj.get("_id") {
- case String(s) -> s
- case _ -> panic("no _id")
- }
-
- let title = match obj.get("title") {
- case String(s) -> s
- case _ -> panic("no title")
- }
-
- let json_tags = match obj.get("tags") {
- case Array(a) -> a
- case _ -> panic("no tags")
- }
-
- let tags = json_tags.iter.map fn (n) {
- match n {
- case String(s) -> s
- case _ -> panic("tag is not a string")
- }
- }
-
- Post {
- @id = id,
- @title = title,
- @tags = tags.to_array,
- }
- }.to_array
-}
-
-fn pub related_to_json(related: Array[RelatedPost]) -> String {
- let array = related.iter.map fn (n) {
- let related = n.related.iter.map fn (n) {
- let mut map = Map.new
- map.set("_id", Json.String(n.id))
- map.set("title", Json.String(n.title))
- map.set("tags", Json.Array(n.tags.iter.map fn (n) { Json.String(n) }.to_array))
- Json.Object(map)
- }.to_array
-
- let mut map = Map.new
- map.set("_id", Json.String(n.id))
- map.set("tags", Json.Array(n.tags.iter.map fn (n) { Json.String(n) }.to_array))
- map.set("related", Json.Array(related))
- Json.Object(map)
- }.to_array
-
- Json.Array(array).to_string
-}
\ No newline at end of file
diff --git a/run.sh b/run.sh
index bd75d67..38bab13 100755
--- a/run.sh
+++ b/run.sh
@@ -892,12 +892,12 @@ run_inko() {
echo "Running Inko" &&
cd ./inko &&
if [ -z "$appendToFile" ]; then # only build on 5k run
- inko build --opt aggressive -i ./utils related.inko
+ inko build --opt aggressive
fi &&
if [ $HYPER == 1 ]; then
- capture "Inko" hyperfine -r $slow_lang_runs -w $warmup --show-output "./build/aggressive/related"
+ capture "Inko" hyperfine -r $slow_lang_runs -w $warmup --show-output "./build/aggressive/main"
else
- command ${time} -f '%es %Mk' ./build//aggressive/related
+ command ${time} -f '%es %Mk' ./build/aggressive/main
fi
check_output "related_posts_inko.json"
--
2.42.1 This doesn't really improve performance much as far as I can tell, but it does make the code more readable/less messy. I'm perfectly fine with sharing these numbers in the README, though perhaps a mention of Inko not applying optimizations at this time somewhere would be nice, that way people know why it's slow. |
Co-authored-by: yorickpeterse <[email protected]>
Thanks for taking the time to review it. |
https://inko-lang.org/
Needs improvement. 6.7s on my machine