Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Inko #440

Merged
merged 6 commits into from
Nov 17, 2023
Merged

Add Inko #440

merged 6 commits into from
Nov 17, 2023

Conversation

jinyus
Copy link
Owner

@jinyus jinyus commented Nov 17, 2023

https://inko-lang.org/

Needs improvement. 6.7s on my machine

@yorickpeterse
Copy link
Contributor

I'm cleaning the code up a bit and changing the structure so you can just run inko build instead of inko build -i ....

What I noticed so far from running perf is that about 40% of the time is spent in just inko_reduce. This isn't entirely surprising as this is done after every method call and involves a function call to the runtime library. There are plans to get rid of this approach as outlined in inko-lang/inko#522. I'm thinking I might need to prioritize that a bit more though, as I figured it wouldn't be this bad.

I'll add more notes as I make my way through the code and profiling data.

@yorickpeterse
Copy link
Contributor

Per perf, about 20% of the time is spent in Int.+, Int.== and Int.* (collectively). This is likely just the cost of function calls, as the Inko compiler performs no inlining at this stage (or any optimizations for that matter).

An additional 5% is spent in Array.address_of, which is used to figure out the offset to read an array index from. Again, I suspect that's just the function call overhead.

From there, there's a long tail of methods consuming small amounts of the total execution time (e.g. 3% for bounds checking of array indexes). Most of these involve arrays which isn't surprising given how much they're used in the benchmark.

So basically in total there's about 60-70% of time spent that could be easily optimized away by just inlining methods, and getting rid of the current reduction based scheduler. That would leave us with a total runtime of about 1.7 seconds. That's still not great, but it's a lot better and close to what Python does.

@yorickpeterse
Copy link
Contributor

To illustrate the above: if I change the compiler to not emit any reduction code, the runtime is reduced to just under 4 seconds. At that point, 17% of the time is spent in just Int.+, followed by the others.

@yorickpeterse
Copy link
Contributor

If I further change the compiler to use the "aggressive" optimization level for LLVM (-O3 I think), the runtime is reduced to 1.5 seconds. We currently don't enable any optimizations for LLVM because I want to figure out which ones we need (rather than just enabling some opaque list), but it seems here it makes a big difference.

@yorickpeterse
Copy link
Contributor

I added a note about this in inko-lang/inko#595 (comment).

@yorickpeterse
Copy link
Contributor

@jinyus One thing I noticed is that the Inko version seems to produce the following error that I'm not seeing with the Rust version:

Running Inko
Processing time (w/o IO): 1470 ms
1.87s 73800k
Checking output
Error: Post h5cwwbua is invalid!
expected: 17 shared tag count
actual: 20

Perhaps this may also explain Inko being a bit slower, perhaps due to it doing more (or the wrong thing) compared to say Rust?

@jinyus
Copy link
Owner Author

jinyus commented Nov 17, 2023

Yea, it looks like an off by 1 error while I was trying to optimize it. Will fix

@jinyus
Copy link
Owner Author

jinyus commented Nov 17, 2023

@yorickpeterse fixed 6597c6f

The bug actually resulted in less work being done.

Re your notes:

I can just add it to the repo but hold off on adding it to the charts until it has been optimized.

@yorickpeterse
Copy link
Contributor

yorickpeterse commented Nov 17, 2023

@jinyus I cleaned things up with the following patch:

0001-Clean-up-Inko-implementation.patch
From 4eee001a21e31941db63e54771d6d600d3703732 Mon Sep 17 00:00:00 2001
From: Yorick Peterse <[email protected]>
Date: Fri, 17 Nov 2023 17:21:58 +0100
Subject: [PATCH] Clean up Inko implementation

---
 inko/related.inko      | 126 -------------------------
 inko/src/main.inko     | 180 ++++++++++++++++++++++++++++++++++++
 inko/src/main_old.inko | 202 +++++++++++++++++++++++++++++++++++++++++
 inko/utils/io.inko     |  26 ------
 inko/utils/post.inko   |  78 ----------------
 run.sh                 |   6 +-
 6 files changed, 385 insertions(+), 233 deletions(-)
 delete mode 100644 inko/related.inko
 create mode 100644 inko/src/main.inko
 create mode 100644 inko/src/main_old.inko
 delete mode 100644 inko/utils/io.inko
 delete mode 100644 inko/utils/post.inko

diff --git a/inko/related.inko b/inko/related.inko
deleted file mode 100644
index 3b5b308..0000000
--- a/inko/related.inko
+++ /dev/null
@@ -1,126 +0,0 @@
-import post.(json_to_posts,related_to_json,RelatedPost)
-import io.(print,read_file,write_file)
-import std.time.(Instant)
-import std.json.Json
-
-let TOPN = 5
-
-class async Main {
-  fn async main {
-    let json_string = read_file('../posts.json').unwrap
-
-    let posts = json_to_posts(json_string)
-
-    let start = Instant.new
-
-    let post_count = posts.size
-
-    let mut tag_map: Map[String, Array[Int]] = Map.with_capacity(128)
-
-    let mut idx = 0
-    while idx < post_count {
-
-      let tags = posts.get(idx).tags
-      
-      tags.size.times fn (i) {
-
-        let t = tags.get(i)
-
-        match tag_map.opt_mut(t) {
-          
-          case Some(v) -> v.push(idx)
-          
-          case _ -> tag_map.set(t, [idx])
-          
-        }
-      }
-
-      idx += 1
-    }
-
-    let all_related : Array[RelatedPost] = Array.with_capacity(post_count)
-
-
-    let mut i = 0
-    while i < post_count{
-
-
-    let tagged_post_count = Array.filled(with: 0, times: post_count)
-
-      let post = posts.get(i)
-
-      let mut pt = 0
-      while pt < post.tags.size {
-        
-        let indexes = tag_map.get(post.tags.get(pt))
-
-        let mut it = 0
-        while it < indexes.size{
-
-            tagged_post_count.set(i, tagged_post_count.get(i) + 1)
-
-            it += 1
-        }
-
-        pt += 1
-
-      }
-
-      tagged_post_count.set(i,0)
-
-      let mut top_idx = Array.filled(0,TOPN * 2) 
-      let mut min_tags = 0
-
-      let mut idx = 0
-      while idx < post_count{
-
-        let count = tagged_post_count.get(idx)
-
-        if count > min_tags {
-
-          let mut upper_bound = ( TOPN - 2 ) * 2
-
-          while upper_bound >= 0 and count > top_idx.get(upper_bound) {
-            top_idx.set(upper_bound+2, top_idx.get(upper_bound))
-            top_idx.set(upper_bound+3, top_idx.get(upper_bound+1))
-            upper_bound -= 2
-          }
-
-          let insert_pos = upper_bound + 2
-          top_idx.set(insert_pos, count)
-          top_idx.set(insert_pos+1, idx)
-
-          min_tags = top_idx.get(TOPN * 2 - 2 )
-
-        }
-
-        idx += 1
-
-      }
-
-      let top_posts = Array.with_capacity(TOPN)
-
-      TOPN.times fn (j) {
-        let index = top_idx.get(j*2+1)
-        top_posts.push(posts.get(index))
-      }
-
-      all_related.push(RelatedPost{
-        @id = post.id, 
-        @tags = post.tags, 
-        @related = top_posts
-    })
-
-      i += 1
-    }
-
-    let took = start.elapsed.to_millis
-    print("Processing time (w/o IO): {took} ms")
-
-    let json = related_to_json(all_related)
-
-    write_file('../related_posts_inko.json', json).unwrap
-
-  }
-}
-
diff --git a/inko/src/main.inko b/inko/src/main.inko
new file mode 100644
index 0000000..9a4d135
--- /dev/null
+++ b/inko/src/main.inko
@@ -0,0 +1,180 @@
+import std.fs.file.(ReadOnlyFile, WriteOnlyFile)
+import std.json.Json
+import std.stdio.STDOUT
+import std.time.Instant
+
+let TOP_N = 5
+
+class Post {
+  let @id: String
+  let @title: String
+  let @tags: Array[String]
+}
+
+class RelatedPost {
+  let @id: String
+  let @tags: ref Array[String]
+  let @related: Array[ref Post]
+}
+
+fn read_posts(path: String) -> Array[Post] {
+  let bytes = ByteArray.new
+
+  ReadOnlyFile
+    .new(path)
+    .then fn (f) { f.read_all(bytes) }
+    .expect('the JSON file must exist')
+
+  let root = match Json.parse(bytes) {
+    case Ok(Array(v)) -> v
+    case _ -> panic('the JSON file must contain a valid array')
+  }
+
+  root
+    .into_iter
+    .map fn (val) {
+      let obj = match val {
+        case Object(v) -> v
+        case _ -> panic('each entry in the JSON array must be an object')
+      }
+
+      let id = match obj.remove('_id') {
+        case Some(String(v)) -> v
+        case _ -> panic('the "_id" key must be a string')
+      }
+
+      let title = match obj.remove('title') {
+        case Some(String(v)) -> v
+        case _ -> panic('the "title" key must be a string')
+      }
+
+      let tags = match obj.remove('tags') {
+        case Some(Array(array)) -> {
+          array
+            .into_iter
+            .map fn (val) {
+              match val {
+                case String(v) -> v
+                case _ -> panic('each tag must be a string')
+              }
+            }
+            .to_array
+        }
+        case _ -> panic('the "tags" key must be an array of strings')
+      }
+
+      Post { @id = id, @title = title, @tags = tags }
+    }
+    .to_array
+}
+
+fn write_posts(path: String, posts: Array[RelatedPost]) {
+  let values = posts
+    .into_iter
+    .map fn (post) {
+      let related = post
+        .related
+        .iter
+        .map fn (related) {
+          let map = Map.new
+          let tags = related.tags.iter.map fn (t) { Json.String(t) }.to_array
+
+          map.set('_id', Json.String(related.id))
+          map.set('title', Json.String(related.title))
+          map.set('tags', Json.Array(tags))
+          Json.Object(map)
+        }
+        .to_array
+
+      let map = Map.new
+      let tags = post.tags.iter.map fn (t) { Json.String(t) }.to_array
+
+      map.set('_id', Json.String(post.id))
+      map.set('tags', Json.Array(tags))
+      map.set('related', Json.Array(related))
+      Json.Object(map)
+    }
+    .to_array
+
+  let out = Json.Array(values).to_string
+
+  WriteOnlyFile
+    .new(path)
+    .then fn (f) { f.write_string(out) }
+    .expect('failed to write to the output JSON file')
+}
+
+class async Main {
+  fn async main {
+    let posts = read_posts('../posts.json')
+    let start = Instant.new
+    let posts_len = posts.size
+    let tag_map: Map[String, Array[Int]] = Map.with_capacity(100)
+
+    posts.iter.each_with_index fn (idx, post) {
+      post.tags.iter.each fn (tag) {
+        match tag_map.opt_mut(tag) {
+          case Some(v) -> v.push(idx)
+          case _ -> tag_map.set(tag, [idx])
+        }
+      }
+    }
+
+    let all_related_posts = Array.with_capacity(posts_len)
+    let tagged_post_count = Array.filled(with: 0, times: posts_len)
+
+    posts.iter.each_with_index fn (i, post) {
+      posts_len.times fn (i) { tagged_post_count.set(i, 0) }
+      post.tags.iter.each fn (tag) {
+        tag_map.get(tag).iter.each fn (i) {
+          tagged_post_count.set(i, tagged_post_count.get(i) + 1)
+        }
+      }
+
+      tagged_post_count.set(i, 0)
+
+      let top_idx = Array.filled(0, TOP_N * 2)
+      let mut min_tags = 0
+      let mut idx = 0
+
+      while idx < posts_len {
+        let count = tagged_post_count.get(idx)
+
+        if count > min_tags {
+          let mut upper_bound = (TOP_N - 2) * 2
+
+          while upper_bound >= 0 and count > top_idx.get(upper_bound) {
+            top_idx.set(upper_bound + 2, top_idx.get(upper_bound))
+            top_idx.set(upper_bound + 3, top_idx.get(upper_bound + 1))
+            upper_bound -= 2
+          }
+
+          let insert_pos = upper_bound + 2
+
+          top_idx.set(insert_pos, count)
+          top_idx.set(insert_pos + 1, idx)
+          min_tags = top_idx.get(TOP_N * 2 - 2 )
+        }
+
+        idx += 1
+      }
+
+      let top_posts = Array.with_capacity(TOP_N)
+
+      TOP_N.times fn (j) {
+        top_posts.push(posts.get(top_idx.get(j * 2 + 1)))
+      }
+
+      all_related_posts.push(RelatedPost {
+        @id = post.id,
+        @tags = post.tags,
+        @related = top_posts,
+      })
+    }
+
+    let took = start.elapsed
+
+    STDOUT.new.print("Processing time (w/o IO): {took.to_millis} ms")
+    write_posts('../related_posts_inko.json', all_related_posts)
+  }
+}
diff --git a/inko/src/main_old.inko b/inko/src/main_old.inko
new file mode 100644
index 0000000..4056fb5
--- /dev/null
+++ b/inko/src/main_old.inko
@@ -0,0 +1,202 @@
+import std.fs.file.(ReadOnlyFile, WriteOnlyFile)
+import std.io.Error
+import std.json.Json
+import std.stdio.STDOUT
+import std.time.Instant
+
+let TOP_N = 5
+
+class Post {
+  let @id: String
+  let @title: String
+  let @tags: Array[String]
+}
+
+class RelatedPost {
+  let @id: String
+  let @tags: ref Array[String]
+  let @related: Array[ref Post]
+}
+
+fn json_to_posts(json_string: String) -> Array[Post] {
+  let parsed = Json.parse(json_string).unwrap
+  let array =  match parsed {
+    case Array(a) -> a
+    case _ -> panic("json is not an array")
+  }
+
+  array
+    .into_iter
+    .map fn (n) {
+      let obj = match n {
+        case Object(m) -> m
+        case _ -> panic("json is not an object")
+      }
+
+      let id = match obj.get("_id") {
+        case String(s) -> s
+        case _ -> panic("no _id")
+      }
+
+      let title = match obj.get("title") {
+        case String(s) -> s
+        case _ -> panic("no title")
+      }
+
+      let json_tags = match obj.get("tags") {
+        case Array(a) -> a
+        case _ -> panic("no tags")
+      }
+
+      let tags = json_tags.iter.map fn (n) {
+        match n {
+          case String(s) -> s
+          case _ -> panic("tag is not a string")
+        }
+      }
+
+      Post { @id = id, @title = title, @tags = tags.to_array }
+    }
+    .to_array
+}
+
+fn related_to_json(related: Array[RelatedPost]) -> String {
+  let array = related.iter.map fn (n) {
+    let related = n
+      .related
+      .iter
+      .map fn (n) {
+        let map = Map.new
+
+        map.set("_id", Json.String(n.id))
+        map.set("title", Json.String(n.title))
+        map.set(
+          "tags",
+          Json.Array(n.tags.iter.map fn (n) { Json.String(n) }.to_array)
+        )
+
+        Json.Object(map)
+      }
+    .to_array
+
+    let map = Map.new
+
+    map.set("_id", Json.String(n.id))
+    map.set(
+      "tags",
+      Json.Array(n.tags.iter.map fn (n) { Json.String(n) }.to_array)
+    )
+
+    map.set("related", Json.Array(related))
+    Json.Object(map)
+  }.to_array
+
+  Json.Array(array).to_string
+}
+
+fn read_file(name:String) -> Result[String, Error] {
+  let file = try ReadOnlyFile.new(name)
+  let bytes = ByteArray.new
+
+  try file.read_all(bytes)
+  Result.Ok(bytes.to_string)
+}
+
+fn write_file(name:String, content:String) -> Result[Nil, Error] {
+  let file = try WriteOnlyFile.new(name)
+
+  try file.write_string(content)
+  Result.Ok(nil)
+}
+
+class async Main {
+  fn async main_old {
+    let json_string = read_file('../posts.json').unwrap
+    let posts = json_to_posts(json_string)
+    let start = Instant.new
+    let post_count = posts.size
+    let tag_map: Map[String, Array[Int]] = Map.with_capacity(128)
+    let mut idx = 0
+
+    posts.iter.each_with_index fn (idx, post) {
+      post.tags.iter.each fn (tag) {
+        match tag_map.opt_mut(tag) {
+          case Some(v) -> v.push(idx)
+          case _ -> tag_map.set(tag, [idx])
+        }
+      }
+    }
+
+    let all_related = Array.with_capacity(post_count)
+    let mut i = 0
+
+    while i < post_count {
+      let tagged_post_count = Array.filled(with: 0, times: post_count)
+      let post = posts.get(i)
+      let mut pt = 0
+
+      while pt < post.tags.size {
+        let indexes = tag_map.get(post.tags.get(pt))
+        let mut it = 0
+
+        while it < indexes.size {
+          tagged_post_count.set(i, tagged_post_count.get(i) + 1)
+          it += 1
+        }
+
+        pt += 1
+      }
+
+      tagged_post_count.set(i, 0)
+
+      let top_idx = Array.filled(0, TOP_N * 2)
+      let mut min_tags = 0
+      let mut idx = 0
+
+      while idx < post_count {
+        let count = tagged_post_count.get(idx)
+
+        if count > min_tags {
+          let mut upper_bound = (TOP_N - 2) * 2
+
+          while upper_bound >= 0 and count > top_idx.get(upper_bound) {
+            top_idx.set(upper_bound + 2, top_idx.get(upper_bound))
+            top_idx.set(upper_bound + 3, top_idx.get(upper_bound + 1))
+            upper_bound -= 2
+          }
+
+          let insert_pos = upper_bound + 2
+
+          top_idx.set(insert_pos, count)
+          top_idx.set(insert_pos+1, idx)
+          min_tags = top_idx.get(TOP_N * 2 - 2 )
+        }
+
+        idx += 1
+      }
+
+      let top_posts = Array.with_capacity(TOP_N)
+
+      TOP_N.times fn (j) {
+        let index = top_idx.get(j * 2 + 1)
+
+        top_posts.push(posts.get(index))
+      }
+
+      all_related.push(RelatedPost {
+        @id = post.id,
+        @tags = post.tags,
+        @related = top_posts,
+      })
+
+      i += 1
+    }
+
+    let took = start.elapsed.to_millis
+
+    STDOUT.new.print("Processing time (w/o IO): {took} ms")
+
+    write_file('../related_posts_inko.json', related_to_json(all_related))
+      .unwrap
+  }
+}
diff --git a/inko/utils/io.inko b/inko/utils/io.inko
deleted file mode 100644
index 416e6aa..0000000
--- a/inko/utils/io.inko
+++ /dev/null
@@ -1,26 +0,0 @@
-import std.io.Error
-import std.fs.file.(ReadOnlyFile,WriteOnlyFile)
-import std.stdio.STDOUT
-
-
-fn pub print(msg:String) {
-  STDOUT.new.print(msg)
-}
-
-fn pub read_file(name:String) -> Result[String,Error] {
-  let file = try ReadOnlyFile.new(name)
-
-  let bytes = ByteArray.new
-
-  try file.read_all(bytes)
-
-  Result.Ok(bytes.to_string)
-}
-
-fn pub write_file(name:String, content:String) -> Result[Nil,Error] {
-  let file = try WriteOnlyFile.new(name)
-
-  try file.write_string(content)
-
-  Result.Ok(nil)
-}
\ No newline at end of file
diff --git a/inko/utils/post.inko b/inko/utils/post.inko
deleted file mode 100644
index fce77e3..0000000
--- a/inko/utils/post.inko
+++ /dev/null
@@ -1,78 +0,0 @@
-import std.json.Json
-
-class pub Post {
-  let pub @id: String
-  let pub @title: String
-  let pub @tags: Array[String]
-}
-
-class pub RelatedPost {
-  let pub @id: String
-  let pub @tags: ref Array[String]
-  let pub @related: Array[ ref Post]
-}
-
-fn pub json_to_posts(json_string: String) -> Array[Post] {
-  let parsed = Json.parse(json_string).unwrap
-
-  let array =  match parsed {
-    case Array(a) -> a
-    case _ -> panic("json is not an array")
-
-  }
-
-  array.iter.map fn (n) {
-    let obj = match n {
-      case Object(m) -> m
-      case _ -> panic("json is not an object")
-    }
-
-    let id = match obj.get("_id") {
-      case String(s) -> s
-      case _ -> panic("no _id")
-    }
-
-    let title = match obj.get("title") {
-      case String(s) -> s
-      case _ -> panic("no title")
-    }
-
-    let json_tags = match obj.get("tags") {
-      case Array(a) -> a
-      case _ -> panic("no tags")
-    }
-
-    let tags = json_tags.iter.map fn (n) {
-      match n {
-        case String(s) -> s
-        case _ -> panic("tag is not a string")
-      }
-    }
-
-    Post {
-      @id = id,
-      @title = title,
-      @tags = tags.to_array,
-    }
-  }.to_array
-}
-
-fn pub related_to_json(related: Array[RelatedPost]) -> String {
-  let array = related.iter.map fn (n) {
-    let related = n.related.iter.map fn (n) {
-      let mut map = Map.new
-      map.set("_id", Json.String(n.id))
-      map.set("title", Json.String(n.title))
-      map.set("tags", Json.Array(n.tags.iter.map fn (n) { Json.String(n) }.to_array))
-      Json.Object(map)
-    }.to_array
-
-    let mut map = Map.new
-    map.set("_id", Json.String(n.id))
-    map.set("tags", Json.Array(n.tags.iter.map fn (n) { Json.String(n) }.to_array))
-    map.set("related", Json.Array(related))
-    Json.Object(map)
-  }.to_array
-
-  Json.Array(array).to_string
-}
\ No newline at end of file
diff --git a/run.sh b/run.sh
index bd75d67..38bab13 100755
--- a/run.sh
+++ b/run.sh
@@ -892,12 +892,12 @@ run_inko() {
     echo "Running Inko" &&
         cd ./inko &&
         if [ -z "$appendToFile" ]; then # only build on 5k run
-            inko build --opt aggressive -i ./utils related.inko
+            inko build --opt aggressive
         fi &&
         if [ $HYPER == 1 ]; then
-            capture "Inko" hyperfine -r $slow_lang_runs -w $warmup --show-output "./build/aggressive/related"
+            capture "Inko" hyperfine -r $slow_lang_runs -w $warmup --show-output "./build/aggressive/main"
         else
-            command ${time} -f '%es %Mk' ./build//aggressive/related
+            command ${time} -f '%es %Mk' ./build/aggressive/main
         fi
 
     check_output "related_posts_inko.json"
-- 
2.42.1

This doesn't really improve performance much as far as I can tell, but it does make the code more readable/less messy.

I'm perfectly fine with sharing these numbers in the README, though perhaps a mention of Inko not applying optimizations at this time somewhere would be nice, that way people know why it's slow.

@jinyus jinyus marked this pull request as ready for review November 17, 2023 17:18
@jinyus
Copy link
Owner Author

jinyus commented Nov 17, 2023

Thanks for taking the time to review it.

@jinyus jinyus merged commit de9fe56 into main Nov 17, 2023
@jinyus jinyus deleted the inko branch November 17, 2023 17:21
@jinyus jinyus mentioned this pull request Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants