From 9a86e562fc81f4ae0dda4a42ee0e602257ed0739 Mon Sep 17 00:00:00 2001 From: Victor Goff Date: Mon, 29 Jul 2024 05:24:01 -0400 Subject: [PATCH] runes concept: single sentence per line (#2825) --- concepts/runes/about.md | 27 ++++++++++++++++++++------- concepts/runes/introduction.md | 24 ++++++++++++++++++------ 2 files changed, 38 insertions(+), 13 deletions(-) diff --git a/concepts/runes/about.md b/concepts/runes/about.md index f05c7d688..07f5844cb 100644 --- a/concepts/runes/about.md +++ b/concepts/runes/about.md @@ -1,10 +1,14 @@ # About -The `rune` type in Go is an alias for `int32`. Given this underlying `int32` type, the `rune` type holds a signed 32-bit integer value. However, unlike an `int32` type, the integer value stored in a `rune` type represents a single Unicode character. +The `rune` type in Go is an alias for `int32`. +Given this underlying `int32` type, the `rune` type holds a signed 32-bit integer value. +However, unlike an `int32` type, the integer value stored in a `rune` type represents a single Unicode character. ## Unicode and Unicode Code Points -Unicode is a superset of ASCII that represents characters by assigning a unique number to every character. This unique number is called a Unicode code point. Unicode aims to represent all the world's characters including various alphabets, numbers, symbols, and even emoji as Unicode code points. +Unicode is a superset of ASCII that represents characters by assigning a unique number to every character. +This unique number is called a Unicode code point. +Unicode aims to represent all the world's characters including various alphabets, numbers, symbols, and even emoji as Unicode code points. In Go, the `rune` type represents a single Unicode code point. @@ -21,7 +25,9 @@ The following table contains example Unicode characters along with their Unicode ## UTF-8 -UTF-8 is a variable-width character encoding that is used to encode every Unicode code point as 1, 2, 3, or 4 bytes. Since a Unicode code point can be encoded as a maximum of 4 bytes, the `rune` type needs to be able to hold up to 4 bytes of data. That is why the `rune` type is an alias for `int32` as an `int32` type is capable of holding up to 4 bytes of data. +UTF-8 is a variable-width character encoding that is used to encode every Unicode code point as 1, 2, 3, or 4 bytes. +Since a Unicode code point can be encoded as a maximum of 4 bytes, the `rune` type needs to be able to hold up to 4 bytes of data. +That is why the `rune` type is an alias for `int32` as an `int32` type is capable of holding up to 4 bytes of data. Go source code files are encoded using UTF-8. @@ -76,9 +82,14 @@ fmt.Printf("myRune Unicode character: %c\n", myRune) ## Runes and Strings -Strings in Go are encoded using UTF-8 which means they contain Unicode characters. Since the `rune` type represents a Unicode character, a string in Go is often referred to as a sequence of runes. However, runes are stored as 1, 2, 3, or 4 bytes depending on the character. Due to this, strings are really just a sequence of bytes. In Go, slices are used to represent sequences and these slices can be iterated over using `range`. +Strings in Go are encoded using UTF-8 which means they contain Unicode characters. +Since the `rune` type represents a Unicode character, a string in Go is often referred to as a sequence of runes. +However, runes are stored as 1, 2, 3, or 4 bytes depending on the character. +Due to this, strings are really just a sequence of bytes. +In Go, slices are used to represent sequences and these slices can be iterated over using `range`. -Even though a string is just a slice of bytes, the `range` keyword iterates over a string's runes, not its bytes. In this example, the `index` variable represents the starting index of the current rune's byte sequence and the `char` variable represents the current rune: +Even though a string is just a slice of bytes, the `range` keyword iterates over a string's runes, not its bytes. +In this example, the `index` variable represents the starting index of the current rune's byte sequence and the `char` variable represents the current rune: ```go myString := "❗hello" @@ -94,7 +105,8 @@ for index, char := range myString { // Index: 7 Character: o Code Point: U+006F ``` -Since runes can be stored as 1, 2, 3, or 4 bytes, the length of a string may not always equal the number of characters in the string. Use the builtin `len` function to get the length of a string in bytes and the `utf8.RuneCountInString` function to get the number of runes in a string: +Since runes can be stored as 1, 2, 3, or 4 bytes, the length of a string may not always equal the number of characters in the string. +Use the builtin `len` function to get the length of a string in bytes and the `utf8.RuneCountInString` function to get the number of runes in a string: ```go import "unicode/utf8" @@ -118,7 +130,8 @@ fmt.Println(myString) // Output: exercism ``` -Similarly, a string can be type converted to a slice of runes. Remember, without formatting verbs, printing a rune yields its integer (decimal) value: +Similarly, a string can be type converted to a slice of runes. +Remember, without formatting verbs, printing a rune yields its integer (decimal) value: ```go myString := "exercism" diff --git a/concepts/runes/introduction.md b/concepts/runes/introduction.md index 411da7d00..57dd0781e 100644 --- a/concepts/runes/introduction.md +++ b/concepts/runes/introduction.md @@ -1,10 +1,14 @@ # Introduction -The `rune` type in Go is an alias for `int32`. Given this underlying `int32` type, the `rune` type holds a signed 32-bit integer value. However, unlike an `int32` type, the integer value stored in a `rune` type represents a single Unicode character. +The `rune` type in Go is an alias for `int32`. +Given this underlying `int32` type, the `rune` type holds a signed 32-bit integer value. +However, unlike an `int32` type, the integer value stored in a `rune` type represents a single Unicode character. ## Unicode and Unicode Code Points -Unicode is a superset of ASCII that represents characters by assigning a unique number to every character. This unique number is called a Unicode code point. Unicode aims to represent all the world's characters including various alphabets, numbers, symbols, and even emoji as Unicode code points. +Unicode is a superset of ASCII that represents characters by assigning a unique number to every character. +This unique number is called a Unicode code point. +Unicode aims to represent all the world's characters including various alphabets, numbers, symbols, and even emoji as Unicode code points. In Go, the `rune` type represents a single Unicode code point. @@ -21,7 +25,9 @@ The following table contains example Unicode characters along with their Unicode ## UTF-8 -UTF-8 is a variable-width character encoding that is used to encode every Unicode code point as 1, 2, 3, or 4 bytes. Since a Unicode code point can be encoded as a maximum of 4 bytes, the `rune` type needs to be able to hold up to 4 bytes of data. That is why the `rune` type is an alias for `int32` as an `int32` type is capable of holding up to 4 bytes of data. +UTF-8 is a variable-width character encoding that is used to encode every Unicode code point as 1, 2, 3, or 4 bytes. +Since a Unicode code point can be encoded as a maximum of 4 bytes, the `rune` type needs to be able to hold up to 4 bytes of data. +That is why the `rune` type is an alias for `int32` as an `int32` type is capable of holding up to 4 bytes of data. Go source code files are encoded using UTF-8. @@ -67,9 +73,14 @@ fmt.Printf("myRune Unicode code point: %U\n", myRune) ## Runes and Strings -Strings in Go are encoded using UTF-8 which means they contain Unicode characters. Since the `rune` type represents a Unicode character, a string in Go is often referred to as a sequence of runes. However, runes are stored as 1, 2, 3, or 4 bytes depending on the character. Due to this, strings are really just a sequence of bytes. In Go, slices are used to represent sequences and these slices can be iterated over using `range`. +Strings in Go are encoded using UTF-8 which means they contain Unicode characters. +Since the `rune` type represents a Unicode character, a string in Go is often referred to as a sequence of runes. +However, runes are stored as 1, 2, 3, or 4 bytes depending on the character. +Due to this, strings are really just a sequence of bytes. +In Go, slices are used to represent sequences and these slices can be iterated over using `range`. -Even though a string is just a slice of bytes, the `range` keyword iterates over a string's runes, not its bytes. In this example, the `index` variable represents the starting index of the current rune's byte sequence and the `char` variable represents the current rune: +Even though a string is just a slice of bytes, the `range` keyword iterates over a string's runes, not its bytes. +In this example, the `index` variable represents the starting index of the current rune's byte sequence and the `char` variable represents the current rune: ```go myString := "❗hello" @@ -85,7 +96,8 @@ for index, char := range myString { // Index: 7 Character: o Code Point: U+006F ``` -Since runes can be stored as 1, 2, 3, or 4 bytes, the length of a string may not always equal the number of characters in the string. Use the builtin `len` function to get the length of a string in bytes and the `utf8.RuneCountInString` function to get the number of runes in a string: +Since runes can be stored as 1, 2, 3, or 4 bytes, the length of a string may not always equal the number of characters in the string. +Use the builtin `len` function to get the length of a string in bytes and the `utf8.RuneCountInString` function to get the number of runes in a string: ```go import "unicode/utf8"