Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building a training set of tags for elm #627

Closed
ErikSchierboom opened this issue Nov 1, 2023 · 29 comments
Closed

Building a training set of tags for elm #627

ErikSchierboom opened this issue Nov 1, 2023 · 29 comments

Comments

@ErikSchierboom
Copy link
Member

Hello lovely maintainers 👋

We've recently added "tags" to student's solutions. These express the constructs, paradigms and techniques that a solution uses. We are going to be using these tags for lots of things including filtering, pointing a student to alternative approaches, and much more.

In order to do this, we've built out a full AST-based tagger in C#, which has allowed us to do things like detect recursion or bit shifting. We've set things up so other tracks can do the same for their languages, but its a lot of work, and we've determined that actually it may be unnecessary. Instead we think that we can use machine learning to achieve tagging with good enough results. We've fine-tuned a model that can determine the correct tags for C# from the examples with a high success rate. It's also doing reasonably well in an untrained state for other languages. We think that with only a few examples per language, we can potentially get some quite good results, and that we can then refine things further as we go.

I released a new video on the Insiders page that talks through this in more detail.

We're going to be adding a fully-fledged UI in the coming weeks that allow maintainers and mentors to tag solutions and create training sets for the neural networks, but to start with, we're hoping you would be willing to manually tag 20 solutions for this track. In this post we'll add 20 comments, each with a student's solution, and the tags our model has generated. Your mission (should you choose to accept it) is to edit the tags on each issue, removing any incorrect ones, and add any that are missing. In order to build one model that performs well across languages, it's best if you stick as closely as possible to the C# tags as you can. Those are listed here. If you want to add extra tags, that's totally fine, but please don't arbitrarily reword existing tags, even if you don't like what Erik's chosen, as it'll just make it less likely that your language gets the correct tags assigned by the neural network.


To summarise - there are two paths forward for this issue:

  1. You're up for helping: Add a comment saying you're up for helping. Update the tags some time in the next few days. Add a comment when you're done. We'll then add them to our training set and move forward.
  2. You not up for helping: No problem! Just please add a comment letting us know :)

If you tell us you're not able/wanting to help or there's no comment added, we'll automatically crowd-source this in a week or so.

Finally, if you have questions or want to discuss things, it would be best done on the forum, so the knowledge can be shared across all maintainers in all tracks.

Thanks for your help! 💙


Note: Meta discussion on the forum

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: bob

Code

module Bob exposing (..)
import String exposing (trim, toUpper, toLower, endsWith)

hey : String -> String
hey phrase =
  if "" == trim phrase then
    "Fine. Be that way!"
  else if phrase == toUpper phrase && phrase /= toLower phrase then
    "Whoa, chill out!"
  else if endsWith "?" phrase then
    "Sure."
  else
    "Whatever."

Tags:

construct:boolean
construct:if-then-else
construct:string
technique:boolean-logic
paradigm:functional

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: twelve-days

Code

module TwelveDays exposing (recite)

import Array


ordinals =
    Array.fromList
        [ "zeroeth"
        , "first"
        , "second"
        , "third"
        , "fourth"
        , "fifth"
        , "sixth"
        , "seventh"
        , "eighth"
        , "ninth"
        , "tenth"
        , "eleventh"
        , "twelfth"
        ]


amountNatural i =
    if i == 1 then
        "a"
    else
        Array.get i numbers |> Maybe.withDefault ""


numbers =
    Array.fromList
        [ "zero"
        , "one"
        , "two"
        , "three"
        , "four"
        , "five"
        , "six"
        , "seven"
        , "eight"
        , "nine"
        , "ten"
        , "eleven"
        , "twelve"
        ]


things =
    Array.fromList
        [ ""
        , "Partridge in a Pear Tree"
        , "Turtle Doves"
        , "French Hens"
        , "Calling Birds"
        , "Gold Rings"
        , "Geese-a-Laying"
        , "Swans-a-Swimming"
        , "Maids-a-Milking"
        , "Ladies Dancing"
        , "Lords-a-Leaping"
        , "Pipers Piping"
        , "Drummers Drumming"
        ]


recite : Int -> Int -> List String
recite start stop =
    List.range start stop
        |> List.map
            (\day ->
                "On the "
                    ++ (Array.get day ordinals |> Maybe.withDefault "")
                    ++ " day of Christmas my true love gave to me, "
                    ++ thingsFor day
                    ++ "."
            )


thingsFor day =
    let
        printAnd thingIndex =
            if day > 1 && thingIndex == 1 then
                "and "
            else
                ""
    in
    List.range 1 day
        |> List.foldl
            (\thingIndex ->
                (::) <|
                    printAnd thingIndex
                        ++ amountNatural thingIndex
                        ++ " "
                        ++ (Array.get thingIndex things |> Maybe.withDefault "")
            )
            []
        |> String.join ", "

Tags:

construct:array
construct:if-then-else
construct:integral-number
construct:lambda
construct:list
construct:let
construct:string
construct:pipe-forward
construct:pipe-backward
paradigm:functional
technique:boolean-logic
uses:Maybe
uses:operator-function

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: pangram

Code

module Pangram exposing (..)
import String
import Char


isPangram s =
  let
      alphabet = List.map Char.fromCode [97..122]
      sContains char = String.contains (String.fromChar char) (String.toLower s)
  in
      List.all sContains alphabet

Tags:

construct:char
construct:let
construct:string
technique:type-conversion
construct:list
paradigm:functional

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: triangle

Code

module Triangle exposing (..)

triangleKind : Float -> Float -> Float -> Result String String
triangleKind a b c =
  if (a<=0) || (b<=0) || (c<=0) then
    Err "Invalid lengths"
  else if (a+b) <= c || (b+c) <= a || (a+c) <= b then
    Err "Violates inequality"
  else if a == b && b == c then
    Ok "equilateral"
  else if a == b || b == c || a == c then
    Ok "isosceles"
  else
    Ok "scalene"

Tags:

construct:add
construct:floating-point-number
construct:if-then-else
construct:string
paradigm:functional
technique:boolean-logic
uses:Result

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: strain

Code

module Strain exposing (keep, discard)


keep : (a -> Bool) -> List a -> List a
keep f items =
    case items of
        [] ->
            []

        x :: xs ->
            if f x == True then
                x :: keep f xs
            else
                keep f xs


discard : (a -> Bool) -> List a -> List a
discard f xs =
    keep (f >> not) xs

Tags:

construct:bool
construct:list
construct:if-then-else
construct:pattern-matching
technique:recursion
paradigm:functional
technique:boolean-logic
technique:higher-order-functions
uses:composition

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: sum-of-multiples

Code

module SumOfMultiples exposing (..)

import Set


sumOfMultiples : List Int -> Int -> Int
sumOfMultiples list max =
    list
        |> List.foldl
            (\x acc ->
                findMultiples max x
                    |> Set.fromList
                    |> Set.union acc
            )
            Set.empty
        |> Set.toList
        |> List.sum


findMultiples : Int -> Int -> List Int
findMultiples max n =
    List.filterMap
        (\x ->
            if x * n < max then
                Just (x * n)
            else
                Nothing
        )
        [1..(max // n)]

Tags:

construct:if-then-else
construct:integral-number
construct:lambda
construct:list
construct:multiply
construct:set
construct:divide
construct:pipe-forward
technique:boolean-logic
paradigm:functional

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: rna-transcription

Code

module RNATranscription exposing (..)

import String


toRNA : String -> Result Char String
toRNA s =
    List.foldl convertChar (Ok "") (String.toList s)


convertChar : Char -> Result Char String -> Result Char String
convertChar c r =
    case ( c, r ) of
        ( 'G', Ok s ) ->
            Ok <| s ++ String.fromList [ 'C' ]

        ( 'C', Ok s ) ->
            Ok <| s ++ String.fromList [ 'G' ]

        ( 'T', Ok s ) ->
            Ok <| s ++ String.fromList [ 'A' ]

        ( 'A', Ok s ) ->
            Ok <| s ++ String.fromList [ 'U' ]

        ( c, Ok s ) ->
            Err <| c

        ( c, Err c' ) ->
            Err c'

Tags:

construct:char
construct:list
construct:pattern-matching
construct:string
construct:pipe-backward
paradigm:functional
uses:Result

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: nucleotide-count

Code

module NucleotideCount exposing (nucleotideCounts, version)

import String


type alias Counts =
    { a : Int
    , c : Int
    , t : Int
    , g : Int
    }


newCounts : Counts
newCounts =
    { a = 0
    , c = 0
    , t = 0
    , g = 0
    }


version : Int
version =
    2


nucleotideCounts : String -> Counts
nucleotideCounts dna =
    String.foldl doCount newCounts dna


doCount : Char -> Counts -> Counts
doCount char counts =
    case char of
        'A' ->
            { counts | a = counts.a + 1 }

        'C' ->
            { counts | c = counts.c + 1 }

        'T' ->
            { counts | t = counts.t + 1 }

        'G' ->
            { counts | g = counts.g + 1 }

        _ ->
            counts

Tags:

construct:add
construct:char
construct:field-access
construct:integral-number
construct:pattern-matching
construct:string
construct:type-alias
paradigm:functional
uses:record-update

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: nucleotide-count

Code

module NucleotideCount exposing (..)

import String


version : Int
version =
    2


type alias Nucleotides =
    { a : Int
    , t : Int
    , c : Int
    , g : Int
    }


nucleotideCounts : String -> Nucleotides
nucleotideCounts dna =
    String.foldl updateCount (Nucleotides 0 0 0 0) dna


updateCount nucleotide counts =
    case nucleotide of
        'A' ->
            { counts | a = counts.a + 1 }

        'T' ->
            { counts | t = counts.t + 1 }

        'C' ->
            { counts | c = counts.c + 1 }

        'G' ->
            { counts | g = counts.g + 1 }

        _ ->
            counts

Tags:

construct:add
construct:char
construct:integral-number
construct:pattern-matching
construct:string
construct:type-alias
paradigm:functional
uses:record-update

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: phone-number

Code

module PhoneNumber exposing (getNumber, prettyPrint)

import Char exposing (isDigit)
import String exposing (filter, length, slice, dropLeft, left, right)


getNumber : String -> Maybe String
getNumber string =
    let
        digits =
            filter Char.isDigit string
    in
        case length digits of
            10 ->
                Just digits

            11 ->
                if slice 0 1 digits == "1" then
                    Just (dropLeft 1 digits)
                else
                    Nothing

            _ ->
                Nothing


prettyPrint : String -> Maybe String
prettyPrint value =
    let
        areaCode =
            left 3

        centralOffice =
            slice 3 6

        stationNumber =
            right 4

        format digits =
            "("
                ++ (areaCode digits)
                ++ ") "
                ++ (centralOffice digits)
                ++ "-"
                ++ (stationNumber digits)
    in
        getNumber value
            |> Maybe.map format

Tags:

construct:char
construct:if-then-else
construct:integral-number
construct:let
construct:string
construct:pattern-matching
construct:pipe-forward
paradigm:functional
uses:Maybe

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: phone-number

Code

module PhoneNumber exposing (..)

import Char exposing (isDigit)


getNumber : String -> Maybe String
getNumber source =
    let
        digits =
            String.filter isDigit source
    in
        if String.length digits < 10 || String.length digits > 11 then
            Nothing
        else if String.length digits == 11 then
            if String.startsWith "1" digits then
                Just (String.dropLeft 1 digits)
            else
                Nothing
        else
            Just digits


prettyPrint : String -> Maybe String
prettyPrint source =
    case getNumber source of
        Just phone ->
            ("(" ++ (String.slice 0 3 phone) ++ ") ")
                ++ (String.slice 3 6 phone)
                ++ "-"
                ++ (String.slice 6 10 phone)
                |> Just

        _ ->
            Nothing

Tags:

construct:char
construct:if-then-else
construct:integral-number
construct:let
construct:string
paradigm:functional
technique:boolean-logic
uses:Maybe

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: grade-school

Code

module GradeSchool exposing (..)

import Dict exposing (..)


type alias Grade =
    Int


type alias Student =
    String


type alias School =
    Dict Int (List Student)


empty : School
empty =
    Dict.empty


addStudent : Grade -> Student -> School -> School
addStudent grade name school =
    let
        updated =
            name :: (studentsInGrade grade school) |> List.sort
    in
        Dict.insert grade updated school


studentsInGrade : Grade -> School -> List Student
studentsInGrade grade school =
    case (Dict.get grade school) of
        Nothing ->
            []

        Just students ->
            students


allStudents : School -> List ( Grade, List Student )
allStudents school =
    Dict.toList school |> List.sortBy (\( grade, _ ) -> grade)

Tags:

construct:dictionary
construct:lambda
construct:let
construct:list
construct:pattern-matching
construct:pipe-forward
construct:string
construct:type-alias
construct:dictionary
construct:tuple
construct:integral-number
paradigm:functional
uses:Maybe

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: grade-school

Code

module GradeSchool exposing (..)

import Dict


type alias Grade =
    Int


type alias Name =
    String


type alias School =
    Dict.Dict Grade (List Name)


empty : School
empty =
    Dict.empty


addStudent : Grade -> Name -> School -> School
addStudent grade name school =
    Dict.update grade (Maybe.withDefault [] >> (::) name >> List.sort >> Just) school


studentsInGrade : Grade -> School -> List Name
studentsInGrade grade school =
    school
        |> Dict.get grade
        |> Maybe.withDefault []


allStudents : School -> List ( Grade, List Name )
allStudents school =
    Dict.toList school

Tags:

construct:list
construct:integral-number
construct:pattern-matching
construct:pipe-forward
construct:type-alias
construct:tuple
construct:dictionary
construct:string
paradigm:functional
technique:functional-composition
uses:Maybe
uses:operator-function

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: largest-series-product

Code

module LargestSeriesProduct exposing (largestProduct)

import Char


largestProduct : Int -> String -> Maybe Int
largestProduct size num =
    if size == 0 then
        Just 1
    else if String.length num < size || size < 0 || not (String.all Char.isDigit num) then
        Nothing
    else
        num
            |> String.toList
            |> List.map (String.fromChar >> String.toInt >> Result.withDefault 0)
            |> partition size
            |> List.map List.product
            |> List.maximum


partition : Int -> List Int -> List (List Int)
partition size num =
    if List.length num <= size then
        [ num ]
    else
        List.take size num :: partition size (List.drop 1 num)

Tags:

construct:char
construct:if-then-else
construct:integral-number
construct:list
technique:recursion
construct:string
construct:pipe-forward
paradigm:functional
technique:boolean-logic
technique:functional-composition
uses:Result
uses:Maybe

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: gigasecond

Code

module Gigasecond exposing (add)

import Date exposing (Date)
import Time


add : Date -> Date
add date =
    date
        |> Date.toTime
        |> (+) 1000000000000
        |> Date.fromTime

Tags:

construct:add
construct:date
construct:time
construct:pipe-forward
paradigm:functional

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: collatz-conjecture

Code

module CollatzConjecture exposing (collatz)


collatz : Int -> Result String Int
collatz num =
    if num < 1 then
        Err "Only positive numbers are allowed"
    else
        Ok <| collatzHelper num 0


collatzHelper : Int -> Int -> Int
collatzHelper num step =
    if num == 1 then
        step
    else if num % 2 == 0 then
        collatzHelper (num // 2) (step + 1)
    else
        collatzHelper (3 * num + 1) (step + 1)

Tags:

construct:add
construct:modulus
construct:if-then-else
construct:integral-number
construct:multiply
construct:divide
technique:recursion
construct:string
construct:pipe-backward
paradigm:functional
uses:Result

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: isogram

Code

module Isogram exposing (..)

import Char

isIsogram : String -> Bool
isIsogram =
  String.toLower
  >> String.toList
  >> List.filter Char.isLower
  >> List.sort
  >> allDifferent

allDifferent : List Char -> Bool
allDifferent list =
  case list of
    [] -> True
    x::[] -> True
    x::y::xs -> x /= y && allDifferent (y::xs)

Tags:

construct:boolean
construct:char
construct:list
construct:pattern-matching
construct:point-free
construct:string
paradigm:functional
technique:boolean-logic
technique:recursion
technique:functional-composition

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: luhn

Code

module Luhn exposing (..)

import Char


valid : String -> Bool
valid input =
    let
        filtered =
            String.filter ((/=) ' ') input
    in
        if String.length filtered < 2 then
            False
        else if not (String.all Char.isDigit filtered) then
            False
        else
            (luhnsum (digits filtered)) % 10 == 0


digits s =
    String.toList s
        |> List.filter Char.isDigit
        |> List.map
            (String.fromChar
                >> String.toInt
                >> Result.withDefault 0
            )


luhnsum digits =
    let
        double xs =
            case xs of
                [] ->
                    []

                x :: y :: xs ->
                    let
                        d =
                            if y < 5 then
                                y * 2
                            else
                                y * 2 - 9
                    in
                        x :: d :: (double xs)

                _ ->
                    xs
    in
        List.reverse digits
            |> double
            |> List.sum

Tags:

construct:modulus
construct:char
construct:if-then-else
construct:integral-number
construct:pipe-forward
construct:let
construct:list
construct:multiply
construct:pattern-matching
paradigm:functional
technique:boolean-logic
technique:functional-composition
uses:List
uses:Result
uses:operator-function

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: transpose

Code

module Transpose exposing (transpose)


transpose : List String -> List String
transpose lines =
    let
        maxLength =
            lines
                |> List.map String.length
                |> List.maximum
                |> Maybe.withDefault 0
    in
        List.foldl
            (\line result ->
                List.map2 (++) result (String.split "" line |> padRight maxLength "")
            )
            (List.repeat maxLength "")
            (padLines lines)


padLines : List String -> List String
padLines lines =
    padLinesHelper (List.reverse lines) []


padLinesHelper : List String -> List String -> List String
padLinesHelper lines padded =
    case lines of
        [] ->
            padded

        line :: [] ->
            line :: padded

        lineBelow :: lineAbove :: rest ->
            let
                paddedLine =
                    String.padRight (String.length lineBelow) ' ' lineAbove
            in
                padLinesHelper (paddedLine :: rest) (lineBelow :: padded)


padRight : Int -> a -> List a -> List a
padRight length item list =
    let
        pad =
            length - List.length list
    in
        list ++ List.repeat pad item

Tags:

construct:char
construct:lambda
construct:let
construct:list
construct:string
construct:subtract
construct:integral-number
construct:pattern-matching
construct:pipe-forward
paradigm:functional
technique:recursion
uses:operator-function
uses:Maybe

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 1, 2023

Exercise: binary-search

Code

module BinarySearch exposing (find)

import Array exposing (Array)


find : Int -> Array Int -> Int
find target xs =
    let
        recur : Int -> Int
        recur midPt =
            case Array.get midPt xs of
                Nothing ->
                    -1

                Just midElement ->
                    case compare midElement target of
                        LT ->
                            let
                                isTargetLessThanNext =
                                    Array.get (midPt + 1) xs
                                        |> Maybe.map (\elt -> elt > target)
                                        |> Maybe.withDefault True
                            in
                            if isTargetLessThanNext then
                                -1
                            else
                                (toFloat midPt * 1.5)
                                    |> ceiling
                                    |> recur

                        GT ->
                            let
                                isTargetGreaterThanLast =
                                    Array.get (midPt - 1) xs
                                        |> Maybe.map (\elt -> elt < target)
                                        |> Maybe.withDefault True
                            in
                            if isTargetGreaterThanLast then
                                -1
                            else
                                (toFloat midPt / 2)
                                    |> floor
                                    |> recur

                        EQ ->
                            midPt
    in
    recur <| floor (toFloat (Array.length xs) / 2)

Tags:

construct:add
construct:char
construct:divide
construct:floating-point-number
construct:if-then-else
construct:integral-number
construct:lambda
construct:let
construct:multiply
construct:pattern-matching
construct:pipe-forward
construct:pipe-backward
technique:recursion
construct:subtract
paradigm:functional
technique:boolean-logic
uses:Array
uses:Maybe

@ceddlyburge
Copy link
Contributor

I'm happy to help out with this, sometime during the next couple of weeks

@ceddlyburge
Copy link
Contributor

Hi @ErikSchierboom , I've completed this now.

The tags are a bit confusing, I've tried to consolidate and make things consistent. For example, I've removed 'logical-or' tags, and instead 'boolean-logic' is used. And I've tried to use the least specific tag when multiple tags describe the same thing (technique:recursion instead of construct:recursion).

It would be good if @jiegillet or @mpizenberg could give it a quick look as well, to see if I have missed anything fundamental.

Another note is that we already parse all solutions in to their abstract syntax trees in the representer (and the analyser I think), so we could calculate these things directly relatively easily, instead of relying on flaky me / flaky ai.

@iHiD
Copy link
Member

iHiD commented Nov 8, 2023

@ceddlyburge Awesome. Thanks!

Very happy for you to have the analyzer do this instead. Having languages that have deterministic, correct tags is really useful for us for training neural networks in general. All the learning so far is based off C#'s so having a different language too would be great!

@jiegillet
Copy link
Contributor

I took a quick look and didn't find anything wrong, thanks for doing it @ceddlyburge

I think an analyzer could do a great job of deterministically tag a lot of concepts on to virtually every part of the AST, but it would be a whole bunch of work!

@ErikSchierboom
Copy link
Member Author

ErikSchierboom commented Nov 8, 2023

it would be a whole bunch of work!

That I can confirm! It has taken me lots of time to build the C# version.

edit: it doesn't help that C# has a lot of constructs

@ceddlyburge
Copy link
Contributor

Cool, I'll make a ticket for it! It probably would take a long time, but should be much much easier than c#. The syntax is a lot smaller and we already parse it into an ast.

We would probably want to have a meeting to map things in the elm syntax to the tags when we start.

@mpizenberg
Copy link
Member

We would probably want to have a meeting to map things in the elm syntax to the tags when we start.

I'm interested in this if you try to organize that.

@ErikSchierboom
Copy link
Member Author

This is an automated comment

Hello 👋 Next week we're going to start using the tagging work people are doing on these. If you've already completed the work, thank you! If you've not, but intend to this week, that's great! If you're not going to get round to doing it, and you've not yet posted a comment letting us know, could you please do so, so that we can find other people to do it. Thanks!

@ErikSchierboom
Copy link
Member Author

Thanks for the help! We've updated the tags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants