Skip to content

Commit

Permalink
docs: add code covnert
Browse files Browse the repository at this point in the history
  • Loading branch information
phodal committed Apr 7, 2023
1 parent fb1a654 commit ac483ad
Show file tree
Hide file tree
Showing 2 changed files with 4,025 additions and 0 deletions.
25 changes: 25 additions & 0 deletions code/text-to-code.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# open datasets/text-to-code/java.jsonl and fetch the first 4000 items
# ```
# output format:
# [{
# instruction: "text to java code",
# input: items[index].text,
# output: items[index].code
# }]

import json

with open('../datasets/text-to-code/java.jsonl', 'r') as f:
data = [json.loads(row) for row in f.readlines()]

data = data[:4000]

with open('../datasets/text-to-code/java-train.jsonl', 'w') as f:
for row in data:
item = {
'instruction': 'text to java code',
'input': row['text'],
'output': row['code']
}

f.write(json.dumps(item) + '\n')
Loading

0 comments on commit ac483ad

Please sign in to comment.