-
Notifications
You must be signed in to change notification settings - Fork 1
/
tokenizer.html
152 lines (133 loc) · 6.11 KB
/
tokenizer.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
<!DOCTYPE html>
<html lang="id">
<head>
<!-- Meta, title, CSS, favicons, etc. -->
<meta charset="utf-8">
<title>Sastrawi Tokenizer | Library PHP untuk memecah teks Bahasa Indonesia menjadi token-token | NLP Tokenization</title>
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Sastrawi Tokenize | Library PHP untuk memcah teks Bahasa Indonesia menjadi token-token | NLP Tokenization">
<meta name="author" content="Andy Librian, and Sastrawi contributors">
<meta name="keywords" content="sastrawi, library, php, tokenization, tokenizer, text segmentation, natural language processing">
<meta name="copyright" content="2014 Andy Librian">
<!-- Latest compiled and minified CSS -->
<link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap.min.css">
<!-- Optional theme -->
<link rel="stylesheet" href="//netdna.bootstrapcdn.com/bootstrap/3.1.1/css/bootstrap-theme.min.css">
<style>
h3 {
color: #111;
font-weight: bold;
padding: 0.5em 0 1em 0;
}
h2 small a {
color: inherit;
}
h1 a {
color: #000;
}
</style>
<script src="//code.jquery.com/jquery-1.11.0.min.js"></script>
<script src="//code.jquery.com/jquery-migrate-1.2.1.min.js"></script>
<!-- Latest compiled and minified JavaScript -->
<script src="//netdna.bootstrapcdn.com/bootstrap/3.1.1/js/bootstrap.min.js"></script>
</head>
<body>
<div class="container">
<h1><a href="https://github.com/sastrawi/tokenizer" title="Sastrawi Tokenizer di github">Sastrawi Tokenizer</a></h1>
<h2><small>PHP library untuk <a href="https://github.com/sastrawi/tokenizer" title="Tokenization Bahasa Indonesia">memecah kalimat Bahasa Indonesia menjadi token-token.</a></small></h2>
<a href="https://travis-ci.org/sastrawi/tokenizer"><img src="https://travis-ci.org/sastrawi/tokenizer.svg?branch=master" /></a>
<a href="https://github.com/sastrawi/tokenizer"><img src="http://img.shields.io/packagist/v/sastrawi/tokenizer.svg" /></a>
<hr />
<div class="row">
<div class="col-md-6">
<h3>Tokenization</h3>
<p>
Tokenization adalah proses memecah kalimat menjadi token-token, contoh:
</p>
<blockquote>
Ini kalimat pertama. Ini kalimat kedua.
</blockquote>
<p>
akan dipecah menjadi:
</p>
<ul>
<li>Ini</li>
<li>kalimat</li>
<li>pertama</li>
<li>.</li>
<li>Ini</li>
<li>kalimat</li>
<li>kedua</li>
<li>.</li>
</ul>
<h3>Lebih Lanjut Mengenai Sastrawi Tokenizer</h3>
<ul>
<li><a href="https://github.com/sastrawi/tokenizer">Source code</a></li>
<li><a href="https://github.com/sastrawi/tokenizer#cara-install">Cara Install</a></li>
<li><a href="https://github.com/sastrawi/tokenizer#penggunaan">Penggunaan</a></li>
<li><a href="https://github.com/sastrawi/tokenizer/wiki">Wiki</a></li>
<li><a href="https://github.com/sastrawi/tokenizer/issues">Bug Report, Questions, Ideas</a></li>
</ul>
</div>
<div class="col-md-6">
<h3>Demo</h3>
<textarea rows="3" type="text" class="form-control" name="txt" id="txt" placeholder="Masukkan teks Bahasa Indonesia yang akan dipecah"></textarea>
<br />
<button class="btn btn-primary" id="btn-tokenize">Pecah jadi token</button>
<br />
<br />
<blockquote>
<p id="result">
</p>
</blockquote>
<br />
<strong>Contoh:</strong>
<ul>
<li><a href="#" class="text-example">Ini ibu Budi. Itu ayah Rudi.</a></li>
<li><a href="#" class="text-example">Jangan makan terlalu banyak! Kamu ingin gemuk?!</a></li>
<li><a href="#" class="text-example">Budi pergi ke Jl. KH. Mukmin no. 67 Surabaya. Dia dipanggil untuk interview.</a></li>
</ul>
<br />
</div>
</div>
</div>
<script>
$(document).on('click', '.text-example', function(e) {
e.preventDefault();
$('#txt').val($(this).html());
$('#btn-tokenize').click();
});
$(document).on('click', '#btn-tokenize', function(e) {
e.preventDefault();
var text = $('#txt').val();
text = text.replace(/(?:\r\n|\r|\n)/g, ' ');
if (!text) {
$('#result').html('');
}
callback = function(data) {
$('#result').html(data.join(" "));
}
var script = document.createElement('script');
script.src = 'http://107.170.70.183/sastrawi-tokenizer/tokenize-jsonp.php?text=' + text;
document.body.appendChild(script);
});
var timeout;
$(document).on('keydown', '#txt', function() {
clearTimeout(timeout);
timeout = setTimeout(function() {
$('#btn-tokenize').click();
}, 1000);
});
$('#btn-tokenize').click();
</script>
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');
ga('create', 'UA-50867400-1', 'sastrawi.github.io');
ga('send', 'pageview');
</script>
</body>
</html>