webhdfs is a Go bindings for Hadoop HDFS via its WebHDFS interface.
It provides typed access to remote HDFS resources via Go's JSON marshaling system. It follows the WebHDFS JSON protocol outline in http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/WebHDFS.html. It has been tested with Apache Hadoop 2.x.x - series.
GoDoc documentation - https://godoc.org/github.com/gohadoop/webhdfs
go get github.com/gohadoop/webhdfs
import github.com/gohadoop/webhdfs
...
fs, err := webhdfs.NewFileSystem(webhdfs.Configuration{Addr: "localhost:50070", User: "hdfs"})
if err != nil{
log.Fatal(err)
}
checksum, err := fs.GetFileChecksum(webhdfs.Path{Name: "location/to/file"})
if err != nil {
log.Fatal(err)
}
fmt.Println (checksum)
To see the API used, see directory test-hdfs
. Compile and use that code to test against a running HDFS deployment. See https://github.com/gohadoop/webhdfs/tree/master/test-hdfs.
- Enable
dfs.webhdfs.enabled
property in your hsdfs-site.xml - Ensure
hadoop.http.staticuser.user
property is set in your core-site.xml.
webhdfs lets you access HDFS resources via two structs FileSystem
and FsShell
. Use FileSystem to get access to low level callse. FsShell is designed to provide a higer level of abstraction and integration with the local file system.
Use the Configuration{}
struct to specify paramters for the file system. You can create configuration either using a Configuration{}
literal or using NewConfiguration()
for defaults.
conf := *webhdfs.NewConfiguration()
conf.Addr = "localhost:50070"
conf.User = "hdfs"
conf.ConnectionTime = time.Second * 15
conf.DisableKeepAlives = false
Create a new FileSystem{}
struct before you can make call to any functions. You create the FileSystem by passing in a Configuration
pointer as shown below.
fs, err := webhdfs.NewFileSystem(conf)
Now you are ready to communicate with HDFS.
FileSystem.Create()
creates and store a remote file on the HDFS server.
See https://godoc.org/github.com/gohadoop/webhdfs#FileSystem.Create
ok, err := fs.Create(
bytes.NewBufferString("Hello webhdfs users!"),
webhdfs.Path{Name:"/remote/file"},
false,
0,
0,
0700,
0,
)
Use the FileSystem.Open()
to open and read a remote file from HDFS. See https://godoc.org/github.com/gohadoop/webhdfs#FileSystem.Open
data, err := fs.Open(webhdfs.Path{Name:"/remote/file"}, 0, 512, 2048)
...
rcvdData, _ := ioutil.ReadAll(data)
fmt.Println(string(rcvdData))
To append to an existing HDFS file, use FileSystem.Append()
. See https://godoc.org/github.com/gohadoop/webhdfs#FileSystem.Append
ok, err := fs.Append(
bytes.NewBufferString("Hello webhdfs users!"),
webhdfs.Path{Name:"/remote/file"}, 4096)
Use FileSystem.Rename()
to rename HDFS resources. See https://godoc.org/github.com/gohadoop/webhdfs#FileSystem.Rename
ok, err := fs.Rename(webhdfs.Path{Name:"/old/name"}, Path{Name:"/new/name"})
To delete an HDFS resource (file/directory), use FileSystem.Delete()
. See https://godoc.org/github.com/gohadoop/webhdfs#FileSystem.Delete
ok, err := fs.Delete(webhdfs.Path{Name:"/remote/file/todelete"}, false)
You can get status about an existing HDFS resource using FileSystem.GetFileStatus()
. See https://godoc.org/github.com/gohadoop/webhdfs#FileSystem.GetFileStatus
fileStatus, err := fs.GetFileStatus(webhdfs.Path{Name:"/remote/file"})
webhdfs returns a value of type FileStatus which is a struct with info about remote file.
type FileStatus struct {
AccesTime int64
BlockSize int64
Group string
Length int64
ModificationTime int64
Owner string
PathSuffix string
Permission string
Replication int64
Type string
}
You can get a list of file stats using FileSystem.ListStatus()
.
stats, err := fs.ListStatus(webhdfs.Path{Name:"/remote/directory"})
for _, stat := range stats {
fmt.Println(stat.PathSuffix, stat.Length)
}
To create an FsShell, you need to have an existing instance of FileSystem.
shell := webhdfs.FsShell{FileSystem:fs}
Use the put to upload a local file to an HDFS file system. See https://godoc.org/github.com/gohadoop/webhdfs#FsShell.PutOne
ok, err := shell.Put("local/file/name", "hdfs/file/path", true)
Use the Get to retrieve remote HDFS file to local file system. See https://godoc.org/github.com/gohadoop/webhdfs#FsShell.Get
ok, err := shell.Get("hdfs/file/path", "local/file/name")
Append local files to remote HDFS file or directory. See https://godoc.org/github.com/gohadoop/webhdfs#FsShell.AppendToFile
ok, err := shell.AppendToFile([]string{"local/file/1", "local/file/2"}, "remote/hdfs/path")
Change owner for remote file. See https://godoc.org/github.com/gohadoop/webhdfs#FsShell.Chown.
ok, err := shell.Chown([]string{"/remote/hdfs/file"}, "owner2")
Change group of remote HDFS files. See https://godoc.org/github.com/gohadoop/webhdfs#FsShell.Chgrp
ok, err := shell.Chgrp([]string{"/remote/hdfs/file"}, "superduper")
Change file mod of remote HDFS files. See https://godoc.org/github.com/gohadoop/webhdfs#FsShell.Chmod
ok, err := shell.Chmod([]string{"/remote/hdfs/file/"}, 0744)
- Only "SIMPLE" security mode supported.
- No support for kerberos (none plan right now)
- No SSL support yet.