- 架构
- 基本原理
- 基本使用
- 上传文件(Write File)
- 修改文件(Write File)
- 删除文件(Write File)
- 查看文件(Read File)
- 文件ID含义解析:
SeaWeedfs最初是作为一个对象存储来有效地处理小文件的。central master不管理central master中的所有文件元数据,而只管理文件卷,它通过这些volume servers管理文件及其元数据。这减轻了来自central master的并发压力,并将文件元数据分散到卷服务器中,从而允许更快的文件访问(O(1),通常只有一个磁盘读取操作)。每个文件的元数据只有40字节的磁盘存储开销。
Usually distributed file systems split each file into chunks, a central master keeps a mapping of filenames, chunk indices to chunk handles, and also which chunks each chunk server has.
The main drawback is that the central master can’t handle many small files efficiently, and since all read requests need to go through the chunk master, so it might not scale well for many concurrent users.
Instead of managing chunks, SeaweedFS manages data volumes in the master server. Each data volume is 32GB in size, and can hold a lot of files. And each storage node can have many data volumes. So the master node only needs to store the metadata about the volumes, which is a fairly small amount of data and is generally stable…
官网描述:In the current implementation, each volume can hold 32 gibibytes (32GiB or 8x2^32 bytes). This is because we align content to 8 bytes. We can easily increase this to 64GiB, or 128GiB, or more, by changing 2 lines of code, at the cost of some wasted padding space due to alignment.
There can be 4 gibibytes (4GiB or 2^32 bytes) of volumes. So the total system size is 8 x 4GiB x 4GiB which is 128 exbibytes (128EiB or 2^67 bytes).
Each individual file size is limited to the volume size.
官网描述:The actual data is stored in volumes on storage nodes. One volume server can have multiple volumes, and can both support read and write access with basic authentication. 一个卷服务对应多个卷
The actual file metadata is stored in each volume on volume servers. Since each volume server only manages metadata of files on its own disk, with only 16 bytes for each file, all file access can read file metadata just from memory and only needs one disk operation to actually read file data.每个文件的元数据16字节大小,Linux中中XFS结构中为536/8(67)字节。
For comparison, consider that an xfs inode structure in Linux is 536 bytes
官网描述:All volumes are managed by a master server. The master server contains the volume id to volume server mapping. This is fairly static information, and can be easily cached.
*The actual data is stored in volumes on storage nodes. One volume server can have multiple volumes, and can both support read and write access with basic authentication.*实际数据保存在存储节点(数据卷)中,一个卷服务管理多个卷,同时支持带有基本认证功能的读写操作。提高读写访问的并发功能。
具体的核心原理可参考facebook的一片文章《Finding a needle in Haystack: Facebook’s photo storage》
1 | ./weed master |
1 2 | > weed volume -dir="/tmp/data1" -max=5 -mserver="localhost:9333" -port=8080 & > weed volume -dir="/tmp/data2" -max=10 -mserver="localhost:9333" -port=8081 & |
By default, the master node runs on port 9333, and the volume nodes run on port 8080. Let’s start one master node, and two volume nodes on port 8080 and 8081. Ideally, they should be started from different machines. We’ll use localhost as an example.
SeaweedFS uses HTTP REST operations to read, write, and delete. The responses are in JSON or JSONP format.
上传文件(Write File)
To upload a file: first, send a HTTP POST, PUT, or GET request to /dir/assign to get an fid and a volume server url:
1 2 | > curl http://localhost:9333/dir/assign {"count":1,"fid":"3,01637037d6","url":"","publicUrl":"localhost:8080"} |
Second, to store the file content, send a HTTP multi-part POST request to url + ‘/’ + fid from the response:
1 2 | > curl -F file=@/home/chris/myphoto.jpg,01637037d6 {"name":"myphoto.jpg","size":43234,"eTag":"1cc0118e"} |
修改文件(Write File)
To update, send another POST request with updated file content.
删除文件(Write File)
For deletion, send an HTTP DELETE request to the same url + ‘/’ + fid URL:
删除文件和上传文件类似,访问对应的URL,采用DELETE Method即可
1 | > curl -X DELETE,01637037d6 |
查看文件(Read File)
First look up the volume server’s URLs by the file’s volumeId:
1 2 | > curl http://localhost:9333/dir/lookup?volumeId=3 {"volumeId":"3","locations":[{"publicUrl":"localhost:8080","url":"localhost:8080"}]} |
Now you can take the public url, render the url or directly read from the volume server via url:
1 | http://localhost:8080/3,01637037d6.jpg |
Notice we add a file extension “.jpg” here. It’s optional and just one way for the client to specify the file content type.
1 | "fid":"3,01637037d6" |
随后是一个文件cookie 637037d6,是一个无符号的32bit整数,用于防止URL猜测。文件密钥和文件cookie都是十六进制编码的。
通过访问master服务获得了fid,根据需要可以将fid 301637037d6保存到不同的存储服务中(REDIS,MYSQL,TEXT等等)。可以按照自己的格式存储