forked from zihuaye/3xsd
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path3fs_white_paper.txt
57 lines (31 loc) · 3.3 KB
/
3fs_white_paper.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
3fs is designed to solve the problem of mass files storage at a cluster of servers, with simple HTTP protocol to access, easy to expand and none centralized architecture.
At internet, most files are stored at web servers, can access using base HTTP protocol: GET/HEAD/PUT/DELETE method. A cluster of servers with WebDAV support, and a set of "routing" servers(3fsd), together make up of a 3fs.
user ----- 3fsd ------ web server
|--------- .....
user ----- 3fsd ------ web server
|--------- web server
3fsd ------ .....
|--------- web server
Architecture is simple: 3fsd as a "router" or "proxy", locating the uri, forward it to corresponding storage server, return response to user and cache it. GET/HEAD will be forward unconditionally, PUT/DELETE will be forwarded only beening authorized(base on client ip).
URI location algorithm
Let's take an example, accessing a 3fs file /path/to/file should use a uri in this form:
http://a.net/_3fs_0/path/to/file
"_3fs" is called the 3fs prefix, uri with it inside, identifying a access to a 3fs file.
"_0", number 0 called stage number, range 0 to max of interger.
a set of server running for sometime, called a stage, it's a relatively fixed state. server numbers, configs, etc.
When you expand your cluster, let say: add some servers to expand storage capability, stage plus one, the adding servers belong to this stage. Plusing stage procedure should not rollback.
When a file been added to cluster, use the current stage num to make the uri. As above, /path/to/file beening added at stage 0.
In 3fs config, also has a item called "region". How many servers in your one stage? region=4096 means you can add/expand max 4096 servers at a time, it can infect the calculating algorithm, should be fixed in 3fs live time, be careful to chose it
Ok, here is how the location algorithm does:
/path/to/file will be calculated to a md5: b4a91649090a2784056565363583d067
assumed that region=256(FF), stage=0, and we have 10 servers at stage 0.
region mask FF0000000000000000000000000000, we got "b4", the first 2 hex num in md5.
0xb4=180, 180/(256/10) = 7.03125, rounded to 7, server 7(0-9), will have it.
Redundancy=2, we have 2 copy of data storaged. How to locate all of them?
As above, the second copy's location is at "a9", the second 2 hex num in md5, 6.601, rounded at server 6. If it's same with first one, just has the server num plus one, server 8 then.
In theory, with region=256, we can have redundancy=16, with region=4096, redundancy=10.
As you can see, at a stage, fixed number of servers, 3fsd can locate the file exactly, algorithm execution time O(1).
When expanding the cluster, old files with old stage num in uri, can be located normally, also the new files with new stage num.
When file is changed, it keeps the stage num original. When deleted and adding again, use the new stage num.
Of cause, a server can belong to different stage, server A can belong to stage 0,1,2, if it's capacity is power enough.
The pool=... config item, is a list of servers at different stage, should be kept fixed at stage level, let say, you can't add a server to a stage randomly, adding servers must leading the stage num to increase. It dosen't matter of the sequence of servers, 3fsd will sort them at stage level.