-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
213 lines (202 loc) · 7.29 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
# vim:sw=2:sts=2:
TODO
====
Legend:
- [ ] not started
- [-] in-progress
- [x] done
- [~] cancelled
In-progress
-----------
- [-] timeline limits
- [x] by time range
- [ ] by msg count
- [ ] per peer
- [ ] total
Not necessary for short format, because we have Unix head/tail,
but may be convinient for long format (because msg spans multiple lines).
- [-] Convert to Typed Racket
- [x] build executable (otherwise too-slow)
- [-] add signatures
- [x] top-level
- [ ] inner
- [ ] imports
- [-] commands:
- [x] c | crawl
Discover new peers mentioned by known peers.
- [x] r | read
- see timeline ops above
- [ ] w | write
- arg or stdin
- nick expand to URI
- Watch FIFO for lines, then read, timestamp and append [+ upload].
Can be part of a "live" mode, along with background polling and
incremental printing. Sort of an ii-like IRC experience.
- [ ] q | query
- see timeline ops above
- see hashtag and channels above
- [x] d | download
- [ ] options:
- [ ] all - use all known peers
- [ ] fast - all except peers known to be slow or unavailable
REQUIRES: stats
- [x] u | upload
- calls user-configured command to upload user's own timeline file to their server
Looks like a better CLI parser than "racket/cmdline": https://docs.racket-lang.org/natural-cli/
But it is no longer necessary now that I've figured out how to chain (command-line ..) calls.
- [-] Output formats:
- [x] text long
- [x] text short
- [ ] HTML
- [ ] JSON
- [-] Peer discovery
- [-] parse peer refs from peer timelines
- [x] mentions from timeline messages
- [x] @<source.nick source.url>
- [x] @<source.url>
- [ ] "following" from timeline comments: # following = <nick> <uri>
1. split file lines in 2 groups: comments and messages
2. dispatch messages parsing as usual
3. dispatch comments parsing for:
- # following = <nick> <uri>
- what else?
- [ ] Parse User-Agent web access logs.
- [-] Update peer ref file(s)
- [x] peers-all
- [x] peers-mentioned
- [ ] peers-followed (by others, parsed from comments)
- [ ] peers-down (net errors)
- [ ] redirects?
Rough sketch from late 2019:
let read file =
...
let write file peers =
...
let fetch peer =
(* Fetch could mean either or both of:
* - fetch peer's we-are-twtxt.txt
* - fetch peer's twtxt.txt and extract mentioned peer URIs
* *)
...
let test peers =
...
let rec discover peers_old =
let peers_all =
Set.fold peers_old ~init:peers_old ~f:(fun peers p ->
match fetch p with
| Error _ ->
(* TODO: Should p be moved to down set here? *)
log_warning ...;
peers
| Ok peers_fetched ->
Set.union peers peers_fetched
)
in
if Set.empty (Set.diff peers_old peers_all) then
peers_all
else
discover peers_all
let rec loop interval peers_old =
let peers_all = discover peers_old in
let (peers_up, peers_down) = test peers_all in
write "peers-all.txt" peers_all;
write "peers-up.txt" peers_up;
write "peers-down.txt" peers_down;
sleep interval;
loop interval peers_all
let () =
loop (Sys.argv.(1)) (read "peers-all.txt")
Backlog
-------
- [ ] Support date without time in timestamps
- [ ] Associate cached object with nick.
- [ ] Crawl downloaded web access logs
- [ ] download-command hook to grab the access logs
(define (parse log-line)
(match (regexp-match #px"([^/]+)/([^ ]+) +\\(\\+([a-z]+://[^;]+); *@([^\\)]+)\\)" log-line)
[(list _ client version uri nick) (cons nick uri)]
[_ #f]))
(list->set (filter-map parse (file->lines "logs/combined-access.log")))
(filter (λ (p) (equal? 'file (file-or-directory-type p))) (directory-list logs-dir))
- [ ] user-agent file as CLI option - need to run at least the crawler as another user
- [ ] Support fetching rsync URIs
- [ ] Check for peer duplicates:
- [ ] same nick for N>1 URIs
- [ ] same URI for N>1 nicks
- [ ] Background polling and incremental timeline updates.
We can mark which messages have already been printed and print new ones as
they come in.
REQUIRES: polling
- [ ] Polling mode/command, where tt periodically polls peer timelines
- [ ] nick tiebreaker(s)
- [ ] some sort of a hash of URI?
- [ ] angry-purple-tiger kind if thingie?
- [ ] P2P nick registration?
- [ ] Peers vote by claiming to have seen a nick->uri mapping?
The inherent race condition would be a feature, since all user name
registrations are races.
REQUIRES: blockchain
- [ ] stats
- [ ] download times per peer
- [ ] Support redirects
- should permanent redirects update the peer ref somehow?
- [ ] optional text wrap
- [ ] write
- [ ] peer refs set operations (perhaps better done externally?)
- [ ] timeline as a result of a query (peer ref set op + filter expressions)
- [ ] config files
- [ ] highlight mentions
- [ ] filter on mentions
- [ ] highlight hashtags
- [ ] filter on hashtags
- [ ] hashtags as channels? initial hashtag special?
- [ ] query language
- [ ] console logger colors by level ('error)
- [ ] file logger ('debug)
- [ ] Suport immutable timelines
- store individual messages
- where?
- something like DBM or SQLite - faster
- filesystem - transparent, easily published - probably best
- [ ] block(chain/tree) of twtxts
- distributed twtxt.db
- each twtxt.txt is a ledger
- peers can verify states of ledgers
- peers can publish known nick->url mappings
- peers can vote on nick->url mappings
- we could break time periods into blocks
- how to handle the facts that many(most?) twtxt are unseen by peers
- longest X wins?
Done
----
- [x] Crawl all cache/objects/*, not given peers.
- [x] Support time ranges (i.e. reading the timeline between given time points)
- [x] Dedup read-in peers before using them.
- [x] Prevent redundant downloads
- [x] Check ETag
- [x] Check Last-Modified if no ETag was provided
- [x] Parse rfc2822 timestamps
- [x] caching (use cache by default, unless explicitly asked for update)
- [x] value --> cache
- [x] value <-- cache
REQUIRES: d command
- [x] Logger sync before exit.
- [x] Implement rfc3339->epoch
- [x] Remove dependency on rfc3339-old
- [x] remove dependency on http-client
- [x] Build executable
Implies fix of "collection not found" when executing the built executable
outside the source directory:
collection-path: collection not found
collection: "tt"
in collection directories:
context...:
/usr/share/racket/collects/racket/private/collect.rkt:11:53: fail
/usr/share/racket/collects/setup/getinfo.rkt:17:0: get-info
/usr/share/racket/collects/racket/contract/private/arrow-val-first.rkt:555:3
/usr/share/racket/collects/racket/cmdline.rkt:191:51
'|#%mzc:p
Cancelled
---------
- [~] named timelines/peer-sets
REASON: That is basically files of peers, which we already support.