Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check out chrisdickinson's recent work #3

Closed
max-mapper opened this issue Mar 31, 2013 · 16 comments
Closed

check out chrisdickinson's recent work #3

max-mapper opened this issue Mar 31, 2013 · 16 comments
Assignees

Comments

@chrisdickinson
Copy link

current working theory:

  1. git in js/browser can lean on browserify so the various modules work both in-browser and out
  2. git in browser should not implement packfile / loose stores, it can use whatever "natural" for the browser; this also means that we don't need a filesystem.
  3. all that needs to happen to get git in the browser working is to be able to generate and parse packfiles (sans index files).

this also implies:

  1. git in browser need not necessarily parse or create .git/index files; at least not necessarily in the same way that filesystem-backed git does.

what does this leave out?

  • garbage collection / repacking.
  • I don't have a plan for human-readable diff tooling yet.
  • We can get away with inefficient packfiles at first for a quick demo; i.e., we don't need to be able to create packfiles containing ofs and ref deltas off the bat; the packing heuristics are kind of poorly documented.

what are weaknesses of the browserify approach?

  • Buffers are shimmed in browserify -- but aren't backed by fast Uint8Arrays. This kind of sucks.
  • The zlib shim only includes inflateSync, which means that we'll have to make use of the "browser" field in package.json's to use inflate/deflate. And those shims should certainly seek to use web workers where possible, to lower the chance of a "this tab timed out" notification while generating/parsing packfiles.

what are weaknesses in node that make this hard?

  • To parse a packfile without an index, you have to be able to say "take N bytes into inflate until you get a result of X size"; currently node's zlib (and by extension, browserify's), doesn't handle this use case well.

my plan for the near future:

  • create a readable/writable stream that accepts packfile data and outputs offsets that can be used by git-packfile to read objects.
  • square away a git-smart-http-remote repo that works with git-fetch-pack.
  • work on a quick read-only git demo in browser
  • then do the read/write demo.

@maks
Copy link

maks commented Mar 31, 2013

@chrisdickinson that's really great work.

working theory points 2 & 3 I whole heartedly agree with, there is no point using c-git filesystem dir structure for object or metadata storage in browser. But then there's the practical need to actual be able to checkout a working direcvtory which does require something resembling a typical file system.

Likewise agree with not needing to use existing index file format, especially given its a binary format that's not especially well documented, but will need to have something similar to allow for staging commits, the index does serve that rather core purpose of preparing what will become the next commits tree and diff'ing current working dir against it.

for testing against smart http remotes from a browser, pushover seems a great fit, as it's simple to add CORS support by putting a reverse proxy in front of it or maybe just adding it directly to pushover itself.

@max-mapper
Copy link
Author

fs-browserify could be useful here, it implements a few of the most common fs operations on top of indexeddb

Sent from my iPhone

On Mar 31, 2013, at 4:25 AM, Maksim Lin notifications@github.com wrote:

@chrisdickinson that's really great work.

working theory points 2 & 3 I whole heartedly agree with, there is no point using c-git filesystem dir structure for object or metadata storage in browser. But then there's the practical need to actual be able to checkout a working direcvtory which does require something resembling a typical file system.

Likewise agree with not needing to use existing index file format, especially given its a binary format that's not especially well documented, but will need to have something similar to allow for staging commits, the index does serve that rather core purpose of preparing what will become the next commits tree and diff'ing current working dir against it.

for testing against smart http remotes from a browser, pushover seems a great fit, as it's simple to add CORS support by putting a reverse proxy in front of it or maybe just adding it directly to pushover itself.


Reply to this email directly or view it on GitHub.

@chrisdickinson
Copy link

@maks yep, the idea is that the "local checkout" package will be entirely separate from the other packages -- you'll populate it using git-walk-tree.

basically, you should be able to use any "local checkout" package that supports being piped to from walk-tree; whether that's backed by plain JS objects, indexeddb, localstorage, requestFileSystem, etc, etc.

The other bonus point here is that by making the utility functions agnostic of the ODB being used (and the filesystem), you could actually back this with redis (or some other KV store) on the server.

@creationix
Copy link
Owner

I wonder how much value there is in having interop at the level of the actual files in the .git folder. I know several people showed interest in using js-git server-side to replace shelling out to git for various deployment situations.

Also I'm not sure that leaning on browserify is such a good idea if it's shims will make it hard to take full advantage of the native APIs in the browser. I guess it will just depend on what the tradeoffs end up being.

Unfortunately I won't be able to start coding on this for a couple more weeks. I do hope that I'll be able to take advantage of a lot of @chrisdickinson's great work.

@kuba-kubula
Copy link

For zip and other packaging, what about emscripten / asm.js conversion of zlib? see: https://github.com/richardassar/zpipe
or the whole list of compiled into JS libs at https://github.com/kripken/emscripten/wiki

@chrisdickinson
Copy link

@creationix Agreed on there not being a huge amount of value for interop at the filesystem level; at best it provides us with test data to feed into the other packages. It shouldn't be the focus (though really, at this point it's 90% done -- both pack and loose odbs are implemented and while it's currently readonly, it's really simple to create loose pack objects).

re: browserify's shims: zlib is probably the most problematic -- though that can be shimmed for something like zpipe using browserify's "browser": {"./file.js": "shim-file.js"} package.json directive. Really, the primary problem with most of the zlib implementations I've found is that they don't differentiate data returned with Z_STREAM_END from data returned with Z_OK; I've been banging my head against this for the last week or so. Ideally, we'll support node (so you can run a git server entirely in JS), so this'll have to get solved sooner rather than later. Dan Lucraft's git.js got around this by including a slightly modified version of raw-deflate.js (the zlib implementation that's been floating around the internet, unlicensed, since 1998) that counts the number of compressed bytes involved in generating output.

In practice, the buffer shim isn't a huge problem -- it can be mostly solved by converting to typed arrays either on-the-fly (which is what I've been doing thus far) using new Uint8Array(buffer.parent).subarray(buffer.offset, buffer.offset + buffer.length) and keeping a map of buffer parents to uint8array instances around (to offset the cost of creating uint8arrays) or at entry / exit points (harder, since this implies that every stream will have to be reworked to accept only uint8arrays and have to be piped through a conversion stream. Really, browserify buys us a lot more than it costs us -- especially with the --standalone generation option.

@kuba-kubula thanks for the link! I recently found (like, last night) that imaya's inflate implementation might not be able to inflate certain decompressed streams (namely, those with predefined dictionaries) so I might try plugging in zpipe instead.

If all goes well (and I'm optimistic -- I just got the git-pack-unpack stream 90% working last night; it emits offsets of objects within a packfile so we can generate an index / pull them out and put those objects in a more browser-sane object db), I should have some sort of demo by the end of the coming weekend.

@chrisdickinson
Copy link

also, in re: the zlib shim not being ideal -- it's fairly easy to swap it out for other implementations using browserify.

the harder part is node: its builtin implementation is problematic WRT actually figuring out how many compressed bytes it takes to generate a payload of a given size ):

@chrisdickinson
Copy link

ahaha, ignore me! browserify's zlib deals just fine with the data, I was just feeding it bad data. So, imaya's zlib should work.

@kuba-kubula looked into zpipe. Sadly, it probably won't work for our purposes:

  • There's no license attached.
  • It only operates on strings (this is a no-no -- we need arrays or typed arrays or buffers -- since utf8 can be lossy (invalid characters might be transcoded to 0xFFFD which loses the original data the characters represented)).

@chrisdickinson
Copy link

@juliangruber
Copy link

@chrisdickinson you are purely awesome!

@chrisdickinson
Copy link

okay, so the inflate implementation is "fast enough" for now -- there are a couple of places where it could be sped up, but it doesn't take an hour to clone "mature" repos now. I've wired it up locally using levelidb for storage.

Next steps:

  1. Need a module that given a commit object, a "find" function, and a path, can reconstitute the data for that path.
  2. Need to rework the git-walk-tree module so that it can check the type of an object before recursing into it (to avoid loading all of the blobs at once).
  3. Creating commits.
  4. Creating packfiles.
  5. Implementing git-receive-pack and git-send-pack similar to how git-fetch-pack works.

@creationix
Copy link
Owner

I'm travelling all day today (It's Thursday morning in France) I'll be
getting home late tonight in Texas (Thursday night). Tomorrow morning I
start full-time on js-git. Thanks for the pointers. Also, what should go
in this repo? Should this be the ceramic that consumes the many tiny
modules or something?

On Thu, Apr 25, 2013 at 7:39 AM, Chris Dickinson
notifications@github.comwrote:

okay, so the inflate implementationhttps://github.com/chrisdickinson/inflateis "fast enough" for now -- there are a couple of places where it could be
sped up, but it doesn't take an hour to clone "mature" repos now. I've
wired it up locally using levelidb for storage.

Next steps:

  1. Need a module that given a commit object, a "find" function, and a
    path, can reconstitute the data for that path.
  2. Need to rework the git-walk-tree module so that it can check the
    type of an object before recursing into it (to avoid loading all of the
    blobs at once).
  3. Creating commits.
  4. Creating packfiles.
  5. Implementing git-receive-pack and git-send-pack similar to how
    git-fetch-pack works.


Reply to this email directly or view it on GitHubhttps://github.com//issues/3#issuecomment-16989552
.

@chrisdickinson
Copy link

@creationix Cool. Welcome back! So, ideally, this repo would provide the following: the project site itself plus possibly a dev blog (as a gh-pages branch), the documentation (which should cover the constituent libraries and give those curious an idea of how it all hangs together), and a porcelain library which should provide similar functionality to the git commands after loading a repo.

Super ideally, but totally up to you:

  • It would use vanilla node concepts -- streams, event emitters, callbacks;
  • It should be concentrated on solving the 80% use case instead of being flexible. i.e., we could allow plugging different odb backends here, but it might be better to just pick one that works in node and one that works in browser and ship something like that.

@maks
Copy link

maks commented Apr 25, 2013

pretty much agree with @chrisdickinson about a porcelain level library and documentation and "reference" implemenation. Especially for the in-browser use case, I'd imagine that people (such as me) would want to use it directly within a text editor/IDE webapp due to sandboxing, rather than as some sort of stand-alone app like you would on a desktop/mobile os.

With different ODBs, @chrisdickinson doesn't your https://github.com/chrisdickinson/git-odb already provide a nice pluggable means of having multiple backends?

@ghost ghost assigned creationix Apr 26, 2013
@creationix
Copy link
Owner

Since we've been working together for the last week and a half and already depend on some of @chrisdickinson's repos, I'm going to close this issue. See new issues for specific tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants