Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ethereum Core Devs Meeting 30 Agenda #28

Closed
Souptacular opened this issue Dec 1, 2017 · 43 comments
Closed

Ethereum Core Devs Meeting 30 Agenda #28

Souptacular opened this issue Dec 1, 2017 · 43 comments

Comments

@Souptacular
Copy link
Contributor

Souptacular commented Dec 1, 2017

Ethereum Core Devs Meeting 30 Agenda

Meeting Date/Time: Friday 12/15/17 at 14:00 UTC

Meeting Duration 1.5 hours

YouTube Live Stream Link

Agenda

  1. Testing Updates.
  2. Digital cats caused network congestion this month. Meow.
    a. Why did this happen and what solutions are available to prevent future network congestion? See comments below for some ideas.
    b. Stateless Clients proposal.
    c. Would having minimum system requirements to set up an optimal client/full node help?
    d. Is the bottleneck is not just disk bandwidth, but specifically sequential disk bandwidth?
    e. Vitalik has some ideas around gas cost changes and scalability-relevant client optimizations.
  3. Plans on Quantum-resistant cryptography and any plans to include it in the next update?
  4. Introduction to K-EVM team (Everett H.)
  5. Does it remain the case that the Yellow Paper is intended to be Ethereum's formal specification?

Time permitting:
6. Parity stuck ether proposals.
7. POA Testnet unification [Update]
8. Core team updates.

Please provide comments to add or correct agenda topics.

@5chdn
Copy link
Contributor

5chdn commented Dec 5, 2017

Shall we talk about transaction backlogs?

https_ 2f 2fblueprint-api-production s3 amazonaws com 2fuploads 2fcard 2fimage 2f667581 2f7e9f9d4e-58af-4ae6-b741-b99756ad9fac

Just some random thoughts.

  • Shouldn't the block gas limit go up at this consistently high transaction load?

screenshot at 2017-12-06 10-51-33

  • Is there anything short-term we can do? Like recommending higher gas limits? Is it even safe to recommend higher block gas limits? If yes, what would be a reasonable limit?

  • Is there anything we can do to improve applications like crypto-kitties to use less gas, or anything else to relax the situation? Did anyone look into options yet?

@ethernian
Copy link

ethernian commented Dec 6, 2017

Is there anything short-term we can do?

Just a mid-term raw idea (not perfect, I know):
We could limit gas usage (or increase min gas price specifically for heavy contracts) per contract group (contracts with the same codebase) if network becomes overloaded. A contract deployer can't easily overcome this restriction by delivering many slightly altered contracts with another codebase, because this bunch of different contracts could not be so easily trusted and accepted as the single one. Such "loadbalancing" is in tradeoff to acceptance.

@v1thesource
Copy link

With Cryptokitties making up >10% of all tx's currently (https://ethgasstation.info/gasguzzlers.php), the best medium-term solution may be helping them implement a payment channel mechanism. Uncles / total blocks per day is around 21%, which is not disastrous, but is only aggravated by a gas limit increase. If the gas limit is increased, we'd have to tell everyone to wait for more block confirmations per tx to make sure they get on the right chain.

Crazy idea, but at this point it may be worth looking at increasing the target time per block from ~15s. Users will have to wait for multiple confirmations anyway with the current increasing uncle rate, at least with a higher block time interval we can increase the gas limit with a reduced effect on uncle rate.

@rolandkofler
Copy link

@dip239 while I see the first part of the idea, the second part "contracts must be audited" is easly circumvented by adding harmless nonsense functions. And it would be lead to batteries of loadbalancer contracts anyway.

@AlexeyAkhunov
Copy link
Contributor

I would like to bring up the Stateless Clients proposal, as I described here: https://medium.com/@akhounov/how-to-speed-up-ethereum-in-the-face-of-crypto-kitties-7a9c901d98e9

I am collecting more data now about how much impact it can make, and what is the overhead, hopefully can present something very briefly

@ethernian
Copy link

part "contracts must be audited" is easly circumvented by adding harmless nonsense functions

My idea is not perfect, I agree, it is more the way of thinking about the problem: I am just trying to punisch an excessive gas consumption by target contracts instead of gas provisioning by transactions.

Nevertheless my point was, that "loadbalancing" will not working "for free": a careful user needs to trust N contracts instead of single one if their codebase is not identical. Personally I wouldn't trust a bunch of loadbalancing contracts with "almost" the same code: too much to check every single one.
But CryptoKitty players possibly do not care about contracts they trust at all.

@coinaisseur
Copy link

Whatever solution we adopt, we can all agree that this is an emergency situation that must be solved short term. With the 'accidental' success of CryptoKitties, we can assume there are a bunch of developers coding Ethereum dApps right now as I write this message, so this transaction backlog will only get worse from now.

@ethernian
Copy link

Thought more:
... there should be some "central contract" behind the "loadbalancer", coordinating the whole application. We could sum all gas burned in all transactions going through this "central contract" in some time frame (TxGasBurningRate for this contract). If the network is currently overloaded AND some contract is involved into excessive gas burning, all transactions going throw it should be deincentivized by higher gas price.

further discussion is moved to ethereum/research

@ghost
Copy link

ghost commented Dec 8, 2017

Might be missing something obvious here: Why do we have a static blocktime target, variable gas limits, and (a more abstract) acceptable uncle rate (which is actually variable). Why isn't the blocktime target also variable in order to target a more well defined/specified uncle rate target? (or uncle/time rate to keep it fair for miners)

@vbuterin
Copy link

vbuterin commented Dec 12, 2017

Why isn't the blocktime target also variable in order to target a more well defined/specified uncle rate target?

The blocktime target is flexible as of Byzantium, to keep total rewards roughly constant. See it rising slightly here: https://etherscan.io/chart/blocktime

I personally oppose further blocktime increases. The contribution of the fast blocktime to the total uncle rate is relatively small, and furthermore it's ADDITIVE, not multiplicative, with contribution to uncle rate from capacity. That is:

uncle_rate ~= k1 / blocktime + k2 * gas_per_sec

This is confirmed with bitcoin in Decker and Wattenhofer's 2013 paper, and experience suggests the same is true with ethereum. Right now it's the second term in the sum that is the problem, not the first.

IMO we should consider a few optimizations:

  • Do another round of increasing gas costs on account-accessing opcodes (BALANCE, EXTCODESIZE, etc), and SLOAD, as that's still our major weak point from the PoV of DoS resistance. I'd recommend SLOAD -> 320, BALANCE -> 800, EXTCODESIZE/CALL/CALLCODE/DELEGATECALL/.... -> 1200. But we should add an exception, that self-calls and calls to precompiles cost only 100.
  • Some variant of Kill dust accounts EIPs#168 and Dust account replay security EIPs#169 to alleviate state size growth
  • Increase the cost of sending a tx by 30000 if it goes to a currently empty account

I also totally support the idea of stateless clients. Right now it actually already is possible to implement without any core protocol changes, as long as miners are stateful. There's also the possibility of a "stateless partially full node" - be a light node by default, but fully (statelessly) verify specific blocks if a trusted server tells you that they're invalid. This gives the security model that you won't accept an invalid block unless BOTH (i) there is an active 51% attack, and (ii) all trusted servers you're connecting to are colluding.

Also, it would make sense to have a much more coordinated benchmarking effort, so we can see what opcodes are currently the slowest, and what can be done to improve their execution speed.

Finally, we should have a poll on where we are at for key scalability-relevant client optimizations. This includes:

  • Garbage collection
  • On-disk state caching
  • State tree pruning
  • Network compression
  • Database optimization

@tejasriram
Copy link

I would like to hear about the Ethereum team's plans on Quantum-resistant cryptography and any plans to include it in the next update?

@RSAManagement
Copy link

Hallo, I would like the foundation to recommend the minimum system requirements to set up an optimal client/full node. This is probably a basic step to mitigate a bit the uncle rate problem, it seem that the hard drive is one of the most important bottleneck given the high number of I/O calls to the database.
https://medium.com/@akhounov/how-to-speed-up-ethereum-in-the-face-of-crypto-kitties-7a9c901d98e9

@vbuterin
Copy link

I would like to hear about the Ethereum team's plans on Quantum-resistant cryptography and any plans to include it in the next update?

Properly incorporating this requires account abstraction, which is going into the sharding spec; I don't think there is yet consensus on how/when it's going into the main chain. Abstraction will also be available for Casper validators.

@vbuterin
Copy link

@vbuterin
Copy link

I do have a question that I'd like to hear answered as well as possible.

It seems to me that the bottleneck is not just disk bandwidth, but specifically sequential disk bandwidth. That is, for example, if we somehow magically knew ahead of time what state tree nodes need to be accessed, and we could make the accesses happen in parallel, then processing speed could be increased greatly.

First, is this true? That is, is it the case that loading 1000 specific state trie keys from the DB in parallel is much faster than doing it sequentially? Second, if so, how much faster?

If there are substantial gains to be made, then there are clever things we can do, like requiring miners to provide a witness specifying what accounts and storage keys get accessed in the block, and additionally it means that there are potential great scalability gains in EIP 648.

@AlexeyAkhunov
Copy link
Contributor

@vbuterin Thanks a lot of the answers! I am still trying to do the full mode sync of geth, and now I hit the road block because my SSD is only 500Gb and doing it on HDD is simply too slow, so I am currently stuck around block 4.5m - 9th of November 2017 :). That is why I am trying to optimise geth a bit. But I have managed to compute the sizes of the witness for the blocks around DoS attacks in September 2016. Very often, the witness would be like 37Mb. I have not analysed yet why.

Regarding your second question about parallel reads from the DB, I also thought about it and I looked at how exactly geth (and parity too) organises the accounts and their storage - I will prepare a blog post on that, because it also explain how I calculated the witness size.
I also looked at LevelDB implementation that geth uses to see if there is any gain from concurrent reads. I doubt there is. Because of the way the data is stored, there is no locality, and data even from the nearby trie branches are randomly scattered across the whole database. So reading them in parallel would require loading more LevelDB blocks into memory and seeking them.

@pirapira
Copy link
Member

@Souptacular About KEVM, Everett and some of his colleagues will be joining the call. So give me an agenda item: "introduction KEVM (Everett)". It would fit nicely before the YP discussion.

@AlexeyAkhunov
Copy link
Contributor

@vbuterin Actually I take it back - I think there will be improvement in trying to access trie nodes in parallel. Because currently lots of time is spent in navigating down the trie, reading lower level only after the higher one. And that exacerbate the high latency of HDD/SDD. I will definite try that.
Another thing we could do is only include parts of the keys in the "witness hint", lets say, only first 8 bytes instead of all 32, and use non-exact seek operation to read from DB. I will look into that too.

@pirapira That is great! I have read KEVM paper after DevCon3 and will be curious to hear the discussion

@pkieltyka
Copy link

pkieltyka commented Dec 13, 2017

Im just following the discussion regarding data storage - I highly recommend the embedded db https://github.com/dgraph-io/badger which is a RocksDB implementation in pure Go. It's very robust, tested, and supports concurrent reads, ACID transactions, batching and snapshots. The original RocksDB btw is a fork of LevelDB by Facebook with more concurrency features/tuning - so I expect the work necessary to replace geth's existing use of github.com/syndtr/goleveldb/leveldb to badger will be quite minimal. The benefits: more performance, no more CGO for the db (leaks? call penalty?), and maybe disk space too depending if there is any data compaction in geth's db (to release old unused space from deleted/changed entries), or opportunities for compression.

@AlexeyAkhunov
Copy link
Contributor

@pkieltyka I have encountered BadgerDB yesterday and it looks interesting. Another thing to try, thanks!

@5chdn
Copy link
Contributor

5chdn commented Dec 13, 2017

Hey @pkieltyka just FYI, we have massive issues with RocksDB and are currently in the process of replacing it in Parity. openethereum/parity-ethereum#6280

@pkieltyka
Copy link

@5chdn I wasn't suggesting to use RocksDB, I suggested to evaluate Badger, an alternative implementation of a LSM in pure Go, inspired by RocksDB. I don't think that issue applies here.

@vbuterin
Copy link

I just synced geth, parity and harmony over the last few days to see how they are handling the load.

Here is my feedback. I ran this on Ubuntu 17.10, with a 512GB SSD with 16 GB RAM; in all three clients I used the appropriate setting to set the cache size to 6 GB.

  • Parity - the warp sync feature failed outright (never even once downloaded a single chunk), and the client did a full sync. This finished after ~2.5 days (not constant online, there were a few offline periods). The processing speed was sometimes ~25-40 mgas/s, and sometimes ~5 mgas/s (see Syncing efficiency inconsistent, sometimes quickly changes Mgas/s openethereum/parity-ethereum#7258). Storage size is 41 GB.
  • Geth - the client randomly crashed the first couple of times I ran it, and then the third time it managed to download all the block receipts/headers and concentrated on downloading the state, and that time it worked. Took ~8 hours, with a total of ~50 million state objects. When processing blocks, the speed is sometimes ~20-30 mgas/s, and sometimes ~3-6 mgas/s. Storage size is 47 GB.
  • Harmony - the client successfully did the fast sync, in ~8 hours, with a total of ~60 million state objects (maybe harmony counts contract code as a state object and geth doesn't, or something similar? not sure what is causing the disagreement; both times it synced around block 4.7m). When processing blocks, the speed is sometimes ~20-30 mgas/s, and sometimes ~3-8 mgas/s. Storage size is 25 GB.

Thoughts at first glance:

  • We should really look into DB optimization
  • All clients should bump up the default cache sizes
  • We need to fast sync work more reliably, and particuarly make it not lose progress if the user closes the client halfway through the fast sync process

@AlexeyAkhunov
Copy link
Contributor

Yes, I managed to do the fast sync too. But not the "full" sync mode. Never mind, I have now ordered 4TB SSD, should arrive in a couple of days :)

@5chdn
Copy link
Contributor

5chdn commented Dec 14, 2017

@vbuterin yes, the warp issue is a well-known annoyance. openethereum/parity-ethereum#6372

@LefterisJP
Copy link

LefterisJP commented Dec 14, 2017

I have now ordered 4TB SSD

@AlexeyAkhunov
Oh damn yeah. Need that too. Any model recommentation?

@vbuterin I have not tried Harmony, but I have similar experience with geth and parity.

One other thing that would be really really nice, but probably quite difficult to achieve, is make it possible to do a sync in an HDD. I have tried to do mainnet syncs in HDDs many times. Fast/warp works fine (after many many retries), but after finishing it an HDD just can't keep up with the network with neither parity nor geth.

@AlexeyAkhunov
Copy link
Contributor

AlexeyAkhunov commented Dec 14, 2017

@LefterisJP

Any model recommentation?

I chose Samsung 850 EVO, but cannot recommend it until I use myself :)

make it possible to do a sync in an HDD

I am trying to hack together a version of geth that can do that. That is what I have spent most of my time last few days...
Otherwise we would have lost the ability to run full nodes without SSD

@cslarson
Copy link

cslarson commented Dec 15, 2017

EIP648 (Easy parallelizability) was brought up on reddit and there was some hope there might be some discussion of it during the dev meeting. Where/If it fits into the roadmap would be great to hear.

@RSAManagement
Copy link

RSAManagement commented Dec 15, 2017

I would like to add some more observations :
1- It seems to me that the uncle rate is partially related to reaching the block gas limit and to growth of the mempool size. So probably raising carefully the block gas limit could lower a bit the uncle rate. (in the short term)

Question is: How much does it cost in terms of computational stress (time to manage and broadcast a block) the mempool management when the the gas limit is reached ? Is it something to do in this specific area?

2- the actual uncle rate is high (about 26%) but lower than the 33% we reach a couple of weeks ago when the gas limit was 6.7 mil. (now is about 8 mil.).

@holiman
Copy link

holiman commented Dec 15, 2017

@pkieltyka yes have been looking into badger, and done some experiments. Orignally, I think a major blocker was that badger did panic on every fault, instead of surfacing errors. IIUC, that's been changed now, and we've done some more experiments. @fjl knows more, here's the first experiment from May this year: https://github.com/fjl/go-ethereum/tree/badger-exp

@fjl
Copy link

fjl commented Dec 15, 2017

Badger works, but it's not a lot faster than leveldb. The other thing to keep in mind is that badgers approach (keeping keyspace separate from value space) is only beneficial on SSDs.

@pkieltyka
Copy link

@fjl Badger has iterated a lot since 7months ago when you made your badger-exp branch. It’s prob worth upgrading the dep and trying again. True it is optimized for SSDs but worth benching as well on an HDD if that’s an important requirement.

@AlexeyAkhunov
Copy link
Contributor

AlexeyAkhunov commented Dec 15, 2017

Just to leave it here. There are 3 things I am trying to do with geth to create an optimised version (that will also hopefully work with HDD):

  1. Disable background miner unless the miner is enabled (currently it is still running)

  2. REMOVED
    While processing blocks, not to write state to disk in the middle of the block, even for pre-Byzantium blocks. Currently, whenever state.IntermediateRoot() is called, it forces disk write of the trie. At the end of each block, there is a batched write to the disk, only that one should be performed.

  3. Write/read state to/from disk as key-value pairs directly as well as trie-structure, which will require order of magnitude fewer reads

@karalabe
Copy link
Member

Geth does not write to disk during transaction processing, it keeps the trie in memory. Only statedb.Commit writes to disk, called once per block.

@AlexeyAkhunov
Copy link
Contributor

@karalabe Yes, you are right, of course. I am removing number 2

@axic
Copy link
Member

axic commented Dec 15, 2017

@vbuterin:

But we should add an exception, that self-calls and calls to precompiles cost only 100.

I like the idea of different cost for self-calling, because there is no cost for loading the code again, but the rate compared to the "regular" call should be determined more carefully since it still needs to handle state changes (applying or rolling back depending on call outcome).

I'd be against subsidising precompiles even more.

@AlexeyAkhunov
Copy link
Contributor

@Souptacular @vbuterin I can do experiments with parallel SSD reads if you want (point 2e)

@Souptacular
Copy link
Contributor Author

@AlexeyAkhunov That would be great! Thanks.

@cdetrio
Copy link
Member

cdetrio commented Dec 20, 2017

@tomachinz
Copy link

Can anyone give 1 reason why block gas limits need to be so low? Shouldn't "mining" be 90% useful and only minimal wastage of electricity. I mean gigahashing is the dumbest way to heat this already scorching planet to oblivion ever invented (bless you Satoshi), so a higher block gas limit must surely only help reduce power usage at the expense of miners (screw them anyhow).

I understand that running the transactions likely takes between 0.1 and 2 seconds out of the 17 second block time for Ethereum mainnet. Block gas limit should be targeting somewhere between 50% and 75% time utilisation (via reduced difficulty in the protocol, and via higher block gas limits, and via lower block rewards for miners etc). THAT would be mining.

@karalabe
Copy link
Member

Uncle rate.

@RSAManagement
Copy link

RSAManagement commented Feb 19, 2018

@tomachinz ethereum is actually having issue to scale on chain because of the time nedeed to validate a new block. The problem seems to be the heavy load of I/O on the hdd/ssd . Geth 1.8.0 could eventually mitigate this issue with a far better DB setup, there are other projects on their way such as TurboGeth ( @AlexeyAkhunov ).
We will see (when there will be another tx demand peak) if geth 1.8.0 (and/or other clients) allow more block gas limit.
Anyway sooner or later ethereum will hit some bandwidth limit (to maintain decentralization) that imho is much hard to solve (and probably there is some bandwidth problem right now).

Uncle rate is a good measure of the difficulty of ethereum to remain decentralized with increasing use of network resources.

@antonio-fr
Copy link

antonio-fr commented Feb 19, 2018

"Block gas limit should be targeting somewhere between 50% and 75% time utilization (via reduced difficulty in the protocol, and via higher block gas limits, and via lower block rewards for miners etc). "
So you're proposing to miners they would perform more work for less revenue? Are you personally okay to do work more for less money?
This is very good for Ethereum to be heavily used. But on top of bringing viable answers, the solutions to scale up need also to be decentralized. It is much easier to bring fast and simple solutions that are paving the way for a centralized system. A decent ETH node already requires a 32GB machine and 512GB SSD. So for the scaling, please be cautious about the decentralization. The fastest path is to build a centralized database controlled and ran by a few. This is not what to expect from this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests