-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Completely remove cache #96
Conversation
516d5c0
to
aa64206
Compare
Codecov Report
@@ Coverage Diff @@
## master #96 +/- ##
============================================
+ Coverage 85.69% 86.83% +1.13%
+ Complexity 743 682 -61
============================================
Files 55 53 -2
Lines 2524 2294 -230
Branches 187 175 -12
============================================
- Hits 2163 1992 -171
+ Misses 259 215 -44
+ Partials 102 87 -15
Continue to review full report at Codecov.
|
// Consider sorting by blob pos or even grouping by the page of the blob pos and then flat-mapping the reads by page. | ||
// Values stream can now be parallel, but it breaks everything... | ||
LongStream longStream = blocks.values(lookups[hash].getValue(lookupKey)); | ||
return longStream.mapToObj(blobs[hash]::read); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applying Parallel here means the calling application cannot also be using parallel.
The previous version, using lazy concat did not suffer this issue because calling parallel on a stream is more of an aspirational statement than a demand.
The result is that our reads have never been parallel and I am unsure how to resolve this. Previously, when we hit this issue with using parallel during write we just removed that code - having decided it was not necessary (it was validating the page file metadata on open - that check and the metadata are now gone).
I think we should look at this in the larger context of how to make the gastronomer list queries actually execute in parallel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At present, some portions of the query are, but many are not parallel.
84f7199
to
2c36e71
Compare
StandardOpenOption[] openOptions; | ||
if (readOnly) { | ||
if (readOnly){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after paren?
@@ -213,7 +202,7 @@ public void append(final long pos, final long val) { | |||
log.trace("appended value {} to {} at {}", val, file, pos); | |||
} | |||
|
|||
public LongStream values(Long pos) { | |||
public LongStream values(Long pos){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space after paren?
} else { | ||
int numValues = (int) size; | ||
long[] values = new long[numValues]; | ||
for (int i = 0; i < numValues; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use an offset
variable here as well?
I have a feeling that this method could be rewritten to make it more clear, and the two loops could probably be combined into one. You could also move the if(size > valuesPerBlock)
and if(size == 0)
checks to the top of the ladder and make them separate if
statements -- no else
clauses would be needed.
But otoh if you're going for speed, then there's something to be said for putting the most likely case first in the if
ladder. The value of doing that depends on how often the method gets called, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right - this method is definitely in need of clarification.
I am going to leave the implementation alone for now. I have marked it as deprecated. The next task for Uppend is implementing this method using a spliterator that allows parallel reads.
|
||
long size = buf.getLong(); | ||
buf.getLong(); | ||
final long size = readLong(pos); | ||
|
||
if (size < 0) { | ||
long nextPos = -size; | ||
long[] values = new long[valuesPerBlock]; | ||
for (int i = 0; i < valuesPerBlock; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use offset
variable here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out the lazy method entirely.
|
||
if (size < 0) { | ||
long nextPos = -size; | ||
long[] values = new long[valuesPerBlock]; | ||
for (int i = 0; i < valuesPerBlock; i++) { | ||
values[i] = buf.getLong(); | ||
values[i] = readLong(pos + 16 + i * 8); | ||
} | ||
return LongStreams.lazyConcat(Arrays.stream(values), () -> values(nextPos)); | ||
} else if (size > valuesPerBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reorganize with if(size > valuesPerBlock)
and if(size == 0)
at top?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented out the lazy method entirely.
@CommandLine.Option(names = {"-b", "--buffer-size"}, description = "Buffer Size (small|medium|large)") | ||
BufferSize bufferSize = BufferSize.medium; | ||
|
||
@CommandLine.Option(names = {"-s", "--size"}, description = "Benchmark size (nano|micro|small|medium|large|huge|gigantic)") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reminds me of this:
https://en.wikipedia.org/wiki/Intel_Memory_Model#Memory_models
The combinations were for code and/or data to be less than/greater than 64KB.
package com.upserve.uppend.cli.benchmark; | ||
|
||
public enum BufferSize { | ||
small(16 * 1024 * 1024 + 16), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment on the need for + 16
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Page size used to include 16 bytes of metadata.
To allow page size == buffer size, I added 16 here.
Removed.
@@ -156,21 +203,36 @@ public Long findKey(VirtualLongBlobStore longBlobStore, LookupKey key) { | |||
if (comparison < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is the positive comparison, I'd put this one first:
if(comparison == 0) {
key.setPosition(keyPosition);
hitCount.increment();
return longBlobStore.readLong(keyPosition);
}
The remaining if/else can be a bit simpler:
if (comparison < 0) {
upperKey = midpointKey;
keyIndexUpper = midpointKeyIndex;
bisectKeyTreeArrayIndex = bisectKeyTreeArrayIndex * 2;
} else
lowerKey = midpointKey;
keyIndexLower = midpointKeyIndex;
bisectKeyTreeArrayIndex = bisectKeyTreeArrayIndex * 2 + 1;
}
return 0; | ||
} | ||
return -1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can shorten the previous if
statement:
if(o1 == null)
return o2 == null ? 0 : -1;
SafeDeleting.removeTempPath(path); | ||
|
||
appendOnlyStore = new AppendOnlyStoreBuilder() | ||
.withPartitionSize(numPartitions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method name confuses me. Is it the size of the partition or the number of partitions? Should the method be renamed withPartitionCount
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ugh - your right - lets hope the IDE does most of the work on this one...
SafeDeleting.removeTempPath(path); | ||
|
||
appendOnlyStore = new AppendOnlyStoreBuilder() | ||
.withPartitionSize(numPartitions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withPartitionCount
?
Changes: