Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/md5 management #108

Merged
merged 17 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
78 changes: 78 additions & 0 deletions dice-where-downloader-lib/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,55 @@
<slf4j.version>1.7.30</slf4j.version>
<jackson.version>2.14.2</jackson.version>
<javax-bind.version>2.3.1</javax-bind.version>
<test-containers.version>1.19.8</test-containers.version>
<wiremock.version>3.0.1</wiremock.version>
</properties>

<build>
<plugins>
<plugin>
<groupId>org.jacoco</groupId>
<artifactId>jacoco-maven-plugin</artifactId>
<version>${jacoco.version}</version>
<executions>
<execution>
<id>pre-unit-tests</id>
<goals>
<goal>prepare-agent</goal>
</goals>
</execution>
<execution>
<id>post-unit-tests</id>
<phase>test</phase>
<goals>
<goal>report</goal>
</goals>
</execution>
<execution>
<id>default-check</id>
<goals>
<goal>check</goal>
</goals>
<configuration>
<rules>
<rule implementation="org.jacoco.maven.RuleConfiguration">
<element>BUNDLE</element>
<limits>
<limit implementation="org.jacoco.report.check.Limit">
<counter>COMPLEXITY</counter>
<value>COVEREDRATIO</value>
<minimum>0.1</minimum>
</limit>
</limits>
</rule>
</rules>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>

<dependencies>
<dependency>
<groupId>software.amazon.awssdk</groupId>
Expand Down Expand Up @@ -60,6 +107,37 @@
<version>${javax-bind.version}</version>
</dependency>

<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>testcontainers</artifactId>
<version>${test-containers.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>junit-jupiter</artifactId>
<version>${test-containers.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>localstack</artifactId>
<version>${test-containers.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>nginx</artifactId>
<version>${test-containers.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.wiremock</groupId>
<artifactId>wiremock</artifactId>
<version>${wiremock.version}</version>
<scope>test</scope>
</dependency>

</dependencies>

</project>
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import technology.dice.dicewhere.downloader.source.FileSource;

public abstract class Download {

private static final Logger LOG = LoggerFactory.getLogger(Download.class);

protected final boolean noCheckMd5;
Expand Down Expand Up @@ -38,25 +39,29 @@
result = processFileDoesNotExist(acceptor, fileSource, pathWritable);
}
LOG.info("A new file was" + (result.isNewFileDownloaded() ? "" : " not") + " downloaded");
LOG.info("Download is " + (!result.isSuccessful() ? "un" : "" + "successful"));
return result;
}

private DownloadExecutionResult processFileDoesNotExist(
FileAcceptor<?> acceptor, FileSource fileSource, boolean pathWritable) {

if (pathWritable) {
final MD5Checksum md5Checksum = fileSource.produce(acceptor);
LOG.info("File successfully transferred");
final MD5Checksum md5Checksum = fileSource.produce(acceptor, noCheckMd5);
LOG.info("File transferred");

Check warning on line 51 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java#L50-L51

Added lines #L50 - L51 were not covered by tests
if (!noCheckMd5) {
boolean checksumMatches = md5Checksum.matches(fileSource.fileInfo().getMd5Checksum());
if (!checksumMatches) {
LOG.warn(
LOG.error(

Check warning on line 55 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java#L55

Added line #L55 was not covered by tests
"Local and remote files' MD5 do not match: "
+ md5Checksum.stringFormat()
+ " Vs. "
+ fileSource.fileInfo().getMd5Checksum().stringFormat());
} else {
LOG.info("MD5 matches that of the remote file");
LOG.info("MD5 matches that of the remote file: "
+ md5Checksum.stringFormat()

Check warning on line 62 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java#L61-L62

Added lines #L61 - L62 were not covered by tests
+ " Vs. "
+ fileSource.fileInfo().getMd5Checksum().stringFormat());

Check warning on line 64 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java#L64

Added line #L64 was not covered by tests
}
return new DownloadExecutionResult(
true, checksumMatches, md5Checksum, acceptor.getUri(), checksumMatches);
Expand Down Expand Up @@ -87,7 +92,10 @@
+ " Vs. "
+ fileSource.fileInfo().getMd5Checksum().stringFormat());
} else {
LOG.info("MD5 matches that of the remote file");
LOG.info("MD5 matches that of the remote file: "
+ existingMd5.map(md5 -> md5.stringFormat()).orElse("?")

Check warning on line 96 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java#L95-L96

Added lines #L95 - L96 were not covered by tests
+ " Vs. "
+ fileSource.fileInfo().getMd5Checksum().stringFormat());

Check warning on line 98 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java#L98

Added line #L98 was not covered by tests
}
return new DownloadExecutionResult(
false,
Expand All @@ -106,7 +114,8 @@

protected abstract DownloadExecutionResult execute();

protected void checkNecessaryEnvironmentVariables() {}
protected void checkNecessaryEnvironmentVariables() {
}

Check warning on line 118 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/Download.java#L118

Added line #L118 was not covered by tests

public boolean isVerbose() {
return verbose;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
package technology.dice.dicewhere.downloader.actions.ipinfo;

import java.net.URI;
import technology.dice.dicewhere.downloader.Download;
import technology.dice.dicewhere.downloader.PathUtils;

public abstract class IpInfoBaseDownload extends Download {

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,9 @@
import technology.dice.dicewhere.downloader.stream.StreamConsumer;

public interface FileAcceptor<T> {
StreamConsumer<T> getStreamConsumer(MD5Checksum originalFileMd5, Instant originalFileTimestamp);

StreamConsumer<T> getStreamConsumer(MD5Checksum originalFileMd5, Instant originalFileTimestamp,
boolean noMd5Check);

boolean destinationExists();

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
import technology.dice.dicewhere.downloader.stream.StreamWithMD5Decorator;

public class LocalFileAcceptor implements FileAcceptor<Void> {

private static final Logger LOG = LoggerFactory.getLogger(LocalFileAcceptor.class);
public static final int BUFFER = 8192;

Expand All @@ -30,7 +31,7 @@ public LocalFileAcceptor(Path destination) {

@Override
public StreamConsumer<Void> getStreamConsumer(
MD5Checksum originalFileMd5, Instant originalFileTimestamp) {
MD5Checksum originalFileMd5, Instant originalFileTimestamp, boolean noMd5Check) {
styxtwo marked this conversation as resolved.
Show resolved Hide resolved
return (stream, size) -> {
try {
Files.createDirectories(destination);
Expand All @@ -39,6 +40,10 @@ public StreamConsumer<Void> getStreamConsumer(
LOG.debug("Destination directory already exists");
}
Files.copy(stream, destination, StandardCopyOption.REPLACE_EXISTING);
if ((!noMd5Check) && (!originalFileMd5.matches(stream.md5()))) {
styxtwo marked this conversation as resolved.
Show resolved Hide resolved
LOG.error("MD5 mismatch. Deleting destination file");
Files.delete(destination);
wermajew marked this conversation as resolved.
Show resolved Hide resolved
}
return null;
};
}
Expand Down Expand Up @@ -68,11 +73,12 @@ public Optional<MD5Checksum> existingFileMd5() {
BufferedInputStream bis = new BufferedInputStream(is);
StreamWithMD5Decorator md5Is = StreamWithMD5Decorator.of(bis)) {
byte[] buffer = new byte[BUFFER];
while ((md5Is.read(buffer)) != -1) {}
while ((md5Is.read(buffer)) != -1) {
}
return Optional.of(md5Is.md5());
} catch (IOException | NoSuchAlgorithmException e) {
throw new RuntimeException(
"Could not obtain md5 of the file existing at the target: " + destination.toString(),
"Could not obtain md5 of the file existing at the target: " + destination,
e);
}
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,11 @@
import java.util.HashMap;
import java.util.Map;
import java.util.Optional;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import software.amazon.awssdk.core.sync.RequestBody;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.DeleteObjectRequest;
import software.amazon.awssdk.services.s3.model.HeadObjectRequest;
import software.amazon.awssdk.services.s3.model.HeadObjectResponse;
import software.amazon.awssdk.services.s3.model.NoSuchKeyException;
Expand All @@ -22,8 +25,8 @@

public class S3FileAcceptor implements FileAcceptor<Void> {

private static final Logger LOG = LoggerFactory.getLogger(S3FileAcceptor.class);
private static final String LATEST_KEY = "latest";
public static final String MD5_METADATA_KEY = "md5";
public static final String TIMESTAMP_METADATA_KEY = "ts";
private final S3Client client;
private final String bucket;
Expand All @@ -42,11 +45,10 @@

@Override
public StreamConsumer<Void> getStreamConsumer(
MD5Checksum originalFileMd5, Instant originalFileTimestamp) {
MD5Checksum originalFileMd5, Instant originalFileTimestamp, boolean noMd5Check) {
return (StreamConsumer)
(stream, size) -> {
Map<String, String> objectMetadata = new HashMap<>();
objectMetadata.put(MD5_METADATA_KEY, originalFileMd5.stringFormat());
objectMetadata.put(
TIMESTAMP_METADATA_KEY, String.valueOf(originalFileTimestamp.toEpochMilli()));
PutObjectRequest putObjectRequest =
Expand All @@ -58,22 +60,29 @@
.storageClass(StorageClass.INTELLIGENT_TIERING)
.build();
client.putObject(putObjectRequest, RequestBody.fromInputStream(stream, size));

Latest latest = new Latest(clock.instant(), key);
String latestContent = mapper.writeValueAsString(latest);

PutObjectRequest putLatest =
PutObjectRequest.builder()
.key(Paths.get(key).getParent().toString() + "/" + LATEST_KEY)
.bucket(bucket)
.contentLength((long) latestContent.length())
.storageClass(StorageClass.INTELLIGENT_TIERING)
.build();
client.putObject(
putLatest,
RequestBody.fromInputStream(
new StringInputStream(latestContent), latestContent.length()));
if ((!noMd5Check) && (!originalFileMd5.matches(stream.md5()))) {
styxtwo marked this conversation as resolved.
Show resolved Hide resolved
LOG.error("MD5 mismatch. Deleting destination file");
client.deleteObject(DeleteObjectRequest.builder()
lcardito marked this conversation as resolved.
Show resolved Hide resolved
.bucket(bucket)
.key(key)
.build());
} else {

PutObjectRequest putLatest =
PutObjectRequest.builder()
.key(Paths.get(key).getParent().toString() + "/" + LATEST_KEY)
.bucket(bucket)
.contentLength((long) latestContent.length())
.storageClass(StorageClass.INTELLIGENT_TIERING)
.build();
client.putObject(
putLatest,
RequestBody.fromInputStream(
new StringInputStream(latestContent), latestContent.length()));
}
return null;
};
}
Expand Down Expand Up @@ -103,8 +112,7 @@

try {
final HeadObjectResponse headObjectResponse = client.headObject(headObjectRequest);
final Map<String, String> metadata = headObjectResponse.metadata();
return Optional.ofNullable(metadata.get(MD5_METADATA_KEY)).map(m -> MD5Checksum.of(m));
return Optional.ofNullable(headObjectResponse.eTag()).map(m -> MD5Checksum.of(m));

Check warning on line 115 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/destination/s3/S3FileAcceptor.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/destination/s3/S3FileAcceptor.java#L115

Added line #L115 was not covered by tests
} catch (NoSuchKeyException e) {
return Optional.empty();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
import technology.dice.dicewhere.downloader.stream.StreamWithMD5Decorator;

public abstract class BaseUrlSource implements FileSource {

protected FileInfo fileInfo;
protected final URL dataFileLocation;

Expand All @@ -19,14 +20,14 @@ protected BaseUrlSource(URL dataFileLocation) {
}

@Override
public MD5Checksum produce(FileAcceptor acceptor) {
public MD5Checksum produce(FileAcceptor acceptor, boolean noMd5Check) {
try {
HttpURLConnection httpConnection = (HttpURLConnection) this.dataFileLocation.openConnection();
httpConnection.setRequestMethod("GET");

try (StreamWithMD5Decorator is = StreamWithMD5Decorator.of(httpConnection.getInputStream())) {
acceptor
.getStreamConsumer(fileInfo.getMd5Checksum(), fileInfo.getTimestamp())
.getStreamConsumer(fileInfo.getMd5Checksum(), fileInfo.getTimestamp(), noMd5Check)
.consume(is, fileInfo.getSize());
return is.md5();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,8 @@
import technology.dice.dicewhere.downloader.destination.FileAcceptor;

public interface FileSource {

FileInfo fileInfo();

MD5Checksum produce(FileAcceptor consumer);
MD5Checksum produce(FileAcceptor consumer, boolean noMd5Check);
lcardito marked this conversation as resolved.
Show resolved Hide resolved
}
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,8 @@
import technology.dice.dicewhere.downloader.stream.StreamWithMD5Decorator;

public class S3Source implements FileSource {

private static Logger LOG = LoggerFactory.getLogger(S3Source.class);
public static final String MD5_METADATA_KEY = "md5";
public static final String TIMESTAMP_METADATA_KEY = "ts";
private final S3Client client;
private final String bucket;
Expand All @@ -43,11 +43,11 @@
HeadObjectRequest.builder().key(key).bucket(bucket).build();

final HeadObjectResponse headObjectResponse = client.headObject(headObjectRequest);
final Map<String, String> metadata = headObjectResponse.metadata();
if (!metadata.containsKey(MD5_METADATA_KEY)) {
if (headObjectResponse.eTag() == null) {
throw new DownloaderException(
"Remote file does not have md5 information. Please delete the file and re-upload");
}
final Map<String, String> metadata = headObjectResponse.metadata();

Check warning on line 50 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/source/s3/S3Source.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/source/s3/S3Source.java#L50

Added line #L50 was not covered by tests
if (!metadata.containsKey(TIMESTAMP_METADATA_KEY)) {
LOG.warn("Timestamp not available at source. Using now as timestamp.");
}
Expand All @@ -59,20 +59,20 @@
Optional.ofNullable(metadata.get(TIMESTAMP_METADATA_KEY))
.map(m -> Instant.ofEpochMilli(Long.valueOf(m)))
.orElse(Instant.now()),
MD5Checksum.of(metadata.get(MD5_METADATA_KEY)),
MD5Checksum.of(headObjectResponse.eTag()),

Check warning on line 62 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/source/s3/S3Source.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/source/s3/S3Source.java#L62

Added line #L62 was not covered by tests
size);
}

return this.fileInfo;
}

@Override
public MD5Checksum produce(FileAcceptor consumer) {
public MD5Checksum produce(FileAcceptor consumer, boolean noMd5Check) {
GetObjectRequest getObjectRequest = GetObjectRequest.builder().bucket(bucket).key(key).build();
try (final ResponseInputStream<GetObjectResponse> object = client.getObject(getObjectRequest);
StreamWithMD5Decorator is = StreamWithMD5Decorator.of(object)) {
consumer
.getStreamConsumer(fileInfo.getMd5Checksum(), fileInfo.getTimestamp())
.getStreamConsumer(fileInfo.getMd5Checksum(), fileInfo.getTimestamp(), noMd5Check)

Check warning on line 75 in dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/source/s3/S3Source.java

View check run for this annotation

Codecov / codecov/patch

dice-where-downloader-lib/src/main/java/technology/dice/dicewhere/downloader/source/s3/S3Source.java#L75

Added line #L75 was not covered by tests
.consume(is, fileInfo.getSize());
return is.md5();
} catch (IOException | NoSuchAlgorithmException e) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,5 +5,5 @@

@FunctionalInterface
public interface StreamConsumer<T> {
T consume(InputStream stream, long size) throws IOException;
T consume(StreamWithMD5Decorator stream, long size) throws IOException;
}
Loading
Loading