Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional support for AEGIS encryption #900

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jedisct1
Copy link

@jedisct1 jedisct1 commented Feb 25, 2025

AEGIS is a new family of authenticated encryption algorithms that offers stronger security, higher usage limits, and better performance than AES-GCM.

This pull request adds support for a new -aegis command-line flag, allowing AEGIS-128X2 to be used as an alternative to AES-GCM on CPUs with AES acceleration.

It also introduces the ability to use ciphers with different key sizes, as well as the ability to compile gocryptfs without CGO out of the box, without having to explicitly pass the without_openssl and without_aegis tags.

I believe it would be a great addition, but I understand if it can't be merged.
More information on AEGIS is available here:

$ gocryptfs -speed speed # on Apple M1:
AES-GCM-256-OpenSSL              3718.79 MB/s
AES-GCM-256-Go                   5083.43 MB/s   (selected in auto mode)
AES-SIV-512-Go                    625.20 MB/s
XChaCha20-Poly1305-OpenSSL       1358.63 MB/s   (selected in auto mode)
XChaCha20-Poly1305-Go             832.11 MB/s
Aegis128X2-Go                   11818.73 MB/s
$ gocryptfs -speed speed # on AMD Zen 4:
AES-GCM-256-OpenSSL              5215.86 MB/s
AES-GCM-256-Go                   6918.01 MB/s   (selected in auto mode)
AES-SIV-512-Go                    449.61 MB/s
XChaCha20-Poly1305-OpenSSL       2643.48 MB/s
XChaCha20-Poly1305-Go            3727.46 MB/s   (selected in auto mode)
Aegis128X2-Go                   28109.92 MB/s

with export CC='clang -O3 -march=native':

Aegis128X2-Go                   31947.54 MB/s

AEGIS is a new family of authenticated encryption algorithms that offers
stronger security, higher usage limits, and better performance than AES-GCM.

This pull request adds support for a new `-aegis` command-line flag, allowing
AEGIS-128X2 to be used as an alternative to AES-GCM on CPUs with AES acceleration.

It also introduces the ability to use ciphers with different key sizes.

More information on AEGIS is available here:
- https://cfrg.github.io/draft-irtf-cfrg-aegis-aead/draft-irtf-cfrg-aegis-aead.html
- https://github.com/cfrg/draft-irtf-cfrg-aegis-aead

gocryptfs -speed speed on Apple M1:

AES-GCM-256-OpenSSL              3718.79 MB/s
AES-GCM-256-Go                   5083.43 MB/s   (selected in auto mode)
AES-SIV-512-Go                    625.20 MB/s
XChaCha20-Poly1305-OpenSSL       1358.63 MB/s   (selected in auto mode)
XChaCha20-Poly1305-Go             832.11 MB/s
Aegis128X2-Go                   11818.73 MB/s

gocryptfs -speed speed on AMD Zen 4:

AES-GCM-256-OpenSSL              5215.86 MB/s
AES-GCM-256-Go                   6918.01 MB/s   (selected in auto mode)
AES-SIV-512-Go                    449.61 MB/s
XChaCha20-Poly1305-OpenSSL       2643.48 MB/s
XChaCha20-Poly1305-Go            3727.46 MB/s   (selected in auto mode)
Aegis128X2-Go                   28109.92 MB/s
@rfjakob
Copy link
Owner

rfjakob commented Feb 26, 2025 via email

@jedisct1
Copy link
Author

./benchmark.bash
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 0.236343 s, 1.1 GB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 0.0838405 s, 3.1 GB/s
UNTAR: 4.070
MD5:   2.300
LS:    1.515
RM:    0.963
./benchmark.bash -aegis
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 0.221578 s, 1.2 GB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 0.0658518 s, 4.0 GB/s
UNTAR: 4.107
MD5:   2.302
LS:    1.622
RM:    0.961

I'm not convinced that this benchmark is very representative, though, as actual I/O is the limiting factor. The system CPU usage is significantly reduced when using AEGIS, though, which matter on real workloads.

@rfjakob
Copy link
Owner

rfjakob commented Feb 27, 2025

FYI Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz; with AES-GCM acceleration

$ gocryptfs -speed
gocryptfs v2.5.1-9-g42f2c13.jedisct1-aegis; go-fuse v2.5.0; 2025-02-27 go1.23.5 linux/amd64
cpu: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz; with AES-GCM acceleration
AES-GCM-256-OpenSSL       	  799.67 MB/s
AES-GCM-256-Go            	 1085.84 MB/s	(selected in auto mode)
AES-SIV-512-Go            	  177.87 MB/s
XChaCha20-Poly1305-OpenSSL	  646.18 MB/s
XChaCha20-Poly1305-Go     	  934.34 MB/s	(selected in auto mode)
Aegis128X2-Go             	 5782.52 MB/s
$ ./benchmark.bash 
Testing gocryptfs    at /tmp/benchmark.bash.zWh: gocryptfs v2.5.1-9-g42f2c13.jedisct1-aegis; go-fuse v2.5.0; 2025-02-27 go1.23.5 linux/amd64
/tmp/benchmark.bash.zWh.mnt is a mountpoint
Downloading linux-3.0.tar.gz
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 0,557764 s, 470 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 0,302142 s, 868 MB/s
UNTAR: 10,336
MD5:   5,191
LS:    4,076
RM:    2,129
$ ./benchmark.bash -aegis
Testing gocryptfs  -aegis  at /tmp/benchmark.bash.NZt: gocryptfs v2.5.1-9-g42f2c13.jedisct1-aegis; go-fuse v2.5.0; 2025-02-27 go1.23.5 linux/amd64
/tmp/benchmark.bash.NZt.mnt is a mountpoint
WRITE: 262144000 bytes (262 MB, 250 MiB) copied, 0,421339 s, 622 MB/s
READ:  262144000 bytes (262 MB, 250 MiB) copied, 0,171538 s, 1,5 GB/s
UNTAR: 9,600
MD5:   4,790
LS:    3,920
RM:    2,122

@@ -98,7 +109,7 @@ func New(key []byte, aeadType AEADTypeEnum, IVBitLen int, useHKDF bool) *CryptoC
{
var emeBlockCipher cipher.Block
if useHKDF {
emeKey := hkdfDerive(key, hkdfInfoEMENames, KeyLen)
emeKey := hkdfDerive(key, hkdfInfoEMENames, keyLen)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you downgrade EME to AES-128 here when aegis is used

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about defining a dedicated EmeKeyLen constant? That would avoid confusion between the different key lengths.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This should fix the issue with EME.

rfjakob added a commit to rfjakob/aegis that referenced this pull request Feb 27, 2025
4 KiB is interesting because it is the default
page size on Linux, and also the block size
that gocryptfs (a disk encryption tool) uses.

gocryptfs is contemplating adding AEGIS support
( rfjakob/gocryptfs#900 ).

Output now looks like this:

$ go test -bench .
goos: linux
goarch: amd64
pkg: github.com/ericlagergren/aegis
cpu: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
BenchmarkSeal16B_128L-4   	12179401	        97.09 ns/op	 164.80 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen16B_128L-4   	11123548	       104.7 ns/op	 152.75 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal1K_128L-4    	 4735476	       253.1 ns/op	4045.21 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen1K_128L-4    	 4614565	       259.0 ns/op	3953.60 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal4K_128L-4    	 1685662	       712.3 ns/op	5750.17 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen4K_128L-4    	 1667968	       719.5 ns/op	5692.93 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal8K_128L-4    	  867411	      1327 ns/op	6175.61 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen8K_128L-4    	  858153	      1333 ns/op	6143.63 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal16K_128L-4   	  460386	      2543 ns/op	6441.87 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen16K_128L-4   	  458068	      2556 ns/op	6409.65 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal16B_256-4    	12555355	        95.04 ns/op	 168.35 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen16B_256-4    	11600582	       102.5 ns/op	 156.05 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal1K_256-4     	 3595903	       336.3 ns/op	3045.34 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen1K_256-4     	 3511225	       340.0 ns/op	3011.56 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal4K_256-4     	 1000000	      1056 ns/op	3878.99 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen4K_256-4     	 1000000	      1046 ns/op	3914.79 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal8K_256-4     	  572815	      2007 ns/op	4080.76 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen8K_256-4     	  586015	      1989 ns/op	4119.45 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal16K_256-4    	  295207	      3930 ns/op	4168.82 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen16K_256-4    	  303639	      3878 ns/op	4224.89 MB/s	       0 B/op	       0 allocs/op
PASS
ok  	github.com/ericlagergren/aegis	26.662s
@rfjakob
Copy link
Owner

rfjakob commented Feb 27, 2025

FYI looks like https://github.com/ericlagergren/aegis is faster. I guess the CGO overhead hits libaegis hard.

Correction: ericlagergren/aegis AEGIS-128L is the same speed as go-libaegis AEGIS-128X2.

(4K benchmark added via ericlagergren/aegis#16)

$ go test -bench 4K
goos: linux
goarch: amd64
pkg: github.com/ericlagergren/aegis
cpu: Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
BenchmarkSeal4K_128L-4   	 1683678	       712.3 ns/op	5750.21 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen4K_128L-4   	 1665622	       719.6 ns/op	5691.92 MB/s	       0 B/op	       0 allocs/op
BenchmarkSeal4K_256-4    	 1000000	      1054 ns/op	3887.53 MB/s	       0 B/op	       0 allocs/op
BenchmarkOpen4K_256-4    	 1000000	      1046 ns/op	3917.63 MB/s	       0 B/op	       0 allocs/op
PASS
ok  	github.com/ericlagergren/aegis	6.001s

@rfjakob
Copy link
Owner

rfjakob commented Feb 27, 2025

https://github.com/aegis-aead/libaegis benchmark:

AEGIS-128X2 being slower than AEGIS-128L means my CPU is too old (Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz) I guess.

AEGIS-256   	44415	Mb/s	5552	MB/s
AEGIS-256X2 	46837	Mb/s	5855	MB/s
AEGIS-256X4 	36029	Mb/s	4504	MB/s
AEGIS-128L   	68178	Mb/s	8522	MB/s
AEGIS-128X2 	54483	Mb/s	6810	MB/s
AEGIS-128X4 	48355	Mb/s	6044	MB/s

@jedisct1
Copy link
Author

On the vast majority of server and desktop CPUs, including on Apple Silicon, 128X2 is generally the fastest.

You are right that 128L is currently faster on older CPUs that don't support AVX2.

An implementation trick will eventually minimize the difference.

So, if we want to only support one variant, 128X2 looks like a decent default.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants