Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TODO: Re-enable full compression on DNS requests once https://github.com/miekg/dns/pull/668 is merged #4071

Closed
pierresouchay opened this issue May 1, 2018 · 1 comment · Fixed by #4131

Comments

@pierresouchay
Copy link
Contributor

This issue is to track the fact that Consul 1.0.7 did disable a part of the DNS compression:
While requests are sent compressed, the requests do not take compression into account while performing Len computation because Len computation as computed by DNS library was wrong (see https://github.com/hashicorp/consul/pull/3948/files#diff-0f2130d66e011e1af6e42f93492a3536R724 where we disable compression).

While miekg/dns#657 has been fixed, miekg/dns#663 is still open, but I made a PR to fix the computation: miekg/dns#668

Once miekg/dns#668 has been merged, we will be able to upgrade the DNS library and apply compression to Len() computation. Thus, when requesting 64k of data, 64k of data will be effectively sent, not 64k of data prior to computation)

This was not a big deal because, it only impacted cluster with huge amount of nodes (for which performance was bad), but since #4037 has now been merged, performance will be better with compression, so we might reconsider adding computation of Len() with compression.

Adding this will increase a lot the number of records returned by DNS requests and will be very interesting for DNS-based provisioning services such as HaProxy

@pierresouchay pierresouchay changed the title TODO: Re-enable full compression on DNS requests once https://github.com/miekg/dns/pull/668 TODO: Re-enable full compression on DNS requests once https://github.com/miekg/dns/pull/668 is merged May 1, 2018
pierresouchay added a commit to pierresouchay/consul that referenced this issue May 16, 2018
@pierresouchay
Copy link
Contributor Author

Here are the results of the patch #4131

Test protocol for large number of nodes:

  1. Create script setup_dns_overflow.sh:
#!/bin/bash
CONSUL_ADDR=${1:-http://localhost:8500}

create_node()
{
  curl -fs --request PUT -d '{"Node": "'$1'", "Address": "'$2'", "Service": {"Service": "'$3'", "Tags": ["http", "tag_'$1'", "start_'$(date +%Y%M%d)'"], "Port": 8000}}' $CONSUL_ADDR/v1/catalog/register -o /dev/null
}

networks=$(seq 1 20)
num_services=${NUM_SERVICES:-254}
for j in $networks
do
for i in $(seq 1 $num_services)
do
  create_node host-redis-$j-$i.test.acme.com 192.168.$j.$i redis || exit 1
done
done
  1. Launch agent with patch [BUGFIX] do not break when TCP DNS answer exceeds 64k #3948
$ consul agent -dev
  1. Launch script
$ ./setup_dns_overflow.sh
  1. Examine the size of returned records

while true; do http_count=$(curl -fs localhost:8500/v1/catalog/service/redis?pretty|grep '"Node"'|wc -l) ; dns_count=$(dig @localhost -p 8600 SRV redis.service.consul +short +tcp|wc -l); dns_a=$(dig @localhost -p 8600 redis.service.consul +tcp +short|wc -l); echo "HTTP: $http_count ; DNS_SRV: $dns_count ; DNS_A: $dns_a"; sleep 1; done

Results

Without Patch #4131

while true; do http_count=$(curl -fs localhost:8500/v1/catalog/service/redis?pretty|grep '"Node"'|wc -l) ; dns_count=$(dig @localhost -p 8600 SRV redis.service.consul +short +tcp|wc -l); dns_a=$(dig @localhost -p 8600 redis.service.consul +tcp +short|wc -l); echo "HTTP: $http_count ; DNS_SRV: $dns_count ; DNS_A: $dns_a"; sleep 1; done
HTTP:     1400 ; DNS_SRV:      418 ; DNS_A:     1407
HTTP:     1449 ; DNS_SRV:      419 ; DNS_A:     1438
HTTP:     1500 ; DNS_SRV:      418 ; DNS_A:     1438
HTTP:     1550 ; DNS_SRV:      418 ; DNS_A:     1438
HTTP:     1599 ; DNS_SRV:      419 ; DNS_A:     1439

With Patch #4131 (compression enabled while computing size)

HTTP:      339 ; DNS_SRV:      342 ; DNS_A:      343
HTTP:      386 ; DNS_SRV:      388 ; DNS_A:      389
HTTP:      432 ; DNS_SRV:      435 ; DNS_A:      436
HTTP:      478 ; DNS_SRV:      480 ; DNS_A:      481
HTTP:      523 ; DNS_SRV:      526 ; DNS_A:      527
HTTP:      569 ; DNS_SRV:      572 ; DNS_A:      573
HTTP:      616 ; DNS_SRV:      618 ; DNS_A:      619
HTTP:      662 ; DNS_SRV:      648 ; DNS_A:      666
# From here, num of DNS SRV Records won't increase anymore
HTTP:      709 ; DNS_SRV:      647 ; DNS_A:      713
[...]

HTTP:     2626 ; DNS_SRV:      645 ; DNS_A:     2632
HTTP:     2675 ; DNS_SRV:      645 ; DNS_A:     2681
HTTP:     2724 ; DNS_SRV:      645 ; DNS_A:     2730
HTTP:     2773 ; DNS_SRV:      644 ; DNS_A:     2780
HTTP:     2823 ; DNS_SRV:      643 ; DNS_A:     2777
# From here DNS A records won't increase anymore
HTTP:     2873 ; DNS_SRV:      644 ; DNS_A:     2786
HTTP:     2922 ; DNS_SRV:      643 ; DNS_A:     2785

Analysis

It greatly increase the number of records available.
This patch not only increase the number of records using TCP, but also with UDP.

This patch also add a small fix for small UDP requests: when size was small (ie: 512 bytes), if lots of TXT records were generated (for instance lots of Metadata), then Consul could not return any record while truncating the records. With this patch, Consul will truncate the records, but will try to return at least one record.

There is a wide diff between SRV and A/AAAA records because compression cannot be used fully in DNS for SRV records to be RFC compliant.

Type Max records BEFORE patch Max records AFTER patch increase
SRV 419 648 54%
A 1439 2786 93%

This patch was made possible because Len() is now always consistent with real size of DNS messages thanks to PR miekg/dns#668

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant