Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare for release v1.16.0-pre.3 #651

Closed
wants to merge 86 commits into from
Closed

Conversation

aanm
Copy link
Owner

@aanm aanm commented Jun 7, 2024

Prepare for release v1.16.0-pre.3

See the included CHANGELOG.md for a full list of changes.

aanm and others added 30 commits June 4, 2024 00:11
With the introduction of 57db22b, Syft creates the sbom files under
the same directory the image digest files are created. This resulted on
image-digest-output.txt file to contain all the SBOMs unexpectedly.
Thus, using find, we will make sure that only the files that start with
the "image-digests" are used to by copied into the
image-digest-output.txt file.

Tested in https://github.com/aanm/cilium/actions/runs/9358191181

Fixes: 57db22b ("Generate SBOMs using Syft instead of bom")
Signed-off-by: André Martins <andre@cilium.io>
Currently, the Agent daemon setup registers the k8s `ServiceCache` with
the LRP Manager via the method `RegisterSvcCache`.

In the meantime, the k8s `ServiceCache` is provided by its own Hive Cell
which makes it possible to have an explicit dependency from the LRP
Manager to the `ServiceCache`.

Therefore, this commit adds the k8s `ServiceCache` as direct dependency to
the LRP Manager and removes the method `RegisterSvcCache`.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Ensure that the unmarshaled data matches the kvstore key it was
retrieved from, to ensure consistency and prevent the propagation
of corrupted data.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Take the clusterID as parameter, rather than retrieving it from the
global option. This allows to reuse the same methods also for clusterIDs
different from the one configured locally.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Fix the handling of CiliumIdentity events received from the k8s informer
to correctly compare the old and new object, rather than the new one
with itself, and propagate modification events as expected. Although
identities are never expected to be modified (only added and removed),
this ensures that the corresponding warnings are emitted as expected.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
In preparation for the subsequent commits, and to simplify the overall
logic, let's remove the difference between Add and Modify events, in
favor of a single Upsert one. Identities are never expected to be
modified (only added and removed), hence this change is effectively a
no-op. However, differentiating add and modify events is typically
fragile and can lead to possible inconsistencies (e.g., if incomplete
or invalid Add events are filtered out, while subsequent updates are
propagated). Additionally, the downstream logic is already designed to
correctly handle (and ignore, when appropriate) updates.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
In preparation for the subsequent commit, let's introduce the
possibility of configuring additional validators to filter out
invalid identity events received from CRD or kvstore. Upsert
events marked as invalid are directly skipped. Deletions, instead,
are propagated for previously known identities only, to avoid
leaving stale identities behind.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Explicitly validate that the identities retrieved from a remote cluster
match the expected range based on the corresponding clusterID, to ensure
improved consistency and prevent the propagation of corrupted data.
Additionally, ensure that said identities include the cluster label
matching the expected cluster name for the given cluster.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
We're currently tracking main (v1.16), v1.15, v1.14.

Signed-off-by: Joe Stringer <joe@cilium.io>
The clustermesh-apiserver's etcd sidecar instance is by design stateless,
as etcd data is stored in an emptyDir Kubernetes volume and not preserved
upon restarts. Yet, let's expose to users the medium config, to allow
creating a volume backed by RAM rather than node storage. This allows for
greatly improved etcd read and write performance at the cost of additional
memory usage, which counts against the memory limits of the container.
Additional information is available in the upstream documentation [1].

[1]: https://kubernetes.io/docs/concepts/storage/volumes/#emptydir

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
Currently, we don't inject headless endpoints into envoy XDS, hence the
response is coming with error `no healthy upstream`. This commit is to
explicitly handle headless service in CEC controller.

One point worth noting is that k8s.Endpoint watcher is used as it's a
wrapper for both k8s Endpoint and EndpointSlice.

Signed-off-by: Tam Mach <tam.mach@cilium.io>
egress_gw_request_needs_redirect* functions already returun proper
error code. So don't need to return DROP_NO_EGRESS_GATEWAY from the
caller side.

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Currently packets from Pods selected by an egress gateway policy will
not be forwarded to a gateway if there's an L7 policy applied to those
packets. That's because we apply the egress gateway policy at bpf_lxc
right after the L7 policy redirection. Therefore the Egress Gateway
logic will be skipped if packets are redirected to the L7 proxy.

This commit adds the egress gateway handling code at to-netdev@bpf_host
so that packets from the L7 proxy can be properly redirected to an egress gw.

We will keep the egress gw code in from-container@bpf_lxc around
until v1.17 to avoid disruption of traffic to egress gateway.
It’s possible that the datapath becomes an incomplete state where bpf_lxc
has been upgraded while bpf_host hasn't during the upgrade. If that situation
were to occur, traffic destined for egress gateway would be broken for that
period of time. So we will keep the egress gateway code at both bpf_lxc
and bpf_host to avoid this scenario.

Fixes: cilium#19642

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
This commit introduces a hive cell for the cgroup manager and replaces
the explicit initialization (and shutdown) in the daemon.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
This commit introduces an Hive Dependency from the cgroup manager to
a logger and replaces the static logger.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
This commit removes the unused daemon method `GetParentPodMetadata` that
introduced an unnecessary dependency to the CGroupManager.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Currently, the implementation of the CGroupManager also handles the cases
if socket lb tracing isn't enabled or the setup failed to lookup the
cgroup path provider.

To ease the understanding of the actual implementation (aside from testing
and the mentioned cases where the feature is disabled), this commit
introduces a `noopCgroupManager` that is provided as implementation in cases
where the feature isn't enabled.

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: Isala Piyarisi <isala@wso2.com>
Update Envoy image to pick up fixes:

- reopen bpf ipcache map on network policy stream restart

  Fixes the problem where cilium agent restart creates a new bpf ipcache
  map and (daemonset) cilium-envoy keeps using the old one.

- change original destination cluster to not create different Host instances for the same destination

  Fixes the problem where multiple Host instances are created when two
  worker threads access the same destination at the same time, and then
  one of them fails to create an upstream connection due to source port
  bind failure.

- update Go dependencies

  Fixes CVEs for the proxylib.

Signed-off-by: Jarno Rajahalme <jarno@isovalent.com>
This first commit adds support for reconciling GAMMA HTTPRoute
objects, updates the status.

Also adds GAMMA reconciler parent checks.

Signed-off-by: Nick Young <nick@isovalent.com>
GAMMA objects will now be correctly ingested,
with predicates added to both the GAMMA
ingestion and the standard Gateway API ingestion
to ensure that each only sees relevant HTTPRoute
updates.

Signed-off-by: Nick Young <nick@isovalent.com>
This also adds the HTTPRoute as a second source to each Listener
in the model.Model, which allows the HTTPRoute to be set as the
owner of the generated CiliumEnvoyConfig.

Signed-off-by: Nick Young <nick@isovalent.com>
Updates Gateway API conformance tests to include Mesh
tests.

GAMMA conformance requires supporting the Port field in
parentRef. This adds support for this for GAMMA only.
This change also does groundwork necessary
to support the Gateway API feature `HTTPRouteListenerPortMatching`
for regular Gateway API objects, which allows HTTPRoutes to select
Gateway parents using the Port field in parentRef.
A followup PR will implement this feature.

As part of the GAMMA work, we now support the GatewayPort8080
Gateway API feature as well, so that is now added to the
conformance test workflow.

The `MeshConsumerRoute` feature cannot be supported without
significant changes to the model (it requires _egress_
CiliumEnvoyConfigs, which are still being worked on.)

Mesh examples are left for a followup PR.

Dedicated Ingresses using a Nodeport need to _not_ have the
port set in their CiliumEnvoyConfig.

Added a test to verify this for the future.

Signed-off-by: Nick Young <nick@isovalent.com>
Provide req id parameter in UpsertIPsecEndpoint, which can be used to
override default req ID '1'. This is useful in cases where we install
feature specific xfrm rules and want graceful cleanup of those rules
when feature is disabled.

Signed-off-by: harsimran pabla <hpabla@isovalent.com>
Encrypt overlay feature when enabled, will add 2 IPv4 xfrm state rules
per node. This change modifies the test script to account for additional
rules when this feature is enabled.

Signed-off-by: harsimran pabla <hpabla@isovalent.com>
This commit updates bpf_host to identify decrypted overlay traffic and
redirect it to the VXLAN device for decapsulation.

After this decapsulation the original payload will be delivered to the
destination.

Without a redirect the decrypted packet comes back around into the stack
and the stack drops the packet in the XFRM hooks within the UDP receive
portion of the input path.

A redirect seems to clear the XFRM state associated with the skb and
allows the stack to process the packet as if the input device was the
VXLAN device.

Signed-off-by: Louis DeLosSantos <louis.delos@isovalent.com>
sayboras and others added 25 commits June 6, 2024 12:24
This commit is to bump envoy version to v1.29.5 for below CVEs:

- [CVE-2024-34362: Crash (use-after-free) in EnvoyQuicServerStream](https://togithub.com/envoyproxy/envoy/security/advisories/GHSA-hww5-43gv-35jv)
- [CVE-2024-34363: Crash due to uncaught nlohmann JSON exception](https://togithub.com/envoyproxy/envoy/security/advisories/GHSA-g979-ph9j-5gg4)
- [CVE-2024-34364: Envoy OOM vector from HTTP async client with unbounded response buffer for mirror response, and other components](https://togithub.com/envoyproxy/envoy/security/advisories/GHSA-xcj3-h7vf-fw26)
- [CVE-2024-32974: Crash in EnvoyQuicServerStream::OnInitialHeadersComplete()](https://togithub.com/envoyproxy/envoy/security/advisories/GHSA-mgxp-7hhp-8299)
- [CVE-2024-32975: Crash in QuicheDataReader::PeekVarInt62Length()](https://togithub.com/envoyproxy/envoy/security/advisories/GHSA-g9mq-6v96-cpqc)
- [CVE-2024-32976: Endless loop while decompressing Brotli data with extra input](https://togithub.com/envoyproxy/envoy/security/advisories/GHSA-7wp5-c2vq-4f8m)
- [CVE-2024-23326: Envoy incorrectly accepts HTTP 200 response for entering upgrade mode](https://togithub.com/envoyproxy/envoy/security/advisories/GHSA-vcf8-7238-v74c)

Upstream release: https://github.com/envoyproxy/envoy/releases/tag/v1.29.5

Signed-off-by: Tam Mach <tam.mach@cilium.io>
Signed-off-by: Shedrack Akintayo <akintayoshedrack@gmail.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
Signed-off-by: Shedrack Akintayo <akintayoshedrack@gmail.com>
Signed-off-by: Quentin Monnet <qmo@qmon.net>
Enable a subsequent patch to specify the security identity of an inserted
endpoint.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
This will be used in a subsequent patch. Also clean up the existing usage.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
Exercise the whole codepath in to-netdev that's needed for encrypted
overlay. This allows us to validate the whole machinery of packet rewrites
and IPsec-related context in the skb mark/cb.

Signed-off-by: Julian Wiedmann <jwi@isovalent.com>
This commit cleans up the fields from the daemon and daemonParams
struct:

- EgressGatewayManager (unused - removed from daemon and daemonParams)
- HealthProvider (unused - removed from daemon and daemonParams)
- DeviceManager (keep in daemonParams)
- EndpointManager (set when initializing daemon struct)

Signed-off-by: Marco Hofstetter <marco.hofstetter@isovalent.com>
Currently, the remoteCluster struct holds a reference to the clustermesh
object, leading to a sort of circular dependency. Let's simplify this by
explicitly propagating only the necessary parameters, for improved
separation and clarity.

Signed-off-by: Marco Iorio <marco.iorio@isovalent.com>
The cell.Health reporter is not needed anymore, since one it is already
provided to the job registered in the manager by the JobGroup.

Signed-off-by: Fabio Falzoi <fabio.falzoi@isovalent.com>
Making BGP config-map settings similar to how rest of the feature flags
are deduced.

Signed-off-by: harsimran pabla <hpabla@isovalent.com>
Fixes: cilium#29590

Signed-off-by: Marcel Zieba <marcel.zieba@isovalent.com>
Signed-off-by: André Martins <andre@cilium.io>
These scripts will only be available under github.com/cilium/release to
avoid any confusion when performing releases.

Signed-off-by: André Martins <andre@cilium.io>
This page referred to GitHub projects being used for tracking upcoming
work, but in general we don't have mechanisms in the project to reliably
track work in this manner. The projects tooling we were using is also
being deprecated in favor of a newer tool in GitHub, so the links etc.
will stop working soon. We can always re-introduce that wording if we
find a good way to maintain and manage such projects.

Additionally, there is some minor wording improvements we can make to
the release cadence to clarify the statements according to the way we
manage releases as a project.

Signed-off-by: Joe Stringer <joe@cilium.io>
This paragraph doesn't make sense in context of releases, as release
management is a task for maintainers / committers of the project.

Signed-off-by: Joe Stringer <joe@cilium.io>
Document the process that the Cilium release team typically follows
around publishing prereleases and release candidates, and outline the
expectations around feature freeze / thaw.

Signed-off-by: Joe Stringer <joe@cilium.io>
We've recently been trending towards a process that looks something like
this with for instance a target date of the 15th and a stable branch
cutoff date a week prior, such as the 8th (or earlier weekday if it
falls on a weekend). Document this in general terms without making hard
commitments to ship or not ship any specific change, subject to the
discretion of the release team.

Signed-off-by: Joe Stringer <joe@cilium.io>
Since commit 7628b19 ("bpf, ipcache: unconditionally assume support
for LPM trie maps"), LPM_LOOKUP_FN is only used in its own test. Remove
the macro and the test, as it's not used in any actual code, and it
causes verifier errors when upgrading to LLVM 18: the verifier can't
track a pointer spilled to a map (a global variable).

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
LLVM 18 doesn't align these structs to 8 by default, and our memcpy
implementation fails to pass the verifier when applied to these structs,
because the verifier requires stack access to be aligned. Align all
affected structs.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
300236c ("Add the datapath filtering for policy verdict logs.")
introduced a mechanism to generate policy verdict logs only if an
endpoint has a network policy enforced on the direction of the traffic,
to reduce the number of allow events that otherwise would have been
notified in case of default allow policies.

Unfortunately this logic doesn't take into account the case where
send_policy_verdict_notify is called from the bpf_host program (e.g.
Host Firewall policies), as POLICY_VERDICT_LOG_FILTER is always set to 0
for that program, resulting in no policy verdicts being notified.

This change tries to address this by ignoring the filter if
send_policy_verdict_notify is evaluated in the context of bpf_host.

Moreover, to prevent a flood of notifications, the ones for default
allow policies are ignored.

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
to tackle the complexity issue introduced by the previous commit in
cil_to_host in the bpf_host program, use the already existing
CILIUM_CALL_IPV{4,6}_TO_HOST_POLICY_ONLY tail calls to handle the
enforcement of the ingress host firewall policies

Signed-off-by: Gilberto Bertin <jibi@cilium.io>
It doesn't make sense to pass `--follow` when the container is still
running, this will hang forever and fail to complete the remaining steps
in the workflow. Remove the follow flag.

Fixes: 9392745 ("ci: l4lb: gather more infos about docker-in-docker issues")
Signed-off-by: Joe Stringer <joe@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
Signed-off-by: André Martins <andre@cilium.io>
This reverts commit 1a3bf24.

Signed-off-by: André Martins <andre@cilium.io>
@aanm aanm deployed to release-base-images June 7, 2024 14:53 — with GitHub Actions Active
@aanm aanm closed this Jun 7, 2024
@aanm aanm deleted the pr/prepare-v1.16.0-pre.3 branch June 7, 2024 14:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.