security/postmortems/cve-2019-15225.md
2019-07-25 - 2019-10-10
@asraa
Final
After an Envoy user publicly reported a crash in Envoy about regular expression matching in route resolution (https://github.com/envoyproxy/envoy/issues/7728), the Envoy security team found that issue could be leveraged for a DoS attack and would go through the public security release process. The fix landed in master with a public PR (https://github.com/envoyproxy/envoy/pull/7878) and was targeted to be included in a 1.11.2 security release.
CVE-2019-15226 was detected via fuzzers just after the 1.11.1 security release. With the fix of CVE-2019-15225 in progress, the Envoy security team decided to lump the two fixes into a 1.11.2 security release. This was the first time in which the Envoy security release included a publicly disclosed vulnerability with a fix that was merged into master. The security release included a backported patch of the fix as well as the patches for CVE-2019-15226.
CVE-2019-15225 was caused by the use of a recursive algorithm for matching regular
expressions. Envoy’s HTTP router can be configured with regular expressions for routing incoming
HTTP requests that matched header values. Envoy used the libstdc++ std::regex implementation for
these regular expressions. As a result, an HTTP request with sufficiently large header values may
consume large amounts of stack memory and cause abnormal process termination. Regular expressions
with the * or + quantifiers are particularly vulnerable and may cause abnormal process
termination. This appeared when matching header values of 16Kb or more.
CVE-2019-15226 resulted from excessive iteration of the HeaderMap from a time-consuming header
size validation that occurred for each header added. Both codec libraries http_parser and nghttp2
have internal limits for the maximum request header size. Envoy’s HTTP/2 codec originally checked
against a hard-coded max header size of 63K, which was just under the default max headers length in
nghttp2. The check occurred every time a header was added, resulting in O(n^2) performance. Work on
making this limit configurable (https://github.com/envoyproxy/envoy/issues/5626) also introduced the
issue in Envoy’s HTTP/1 codec, where the check was added per header field mimicking the same
problematic pattern as the original HTTP/2 codec.
To resolve the memory consumption caused by excessive memory consumption from regex matching, Envoy
1.11.2 deprecates the use of std::regex in user facing paths. A new safe regex matcher introduces
an explicitly configurable regex engine. Currently, the regex engine is limited to Google’s RE2
regex engine that implements a safe subset of the std::regex language features. The existing regex
engine is in a deprecation period to allow users to switch to safe regex engines.
Google’s RE2 regex engine is designed to complete execution in linear time (https://github.com/google/re2/wiki/WhyRE2) and limit the amount of memory used. Envoy 1.11.2 also includes an option to configure a “program size” when using Google RE2, a rough estimate of how complex a compiled regex is to evaluate. A regex that has a program size greater than this value will fail to compile.
CVE-2019-15226 was first noticed via fuzzers when a timeout was reported by
h1_capture_direct_fuzz_test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16325 on
08-09-2019. Once a reproducer was made in an Envoy deployment to confirm the issue, and some
profiling work was done by the Envoy security team, we moved to a private fix process targeting the
1.11.2 release along with CVE-2019-15225. Other calls to byteSize() and iterations over
HeaderMap were used were also analyzed for potential DoS vulnerabilities and performance issues.
The fix re-implemented the HeaderMapImpl::byteSize() method to have O(1) performance by returning
a cached_byte_size_ member to HeaderMapImpl that was updated as header entries are added, rather
than iterate over the HeaderMap to calculate the byte size. To resolve excessive iterations over
the HeaderMap that can appear in access logging with many header formatters and many headers, the
fix also included configurable limits for the maximum number of headers.
The following patches were produced:
A 1.11.2 security release was announced on 09-18-2019. An e-mail was sent to the Envoy private distributor list sharing the details of CVE-2019-15226. A week later, the candidate fix patches for CVE-2019-15226 were shared with distributors on 2019-09-24. This provided two weeks for distributors to test and prepare their software for the security release date, as per the guidelines set in place after security release 1.9.1.
CVE-2019-15225 was reported by Seikun Kambashi in a public GitHub Issue describing a crash caused by a request with a very large URI for routes configured with a regex matcher: https://github.com/envoyproxy/envoy/issues/7728.
Envoy’s route_fuzz_test, which fuzzes route resolution and header finalization, ideally should
have caught this crash. The test takes a RouteConfiguration and a set of headers as inputs, and
routes a request with the input headers with the RouteConfiguration given. It should have been
fairly easy for the fuzzers to produce a wildcard matcher and a long header string. However, the
fuzz test itself had a logical error that resulted in ignoring input path headers and setting them
to a default value of “/”. As a result, the fuzz test would never have tested a large URI and an OOM
or crash would never have been detected. The fuzz test was fixed in
https://github.com/envoyproxy/envoy/pull/8653, and a reproducer for the CVE was added.
The underlying issue behind CVE-2019-15226 was first noticed via fuzzers when a timeout was reported
by h1_capture_direct_fuzz_test: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=16325. Some
profiling work revealed that HeaderMapImpl::byteSize(), which is O(n) in the number of headers, is
called for every single header in both HTTP/1.1 and HTTP/2 codecs. Although Envoy’s stateless HTTP/2
header fuzzers (request_header_fuzz_test and response_header_fuzz_test) perform 10x more
executions per second than this fuzzer, these tested one header frame per testcase and used
nghttp2’s default max header frame size (16 KB). Because of this, the frame size was too small to
amplify the effect of the O(n^2) process enough to produce a timeout.
CVE-2019-15226 was detected quickly after the fuzzer reported the timeout.
The fixes for CVE-2019-15226 were straightforward and localized.
The security release occurred on time and followed the guidelines established in https://github.com/envoyproxy/envoy/blob/main/SECURITY.md
It took nearly a week to set up a branch for fix patches. This was due to some confusion over whether to use the new GitHub Security advisories, which didn’t support the required permission model and CI integrations. In the process, the envoy-setec branch was temporarily made readable to all Envoy contributors.
While resolving the above permission issue, we hit an issue with Github permissions on envoyproxy: people could no longer assign issues to members in the Envoy repository. This was fixed with some restructuring of GitHub team’s to support the limited GitHub IAM model.
It was possible to push to envoy-setec branches by fix team, e.g. the 1.11.2 could be directly pushed to (master as well). We need branch protection to ensure that CI gates merges; this will provide confidence that the staged release branches are likely to work on the main Envoy repository.
We had manual patch sets the day before release, but no envoy-setec branches reflecting them passing end-to-end. We should not consider a release ready to go until it passes a full CI pass.
It wasn’t possible to get a full CI pass due to docs/image/etc push issues. We should have a set of presubmits that provide a simple yes/no in the GH UX.
Our route resolution fuzzer would not have picked up the regex vulnerability due to a logical error in the fuzzer.
Our more efficient request and response fuzzers would not have picked up this vulnerability earlier. They only fuzz a single HEADER frame, and the maximum frame size for HTTP/2 is by default 16 KB.
From a distributor: “We didn't realize about safe_regex until the note this morning. So we're patching ... to switch to safe_regex -- would it be possible in future notes to distributors to note if usage changes are required?”
We coupled the CVE-2019-15225 and CVE-2019-15226 releases. This made sense initially, due to release overhead, but as the release date for the header map fixes was extended, it meant that a somewhat known vulnerability was fixed on master but not on any released version of Envoy.
All times US/Pacific
2019-07-25:
2019-08-09:
2019-08-13:
2019-08-19:
2019-08-20:
2019-08-21:
2019-08-23:
2019-09-18:
2019-09-19:
2019-09-24:
2019-10-07:
2019-10-08:
2019-10-10