docs/http.md
Table of Contents
<!-- markdown-toc end -->This is a proposal to restructure the HTTP client code in puppet to solve the following problems.
It's difficult to use puppet as a library to call our own REST APIs due to the
coupling of puppet's http code with the indirector. As a result, users have
created REST clients, but they don't behave the same way our agent does, such as
serialization and deserialization of rich data, server_list for high
availability, and JSON to PSON content negotiation, etc.
It would be beneficial to the puppet ecosystem to have an REST client that's reusable by more than the agent.
Persistent HTTP connections allow puppet to establish an HTTP(S) connection once
and reuse it for multiple HTTP requests. This avoids making a new TCP connection
and SSL handshake for each request. This is important for pluginsync, due to the
large number of individual GET requests. However, persistent connections are not
enabled by default, and must be opted into, as was recently done for puppet device and puppet plugin download. More than likely, other applications
should be using persistent connections, but aren't.
Puppet supports 3 ways of routing connections: DNS SRV records, server list, and
static puppet settings. However, some routing methods are not consistently
applied. For example, puppet plugin download and puppet report upload don't
observe server list.
Once a route has been determined, puppet stores the last used server and port in Puppet's context system, but it's more of a hack than anything. As a result, it's difficult to know how the last used server and port were set and when to invalidate them.
Puppet::Network::HTTP::Connection supports two ways of making GET and POST
requests, but they don't behave consistently when handling HTTP redirects, the
Retry-After header, server and proxy authentication, and exception handling.
The Puppet::Network::HttpPool and related classes don't specify which
exceptions can be raised. Instead they pass through whatever exceptions ruby
raises. Everything from SocketError to SystemCallError to
OpenSSL::SSL::SSLError to Net::ProtocolError and TimeoutError. As a
result, it's hard for clients to build higher level abstractions.
Puppet only trusts the puppet PKI when connecting to puppet infrastructure, but
needs to additionally trust the system cert store for requests like PMT and
downloading files from https sources. However, the current API doesn't allow the
caller to do that, which is why Puppet::Util::HttpProxy#request_with_redirects
duplicates the logic fromPuppet::Network::HTTP::Connection#request_with_redirects.
In order to solve these problems, I propose creating an HTTP client in puppet with the following goals:
Puppet::Network::HTTP::Pool, but restructure it with a clear API.Net::HTTP specific
exceptions don't leak out.Net::HTTP library is fairly buggy, however, we're not switching
away from it right now. We may in the future, but it's out of scope.Has a pool of persistent HTTP connections and creates HTTP sessions. Closes persistent connections when its close method is called.
Has low-level HTTP methods, such as get, post, etc which take the path,
headers, options, and allow the caller to stream the request and response body.
Returns Puppet::HTTP::Response with the response code, etc.
Maintains the pool of persistent Net::HTTP connections, keeping track of when
idle connections expire. The with_connection method takes a block, which
ensures borrowed connections are always returned to the pool.
Defines a route to a REST service. Includes the API prefix, DNS SRV service name, and puppet server and port settings for that service.
Represents an instance of a puppet web service. Includes the URL used to connect
to the service, such as https://puppet:8140/puppet/v3. There are four
services: ca, report, fileserver, and the default puppet.
The ca and report services handle certs and reports, respectively. The
fileserver service handles puppet file metadata and content requests, such as
pluginsync and file resources with source => 'puppet://'. The puppet service
handles nodes, facts, and catalogs, and is also the fallback for the other three
services.
Each service is responsible for serializing/deserializing the HTTP entity into a
domain object. It uses the existing Puppet::Network::Format code to do so.
Each resolver represents a different strategy for resolving a service name into a list of candidate servers and ports.
Represents an HTTP session through which services may be connected to and accessed.
Has a Session#route_to method to route to a web service based on the requested
service name and client configuration:
client = Puppet::HTTP::Client.new
session = client.create_session
service = session.route_to(:ca)
cert = service.get_certificate('foo')
puts "Retrieved cert #{cert.subject.to_utf8} from #{service.url}"
The Session#route_to(:ca) method (above) returns an instance of
Puppet::HTTP::Service::Ca, which has methods appropriate for that type of
service. All services extend Puppet::HTTP::Service.
If an explicit server and port are specified on the command line or
configuration, such as puppet agent -t --server foo.example.com, then the
Session#route_to method will always return a Service with that host and port.
Otherwise, the session will walk the list of resolvers in priority order:
If the route_to method attempts to connect to a service, but it results in an
exception, such as "connection refused", then the session will attempt the next
service.
If the caller successfully uses a service, then the session will return the same
service the next time route_to is called again.
The DNS SRV resolver performs an SRV lookup, and randomly selects one of the targets based on the weight of each entry in the SRV record. A target with weight 2 would be twice as likely to be chosen as a target with weight 1.
client = Puppet::HTTP::Client.new(use_srv: true, srv_domain: 'puppet.example.com')
session = client.create_session
service = session.route_to(:ca)
# service.url is "https://compiler1.puppet.example.com:8140"
The server list resolver selects the first available server using puppetserver's
simple status endpoint. This applies when routing requests to the :puppet
service, as well as any service whose server and port are the same as the
:puppet service. For example, when :ca_server and :report_server have not
been overridden.
client = Puppet::HTTP::Client.new(server_list: ['compiler1', 'compiler2'])
session = client.create_session
service = session.route_to(:puppet)
# service.url is "https://compiler1:8140"
The resolver selects a route based on the puppet settings for that service:
| service | server setting | port setting |
|---|---|---|
| ca | ca_server | ca_port |
| fileserver | server | serverport |
| report | report_server | report_port |
| puppet | server | serverport |
For example, route_to(:report) would use Puppet[:report_server] and
Puppet[:report_port].
There are some variations in how the different services are routed. Here is a visual of how the CA service is routed. We have to preserve some interesting behavior with this service, but otherwise the flow is similar to that of other services.
Puppet agents support downloading file content from 3rd party file servers,
which reduces load on the compiler. The Client will provide a low-level API
for making GET requests for an arbitrary URL, and streaming the response body.
Puppet only trusts the puppet PKI for its REST requests. However, it should be possible to additionally trust the system store when making HTTPS requests:
client = Puppet::HTTP::Client.new
response = client.get("https://artifactory.example.com/java.tar.gz", options: { include_system_store: true })
response.read_body do |data|
puts "Read #{data.bytes}"
end
Puppet ruby code running in puppetserver sometimes make outbound connections such as the puppetdb terminus, PE classifier terminus, and 'http' report processor. Currently, puppetserver registers its own http client class, so that it can perform the HTTP request using Apache HttpClient.
In order to preserve this capability, puppetserver should have a way of
overriding the get and post methods of Puppet::HTTP::Client to call the
Apache HttpClient instead.
One way might be to create an adapter that overrides Puppet's implementation and delegates to puppetserver's client:
class Puppet::Server::HttpClientAdapter < Puppet::HTTP::Client
def initialize(http_client)
super
@http_client = http_client
end
def get(url, headers={}, options={})
@http_client.get(url, headers, options)
end
# etc
end
And register it with puppet:
Puppet.push_context(http_client: HttpClientAdapter.new(Puppet::Server::HttpClient.new))