content/v2.2/guides/troubleshoot-crossplane.md
If you use the Crossplane CLI to install a Provider or
Configuration (for example, crossplane xpkg install provider xpkg.crossplane.io/crossplane-contrib/provider-aws-s3:v2.0.0) and get the server could not find the requested resource error, more often than not, that's an
indicator that your Crossplane CLI needs updating. In other words
Crossplane graduated some API from alpha to beta or stable and the old
plugin isn't aware of this change.
Most Crossplane resources have a status section that can represent the current
state of that particular resource. Running kubectl describe against a
Crossplane resource frequently gives insightful information about its
condition. For example, to determine the status of a GCP CloudSQLInstance
managed resource use kubectl describe for the resource.
kubectl describe cloudsqlinstance my-db
Status:
Conditions:
Last Transition Time: 2019-09-16T13:46:42Z
Reason: Creating
Status: False
Type: Ready
Most Crossplane resources set the Ready condition. Ready represents the
availability of the resource - whether it's creating, deleting, available,
unavailable, binding, etc.
Most Crossplane resources emit events when something interesting happens. You
can see the events associated with a resource by running kubectl describe -
for example, kubectl describe cloudsqlinstance my-db. You can also see all events in a
particular namespace by running kubectl get events.
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning CannotConnectToProvider 16s (x4 over 46s) managed/postgresqlserver.database.azure.crossplane.io cannot get referenced ProviderConfig: ProviderConfig.azure.crossplane.io "default" not found
<!-- vale Google.Headings = NO --> <!-- vale Microsoft.Headings = NO -->Note that Kubernetes namespaces events, while most Crossplane resources (XRs, etc) are cluster scoped. Crossplane emits events for cluster scoped resources to the 'default' namespace.
The next place to look to get more information or investigate a failure would be
in the Crossplane pod logs, which should be running in the crossplane-system
namespace. To get the current Crossplane logs, run the following:
kubectl -n crossplane-system logs -lapp=crossplane
Note that Crossplane emits minimal logs by default - events are typically the best place to look for information about what Crossplane is doing. You may need to restart Crossplane with the
--debugflag if you can't find what you're looking for.
Remember that providers provide much of Crossplane's features. You
can use kubectl logs to view provider logs too. By convention, they also emit
minimal logs by default.
kubectl -n crossplane-system logs <name-of-provider-pod>
All providers maintained by the Crossplane community mirror Crossplane's support
of the --debug flag. The easiest way to set flags on a provider is to create a
DeploymentRuntimeConfig and reference it from the Provider:
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
name: debug-config
spec:
deploymentTemplate:
spec:
selector: {}
template:
spec:
containers:
- name: package-runtime
args:
- --debug
---
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: crossplane-contrib-provider-aws-s3
spec:
package: xpkg.crossplane.io/crossplane-contrib/provider-aws-s3:v2.0.0
runtimeConfigRef:
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
name: debug-config
Note that you can add a reference to a
DeploymentRuntimeConfigto an already installedProviderand it updates itsDeploymentaccordingly.
Sometimes, for example when you encounter a bug, it can be useful to pause Crossplane if you want to stop it from actively attempting to manage your resources. To pause Crossplane without deleting all its resources, run the following command to scale down its deployment:
kubectl -n crossplane-system scale --replicas=0 deployment/crossplane
After you have been able to rectify the problem or smooth things out, you can unpause Crossplane by scaling its deployment back up:
kubectl -n crossplane-system scale --replicas=1 deployment/crossplane
You can also pause Providers when troubleshooting an issue or orchestrating a
complex migration of resources. Creating and referencing a DeploymentRuntimeConfig is
the easiest way to scale down a provider, and you can change the DeploymentRuntimeConfig or remove the reference to scale it back up:
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
metadata:
name: scale-config
spec:
deploymentTemplate:
spec:
selector: {}
replicas: 0
template: {}
---
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: crossplane-contrib-provider-aws-s3
spec:
package: xpkg.crossplane.io/crossplane-contrib/provider-aws-s3:v2.0.0
runtimeConfigRef:
apiVersion: pkg.crossplane.io/v1beta1
kind: DeploymentRuntimeConfig
name: scale-config
Note that you can add a reference to a
DeploymentRuntimeConfigto an already installedProviderand it updates itsDeploymentaccordingly.
The resources that Crossplane manages are automatically cleaned up so as not to leave anything running behind. Crossplane accomplishes this by using finalizers, but in certain scenarios the finalizer can prevent the Kubernetes object from getting deleted.
To deal with this, patch the object to remove its finalizer, which then allows Kubernetes to delete it. Note that this doesn't necessarily delete the external resource that Crossplane was managing, so you want to go to your cloud provider's console and look there for any lingering resources to clean up.
In general, you can remove a finalizer from an object with this command:
kubectl patch <resource-type> <resource-name> -p '{"metadata":{"finalizers": []}}' --type=merge
For example, for a CloudSQLInstance managed resource (database.gcp.crossplane.io) named
my-db, you can remove its finalizer with:
kubectl patch cloudsqlinstance my-db -p '{"metadata":{"finalizers": []}}' --type=merge
Crossplane includes a circuit breaker mechanism to prevent reconciliation thrashing. Thrashing occurs when controllers fight over composed resource state or enter tight reconciliation loops that could impact cluster performance.
<!-- vale alex.ProfanityUnlikely = YES -->Each Composite Resource (XR) has its own token bucket-based circuit breaker that monitors reconciliation rates:
<!-- vale Crossplane.Spelling = YES -->When an XR receives too many watch events (exceeding the burst and refill rate), the circuit breaker opens and blocks most reconciliation requests. While the circuit is open, Crossplane allows one request every 30 seconds to probe for recovery.
<!-- vale write-good.Weasel = YES -->XRs have a Responsive condition that tracks circuit breaker state. When the circuit breaker opens, this condition changes to False:
conditions:
- type: Responsive
status: "False"
reason: WatchCircuitOpen
message: "Too many watch events from ConfigMap/my-config (default). Allowing events periodically."
The message identifies which resource is causing excessive watch events, helping you pinpoint the source of thrashing.
Watch events occur when resources change in your cluster. Excessive watch events typically indicate composition patterns that cause loops, such as resources updating each other in cycles or external systems reverting changes made by Crossplane.
To identify the source of excessive watch events:
The XR's Responsive condition message identifies the problematic resource. Monitor this resource for modification events:
kubectl get <resource-kind> <resource-name> -n <namespace> --output-watch-events --watch-only
Common root causes and fixes:
<!-- vale write-good.TooWordy = NO -->Investigate and fix the root cause before adjusting circuit breaker thresholds.
<!-- vale write-good.TooWordy = YES -->The default circuit breaker settings work well for most environments. You may need to adjust them based on your composition patterns and cluster size.
<!-- vale write-good.Weasel = YES --> <!-- vale write-good.TooWordy = YES --> <!-- vale Crossplane.Spelling = NO --> <!-- vale Microsoft.Adverbs = NO -->For example, increase the burst and refill rate for large-scale deployments with XRs updating frequently, or decrease them if you want stricter protection against thrashing.
<!-- vale Crossplane.Spelling = YES --> <!-- vale Microsoft.Adverbs = YES -->Configure circuit breaker parameters using Crossplane startup arguments via Helm:
helm install crossplane --namespace crossplane-system --create-namespace crossplane-stable/crossplane \
--set args='{"--circuit-breaker-burst=500.0","--circuit-breaker-refill-rate=5.0","--circuit-breaker-cooldown=1m"}'
Available parameters:
--circuit-breaker-burst: Maximum burst of events (default: 100.0)--circuit-breaker-refill-rate: Events per second for sustained rate (default: 1.0)--circuit-breaker-cooldown: Duration to keep circuit open (default: 5m0s)Track circuit breaker activity using these Prometheus metrics.
<!-- vale write-good.TooWordy = YES -->See the [Metrics guide]({{< ref "metrics#circuit-breaker-metrics" >}}) for detailed metric information.
This section covers some common tips, tricks, and troubleshooting steps for working with Composite Resources. If you're trying to track down why your Composite Resources aren't working the [Troubleshooting][trouble-ref] page also has some useful information.
<!-- Named Links -->[Crossplane package]: {{<ref "../packages/configurations/">}} [Handling Crossplane Package Dependency]: #handling-crossplane-package-dependency [semver spec]: https://github.com/Masterminds/semver#basic-comparisons