docs/sources/visualizations/dashboards/build-dashboards/best-practices/index.md
This section provides information about best practices for intermediate Grafana administrators and users about how to build and maintain Grafana dashboards.
For more information about the different kinds of dashboards you can create, refer to Grafana dashboards: A complete guide to all the different types you can build.
When you have a lot to monitor, like a server farm, you need a strategy to decide what is important enough to monitor. This page describes several common methods for choosing what to monitor.
A logical strategy allows you to make uniform dashboards and scale your observability platform more easily.
USE stands for:
This method is best for hardware resources in infrastructure, such as CPU, memory, and network devices. For more information, refer to The USE Method.
RED stands for:
This method is most applicable to services, especially a microservices environment. For each of your services, instrument the code to expose these metrics for each component. RED dashboards are good for alerting and SLAs. A well-designed RED dashboard is a proxy for user experience.
For more information, refer to Tom Wilkie's blog post The RED method: How to instrument your services.
According to the Google SRE handbook, if you can only measure four metrics of your user-facing system, focus on these four.
This method is similar to the RED method, but it includes saturation.
{{< docs/play title="The Four Golden Signals" url="https://play.grafana.org/d/000000109/" >}}
Dashboard management maturity refers to how well-designed and efficient your dashboard ecosystem is. It's recommended that you periodically review your dashboard setup to gauge where you are and how you can improve.
Broadly speaking, dashboard maturity can be defined as low, medium, or high.
Much of the content for this topic was taken from the KubeCon 2019 talk Fool-Proof Kubernetes Dashboards for Sleep-Deprived Oncalls.
At this stage, you have no coherent dashboard management strategy. Almost everyone starts here.
How can you tell you are here?
At this stage, you are starting to manage your dashboard use with methodical dashboards. You might have laid out a strategy, but there are some things you could improve.
How can you tell you are here?
Prevent sprawl by using template variables. For example, you don't need a separate dashboard for each node, you can use query variables. Even better, you can make the data source a template variable too, so you can reuse the same dashboard across different clusters and monitoring backends.
Refer to the list of Variable examples if you want some ideas.
Methodical dashboards according to an observability strategy.
Hierarchical dashboards with drill-downs to the next level.
{{< figure class="float-right" max-width="100%" src="/static/img/docs/best-practices/drill-down-example.png" caption="Example of using drill-down" >}}
Dashboard design reflects service hierarchies. The example shown below uses the RED method (request and error rate on the left, latency duration on the right) with one row per service. The row order reflects the data flow.
{{< figure class="float-right" max-width="100%" src="/static/img/docs/best-practices/service-hierarchy-example.png" caption="Example of a service hierarchy" >}}
Compare like to like: split service dashboards when the magnitude differs. Make sure aggregated metrics don't drown out important information.
Expressive charts with meaningful use of color and normalizing axes where you can.
Directed browsing cuts down on "guessing."
Version-controlled dashboard JSON.
At this stage, you have optimized your dashboard management use with a consistent and thoughtful strategy. It requires maintenance, but the results are worth it.
This page outlines some best practices to follow when creating Grafana dashboards.
Here are some principles to consider before you create a dashboard.
What story are you trying to tell with your dashboard? Try to create a logical progression of data, such as large to small or general to specific. What is the goal for this dashboard? (Hint: If the dashboard doesn't have a goal, then ask yourself if you really need the dashboard.)
Keep your graphs simple and focused on answering the question that you are asking. For example, if your question is "which servers are in trouble?", then maybe you don't need to show all the server data. Just show data for the ones in trouble.
Cognitive load is basically how hard you need to think about something in order to figure it out. Make your dashboard easy to interpret. Other users and future you (when you're trying to figure out what broke at 2 AM) will appreciate it.
Ask yourself:
It's easy to make new dashboards. It's harder to optimize dashboard creation and adhere to a plan, but it's worth it. This strategy should govern both your overall dashboard scheme and enforce consistency in individual dashboard design.
Refer to Common observability strategies and Dashboard management maturity levels for more information.
Once you have a strategy or design guidelines, write them down to help maintain consistency over time. Check out this Wikimedia runbook example.
TEST or TMP in the name.i in the top left corner of the panel.This page outlines some best practices to follow when managing Grafana dashboards.
Here are some principles to consider before you start managing dashboards.
There are several common observability strategies. You should research them and decide whether one of them works for you or if you want to come up with your own. Either way, have a plan, write it down, and stick to it.
Adapt your strategy to changing needs as necessary.
What is your dashboard maturity level? Analyze your current dashboard setup and compare it to the Dashboard management maturity model. Understanding where you are can help you decide how to get to where you want to be.
TEST:. Delete the dashboard when you are finished.