DevOps

Questions

General

<details> <summary>What is DevOps?</summary>

The definition of DevOps from selected companies:

Amazon:

"DevOps is the combination of cultural philosophies, practices, and tools that increases an organization’s ability to deliver applications and services at high velocity: evolving and improving products at a faster pace than organizations using traditional software development and infrastructure management processes. This speed enables organizations to better serve their customers and compete more effectively in the market."

Microsoft:

"DevOps is the union of people, process, and products to enable continuous delivery of value to our end users. The contraction of “Dev” and “Ops” refers to replacing siloed Development and Operations to create multidisciplinary teams that now work together with shared and efficient practices and tools. Essential DevOps practices include agile planning, continuous integration, continuous delivery, and monitoring of applications."

Red Hat:

"DevOps describes approaches to speeding up the processes by which an idea (like a new software feature, a request for enhancement, or a bug fix) goes from development to deployment in a production environment where it can provide value to the user. These approaches require that development teams and operations teams communicate frequently and approach their work with empathy for their teammates. Scalability and flexible provisioning are also necessary. With DevOps, those that need power the most, get it—through self service and automation. Developers, usually coding in a standard development environment, work closely with IT operations to speed software builds, tests, and releases—without sacrificing reliability."

Google:

"...The organizational and cultural movement that aims to increase software delivery velocity, improve service reliability, and build shared ownership among software stakeholders" </details>

<details> <summary>What are the benefits of DevOps? What can it help us to achieve?</summary>

Collaboration
Improved delivery
Security
Speed
Scale
Reliability </details>

<details> <summary>What are the anti-patterns of DevOps?</summary>

A couple of examples:

One person is in charge of specific tasks. For example there is only one person who is allowed to merge the code of everyone else into the repository.
Treating production differently from development environment. For example, not implementing security in development environment
Not allowing someone to push to production on Friday ;) </details>

<details> <summary>How would you describe a successful DevOps engineer or a team?</summary>

The answer can focus on:

Collaboration
Communication
Set up and improve workflows and processes (related to testing, delivery, ...)
Dealing with issues

Things to think about:

What DevOps teams or engineers should NOT focus on or do?
Do DevOps teams or engineers have to be innovative or practice innovation as part of their role? </details>

<details> <summary>One of your team members suggests to set a goal of "deploying at least 20 times a day" in regards to CD. What is your take on that?</summary>

A couple of thoughts:

Why is it an important goal? Is it affecting the business somehow? One of the KPIs? In other words, does it matters?
This might introduce risks such as losing quality in favor of quantity
You might want to set a possibly better goal such as "be able to deploy whenever we need to deploy" </details>

Tooling

<details> <summary>What do you take into consideration when choosing a tool/technology?</summary>

A few ideas to think about:

mature/stable vs. cutting edge
community size
architecture aspects - agent vs. agentless, master vs. masterless, etc.
learning curve </details>

<details> <summary>Can you describe which tool or platform you chose to use in some of the following areas and how?

CI/CD
Provisioning infrastructure
Configuration Management
Monitoring & alerting
Logging
Code review
Code coverage
Issue Tracking
Containers and Containers Orchestration
Tests</summary>

This is a more practical version of the previous question where you might be asked additional specific questions on the technology you chose

CI/CD - Jenkins, Circle CI, Travis, Drone, Argo CD, Zuul
Provisioning infrastructure - Terraform, CloudFormation
Configuration Management - Ansible, Puppet, Chef
Monitoring & alerting - Prometheus, Nagios
Logging - Logstash, Graylog, Fluentd
Code review - Gerrit, Review Board
Code coverage - Cobertura, Clover, JaCoCo
Issue tracking - Jira, Bugzilla
Containers and Containers Orchestration - Docker, Podman, Kubernetes, Nomad
Tests - Robot, Serenity, Gauge </details>

<details> <summary>A team member of yours, suggests to replace the current CI/CD platform used by the organization with a new one. How would you reply?</summary>

Things to think about:

What we gain from doing so? Are there new features in the new platform? Does the new platform deals with some of the limitations presented in the current platform?
What this suggestion is based on? In other words, did he/she tried out the new platform? Was there extensive technical research?
What does the switch from one platform to another will require from the organization? For example, training users who use the platform? How much time the team has to invest in such move? </details>

Version Control

<details> <summary>What is Version Control?</summary>

Version control is the system of tracking and managing changes to software code.
It helps software teams to manage changes to source code over time.
Version control also helps developers move faster and allows software teams to preserve efficiency and agility as the team scales to include more developers. </details>

<details> <summary>What is a commit?</summary>

In Git, a commit is a snapshot of your repo at a specific point in time.
The git commit command will save all staged changes, along with a brief description from the user, in a “commit” to the local repository. </details>

<details> <summary>What is a merge?</summary>

Merging is Git's way of putting a forked history back together again. The git merge command lets you take the independent lines of development created by git branch and integrate them into a single branch. </details>

<details> <summary>What is a merge conflict?</summary>

A merge conflict is an event that occurs when Git is unable to automatically resolve differences in code between two commits. When all the changes in the code occur on different lines or in different files, Git will successfully merge commits without your help. </details>

<details> <summary>What best practices are you familiar with regarding version control?</summary>

Use a descriptive commit message
Make each commit a logical unit
Incorporate others' changes frequently
Share your changes frequently
Coordinate with your co-workers
Don't commit generated files
Don't commit binary files </details>

<details> <summary>Would you prefer a "configuration->deployment" model or "deployment->configuration"? Why?</summary>

Both have advantages and disadvantages. With "configuration->deployment" model for example, where you build one image to be used by multiple deployments, there is less chance of deployments being different from one another, so it has a clear advantage of a consistent environment. </details>

<details> <summary>Explain mutable vs. immutable infrastructure</summary>

In mutable infrastructure paradigm, changes are applied on top of the existing infrastructure and over time the infrastructure builds up a history of changes. Ansible, Puppet and Chef are examples of tools which follow mutable infrastructure paradigm.

In immutable infrastructure paradigm, every change is actually a new infrastructure. So a change to a server will result in a new server instead of updating it. Terraform is an example of technology which follows the immutable infrastructure paradigm. </details>

Software Distribution

<details> <summary>Explain "Software Distribution"</summary>

Read this fantastic article on the topic.

From the article: "Thus, software distribution is about the mechanism and the community that takes the burden and decisions to build an assemblage of coherent software that can be shipped." </details>

<details> <summary>Why are there multiple software distributions? What differences they can have?</summary>

Different distributions can focus on different things like: focus on different environments (server vs. mobile vs. desktop), support specific hardware, specialize in different domains (security, multimedia, ...), etc. Basically, different aspects of the software and what it supports, get different priority in each distribution. </details>

<details> <summary>What is a Software Repository?</summary>

Wikipedia: "A software repository, or “repo” for short, is a storage location for software packages. Often a table of contents is stored, as well as metadata."

How communication between web server and web browsers established:

Whenever a browser needs a file that is hosted on a web server, the browser requests the page from the web server and the web server responds with that page. This communication between web browser and web server happens in the following ways:

(1) User enters the domain name in the browser,and the browser then search for the IP address of the entered name. It can be done in 2 ways-

-By searching in its cache. 
-By requesting one or more DNS (Domain Name System) Servers.

(2) After knowing the IP Address, the browser requests the file via HTTP and the request reaches the correct (hardware) web server.

(3) The (software) HTTP server accepts the request, finds the requested document, and sends it back to the browser, also through HTTP. (If the server doesn't find the requested document, it returns a 404 response instead.)

(4) The Browser finally gets the webpages and displays it, or displays the error message.

</details>

<details> <summary>Explain "Open Source"</summary> </details> <details> <summary>Describe the architecture of service/app/project/... you designed and/or implemented</summary> </details> <details> <summary>What types of tests are you familiar with?</summary>

Styling, unit, functional, API, integration, smoke, scenario, ...

You should be able to explain those that you mention. </details>

<details> <summary>You need to install periodically a package (unless it's already exists) on different operating systems (Ubuntu, RHEL, ...). How would you do it?</summary>

There are multiple ways to answer this question (there is no right and wrong here):

Simple cron job
Pipeline with configuration management technology (such Puppet, Ansible, Chef, etc.) ... </details>

<details> <summary>What is Chaos Engineering?</summary>

Wikipedia: "Chaos engineering is the discipline of experimenting on a software system in production in order to build confidence in the system's capability to withstand turbulent and unexpected conditions"

Read about Chaos Engineering here </details>

<details> <summary>What is "infrastructure as code"? What implementation of IAC are you familiar with?</summary>

IAC (infrastructure as code) is a declarative approach of defining infrastructure or architecture of a system. Some implementations are ARM templates for Azure and Terraform that can work across multiple cloud providers. </details>

<details> <summary>What benefits does infrastructure-as-code have?</summary>

fully automated process of provisioning, modifying and deleting your infrastructure
version control for your infrastructure which allows you to quickly rollback to previous versions
validate infrastructure quality and stability with automated tests and code reviews
makes infrastructure tasks less repetitive </details>

<details> <summary>How do you manage build artifacts?</summary>

Build artifacts are usually stored in a repository. They can be used in release pipelines for deployment purposes. Usually there is retention period on the build artifacts. </details>

<details> <summary>What Continuous Integration solution are you using/prefer and why?</summary> </details> <details> <summary>What deployment strategies are you familiar with or have used?</summary>

There are several deployment strategies:
* Rolling
* Blue green deployment
* Canary releases
* Recreate strategy

</details>

<details> <summary>You joined a team where everyone developing one project and the practice is to run tests locally on their workstation and push it to the repository if the tests passed. What is the problem with the process as it is now and how to improve it?</summary> </details> <details> <summary>Explain test-driven development (TDD)</summary> </details> <details> <summary>Explain agile software development</summary> </details> <details> <summary>What do you think about the following sentence?: "Implementing or practicing DevOps leads to more secure software"</summary> </details> <details> <summary>Do you know what is a "post-mortem meeting"? What is your opinion on that?</summary> </details> <details> <summary>What is a configuration drift? What problems is it causing?</summary>

Configuration drift happens when in an environment of servers with the exact same configuration and software, a certain server or servers are being applied with updates or configuration which other servers don't get and over time these servers become slightly different than all others.

This situation might lead to bugs which hard to identify and reproduce. </details>

<details> <summary>How to deal with a configuration drift?</summary> Configuration drift can be avoided with desired state configuration (DSC) implementation. Desired state configuration can be a declarative file that defined how a system should be. There are tools to enforce desired state such a terraform or azure dsc. There are incremental or complete strategies. </details> <details> <summary>Explain Declarative and Procedural styles. The technologies you are familiar with (or using) are using procedural or declarative style?</summary>

Declarative - You write code that specifies the desired end state

Procedural - You describe the steps to get to the desired end state

Declarative Tools - Terraform, Puppet, CloudFormation, Ansible

Procedural Tools - Chef

To better emphasize the difference, consider creating two virtual instances/servers. In declarative style, you would specify two servers and the tool will figure out how to reach that state. In procedural style, you need to specify the steps to reach the end state of two instances/servers - for example, create a loop and in each iteration of the loop create one instance (running the loop twice of course). </details>

<details> <summary>Do you have experience with testing cross-projects changes? (aka cross-dependency)</summary>

Note: cross-dependency is when you have two or more changes to separate projects and you would like to test them in mutual build instead of testing each change separately. </details>

<details> <summary>Have you contributed to an open source project? Tell me about this experience</summary> </details> <details> <summary>What is Distributed Tracing?</summary> </details>

GitOps

<details> <summary>What is GitOps?</summary>

GitLab: "GitOps is an operational framework that takes DevOps best practices used for application development such as version control, collaboration, compliance, and CI/CD tooling, and applies them to infrastructure automation".

SRE

<details> <summary>What are the differences between SRE and DevOps?</summary>

Google: "One could view DevOps as a generalization of several core SRE principles to a wider range of organizations, management structures, and personnel."

Read more about it here </details>

<details> <summary>What SRE team is responsible for?</summary>

Google: "the SRE team is responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning of their services"

Read more about it here </details>

<details> <summary>What is an error budget?</summary>

Atlassian: "An error budget is the maximum amount of time that a technical system can fail without contractual consequences."

Read more about it here </details>

<details> <summary>What do you think about the following statement: "100% is the only right availability target for a system"</summary>

Wrong. No system can guarantee 100% availability as no system is safe from experiencing zero downtime. Many systems and services will fall somewhere between 99% and 100% uptime (or at least this is how most systems and services should be). </details>

<details> <summary>What are MTTF (mean time to failure) and MTTR (mean time to repair)? What these metrics help us to evaluate?</summary>

* MTTF (mean time to failure) other known as uptime, can be defined as how long the system runs before if fails.
* MTTR (mean time to recover) on the other hand, is the amount of time it takes to repair a broken system.
* MTBF (mean time between failures) is the amount of time between failures of the system.

</details>

<details> <summary>What is the role of monitoring in SRE?</summary>

Google: "Monitoring is one of the primary means by which service owners keep track of a system’s health and availability"

Read more about it here </details>

Service Level Indicators (SLI) and Service Level Objectives (SLO). </details>

Google: Toil is the kind of work tied to running a production service that tends to be manual, repetitive, automatable, tactical, devoid of enduring value, and that scales linearly as a service grows

Read more about it here </details>

<details> <summary>What is a postmortem ? </summary>

The postmortem is a process that should take place following an incident. It’s purpose is to identify the root cause of an incident and the actions that should be taken to avoid this kind of incidents from happening again. </details>

<details> <summary>What is the core value often put forward when talking about postmortem?</summary>

Blamelessness. Postmortems need to be blameless and this value should be remided at the beginning of every postmortem. This is the best way to ensure that people are playing the game to find the root cause and not trying to hide their possible faults.</details>