SYO-701 Security Architecture Study Guide for the CompTIA Security+

Data Protection Strategies

Maintaining the privacy and security of data is one of the primary goals of cybersecurity professionals. You should be familiar with common concepts related to data privacy and security and how these concepts relate to the field of cybersecurity. You should also be able to compare and contrast strategies.

Data Types

Data collection is vast, with only some pieces of data having the same weight as others. To filter through and organize the vast amounts of data collected, data is broken down into types based on sensitivity and usage.

Regulated

Regulated data is data governed by external laws and regulations that place restrictions and security requirements on its use and storage. Regulated data includes personally identifiable information (PII) and health-related data.

Trade Secret

Trade secret data is data specific to a particular business, which provides an advantage to the business. Trade secrets may include manufacturing processes, mathematical algorithms, scientific research, or even recipes.

Intellectual Property

Intellectual property (IP) data is creative output that is protected by IP laws, such as copyrights, patents, trademarks, and logos.

Legal Information

Legal information data is any data that pertains to the field of law, including court records, legal statutes, and legally binding contracts.

Financial Information

Financial Information is any data pertaining to an individual’s finances, such as credit card and banking information. The Payment Card Industry Data Security Standard (PCI DSS), which is not a government regulation but rather a contractual obligation between entities that accept credit cards and credit card service providers, specifies how financial data should be stored, processed, and transmitted.

Human- and Non-Human-Readable

Human-readable data is data that can be easily read by a human being, while non-human-readable data is data that cannot be easily understood by a human. For example, a PDF is human-readable data, while binary code is non-human readable data.

Data Classifications

Data classification is the process of identifying the sensitivity of data as well as the impact the breach of data may have and separating them into predefined general categories.

Note: These are generalizations of the data types, which may be classified with specific designations but can vary from entity to entity.

Sensitive

The sensitive classification can be used to identify data that contains information that should be protected to some extent. In general, the sensitive classification is toward the bottom of the data security scale.

Confidential

The confidential classification is used on data that should have increased protection and should only be accessed by specified entities. For example, data collected by a medical facility is generally classified as confidential and can only be shared under very specific circumstances.

Public

The public classification is a designation for data that can be accessed safely by the general public.

Restricted

The restricted classification pertains to highly confidential data, which requires strict access control. The restricted data classification is commonly applied to the most sensitive business data, which has the largest potential negative impact on the business if breached.

Private

The private classification is a designation for data that should only be accessible by the user or those specifically allowed by the user.

Critical

The critical classification is reserved for data that could have a major impact if exposed. Critical data requires extensive security measures for increased protection. For example, data pertaining to military launch codes would most likely be considered critical.

General Data Considerations

Data protection is vital to security and uses various techniques to protect that data no matter where on the network or in the system it is. Data protection involves data in all its states: at rest, in motion, and in processing.

Data States

A data state is a method of classification for what data is doing at a particular moment. Data states can change quickly in a networking environment. No matter its state, data should be protected.

Data at Rest

Data that is “at rest” is data that is currently stored at a permanent location awaiting retrieval or use. Data at rest locations include anything from a Universal Serial Bus (USB) drive, a hard drive, a cloud service, or anywhere else data is stored.

Data in Transit

Data “in motion” or “in transit” is data that is actively being sent or transmitted between two locations or points, such as from one computer to another or from one city to another. When data is in motion, it is in the transmission process.

Data in Use

Data that is “in processing” or “in use” is being used by a workstation or a server and is currently being stored in the device’s temporary memory files. For example, if a human resources employee is accessing the employee database to review pay data, the employee database is being stored on the employee’s desktop in its RAM memory for quicker retrieval.

Data Sovereignty

Data sovereignty is the principle that any data collected, stored, or processed has to align with the legal restrictions of the location in which it originated. For example, data housed in the cloud can bounce through multiple data centers in multiple countries, which means that no matter where the data center is located, it must adhere to the legal requirements of its country of origin.

Geolocation

There are numerous geolocation considerations regarding data storage, including off-site storage, distance/location selection, legal implications, and data sovereignty. At least one copy should be stored in a separate geographic location to protect the data from a natural disaster. The general rule of thumb is an organization should keep a copy of data at least 90 miles away to prevent it from being affected by geographically related incidents, such as natural disasters, or mechanical failure, such as a power grid outage.

Methods to Secure Data

Data is the most valuable and targeted asset within a network and should be secured in all states. There are multiple methods to use for this purpose.

Geographic Restrictions

Geographic restrictions may be placed on data, which control access to the data based on geographical considerations. Geofencing and geolocation are commonly used to set geographic restrictions.

Encryption

Encryption algorithms can be applied to data in all of its states. The security of data is dependent on the type of encryption applied and can range from simple encryption algorithms to highly complex algorithms.

Hashing

Hashing is a process that transforms data into a unique fixed-length form using a hashing algorithm. Unlike encryption, it cannot be reversed. Hashing is commonly used to store password data.

Masking

Data masking is a technology that redacts or replaces some portions of data with generic characters. It is commonly seen when a Social Security number is entered. Instead of revealing the entire number, the first five numbers are replaced by an x or an asterisk.

Tokenization

Tokenization is a technology that replaces an identifier with another unique identifier that is stored in a lookup table. For example, when a user creates an account, the name entered is replaced by an identifier, such as XXX111. When utilizing tokenization, however, it is vitally important to keep the lookup table secure.

Obfuscation

Data obfuscation, sometimes referred to as data anonymization, is the process of erasing or encrypting data that can be used for identification. The goal of anonymization is to completely remove all identifiers that can be traced back to a specific individual. Total anonymization, however, may not be practical for some entities.

Segmentation

Data segmentation is the process of placing data in separate areas within a network to minimize the amount of data that may be impacted in case of a breach.

Permission Restrictions

Data can also be secured using permission restrictions. These restrictions can be configured based on the type of data or the user’s authorization level.

Resilience and Recovery

Resilience and recovery in a network architecture is the ability of the network to respond to and recover from an interruption. You should be able to understand and explain the importance of concepts related to network resilience and recovery.

High Availability

Availability is the concept that a system should be operational and ready for use by legitimate users. In a high-availability environment, systems such as routers, switches, servers, and more need to remain up at all times with near-zero downtime. For this to happen, the environment must implement redundancy and fault tolerance.

Load Balancing vs. Clustering

Load balancing is the process of spreading data between multiple devices acting as separate entities to increase availability. Clustering, by contrast, uses multiple devices acting as a single device to increase availability.

Site Considerations

Site considerations refer to factors that may impact data based on physical location and the surrounding environment. To ensure site resiliency, it’s important to have a backup that can be used to process transactions when the primary site is down or has failed. There are three recovery site types: hot, cold, and warm.

Hot

This site is up all the time, meaning 24/7. When the primary site fails or is under maintenance, this site takes over. This is the best choice for a company that requires high availability.

Cold

A cold site only requires power and network connectivity. This may be a leased building or a data center that can be used when needed. This site does not contain any of the hardware needed for the site, it is merely an empty shell that is ready to be utilized. This is the least expensive site.

Warm

A warm site is in between cold and hot. The site has all the hardware and connections in place; the only thing that is needed is the data. This can be obtained from a backup.

Geographic Dispersion

Geographic dispersion is the process of physically separating resources like data centers to prevent a single incident, such as a natural disaster or power grid failure, from taking down the entire system. A common rule of thumb for data centers is they should be separated by at least 90 miles.

Platform Diversity

Platform diversity within a network architecture is the use of multiple systems, vendors, or technologies to provide resilience. Platform diversity reduces the impact on a network if a single system fails or is vulnerable.

Multi-Cloud Systems

A multi-cloud system uses multiple cloud-based solutions to provide resilience to a network. It provides redundancy and the ability to quickly shift between cloud systems in case of failure.

Continuity of Operations

Continuity of operations refers to the ability of a network to remain functional in case of a failure, either minimal or catastrophic. The ability of a network to achieve continuity of operations is dependent upon multiple factors, including cost, risk appetite, and complexity.

Capacity Planning

Capacity planning supports network resilience and recovery and refers to the ability of a network to scale as needed in response to current and future demands. It takes multiple aspects into consideration, including people, technology, and infrastructure.

People

The people component of capacity planning involves quickly increasing staff in response to demand. Staff increases may be handled internally or through the use of a third party, such as a staffing agency.

Technology

The technology component of capacity planning is the ability of a network to deploy technology throughout the network to meet demand. For example, load balancers may be used to meet increased capacity when needed.

Infrastructure

The infrastructure component of capacity planning refers to the underlying networking components that would be needed to meet increased demand, such as routers, switches, storage, and data throughput capabilities.

Testing

To ensure the resilience and recovery capabilities of a network are sufficient, testing methods can be used. Testing methods may be minimally or highly intrusive depending on the technique used.

Tabletop Exercises

Tabletop exercises, which are a testing method involving verbal discussions of a situation or scenario, the planned response, and the potential ramifications, can be used to identify potential vulnerabilities in a network’s resiliency and recovery plans. Tabletops exercises provide a testing environment that does not directly impact the current working environment.

Failover

A failover test takes place when a full switch to a recovery site or system is performed. While a failover test is the most intrusive and disruptive testing technique, it also provides the most accurate results regarding the network’s resilience and recovery capabilities.

Simulation

Simulations, which involve a complete scenario with real responses, provide a real-time testing environment through which responses may be practiced and fine-tuned to ensure the recovery and response plan in place meets the actual needs of the organization. A simulation may potentially impact the functionality of the current network.

Parallel Processing

Parallel processing is a testing technique that moves processing to an alternative site while also continuing processing at the primary site to test the capabilities of the alternative site.

Backups

A backup is a copy of data from components of a network that can be used to restore network components to a prior state. Backup types are defined by how often the data is collected, how much data is collected, in what form the data is collected, and what storage device is used to hold the data.

Onsite/Offsite

Onsite storage means having a backup on the physical premises, while offsite storage means having at least one copy away from the physical job site, providing geographical diversity. Offsite storage can be handled by the company itself or by a third-party vendor who specializes in secure data storage to ensure resilience in the case of a disaster.

Frequency

Backup frequency refers to how often a copy is made and stored and may vary depending on the type of backup made (i.e., full, incremental, or differential) and the requirements of the network.

Encryption

Encryption should be used on backups to secure the data they contain. By their very nature, backups contain vital network data that can be used for malicious purposes and should be protected in all states.

Snapshots

A snapshot backs up the data of a machine at a specific point in time. This is commonly used with virtual machines (VMs) and can provide a recovery point to a specific time or can be used to replicate the VM to another device. Snapshots are full backups of the system or machine and take up a significant amount of space.

Recovery

A backup can be used to recover a network or system after failure. How quickly a backup can be used to recover a network is dependent on multiple factors, including how extensive the recovery is, where the backup is stored, and the storage medium used for the backup.

Replication

Replication is the ability to restore a system or machine from an exact copy. It is commonly referred to as backing up. Replicated data, or backups, can be stored either physically using a storage area network (SAN) or virtually through the cloud. Either method should be kept current with a complete replica of the network, system, machine, or file it is used for.

Journaling

Journaling is a backup method that creates logs of changes made to a system. This allows for reversion to a previous point and is commonly used in systems that experience frequent changes, such as databases.

Power

Power is needed to run servers, data center lights, heating and cooling units, and other systems and components. In the event of a power failure, there are several fault-tolerant solutions.

Generators

Generators provide an alternate source of power and can be used for long-term power outages. One or more may be needed during a natural disaster.

Uninterruptible Power Supply (UPS)

A UPS is a hardware solution that can provide power for systems to use in case of a power outage. It can be used for short-term power, and it can also protect against any power fluctuations. Generally, a UPS is powered by batteries that are kept charged by electricity when the power is on. Some larger UPS units may rely on a diesel or other type of power generator.