Process Management with Security Metrics

A metric is a quantitative measurement that can be interpreted in the context of a series of previous or equivalent measurements. Metrics are necessary to show how security activity contributes directly to security goals; measure how changes in a process contribute to security goals; detect significant anomalies in processes and inform decisions to fix or improve processes. Good management metrics are said to be S.M.A.R.T:

  • Specific: The metric is relevant to the process being measured.
  • Measurable: Metric measurement is feasible with reasonable cost.
  • Actionable: It is possible to act on the process to improve the metric.
  • Relevant: Improvements in the metric meaningfully enhances the contribution of the process towards the goals of the management system.
  • Timely: The metric measurement is fast enough for being used effectively.

Metrics are fully defined by the following items:

  • Name of the metric;
  • Description of what is measured;
  • How is the metric measured;
  • How often is the measurement taken;
  • How are the thresholds calculated;
  • Range of values considered normal for the metric;
  • Best possible value of the metric;
  • Units of measurement.

Security Metrics are difficult to come by
Unfortunately, it is not easy to find metrics for security goals like security, trust and confidence. The main reason is that security goals are “negative deliverables”. The absence of incidents for an extended period of time leads to think that we are safe. If you live in a town where neither you nor anyone you know has ever been robbed, you feel safe. Incidents prevented can’t be measured in the same way a positive deliverable can, like the temperature of a room.

Metrics for goals are not just difficult to find; they are not very useful for security management. The reason for this is the indirect relationship between security activity and security goals. Intuitively most managers think that there is a direct link between what we do (which results or outputs) and what we want to achieve (the most important things: our goals). This belief is supported by real life experiences like making a sandwich. You buy the ingredients, go home, arrange them, and perhaps toast them and voilá: A warm sandwich ready to eat. The output sought (the sandwich) and the goal (eating a home made sandwich) match beautifully.

Unfortunately, there is no direct link every time. A good example can be research. There is not direct relationship between goals (discoveries) and the activity (experiments, publication). You can try hundreds of experiments and still not discover a cure for cancer. Same thing happens with security. The goals (trust, confidence, security) and the activity (controls, processes) are not directly linked.

When there is a direct link between activity and goal, like the temperature in a pot and the heat applied that pot, we know what decision to take if we want the temperature to drop: stop applying heat But, how will we make a network safer, adding (more accurate filtering), or summarising (less complexity) filtering rules? We don’t know.  If a process produces dropped packets, more or less dropped packets won’t necessarily make the network more or less secure, just like a change in the firewall rules won’t necessarily make the network safer of otherwise.

The disconnect present in information security between goals and activity prevents goal metrics from being useful for management, as you can never tell if you are closer to your goals because of decisions recently taken on the security processes.

Goal metric examples:

  • Instances of secret information disclosed per year. What can you do to prevent people with legitimate access to disclose that information?
  • Use of system by unauthorised users per month. What can you do to prevent people from letting other users to use their accounts?
  • Customers reports of misuse of personal data to the Data Protection Agency. Even if you are compliant, what can you do to prevent a customer to fill a report?
  • Risk reduction per year of 10%. As risk depends on internal an external factors, what can you do to actually modify risk?
  • Prevent 99% of incidents. How do you know how many incidents didn’t happen?

Actually useful security metrics
If metrics for goals are difficult to get, and are not very useful; what is a security manager to do? Measuring process outputs can be the answer. Measuring outputs is not only possible but very useful, as outputs contribute directly or indirectly to achieve security, trust and confidence. Using output metrics you can:

  • Measure how changes in a process contribute to outputs;
  • Detect significant anomalies in processes;
  • Inform decisions to fix or improve the process.

There are seven basic types of process output metrics:

  • Activity: The number of outputs produced in a time period;
  • Scope: The proportion of the environment or system that is protected by the process. For example, AV could be installed in only 50% of user PCs;
  • Update: The time since the last update or refresh of process outputs.
  • Availability: The time since a process has performed as expected upon demand (up time), the frequency and duration of interruptions, and the time interval between interruptions.
  • Efficiency / Return on security investment (ROSI): Ratio of losses averted to the cost of the investment in the process. This metric measures the success of a process in comparison to the resources used.
  • Efficacy / Benchmark: Ratio of outputs produced in comparison to the theoretical maximum. Measuring efficacy of a process implies the comparison against a baseline.
  • Load: Ratio of available resources in actual use, like CPU load, repositories capacity, bandwidth, licenses and overtime hours per employee.

Examples of use of these metrics:

  • Activity: Measuring the number of new user account created per week, a sudden drop could lead to detecting that the new administrator is lazy, or that users started sharing user accounts, so they are not requesting them any more.
  • Scope: In an organization with a big number of third party connections, measuring the number of connections with third parties protected by a firewall could lead to a management decision not to create more unprotected connections.
  • Update: Measuring the update level of the servers in a DMZ could lead to investigating the root cause if the level goes above certain level.
  • Availability: Measuring the availability of a customer service portal could lead to rethinking the High Availability Architecture used.
  • Efficiency / Return on security investment (ROSI): Measuring the cost per seat of the Single Sign On systems of two companies being merged could lead to choose one system over the other.
  • Efficacy / Benchmark: Measuring backup speed of two different backup systems could lead to choose one over the other.
  • Load: Measuring and projecting the minimum load of a firewall could lead to taking the decision to upgrade pre-emptively.

There is an important issue to tackle when using output metrics; what I call the Comfort Zone. When there are too many false positives, the metrics is quickly dismissed, as it is not possible to investigate every single warning. On the other hand, when the metric never triggers a warning, there is a feeling that the metric is not working or providing value. The Comfort Zone (not too many false positives, pseudo-periodic warnings) can be achieved using an old tool from Quality Management, the control chart. The are some rules used in Quality Management to tell a warning, a condition that should be investigated from a normal statistical variation (Western Electric, Donald J. Wheeler's, Nelson rules), but for security management the best practice is adjusting the multiple of the standard deviation that will define the range of normal values for the metric until we achieve the Comfort Zone, pseudo-periodic warnings without too many false positives.

Using Security Management Metrics
There are six steps in the use of metrics: measurement, representation, interpretation, investigation and diagnosis.

Measurement: The measurement of the current value of the metric is periodic and normally refers to a window, for example: “9:00pm Sunday reading of the number of viruses cleaned in the week since the last reading” Measurements from different sources and different periods need to be normalized before integration in a single metric.

Interpretation: The meaning of a measured value is evaluated comparing the value of a measurement with a threshold, a comparable measurement, or a target. Normal values (those within thresholds) are estimated from historic or comparable data. The results of interpretation are:

  • Anomaly: When the measurement is beyond acceptable thresholds.
  • Success: When the measurement compares favourably with the target.
  • Trend: General direction of successive measurements relative to the target.
  • Benchmark: Relative position of the measurement or the trend with peers.

Incidents or poor performance take process metrics outside normal thresholds. Shewhart-Deming control charts are useful to indicate if the metric value is within the normal range, as values within the arithmetic mean plus/minus twice the standard deviation make more than 95.4% of the values of a normally distributed population. Fluctuations within the “normal” range would not normally be investigated.

Investigation: The investigation of abnormal measurements ideally ends with identification of the common cause, for example changes in the environment or results of management decisions, or a special cause (error, attack, accident) for the current value of the metric.

Representation: Proper visualisation of the metric is key for reliable interpretation. Metrics representation will vary depending on the type of comparison and distribution of a resource. Bar charts, pie charts and line charts are most commonly used. Colours may help to highlight the meaning of a metric, such as the green-amber-red (equivalent to on-track, at risk and alert) traffic-light scale. Units, the period represented, and the period used to calculate the thresholds must always be given for the metric to be clearly understood. Rolling averages may be used to help identify trends.

Diagnosis: Managers should use the results of the previous steps to diagnose the situation, analyse alternatives and their consequences and make business decisions.

  • Fault in Plan-Do-Check-Act cycle leading to repetitive failures in a process -> Fix the process.
  • Weakness resulting from lack of transparency, partitioning, supervision, rotation or separation of responsibilities (TPSRSR) -> Fix the assignment of responsibilities .
  • Technology failure to perform as expected -> Change / adapt technology.
  • Inadequate resources -> Increase resources or adjust security targets.
  • Security target too high -> Revise the security target if the effect on the business would be acceptable.
  • Incompetence, dereliction of duty -> Take disciplinary action.
  • Inadequate training -> Institute immediate and/or long-term training of personnel.
  • Change in the environment -> Make improvements to adapt the process to the new conditions.
  • Previous management decision -> Check if the results of the decision were sought or unintended.
  • Error -> Fix the cause of the error.
  • Attack -> Evaluate whether the protection against the attack can be improved.
  • Accident -> Evaluate whether the protection against the accident can be improved.

What management practices become possible?
A side effect of an Information Security Management System (ISMS) lacking useful security metrics is that security management becomes centred in activities like Risk Assessment and Audit.  Risk Assessment considers assets, threats, vulnerabilities and impacts to get a picture of security and prioritise design and improvements while Audit checks the compliance of the actual information security management system with the documented management system with an externally defined management system or an external regulation. Risk Assessment and Audit are valuable, but there are more useful security management activities like monitor, test, design & improvement and optimisation that become possible with output metrics. Theses activities can be described as follows:

  • Monitor—Use metrics to watch processes outputs, detect abnormal conditions and assess the effect of changes in the process.
  • Test—Check if inputs to the process produce the expected outputs.
  • Improving -  Making changes in the process to make it more suitable for the purpose, or to reduce usage of resources.
  • Planning - Organising and forecasting the amount, assignment and milestones of tasks, resources, budget, deliverable and performance of a process.
  • Assessment -  How well the process matches the organisation's needs and compliance goals expressed as security objectives. How changes in the environment or management decisions in a process change the quality, performance and use of resources of the process; Whether bottlenecks or single points of failure exist; Points of diminishing returns; Benchmarking of processes between process instances and other organisations. Trends in quality, performance and efficiency.
  • Benefits realisation. Shows how achieving security objectives contributes to achieving business objectives, measures the value of the process for the organisation, or justifies the use of resources.

While audits can be performed without metrics, monitoring, testing, planning,  improvement and benefits realisation are not feasible without them.

What needs to be done?
S.M.A.R.T security managers need metrics that actually help them performing management activities.

While it is not necessary to drop goal metrics altogether, the day to day focus of information security management should be on security monitoring, testing, design & improvement and optimization using output metrics, which are the ones which will show what are the effect of management decisions, if things are getting worse or better, if processes work as designed, and if there are changes out of our direct control that cause abnormal conditions in security processes. All these activities are perfectly feasible using outputs metrics and control charts.

Return On Security Investment

The information security industry recognizes both the necessity and the difficulty of carrying out a quantitative evaluation of ROSI, return on security investment.

The main reason for investing in security measures is to avoid the cost of accidents, errors and attacks. Direct costs of an incident may include lost revenues, damages and property loss, or direct economic loss. The total cost can be considered to be the direct cost plus the cost of restoring the system to its original state before the incident. Some incidents can cost information, fines, or even human lives.

The indirect cost of an incident may include damage to a company’s public image, loss of client and shareholder confidence, cash-flow problems, breaches of contract and other legal liabilities, failure to meet social and moral obligations, and other costs.

Measuring Return

What do we know intuitively about the risk and cost of security measures? First, the relationship between the factors that affect risk - such as window of opportunity, value of the asset and its value to the attacker, combined assets, number of incidents and their cost, etc. - is quite complex.  We also know that when measures are implemented to reduce risk, the ease of using and managing systems also decreases, generating an indirect cost of the security measures.

How do we go from this intuitive understanding to quantitative information? There is some accumulated knowledge of the relationship between investment in security measures and their results. First, there is the Mayfield paradox, according to which the cost of universal access to a system and absolutely restricted access is infinite, with more acceptable costs corresponding to the intermediate cases.

An empirical study was also done by the CERT at Carnegie Mellon University, which states that the greater the expenditure on security measures, the smaller the effect of the measures on security. This means that after a reasonable investment has been made in security measures, doubling security spending will not make the system twice as secure.

The study that is most easily found on the Internet on this subject cites the formulas created during the implementation of an intrusion detection system by a team from the University of Idaho.

R: losses.
E: prevented losses
T: total cost of security measures

(R-E)+T= ALE

R-ALE = ROSI, therefore ROSI = E-T

The problem with this formula is that E is merely an estimate, and even more so if the measure involved is an IDS, which simply collects information on intrusions, which means that there is no cause-effect relationship between detecting an intrusion and preventing an incident. Combining this type of estimate with basing it on mathematical formulas is like combining magic with physics.

What problems do we face in calculating return on investment of security measures? The most important is the lack of concrete data, followed closely by a series of commonly accepted suppositions and half-truths, such as that risk always decreases as investment increases, and that the return on the investment is positive for all levels of investment.

Nobody invests in security measures to make money; they invest in them because they have no choice. Return on investment demonstrates that investing in security is profitable, in order to select the best security measures with a given budget, and to determine whether the budget allocated to security is sufficient to fulfill the business objectives, but not to demonstrate that companies make money off of the investment.

In general, and also from the point of view of return on investment, there are two types of security measures: measures to reduce vulnerability and measures to reduce impact.

  • Measures that reduce vulnerability barely reduce the impact when an incident does occur. These measures protect against a narrow range of threats. They are normally known as Preventive Measures. Some of these measures are firewalls, padlocks, and access control measures. One example of the narrowness of the protection range is the use of firewalls, which protect against access to unauthorized ports and addresses, but not against the spread of worms or spam.
  • Measures that reduce impact to very little to minimize vulnerability if an incident does occur. These measures protect against a broad range of threats and are commonly known as Corrective Measures. Examples of these measures include RAID disks, backup copies, and redundant communication links. One example of the range of protection is the use of backups, which do not prevent incidents, but do protect against effective information losses in the case of all types of physical and logical failures.

The profitability of both types of measures is different, as the rest of the article will show.

Preventive or Vulnerability-Reduction Measures

A reduction in vulnerability translates into a reduction in the number of incidents. Security measures that reduce vulnerability are therefore profitable when they prevent incidents for a value that is higher than the total cost of the measure during that investment period.

The following formula can be used:

ROSI = CTprevented / TCP

CT  = Cost of Threat = Number of Incidents * Per Incident Cost.
TCP = Total Cost of Protection

When ROSI > 1, the security measure is profitable.

Several approximations can be used to calculate the prevented cost. One takes the prevented cost into account as the cost of the threat in a period of time before and after the implementation of the security measure.

CTprevented = ( CTbefore – CTafter)

Calculating the cost of the threat as the number of incidents multiplied by the cost of each incident is an alternative with respect to the traditional calculation of the incident probability multiplied by the incident cost, provided that the number of incidents in the investment period is more than 1. To calculate a probability mathematically, the number of favorable cases and the number of possible cases must be known. Organizations rarely have information on possible cases (but not “favorable” cases) of incidents. It is impossible to calculate the probability without this information. However, it is relatively simple to determine the number of incidents that occur within a period of time and their cost.

For a known probability to be predictive, it is also necessary to have a large enough number of cases, and conditions must also remain the same. Taking into account the complexity of the behavior of attackers and the organizations that use information systems, it would be foolish to assume that conditions will remain constant. Calculating the cost of a threat using probability information is therefore unreliable in real conditions.

One significant advantage of calculating the cost of a threat as the product of the number of incidents and their unit cost is that this combines the cost of the incidents, the probability, and the total assets (since the number of incidents partly depends on the quantity of the total assets) into a single formula. To make a profitability calculation like this, real information on the incidents and their cost is required, and gathering this information generates an indirect cost of an organization’s security management. If this information is not available, the cost of the threats will have to be estimated to calculate the ROSI, but the value of the calculation result will be low as the estimate can always be changed to generate any desired result.

The profitability of a vulnerability reduction measure depends on the environment. For example, in an environment in which many incidents occur, a security measure will be more profitable than in the case of another environment in which they do not occur. While using a personal firewall on a PC connected to the Internet twenty-four hours a day may be profitable, using one on a private network not connected to the Internet would not. Investing in a reinforced door would be profitable in many regions of Colombia, but in certain rural areas of Canada, this investment would be a waste of money.

Sample profitability calculation:

  1. Two laptops out of a total of 50 are stolen in a year.
  2. The replacement cost of a laptop is 1800 euros.
  3. The following year, the company has 75 laptops.
  4. The laptops are protected with 60€ locks.
  5. The following year only one laptop is stolen.

ROSI = ( Rbefore – Rafter) / TCP

ROSI = ( ( 1800+Vi )*3 - (( 1800+Vi )*1+75*60) )/( 75*60 )

(The number of incidents is adjusted for the increase in the number of targets).

If a laptop was worth nothing (Vi=0), the security measure would not be profitable (ROSI < 1). In this example, the 60€ locks are profitable when a laptop costs more than 2700€, or when, based on historical information, the theft of 5 laptops can be expected for the  year in question.

Using this type of analysis, we could:

  • Use locks only on laptops with valuable information.
  • Calculate the maximum price of locks for all laptops (24€ when Iv=0).

Corrective or Impact-Reduction Measures

Since impact-reduction measures do not prevent incidents, the previous calculation cannot be applied. In the best case scenario, these measures are never used, while when there are two incidents which could result in the destruction of the protected assets, they are apparently worth twice the value of the assets. Now then, who would spend twice the value of an asset on security measures? Profitability of corrective measures cannot be measured. These measures are like insurance policies; they put a limit on the maximum loss suffered in the case of an incident.

What is important in the case of impact-reduction measures is the protection that you get for your money. The effectiveness of this protection can be measured, for example depending on the recovery time after an incident. Depending on their effectiveness, there are measures that range from backup copies (with some added cost) to fully redundant systems (which cost more than double).

One interesting alternative to calculating the ROSI of a specific security measure is to measure the ROSI of a set of measures – including detection, prevention, and impact reduction – that protect an asset. In this case, the total cost of protection (TCP) is calculated as the sum of the cost of all of the security measures, which the effort to obtain the information on the cost of the threats is practically identical.

Budget, cost, and selection of measures

The security budget should be at most equal to the annual loss expectancy (ALE) caused by attacks, errors, and accidents in information systems for a tax year. Otherwise, the measures are guaranteed not to be profitable. The graph below shows the expected losses as the area under the curve. To clarify the graph, it represents a company with enormous expected losses, of almost 25% of the value of the company. In the case of an actual company, legibility of the graph could be improved using logarithmic scales.

An evaluation of the cost of a security measure must take into account both the direct costs of the hardware, software, and implementation, as well as the indirect costs, which could include control of the measure by evaluating incidents, ethical hacking (attack simulation), audits, incident simulation, forensic analysis, and code audits.

Security measures are often chosen based on fear, uncertainty and doubt, or out of paranoia, to keep up with trends, or simply at random. However, the calculation of the profitability of security measures can help to select the best measures for a particular budget. Part of the budget must be allocated to the protection of critical assets using impact-reduction measures, and part to the protection of all of the assets using vulnerability-reduction measures and incident and intrusion detection measures.

Conclusions

The main conclusions that can be drawn from all of this are that:

  • To guarantee maximum effectiveness of an investment, it is necessary, and possible if the supporting data is available, to calculate the return on the investment of vulnerability-reduction measures.
  • In order to make real calculations, real information is needed regarding the cost of the incidents for a company or in comparable companies in the same sector.
  • Both incidents and security measures have indirect and direct costs that have to be taken into account when calculating profitability.

Download the Article

Information Security Paradigms

Information security is complex, isn’t it? Confidentiality, Integrity, Availability, Non Repudiation, Compliance, Reliability, Access Control, Authentication, Identification, Authorization, Privacy, Anonymity, Data Quality and Business Continuity are some concepts that are often used.

It is very difficult to define security, but Why? The reasons are manifold:

Information systems are very complex, they have structural and dynamic aspects. Unix abstracts these aspects using the file - process dichotomy. Generally speaking, information systems are structured as information repositories and interfaces, connected by channels both physical and logical. Interfaces interconnect information systems, facilitate input/output of information and interact with users. Data repositories in these systems can hold information either temporarily or permanently. The dynamic aspect of information systems are those processes that produce results and exchange messages through channels.

Information systems process data, but data is not information. The same information can be rendered as binary data using different formats and conversion rates of data to information. The importance of a single bit of data depends on how much information it represents.

Security is not a presence, but an absence. When there weren’t any known incidents, organisations could confidently say that they were safe.

If an incident that goes totally undetected is still a incident, is a matter of debate.

Security depends on the context. An unprotected computer wasn’t as safe connected directly to the Internet in 1990 as it was connected to a company’s network in 2005, or totally isolated. We can be safe when there are no threats, even if we don’t protect ourselves. So security depends on the context.

Security costs money. Any realistic definition must consider the cost of protection, as there is a clear limit on how much we should spend protecting an information system.  The expenditures depends both on how much the system is worth to us and the available budget.

Finally, Security depends on our expectations about our use of systems. The higher the expectations, the more difficult they will be met. A writer that holds in a computer everything he wrote in his life and someone who just bought a computer will have totally different expectations. The writer expectations will be more difficult to meet, as he might expect his hard drive to last forever, so a crash can mean catastrophe, while the recently bought computer’s hard drive might be replaced with little hassle. The writer expectations are independent of his knowledge of the system. The system is sold as an item that holds data, and the writer expects the data to stay there as long as he wants to. Expectations about use of a system are not technical expectations about the system. We expect a car to take us places in summer or winter, at it doesn’t matter how much you know about cars, they usually do. The same way users expect systems to serve a purpose, and their expectation can’t be qualified as unrealistic or based on ignorance about how reliable computer systems are.

A good security definition should assist in the processes related to protecting an information system, for example:
1. Find what threats are relevant to me.
2. Weight the threats and measure the risk.
3. Select security measures we can afford that reduce the risk to an acceptable level for the smaller cost.

Unfortunately current definitions in use are not up to this task, and worse still, they are not helpful for advancing information security knowledge.  Ideally, a security definition should comply with the scientific method, as it is the best tool for the advancement of empiric knowledge. Scientific theories are considered successful if:
•    Survive every falsification experiment tried.
•    Explain an ample spectrum of phenomena, becoming widely usable.
•    Facilitate the advance of knowledge.
•    Have predictive power.

The demarcation criteria used to distinguish scientific from pseudo-scientific theories are based on Karl Popper’s falsifiability. If a theory is falsifiable, it’s possible to think of an experiment that refutes or confirms the theory. For example the theory that the Koch’s germ causes TB is falsifiable, because it’s possible to design an experiment, exposing one of two different populations of animals to the germ. If the exposed animals get infected, the theory seems confirmed, but what makes the experiment valid is that if both populations get infected, the theory would be shown to be false, because it would be shown that the cause of the illness is something different from the germ.

The definitions in use normally don’t state their scope and point of view. From now on I will assume a information technology point of view, within the scope of a company.

Let’s have a look at the four main approaches to defining security.
1.    the set of security measures.
2.    to keep a state.
3.    to stay in control.
4.    CIA and derivatives.

The first approach is easy to debunk. If security was the set of security measures, a bicycle with a lock would be just as safe in the countryside of England as in Mogadishu, but it is not. It is interesting that Bruce Schneier has been so often misquoted. “Security is a process, not a product” doesn’t mean that security is impossible to achieve, a point of view favoured by those who think that being secure is the same as being invulnerable. Reading the quote in context what he means is that security is not something you can buy, it’s not a product. Security is NOT the set of security measures we use to protect something.

The second approach states that security is a state of invulnerability or the state that results from the protection. Examples of proponents of this approach are:

•    Gene Spafford: “The only truly secure system is one that is powered off, cast in a block of concrete and sealed in a lead-lined room with armed guards - and even then I have my doubts.”
•    RFC2828 Internet Security Glossary:
o    Measures taken to protect a system.
o    The condition of a system that results from the establishment and maintenance of measures to protect the system.
o    The condition of system resources being free from unauthorised access and from unauthorised or accidental change, destruction, or loss.

The approach that states that security is alike to be invulnerable is purely academic and can’t be applied to real systems because it neglects to consider that security costs money. Invulnerability leads to protection from highly unlikely threats, at a high cost. It is related to very uncommon expectations, and it focuses in attacks, neglecting protection from errors and accidents.

The third approach, stay in control, is akin to keeping Confidentiality, defined as the ability to grant access to authorised users and deny access to unauthorised users, so this approach can be considered a subset of the CIA paradigm. This approach states that security is “to stay in control” or “protecting information from attacks” Examples of proponents of this approach are:

•    William R. Cheswick: “Broadly speaking, security is keeping anyone from doing things you do not want them to do to, with, or from your computers or any peripherals”
•    INFOSEC Glossary 2000: “Protection of information systems against access to or modification of information, whether in storage, processing or transit, and against the denial of service to authorised users, including those measures necessary to detect, document, and counter such threats.”
•    Common Criteria for Information Technology Security Evaluation - Part 1: Security is concerned with the protection of assets from threats, […] in the domain of security greater attention is given to those threats that are related to malicious or other human activities..”

Some access mechanisms used to achieve Confidentiality are often taken as part of security definitions:
•    Identification is defined as the ability to identify a user of an information system at the moment he is granted credentials to that system.
•    Authentication is defined as the ability to validate the credentials presented to an information system at the moment the system is used.
•    Authorisation is defined as the ability to control what services can be used and what information can be accessed by an authenticated user.
•    Audit is defined as the ability to know what services have been used by an authorised user and what information has been accessed, created, modified or erased including details such when, when, where from, etc.
•    Non repudiation is defined as the ability to assert the authorship of a message or information authored by a second party, preventing the author to deny his own authorship.

This has led to different mixes of CIA and these security mechanisms. As these definitions mix the definition of security with protection mechanisms to achieve security, I won’t bother debunking them any further (ACIDA, CAIN, etc)

CIA is the fourth approach to defining security and the most popular; “keeping confidentiality, integrity, availability”, defined as:
•    Confidentiality, already defined, sometimes mistaken for secrecy.
•    Integrity, defined as the ability to guarantee that some information or message hasn’t been manipulated.
•    Availability is defined as the ability to access information or use services at any moment we demand it, with appropriate performance.

Examples of proponents of this approach are:
•    ISO17799: “Preservation of confidentiality, integrity and availability of information”
o    Confidentiality: Ensuring that information is accessible only to those authorised to have access.
o    Integrity: Safeguarding the accuracy and completeness of information and processing methods.
o    Availability. Ensuring that authorised users have access to information and associated assets when required..
•    INFOSEC Glossary 2000: “Measures and controls that ensure confidentiality, integrity, and availability of information system assets including hardware, software, firmware, and information being processed, stored, and communicated”

This popular paradigm classifies incidents and threats by effects, not causes, and therefore is not falsifiable. For example, an illnesses classification of fevergenic, paingenic, swellnessgenic and exhaustiongenic is complete, but not falsifiable, because what illness doesn’t produce fever, pain, exhaustion or swelling?.

It is curious that, using this example, a change in the skin coloration doesn’t fit with these categories. A doctor using that paradigm will incorrectly classify it a fevergenic (“It’s a local fever”) or swellgenic (“It’s a micro-swelling). The same way, professionals that don’t question the CIA paradigm classify the loss of synchronization as an integrity problem (“time information has been changed”), while it’s clear that only stateful information, like a file or a database, can have the property of integrity.

It is impossible to think of an experiment that shows an incident or a threat not to belong to one of the confidentiality, integrity or availability categories. Therefore the CIA paradigm is unscientific.

There are several examples of incidents that are not well treated using CIA, but appear to fit within the paradigm. Uncontrolled permanence of information can lead to Confidentiality Loss. Information Copy in violation of authorship rights can lead to Confidentiality Loss, as someone is getting access who is not authorised for it. Copy in violation of privacy rights can lead to Confidentiality Loss, as someone is getting access who is not authorised for it. Now, what are these CIA classifications good for? It’s very clear that to prevent “confidentiality” incidents our controls will be very different if we want to limit access, if we want to prevent breaching of authorship rights, or if we want to guarantee information erasure. So, why are we classifying at all, if the classification doesn’t help to do something as simple as selecting a security measure?. Some other examples of incidents that don’t fit CIA are operator errors and fraud. To neutralise a threat, a control that regulates the causes of the threat will normally be needed; therefore, for control selection, it would be far more useful to classify by causes than by effects, which is exactly what CIA doesn’t do.

CIA doesn’t consider the context at all. This is why small and medium size organisations are intimidated by the exigency of Confidentiality, Integrity and Availability, giving up on devoting enough resources to security. Only big organisations aim for Confidentiality, Integrity and Availability.

CIA doesn’t consider our expectation about our information systems. You can’t demand confidentiality of public information, like www.cnn.com news; you can’t demand integrity of low durability information, it is too easy to reproduce; and you can’t demand availability of low priority services.

Many practitioners who use the CIA definition have a stance of “We want to prevent attacks from succeeding”; in other words, for us to be safe is equivalent to being invulnerable. The definition of an incident under this light is totally independent of the context, and considers attacks only, neglecting accidents and errors as incidents. Disaster Recovery Plans show that the need to protect a company from catastrophes is well known, but many accidents are considered a reliability issue and not a security issue, because accidents are not considered a security problem.

So, if no current information security definition or paradigm is satisfactory, what can replace it? An interesting alternative is the use of an operational definition. An operational definition uses the measuring process the definition of the measured quantity. For example, a meter is defined operationally as the distance travelled by a beam of light in a certain span of time. An example for the need of operational definitions is the collapse of the West Gate Bridge in Melbourne, Australia in 1970, killing 35 construction workers. The subsequent enquiry found that the failure arose because engineers had specified the supply of a quantity of flat steel plate. The word “flat” in this context lacked an operational definition, so there was no test for accepting or rejecting a particular shipment or for controlling quality.

Before detailing the operational definition, some words about probability. Probability has predictive power with the following considerations:
•    As long as systems and the environmental conditions don’t change, the future is similar to the past.
•    You can apply probability to set of phenomena, not to individual phenomenon.
•    A sufficiently big set of historic cases must be available for significant probability calculations.

Probability is often misunderstood. If you drop a coin nine times and get nine crosses, the probability of getting a cross the tenth time is still ½, not lower as intuition suggests. Quite the opposite, the more crosses we get, the higher should be our confidence that the next drop will be a cross too, unless we have tested the coin previously the coin with several runs of dropping it ten times and we got globally crosses 5 out of ten times, meaning the coin is “mathematically fair”.

An operational definition for information security is: “The absence of threats that can affect our expectations about information systems equivalently protected in equivalent environments”. Security is something that you get, not something that you do.

In practice threats are always present. This is the reason perfect security is not possible, which is perfectly consistent with the operational definition. This shows how invulnerability and security are different, as the definition put in practice shows invulnerability as unfeasible.

Expectations about a system are expectation about the use of the system, not expectations about how would it respond to an attack, and are therefore the same even of some new vulnerabilities are discovered.

This operational definition is not only falsifiable, but it is expectations dependent and deals cleanly with the definition difficulties of context. It is helpful to determine what threats are relevant, to weight the threats, measure the risk and to select security measures.

Operational means “working definition” (Give EXAMPLE)

The following definitions of incident and threat follow from the operational definition:
•    Incident: “Any failure to meet our expectations about an information system”. This definition makes our expectations the pivotal point about what should protected.
•    Threat “Any historical (CLASS?) cause of at least one incident in an equivalent information system” This implies that the probability is not zero, and brings in the context. This is an operational, a “working definition”. Zero days are considered threats, as they belong to the category of malicious code, which is know to have caused incidents in the past. If a “threat” that never causes an incident is a threat is a matter of debate.

The threats relevant to an information system will be the causes of historic incidents in information systems protected equivalently in equivalent environments. Insecurity can be measured by the cost of historic incidents in a span of time for every information system equivalently protected in an equivalent environment.

Many companies have these general expectations about their information systems and the way they are used:

•    Comply with existing legal regulations.
•    Control the access to secrets and information or services protected by law, like private information and copyrights.
•    Identify the authors of information or messages and record of their use of services.
•    Make the users responsible for their use of services and acceptance of contracts and agreements.
•    Control the physical ownership of information and information systems.
•    Control the existence and destruction of information and services.
•    Control the availability of information and services.
•    Control the reliability and performance of services.
•    Control the precision of information.
•    Reflect the real time and date in all their records.

Every organisation will have a different set of expectations, which leads to different sets of incidents to protect from and different sets of threats to worry about depending on the environment. The more specific are the expectations defined, the easier it becomes to determine the threats to them and the security measures that can protect them.

To determine how relevant are the threats it is necessary to gather historical data for incidents in equivalent systems in equivalent environments. Unfortunately, whereas the insurance industry has been doing this for years, information security practitioners lack this statistical information. It is possible to know the likelihood and the cause to have a car accident, but there is not data enough to know how likely is to suffer a information security incident, nor the cause. Quantitative risk measurement without proper historical data is useless. Some practitioners even mix estimates figures with complex formulae, which is equivalent to mixing magic and physics.

Even if there is no accurate data about risk, it is possible to follow a risk assessment process similar to OCTAVE to identify the expectations about the information systems and the significant threats that can prevent the expectations to materialise.

With the operational definition, every identified threat can be controlled using suitable security measures. If quantitative risk information is available, the most cost efficient security measures could selected.

Previously unknown threat can be controlled using impact reduction security measures, which are effective against a wide spectrum of threats, like for example, back-up.

The operational definition of an incident helps to focus on whatever is relevant to our context. If there is no expectation for secrecy, no matter what is revealed, there is no incident. The operational definition of a threat helps focus on threats that are both relevant and likely. It doesn’t make much sense consider meteors as a threat if no information system has ever been destroyed by a meteor. Measuring insecurity by the cost of incidents helps to gauge how much invest in information security. If our expenses protecting information systems for the last five years were 10.000 euros a year, and our losses were 500 euros a year, it probably doesn’t make sense to rise the budget to 20.000 euros, but to 10.500 tops. Of course this is a gross estimate, but it gives an idea of what can be achieved if statistics on the cost of incidents and their causes were available.

The operational definition is richer than the other paradigms, it addresses expectations, context and cost and it makes far easier to determine what security measures to take to protect the expectations put on an information system. The adoption of a falsifiable definition should enable some progress in information security theory.

Using O-ISM3 with TOGAF

In order to prevent duplication of work and maximize the value provided by the Enterprise Architecture and Information Security discipline, it is necessary to find ways to communicate and take advantage from each other’s work. We have been examining the relationship between O-ISM3 and TOGAF®, both Open Group standards, and have found that, terminology differences aside, there are quite a number of ways to use these two standards together. We’d like to share our findings with The Open Group’s audience of Enterprise Architects, IT professionals, and Security Architects in this article.

Events Logging Markup Language

Can you think of performing a Forensic analysis on a system with no records, no logs? Neither can I. Logs contain events like startup, restart, abnormal termination of services, physical and logical thresholds being exceeded, access to resources, network connections, privilege and access rights changes, configuration changes, etc. Logs are generated everywhere, in a multiplicity of formats, with different transports, API and formats. There are quite a few standards, for example:

It would be interesting to be able to check if all the important event are considered as part of the requirements of the design of an application. This can prevent nasty surprises when analyzing an incident. Using a good model of an information system can make this task relatively easy. Such a model would model an information system using the following elements:

  • Repositories (Credentials): Any temporary or permanent storage of information, including RAM, databases, file systems, and any kind of portable media;
  • Interfaces: Any input/output device, such as screens, printers and fax;
  • Channels: Physical or logical pathways for the flow of messages, including buses, LAN networks, etc. A Network is a dynamic set of channels;
  • Borders define the limits of the system.
  • Services. Any value provider in an information system, including services provided by BIOS, operating systems and applications. A service can collaborate with other services or lower level services to complete a task that provides value, like accessing information from a repository;
  • Sessions. A temporary relationship of trust between services. The establishment of this relationship can require the exchange of Credentials.
  • Messages (Instructions) . Any meaningful information exchanged between two services or a user and an interface.

For a log entry to be complete it should contain at least the following elements:

  • Every event can have an eventID.
  • If the event is not logged by the agent of the event, the "logger" can be identified using a "loggerID".
  • The "agent" of the event can be identified using a "sourceID".
  • The "agent" of the event can stay in different locations, identified using a "addressID".
  • The "credential" used by the source to perform a request can be identified using a "credentialID".
  • The "resource" (subject) of the event is identified using a "resourceID".
  • The "request" (access attempt) performed has a "RequestType" and a "Result". The reason for the "Result" is stated in the "ResultText".
  • The "payload" contains the information necessary to perform the request.
  • "dateTime" is the date and time when the request is performed.
  • "signature" is the digital signature of the event using the "credentialID".
  • "hash" is the digital summary of the event. It is recommended that the hash of the previous event in the Record is used to calculate it.

For example:

<sourceID>proftpd.lab.ossec.net</sourceID><addressID>192.168.20.10</addressID><loggerID>slacker proftpd[25530]</loggerID><Result>success</Result><ResultTextFTP session closed. </ResultText><dateTime>21/5/2007 20:22:14</dateTime>>">>

<sourceID>proftpd.lab.ossec.net</sourceID><addressID<190.48.150.156</addressID><credentialID>abad</credentialID><loggerID> proftpd.lab.ossec.net:21:slacker proftpd[31806]</loggerID><RequestType>login</RequestType><Result>failure</Result><ResultText>no such user found</ResultText><dateTime>21/5/2007 20:21:21</dateTime>

Using this scheme, it is possible to check how complete is a log, by checking:

  • If events need an unique identifier, or even a digital signature or a hash.
  • If there is a need to distinguish the process performing the action from the process logging the event.
  • if there is a need to identify the origin (agent) of the action.
  • If there is a need to identify the logical or physical location of the origin (agent) of the action.
  • If there is a need to identify the credentials used by the origin (agent) of the action.
  • If there is a need to identify the resource that is being accessed by the the origin (agent) of the action.
  • If there is a need to identify nature of the action (RequestType) performed on the resource.
  • If there is a need to identify the result of of the action performed on the resource.
  • If there is a need to identify the date and time of the action performed on the resource.

You can find a list of types of request and results here:

Using this list of type of request can very useful, as the RequestType indicates what is the type of resource being accessed, making it easier to read the log.

How do you check if your log designs are complete and contain all the information you might ever need?

Pages