• No results found

Using the SEI CERT Secure Coding Standard to Reduce Vulnerabilities

N/A
N/A
Protected

Academic year: 2021

Share "Using the SEI CERT Secure Coding Standard to Reduce Vulnerabilities"

Copied!
83
0
0

Loading.... (view fulltext now)

Full text

(1)

Linköpings universitet SE–581 83 Linköping

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Datateknik

2021 | LIU-IDA/LITH-EX-A--21/018--SE

Using the SEI CERT Secure

Cod-ing Standard to Reduce

Vulnera-bilities

Johan Fisch

Carl Haglund

Supervisors : Senyang Huang, Rahul Hiran, Ioannis Avgouleas Examiner : Andrei Gurtov

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet - eller dess framtida ersättare - under 25 år från publicer-ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka ko-pior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervis-ning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säker-heten och tillgängligsäker-heten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsman-nens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet - or its possible replacement - for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down-load, or to print out single copies for his/hers own use and to use it unchanged for non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility.

According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Security is a critical part of every software developed today and it will be even more important going forward when more devices are getting connected to the internet. By striv-ing to improve the quality of the code, in particular the security aspects, there might be a reduction in the number of vulnerabilities and improvements of the software developed. By looking at issues from past problems and studying the code in question to see whether it follows the SEI CERT secure coding standards, it is possible to tell if compliance to this standard would be helpful to reduce future problems. In this thesis an analysis of vulner-abilities, written in C and C++, reported in Common Vulnerabilities and Exposures (CVE), will be done to verify whether applying the SEI CERT secure coding standard will help reduce vulnerabilities. This study also evaluates the SEI CERT rule coverage of three dif-ferent static analysis tools, Rosecheckers, PVS-Studio and CodeChecker by executing them on these vulnerabilities. By using three different metrics, true positive, false negative and the run time. The results of the study are promising since it shows that compliance to the SEI CERT standard does indeed reduce vulnerabilities. Of the analyzed vulnerabilities it was found that about 60% of these could have been avoided, if the standard had been fol-lowed. The results of the tools were of great interest as well, it showed that the tools did not perform as well as the manual analysis, however, all of them found some SEI CERT rule vi-olations in different areas. Conclusively, a combination of manual analysis and these three static analysis tools would have resulted in the highest number of vulnerabilities avoided.

(4)

Acknowledgments

We would like to thank Ericsson and their employees that have been involved in our work. A special thanks goes out to Rahul Hiran, our supervisor at Ericsson. Without his interesting ideas and help throughout the whole process, the results of the thesis would not have been the same. We would also like to thank the developers of the tool CodeChecker at Ericsson, especially Daniel Krupp who took the time to have a meeting with us and explain more about the tool. Appreciation also goes out to Linköping University. We would like to thank our supervisors Senyang Huang and Ioannis Avgouleas as well as our examiner Andrei Gurtov who have assisted us with the thesis writing and provided us with interesting and valuable thoughts about the area.

(5)

Contents

Abstract iii

Acknowledgments iv

Contents v

List of Figures vii

List of Tables ix Listings x 1 Introduction 1 1.1 Motivation . . . 2 1.2 Aim . . . 2 1.3 Research questions . . . 2 1.4 Delimitations . . . 2 2 Theory 4 2.1 Secure software development . . . 4

2.2 CVE . . . 5

2.3 SEI CERT Coding Standard . . . 5

2.4 SEI CERT C Coding Standard . . . 5

2.5 SEI CERT C++ Coding Standard . . . 11

2.6 CVSS . . . 11

2.7 Static Analysis Tools . . . 11

2.8 Programming languages . . . 13

3 Related Work 15 3.1 Secure coding . . . 15

3.2 Benefits of coding standards . . . 16

3.3 Evaluation of static analysis tools . . . 17

3.4 Collection of vulnerabilities . . . 18

4 Method 20 4.1 Approach . . . 20

4.2 Gathering of vulnerabilities in CVE . . . 20

4.3 Analyzing vulnerabilities in CVE . . . 21

4.4 Gathering rule specific CVE vulnerabilities . . . 25

4.5 Analyzing rule specific CVE vulnerabilities . . . 26

4.6 Case studies . . . 26

5 Results 28 5.1 Gathering of vulnerabilities in CVE . . . 28

(6)

5.2 Analyzing vulnerabilities in CVE . . . 28

6 Discussion 41 6.1 Method . . . 41

6.2 Results . . . 44

6.3 The work in a wider context . . . 47

7 Conclusion 48 7.1 How can vulnerabilities be reduced in the early phase of software development? 48 7.2 To what extent does SEI CERT compliance help reduce vulnerabilities? . . . 48

7.3 What tools can help complying with the SEI CERT secure coding standard? . . 49

7.4 Future work . . . 49

Bibliography 50 A Script for gathering EXP34-C CVE vulnerabilities. 54 B Script to gather C++ CVE:s 55 C C CVE:s 56 D C++ CVE:s 58 E Rule Specific CVE:s 60 F Rule Specific figures 63 F.1 ARR30-C . . . 63 F.2 EXP33-C . . . 65 F.3 EXP34-C . . . 66 F.4 FIO47-C . . . 67 F.5 INT30-C . . . 68 F.6 INT32-C . . . 69 F.7 INT33-C . . . 69 F.8 MEM30-C . . . 70 F.9 MEM35-C . . . 71 F.10 STR31-C . . . 72

(7)

List of Figures

2.1 Abstract syntax tree generated for the code in Listing 2.9 . . . 13

4.1 Description of a CVE vulnerability. . . 21

4.2 Example of PVS-Studio output. . . 23

4.3 Example of Rosecheckers output. . . 24

4.4 Rules that Rosecheckers covers for memory management [rose_source]. . . . 24

4.5 Example of CodeChecker HTML output. . . 25

5.1 SEI CERT C Rule vs. No Rule distribution for the 60 CVE:s analyzed. . . 29

5.2 SEI CERT C Rule distribution for the 38 CVE:s where a rule could be found. . . 29

5.3 Risk level distribution for the 16 different rules found during C analysis. . . 30

5.4 Number of SEI CERT C violations found per rule during C analysis. . . 30

5.5 Percentages of violations found per rule during C analysis. . . 31

5.6 Violations found in relation to size during C analysis. . . 31

5.7 SEI CERT C++ Rule vs. No Rule distribution for the 60 CVE:s analyzed. . . 32

5.8 SEI CERT C++ Rule distribution for the 37 CVE:s where a rule could be found. . . 33

5.9 Risk level distribution for the 12 different rules found during C++ analysis. . . 33

5.10 Number of SEI CERT C++ violations found per rule during C++ analysis. . . 34

5.11 Percentages of violations found per rule during C++ analysis. . . 34

5.12 Static analysis tools Run time comparison. . . 35

5.13 PVS & Rosecheckers Run time in relation to project size. . . 35

5.14 PVS & Rosecheckers Run time in relation to number of files. . . 36

5.15 CodeChecker Run time in relation to project size. . . 36

5.16 CodeChecker Run time in relation to number of files. . . 36

5.17 Rule specific violations found per static analysis tool. . . 37

5.18 Rule specific project size in relation to found violations per static analysis tool. . . 38

5.19 Rule specific project run time in relation to size per static analysis tool. . . 39

5.20 Rule specific number of violations found in relation to CVSS per static analysis tool. 39 F.1 ARR30-C Size related to run time. . . 63

F.2 ARR30-C Size related to number of found violations. . . 64

F.3 ARR30-C CVSS related to number of found violations. . . 64

F.4 EXP33-C Size related to run time. . . 65

F.5 EXP33-C Size related to number of found violations. . . 65

F.6 EXP33-C CVSS related to number of found violations. . . 65

F.7 EXP34-C Size related to run time. . . 66

F.8 EXP34-C Size related to number of found violations. . . 66

F.9 EXP34-C CVSS related to number of found violations. . . 66

F.10 FIO47-C Size related to run time. . . 67

F.11 FIO47-C Size related to number of found violations. . . 67

F.12 FIO47-C CVSS related to number of found violations. . . 67

F.13 INT30-C Size related to run time. . . 68

(8)

F.15 INT30-C CVSS related to number of found violations. . . 68

F.16 INT32-C Size related to run time. . . 69

F.17 INT33-C Size related to run time. . . 69

F.18 INT33-C Size related to number of found violations. . . 69

F.19 INT33-C CVSS related to number of found violations. . . 70

F.20 MEM30-C Size related to run time. . . 70

F.21 MEM30-C Size related to number of found violations. . . 70

F.22 MEM30-C CVSS related to number of found violations. . . 71

F.23 MEM35-C Size related to run time. . . 71

F.24 MEM35-C Size related to number of found violations. . . 71

F.25 MEM35-C CVSS related to number of found violations. . . 72

F.26 STR31-C Size related to run time. . . 72

F.27 STR31-C Size related to number of found violations. . . 72

(9)

List of Tables

2.1 Likelihood table in a risk assessment. . . 7

2.2 Severity table in a risk assessment. . . 7

2.3 Remediation Cost table in a risk assessment. . . 7

2.4 Possible levels in a risk assessment. . . 8

4.1 Rules tested in tools analysis. . . 26

5.1 Rules tested in C analysis. . . 29

5.2 True positive and False negative for C analysis. . . 32

5.3 Rules tested in C++ analysis. . . 32

5.4 True positive and False negative for C++ analysis. . . 37

5.5 Found violations per tool for each Size range during Rule specific analysis. . . 38

5.6 True positive and False negative for Rule specific analysis. . . 40

C.1 CVE:s tested in C CVE analysis. . . 56

D.1 CVE:s tested in C++ CVE analysis. . . 58

(10)

Listings

2.1 Off-by-One error. . . 8

2.2 Fixed Off-by-One error. . . 8

2.3 Accessing freed memory. . . 9

2.4 No longer accessing freed memory. . . 9

2.5 Format string bug. . . 9

2.6 No longer contains a Format string bug. . . 10

2.7 Integer overflow. . . 10

2.8 Fixed Integer Overflow. . . 10

2.9 Abstract syntax tree example code. . . 13

4.1 Python script for extracting C vulnerabilities. . . 21

4.2 Commands for PVS-Studio analysis. . . 22

4.3 Python script for adding PVS-Studio student license comment. . . 23

4.4 Command for getting the docker container. . . 23

4.5 Example command for running Rosecheckers analysis on the rtp.c file in the Janus-gateway project. . . 23

4.6 Command for setting up and running CodeChecker . . . 24

4.7 Git diff for CVE-2020-14033. . . 27

4.8 Git diff for CVE-2018-9304. . . 27

4.9 Line with problematic code for CVE-2019-9113. . . 27

A.1 Python script for extracting EXP34-C CVE vulnerabilities. . . 54

(11)

1

Introduction

In a society that is constantly moving forward and where the amount of connected devices is increasing each day, the danger of cyber attacks is rising and the need for a defense is more important than ever [15]. This potential danger is a significant issue, in both private and public sectors, where involved parties need to consider different security aspects such as which systems to defend, the expected frequency of cyber attacks and what type of security to invest in. As Ijaz Ahmad et al. explain in their article "Security for 5G and Beyond" [1], the rise of 5G and the massive growth of connected devices that come with it have also opened up for more security threats. The emerging 6G standard, which will be introduced in about 10 years and change the society we live in, also comes with challenging security threats that needs to be taken care of [36].

One way to address the issue of software vulnerabilities and as such cyber attacks is to in-troduce secure coding standards into the software development and maintenance of the code [44]. Mark Grover et al. [25] conclude in their study that cyber attacks will increase over time and that introducing secure coding is an effective countermeasure. By applying these types of standards, the programmers will be encouraged to follow a collection of guidelines that are established to make the software more secure. There are a great number of different stan-dards and guidelines that can be followed depending on the type of programming language used. For example, the SEI CERT Secure Coding Standard [27] and MISRA C [6].

Static analysis tools are one way to identify the existing vulnerabilities within the code. These tools can also help with complying to a specific coding standard, since the purpose of the tools is to give warnings where the code is non-compliant with the standard. There are multiple studies that have been conducted showing that static analysis tools result in both false and correct warnings, called false positive and true positive. Studies like Jiang Zheng et al.’s [54], also shows that tools are a good way to observe what mistakes occur most often. They found that "possible use of NULL pointer" stood for 45.92% of all the vulnerabilities.

While it can be discussed whether one static analysis tool is better than the other in regards to violations found, it is equally important to include the run time of the tool as well as the amount of projects the tool actually can be run on. Therefore, an evaluation of different tools is of great interest when deciding on what tool to use in a project.

(12)

1.1. Motivation

1.1

Motivation

Different coding standards have for a long time been suggested to increase the security, reli-ability and overall quality [27, 13]. Currently there is not much empirical evidence backing these statements, and studies have even shown that conformance to all rules in a specific standard may result in more introduced faults [11]. This is beyond doubt an interesting area and more effort needs to be put into these types of questions. Therefore, the motivation and main focus of this thesis seeks to establish how compliance to different rules of SEI CERT Coding Standards can be used to reduce the amount of vulnerabilities and as such improve the quality of the code.

1.1.1

Ericsson

This thesis is conducted at Ericsson, which is a leading company of Information and Com-munication Technology. Ericsson works with technology ranging from networks and digital services, to managed services and emerging business. This leads to Ericsson having to deal with thousands of lines of code every day. As a part of this, Ericsson needs to handle all vul-nerabilities related to the code. A recurring part of Ericsson’s development process is to write trouble reports (TRs) when problems occur. Since this is taking a substantial amount of time, this is something Ericsson wants to minimize. By investigating whether using the SEI CERT standard is beneficial, Ericsson hopes to reduce the vulnerabilities, meaning the workload in regards to TRs. Ericsson could further improve the quality of their products by writing more secure code.

1.2

Aim

This thesis aims to test if compliance to the SEI CERT secure coding standard can help reduce vulnerabilities. This can be achieved by analyzing vulnerabilities, both manually and with static analysis tools, reported in Common Vulnerabilities and Exposures (CVE) [17], a public database where a significant number of vulnerabilities from different software projects are reported. A secondary aim of this thesis is to evaluate different static analysis tools in regards to SEI CERT coverage and performance.

1.3

Research questions

To achieve the aim of the thesis, the following research questions will be answered: RQ1. How can vulnerabilities be reduced in the early phase of software development? RQ2. To what extent does SEI CERT compliance help reduce vulnerabilities?

RQ3. What static analysis tools can help complying with the SEI CERT secure coding stan-dard?

1.4

Delimitations

When analyzing what tools can help complying with SEI CERT, the time it takes to run the tools on the projects is an important factor and it was decided to limit this by not including projects that are very large in size, such as the Linux kernel. The tools analyzed were limited to three different static analysis tools, mainly because the selected ones were free to use and

(13)

1.4. Delimitations

Another delimitation is that this thesis only looks at the C and C++ SEI CERT coding standards and not the Android, Java or Perl standards. This is primarily because it was requested by Ericsson, but also due to the limited time for the thesis to be completed.

(14)

2

Theory

In this chapter, theory about Secure software development, CVE, and the SEI CERT Coding Standard will be given. After that, CVSS will be briefly presented as well as the static analysis tools used.

2.1

Secure software development

In this subsection, secure software development (also referred to as secure coding inter-changeably throughout this thesis) and how it can be applied to a project will be presented. The reason secure coding, at this moment, is a prevailing topic and more important than ever before to devote to, is because of the ever-present threat of cyber attacks. As more and more systems are getting connected to the internet people need to consider the risks and if protect-ing the system is cost-effective [10]. As Pawani Porambage et al. talk about in the article "The Quest for Privacy in the Internet of Things" [37], the big increase of the Internet of Things (IoT) is a reason to why the developers need to take the privacy of their users into consider-ation and why they need to develop secure and trustworthy software that protects the users in any case.

Secure software development is what developers need to adhere to, to make sure that the software is protected from vulnerabilities. By neglecting this, it could result in loss of important data, denial of services, company secrets getting leaked or damage to the system.

The software development life cycle (SDLC) is a model that includes different processes of the life cycle of the development of a software project [8]. This model consists of six steps, the requirement phase, the architecture and design phase, the implementation phase, the testing phase, the deployment phase and the maintenance phase. Secure coding is not only applied in the implementation phase, instead this is something that should be considered throughout the whole life cycle. Security requirements should start being established as early as in the requirement phase. In the architecture and design phase, risk analysis should be exercised. The implementation phase might contain risk-based security testing and static analysis, and deployment and verification phase could for example include risk analysis and penetration testing.

(15)

2.2. CVE

e.g. the SEI CERT Secure Coding Standard, or adhere to a special SDL method. The next section will introduce the CVE database and describe some of its components.

2.2

CVE

The Common Vulnerabilities and Exposures, or CVE, is a program and a database that was launched in 1999 [17]. CVE aims to gather known vulnerabilities from different software projects. CVE consists of countless CVE records, these records contain, amongst other things, a CVE ID number, a description, and a section of references. The CVE records also have the vulnerability analysis and the description that often includes references to the source of the vulnerability. The reference section usually contains links, for example, a GitHub repository link or a report of the vulnerability on the developers product website. Because of the high usability of CVE, CVE has become the industry standard for vulnerability reports [18].

There are a few other databases, such as the IBM X-Force Exchange [26] and SecurityFo-cus [45], that also collect and show vulnerabilities. The IBM X-Force Exchange usually has a link to a CVE record in the vulnerability record, if there is one for the vulnerability. Vulner-ability records in the IBM X-Force Exchange have tags, making the searching for particular vulnerability types quite easy.

2.3

SEI CERT Coding Standard

The SEI CERT Secure Coding Standard includes five different components where each of them consists of guidelines about secure coding in that specific programming language or area:

1. SEI CERT C Coding Standard 2. SEI CERT C++ Coding Standard

3. SEI CERT Oracle Coding Standard for Java 4. Android™ Secure Coding Standard 5. SEI CERT Perl Coding Standard

In particular, CERT stands for Computer Emergency Response Team and is a program that is part of the Software Engineering Institute (SEI) at Carnegie Mellon University [14, p. xxvii]. Originally the CERT program was created to help teams of experts communicate during security emergencies, however this is no longer the sole purpose of CERT. They now produce analysis in different security areas as well as provide standards for secure coding practices.

Next, theory about the SEI CERT C standard will be introduced. All the sections explained for the C standard can be applied for the SEI CERT C++ standard and most of the SEI CERT C rules are included in the C++ standard as well. Therefore the C++ section will be shorter, as otherwise there would be a lot of repetition.

2.4

SEI CERT C Coding Standard

The CERT C Secure Coding Standard, referred to as CERT C standard at times, is developed by SEI and the goal of the standard is to make it easier to develop safe, reliable, and secure systems [27]. Compliance to the CERT C standard will make sure that the system is more secure and reliable on a code level, but this is not always enough since there might exist critical design flaws in the system design, which is not something that SEI CERT directly addresses. In systems where safety is of utmost importance, the requirements are usually stricter than that of the CERT C standard.

(16)

2.4. SEI CERT C Coding Standard

The different rules in the SEI CERT C Secure Coding Standard consist of a few parts. A title to shortly describe the rule and a description that is a bit more specific and explains the requirements of the rule. There are also code examples, both non-compliant and compliant examples. The guidelines also consist of recommendations to help guide the programmers to a more secure and reliable code. These recommendations do not need to be followed in the same way that a rule does and a violation of a recommendation does not automatically mean that the code is insecure or bad. To check for compliance to the SEI CERT C coding standard it is most efficient to have a static analysis tool setup, explained in Section 2.7, but it can also be done manually. However, manual analysis takes a lot more time than automated tools analysis.

The SEI CERT C Secure Coding Standard is meant to make project members change the way they think about secure coding in software development. By adhering to this standard the team can create the highest form of value, and also gain knowledge that will be useful for a long time in future work.

2.4.1

Scope of SEI CERT C

As for now, the SEI CERT C Secure Coding Standard focuses mainly on version C11 (ISO/IEC 9899:2011), but it can also be practiced in previous versions. The difference between the versions may lead to ambiguities, therefore it is important when following the standard, to look for notations about how the standard would affect a specific version.

Some of the issues that are not addressed in the CERT C standard are the coding style and rules that are seen as controversial. The reason for this is that coding style is usually subjec-tive and it is extremely difficult to create a style guide that is in agreement with everyone. Therefore coding style is skipped completely in the CERT C standard. For a similar reason, controversial rules are skipped. Since there is no broader consensus on these controversial rules CERT have decided to not include these at all.

2.4.2

Validation

Compliance with the SEI CERT C Secure Coding Standard can be checked with different static analysis tools. The reason to use these is because of the increased complexity of a program with thousands lines of code. The static analysis tools can not be applied to enforce all of the guidelines, since some of the rules are only meant to be descriptive, for example, "MSC41-C. Never hard code sensitive information".

A static tool will in most cases not be able to tell whether a program is following a specific guideline or a set of rules. The reason for that is because it is computationally infeasible to tell if a program is pursuing a specific rule or recommendation. When deciding which static analysis tool to use there are certain aspects that should be taken into consideration. One of these aspects is the completeness, which means that no false positives are reported, and soundness, which means that no false negatives are reported. Where a false negative means that there is a vulnerability in the program but the static analysis tool does not report it, and a false positive means that the tool reports that there is a vulnerability in the code when in reality there is not [20]. The false negative is usually the more serious of these two since it leaves the users of the tool with the illusion that there are no vulnerabilities in the program. When deciding on what analyzer to use, it is important to choose one that is both sound and complete according to the specific guidelines or set of rules that are of importance.

2.4.3

Rules and recommendations

(17)

rec-2.4. SEI CERT C Coding Standard

Table 2.1: Likelihood table in a risk assessment.

Value Category

1 Unlikely

2 Probable

3 Likely

The purpose of the rules is to guide the developers throughout the development process. The rules can be seen as requirements that the coders need to follow to comply with the stan-dard. A failure to comply with a specific rule should result in a defect in the code which could lead to an exploitable vulnerability in the program. By making sure every rule is adhered to, the program is considered to be more reliable, secure and safe. A recommendation is not treated as a rule, this means that if a specific recommendation is not followed there will not pop up a vulnerability as a result of this. Instead a recommendation could be seen as a way to help the developers navigate through the development process to make the final product more stable and to improve the safety and security. Thus it can be seen that together, rules and recommendations are referred to as guidelines for the developers through the developing process.

Each rule has a risk assessment where it is given a level depending on how likely, severe and the ease of fixing it. These levels are L1, L2, and L3. The likelihood of a rule leading to a vulnerability that an attacker can exploit is measured in three categories i.e., unlikely, probable, likely, each of these are given a value 1-3 (see Table 2.1). The severity is a measure of the possible consequences of a vulnerability that occurred due to a rule violation. Each example of this is categorized as either low, medium or high severity. These are also explained with an example, which can be seen in Table 2.2. The third metric that is used in the risk assessment is the remediation cost. This is the estimated cost for the developers to change a program that is violating a rule to make it comply to the standard. The remediation cost is categorized in the same way as the severity, but this time each category is also given a detection and correction class, which can be either automatic or manual, as seen in Table 2.3.

Table 2.2: Severity table in a risk assessment.

Value Category Examples

1 Low DoS attack

2 Medium Data breach

3 High Buffer overflow

Table 2.3: Remediation Cost table in a risk assessment.

Value Category Detection Correction

1 High Manual Manual

2 Medium Auto Manual

3 Low Auto Auto

When all these three metrics have been evaluated it is possible to give each rule an accu-rate level (L1, L2, L3). This is done by multiplying the value of the likelihood, severity and remediation cost. If the product of the multiplied values is in the range 1 to 4, it is given the level 3 (L3, low severity, probable, medium remediation cost), if it is in the range of 6 to 9 it is given the level 2 (L2, low severity, likely, low remediation cost) and the range 12 to 27 is given the level 1 (L1, high severity, likely, medium remediation cost), as seen in Figure 2.4.

(18)

2.4. SEI CERT C Coding Standard

Table 2.4: Possible levels in a risk assessment.

Levels Priorities Clarification

L1 12, 18, 27 High severity, likely, medium

remediation cost

L2 6, 8, 9 Low severity, likely, high

re-mediation cost

L3 1, 2, 3, 4 Low severity, probable,

medium remediation cost

2.4.4

Code Examples for Rules

As mentioned in the former section, the SEI CERT C Coding Standard includes 99 differ-ent rules. These rules are categorized in differdiffer-ent areas, such as Preprocessor, Declarations and Initialization, Integers, Characters and Strings. Next, examples from these areas will be presented.

Characters and Strings

An example of a rule in this area is "STR31-C. Guarantee that storage for strings has sufficient space for character data and the null terminator" [27, p. 230], which is an L1 rule on the risk assessment scale. This rule is meant to hinder users from causing a buffer overflow by overwriting a buffer with data. An Off-by-One Error is a typical case of this, which can be seen in Listing 2.1 , this may occur if the null terminator is not taken into consideration when writing the looping condition.

1 void copy_to_buffer(char *input) {

2 char buf[20]; 3 if(strlen(input) > 20) { 4 exit(1); 5 } 6 strcpy(buf, input); 7 }

Listing 2.1: Off-by-One error.

In this case the Off-by-One error could be exploited since the strlen() function does not take the null terminator into account when checking the length of the input. This means that the if-statement will be passed if the input is 20 characters long (not counting the null terminator) and the call to strcpy() will write outside of the buffer. To comply with the rule and avoid this vulnerability, the function could be rewritten as in Listing 2.2, where the null-terminator is taken into consideration by the sizeof() function call in the if-statement.

1 void copy_to_buffer(char *input) {

2 char buf[20]; 3 if(sizeof(input) > 20) { 4 exit(1); 5 } 6 strcpy(buf, input); 7 }

(19)

2.4. SEI CERT C Coding Standard

Memory Management

One of the rules this area contains is the L1 rule "MEM30-C. Do not access freed memory" [27, p. 256]. An example of this is when accessing a dangling pointer, which can be described as a pointer that used to point to data located in memory, but now points to nothing more than the memory address since that data has been deleted and no longer exists. This behavior can be seen in Listing 2.3 and may result in undefined behavior.

1 void use_after_free(char *msg) {

2 char* ptr = (char*) malloc(sizeof(char));

3 ptr = "abc";

4 if(msg != NULL) {

5 free(ptr);

6 }

7 printf("error on: %s\n", ptr);

8 }

Listing 2.3: Accessing freed memory.

To avoid this happening it is important that the pointer is not being dereferenced after it has been freed, as in Listing 2.4.

1 void use_after_free(char *msg) {

2 char* ptr = (char*) malloc(sizeof(char));

3 ptr = "abc";

4 printf("error on: %s\n", ptr);

5 if(msg != NULL) {

6 free(ptr);

7 }

8 }

Listing 2.4: No longer accessing freed memory.

Input/Output

From the Input/Output area there are rules such as "FIO30-C. Exclude user input from format strings" [27, p. 281], which is also an L1 rule. This type of rule exists to prevent attackers from having the opportunity to directly control the contents of a string format. When an attacker controls the string format, it may be possible to view contents on the stack/memory or even to execute arbitrary code through the vulnerable software. A non-compliant code example can be seen in Listing 2.5, where a user were to give his/her password to be able to log in to a service.

1 void check_user_password(const char *psw) {

2 if(psw == NULL) {

3 no_psw_input();

4 } else if(...)

5 {...}

6 else{

7 printf("Wrong password, you wrote: ");

8 printf(psw);

9 }

10 }

Listing 2.5: Format string bug.

To fix this type of bug it is important to not allow the user to have any control over the format of the string. One way to do it is shown in Listing 2.6, where the printf() call now uses the "%s" format specifier to insert the user input into the string.

(20)

2.4. SEI CERT C Coding Standard

1 void check_user_password(const char *psw) {

2 if(psw == NULL) {

3 no_psw_input();

4 } else if(...)

5 {...}

6 else{

7 printf("Wrong password, you wrote: %s \n", psw);

8 }

9 }

Listing 2.6: No longer contains a Format string bug.

Integers

One of the rules which belongs to the Integer area is "INT32-C. Ensure that operations on signed integers do not result in overflow" [27, p. 147]. This is an L2 rule that has a high sever-ity, is likely, and has a high remediation cost, therefore making it L2 instead of L1. This rule ensures that signed integer overflows are handled, meaning that a program will not allow a signed int to go outside of its defined range, which would result in undefined behavior and often means the integer turning negative. This may result in an attacker bypassing weaker if-statements meant to prevent these overflows, as seen in Listing 2.7, where an attacker could exploit the code by giving "size_a" and "size_b" INT_MAX values. This could result in the variable "size" becoming negative, which would bypass the if-statement and allow the at-tacker to overflow the buffer.

1 void func(signed int *size_a, signed int *size_b) {

2 char buf[1024];

3 signed int size = size_a + size_b;

4 if(size > 1024) {

5 printf("size of c too big for buffer");

6 return ERROR_CODE;

7 }

8 printf("size will fit buffer");

9 return OK;

10 }

Listing 2.7: Integer overflow.

To avoid this you need to check the integers size_a and size_b as in Listing 2.8.

1 void func(signed int *size_a, signed int *size_b) {

2 char buf[1024];

3 signed int size = size_a + size_b;

4 if(((size_b > 0) && (size_a > (INT_MAX - size_b))) ||

5 ((size_b < 0) && (size_a < (INT_MIN - size_b)))) {

6 exit(0);

7 }

8 if(size < 0 && size > 1024) {

9 printf("size too big (or negative) for buffer");

10 return ERROR_CODE;

11 }

12 printf("size will fit buffer");

13 return OK;

14 }

(21)

2.5. SEI CERT C++ Coding Standard

2.5

SEI CERT C++ Coding Standard

As for the C version of this standard, the CERT C++ Secure Coding Standard [7] is developed by the same institute, SEI. The purpose of this standard is also the same, to develop safe, reliable, and secure systems, but this time for systems written in the programming language C++. The CERT C++ standard references the C standard for some parts, for example some of the rules are included in the C standard also apply in the C++ standard. The standard could also be used by software customers when defining important requirements for the software.

2.5.1

Scope of SEI CERT C++

The scope of the SEI CERT C++ Coding Standard mainly focuses on the C++ version C++14 (ISO/IEC 14882 standard), but can be applied for earlier released versions as well. As for the C standard, the guideline of the C++ standard consists of rules and recommendation where each rule and recommendation will have a compliant and non-compliant code example that conforms with the C++14 guidelines. The issues not addressed in the C++ standard are the same as the ones not addressed for the C standard.

2.5.2

Validation

The validation of the SEI CERT C++ Coding Standard is the same as for the C standard in section 2.4.2, meaning, compliance to the C++ standard could be checked in the same way.

2.5.3

Rules and recommendations

In the CERT C++ standard, the rules and recommendations are defined and function in the same way as for the CERT C standard, described in section 2.4.3. However, this time SEI decided that the recommendations should not be included until additional research and de-velopment have been done into this area. Two new main rule areas have been added in the C++ standard, these are Object Oriented Programming (OOP) and Containers (CTR). How-ever, containers is very similar to the Array area in the C standard and even includes some of the rules from the Array area, but it is expanded since C++ has multiple different container types.

2.6

CVSS

Each CVE record is given a CVSS base score by the National Vulnerability Database (NVD), which can be described as a way to classify the all around severity of the specific vulnerability [46]. The CVSS base score ranges from 0 to 10, where 10 is the most severe. Depending on the CVSS base score, the vulnerability is also given a severity rank of "None", "Low", "Medium", "High", or "Critical" if the base score is between 0, 0.1-3.9, 4.0-6.9, 7.0-8.9 and 9.0-10.0, respectively.

2.7

Static Analysis Tools

A static analysis tool is a tool that aims to identify as many coding problems as possible during development and testing [16]. Software consists of source code that has to be run, static analysis tools are able to examine this code statically without needing to execute the software. However, all vulnerabilities are not possible to find with static analysis, which means that only static analysis is not enough. One form of static analysis is Manual auditing, this is done by letting people go through the code, line by line. This sort of analysis is known to be slow and slightly problematic since the people analyzing the code needs to know a lot about security vulnerabilities by hand, which leads to many vulnerabilities being missed.

(22)

2.7. Static Analysis Tools

By using a tool instead of manual auditing the analysing process will be more precise and consistent since it will, for each analysis, base it on the set of rules it was programmed to use. Although, this set of rules is not complete or perfect in any way and a static analysis tool should not be trusted completely. When using static analysis tools, one thing to keep in mind is that these tools may report a large number of false negatives or false positives [19, 38], meaning that the software consists of vulnerabilities that the tool does not report or that the tool reports vulnerabilities that really is not a problem in the software, respectively. Static analysis tools will probably not uncover all of the vulnerabilities and bugs in software, this is something developers need to keep in mind when using these types of tools [2]. A good way to use the tools could be as assistance during manual coding reviews.

A technique that static analysis tools can apply is pattern matching [3]. This method scans the code for predefined and potentially dangerous patterns. This could for example be unsafe library functions like gets() and sprintf(). Another approach is the Data-flow analysis method [3], by going through all possible paths this method gathers the information needed to be able to tell whether a set of values or a chosen path is dangerous. This could be if a program variable is used in a way that might be dangerous, e.g. in an unsafe library function. Symbolic execution [28] is another technique that can be used, this method will use symbolic values instead of running the program with actual inputs. When the symbolic values are used, the interpreter gets constraint expressions based on the symbolic values which contain every possible outcome.

2.7.1

CodeChecker

CodeChecker [22] is a static analysis tool developed by Ericsson. It uses the analyzers Clang-Tidy and Clang Static Analyzer with the possibility of using Cross-Translation Unit analysis (CTU), which makes it possible to analyze functions that communicate between multiple files. The Clang Static Analyzer uses techniques based on symbolic execution and path-sensitive inter-procedural analysis [51], which is a type of data-flow analysis. Clang-Tidy is more of a linter-type tool that focuses on finding simpler errors related to style and syntax [52]. There is also the possibility for statistical analysis when there are checkers available for use. Results can be visualized in both the terminal and in static HTML files. CodeChecker also allows for web based report storage, which means that the performed analysis can be visualized in a web browser where it is easier to go through the report of the code.

2.7.2

PVS-Studio

PVS-Studio [31], which will be referred to as PVS at times, is a static analysis tool developed to work with programming languages written in either C, C++, C# or Java and it can be run on most operating systems today, such as Windows, Linux and macOS. PVS-Studio applies techniques such as pattern-matching, symbolic execution and data-flow analysis. When PVS-Studio analyzes the source code, it will print out error codes that correspond to some type of rule, for example: V501, V517, V522. These error codes have a short description of what they are and a more in-depth explanation is available on the PVS-Studio website. There is also a classification table of these error codes and warnings available on the PVS-Studio website, where you can look up which SEI CERT rule the error code corresponds to.

PVS-Studio costs money for commercial use, but for students there is a free license avail-able [30]. There is a small catch to the free license however, in that each file that is analyzed needs to start with a specific set of lines stating that it is for an academic project. This free version does not have the full set of features that the paid version has, but the analysis is not hindered; only some of the customization commands are restricted.

(23)

2.8. Programming languages

2.7.3

Rosecheckers

Another static analysis tool is Rosecheckers, which is a tool developed by the Software En-gineering Institute at Carnegie Mellon University that performs static analysis on software written in the C and C++ programming languages [50]. The tool is made to check for com-pliance to the SEI CERT Secure Coding Standard. By reading the source code and generating an Abstract Syntax Tree (AST), Rosecheckers is able to create a graph of the analyzed code [40]. The AST is then traversed to be able to check for compliance to SEI CERT. An example of how an AST may look like can be seen in Figure 2.1, which was generated from the code shown in Listing 2.9.

1 while x > y:

2 x -= 1

3 return x

Listing 2.9: Abstract syntax tree example code.

Figure 2.1: Abstract syntax tree generated for the code in Listing 2.9

2.8

Programming languages

Below we will give a short introduction to the programming languages that were used in this thesis.

2.8.1

C

The C programming language [42] [41] is on of the most popular programming languages. It was created in 1972 by Dennis Ritchie when he worked at Bell Labs. C is a low-level program-ming language that was designed to make it easier for the developers to access the memory. To be able to execute C code, it will first have to be compiled. It is a cross-platform language and can be run on multiple different operating systems like Windows, macOS and different Unix variants. A programming paradigm that C uses is the structured programming, un-like many newer languages un-like Python and JavaScript, C does not include object oriented programming or garbage collections.

(24)

2.8. Programming languages

2.8.2

C++

The C++ programming language [47] [48] started as an extension of the C programming language. It was developed by Bjarne Stroustrup in the 1980s. In comparison to C, C++ makes it possible for the developers to use object oriented programming and to implementing classes. C++ was meant to offer the same efficiency and flexibility as C but to also include support for high-level programming. To be able to run C++ code, just like for C code, it will first have to be compiled. The platforms that are supported for C++ are, same as for C, Windows, macOS and different Unix distributions.

2.8.3

Python

Guido van Rossum, the creator of Python [39], started working on an implementation of Python in the 1980s, but the first release did not happen until 1991. Python is an interpreted, high-level, and object oriented programming language that supports dynamic typing, func-tional programming and garbage collection. The Python language is a programming lan-guage that is open source, meaning everyone can contribute with improvements. As for the other programming languages described in previous sections, Python is cross-platform and can be run on most operating systems.

(25)

3

Related Work

This section aims to introduce previous studies that have been conducted within the area. The first section will present secure coding. The second section will introduce the benefits of coding standards and what they bring to the table. The third section will present earlier studies that have been conducted to evaluate different static analysis tools. The fourth section will demonstrate a way to collect vulnerabilities.

3.1

Secure coding

It is often questioned by management whether it is worth the time, money, and effort to im-plement a more strict and secure coding standard when developing information systems. This problem is also mentioned in the article “Moving Beyond Coding: Why Secure Coding Should be Implemented” by Mark Grover et al. [25]. In the article, where they review this problem, Grover et al. give examples of major data breaches that are caused by poor secu-rity. As mentioned in the review, what is even worse is that in the future more of these types of attacks can be expected since the number of devices that are connected to the internet is increasing drastically. It can be seen that secure coding will be more important than ever before. The article also looks at the definition of secure coding and by combining different definitions they end up defining secure coding as: “The practice of writing code that is resis-tant to attacks.”. They claim that one of the main reasons for software not being developed in a secure way is that developers are usually under time pressure and the main focus is to develop something that works. This means that the security is often an afterthought and not something taken into consideration from the start. This article shows that the topic of this thesis is relevant and that there is a lot of evidence to suggest that secure coding may be helpful for organizations that develop different kinds of systems.

Juan F. García et al. [24] depict another interesting point of view regarding secure cod-ing standards. Unlike this thesis, which will try to answer if secure codcod-ing standards can reduce the number of vulnerabilities in code, García et al.’s “C Secure Coding Standards Per-formance: CMU SEI CERT vs MISRA” tries to answer whether secure coding standards affect the performance, more specifically the run time, of a program. García et al. also compared the two different secure coding standards, SEI CERT C with MISRA C. To accomplish this they compared the solutions to six different coding problems, where they had three versions of the solutions. One where no standards were followed, one where the SEI CERT C coding

(26)

3.2. Benefits of coding standards

standard was used, and one final where the MISRA C standard was used. They then exe-cuted these different versions and compared the run times. This showed that, in relation to run time, the original with no standards applied was always the fastest, while the MISRA version was usually the same (only slower on 1/6 problems), and the SEI CERT version was slower than the original on half of the problems (3/6 problems). This study only shows the tip of the iceberg since, as they also claim, the analysis has to be performed on large scale software projects. However, the result of García et al.’s study shows that compliance to the SEI CERT C Standard has its downsides and needs to be researched more. This makes this thesis even more relevant as it tries to further investigate the effectiveness of the SEI CERT Secure Coding Standard and how it can be used to reduce vulnerabilities.

In 2008, Cathal Boogerd and Leon Moonen [11] performed an empirical study where they tried to give an answer to whether there is a relation between a MISRA-C:2004 rule violation and a fault in the software. To answer this they used two different methods, one being the ex-amining of violations of the MISRA-C standard and faults over time to check for correlation, and the other method being to study separate violations of the standard closely over a period of time to tell how often they actually lead to faults. They also used two metrics, firstly the number of violations and secondly the faults, both divided by KLOC (each taken on a per version basis). Boogerd and Moonen observed that by complying with MISRA-C:2004 rules completely there may be a rise in the number of faults in the specific software due to the fact that compliance with a rule has the risk of causing new faults, in fact, their study showed that only 12 out of 72 rules that were observed were able to find faults much better than a random predictor that randomly selects a line in the code.

In a follow up work, Boogerd and Moonen [12] tried to measure the quality of two dif-ferent projects written in the C language. In this study they introduced the term violation density, which is a metric where the violations are divided by the LOC. They analyzed 89 different rules of the MISRA-C:2004 standard, and as in the study before, they found that only a small amount (10) of these rules are more fault-prone when there is a higher violation density. In contrast to this thesis, where the method is based on searching for vulnerabilities to see whether they could have been avoided using SEI CERT rules, Boogerd and Moonen’s studies, search for MISRA violations to see whether they are faults and if they lead to vulner-abilities. That said, the results of their studies are well worth taking into consideration when answering RQ2, especially when contemplating the risk of not following a rule in contrast to the risk of introducing new faults when complying to it.

3.2

Benefits of coding standards

The tech world always strives to be on the forefront of new and revolutionizing ideas. Many of the new technologies and products spawned from these ideas are connected to the internet and tend to handle sensitive information, and as such the need for good software security is greater than ever before. This is commonly known in the software industry and has also been brought up by Sr ¯dan Popi´c et al. [35]. In their study they researched whether different coding standards, including secure ones, can be used to improve software quality, maintenance of the code, stability and safety. The projects that the study analyzed were two python projects where the PEP8 [43] style guide was followed. Using the two dimensions, “new or corrected lines during the implementation step” and “the number of fails detected during the coding standard check” the study came to the conclusion that the number of errors dropped over time as the developers acclimate to the PEP8 standard.

As mentioned in the previous paragraph program quality is of utmost importance when developing software. One way to help increase the quality of software is to follow a coding standard, as mentioned by Xuefen Fang [23]. In Fang’s study he investigated whether the

(27)

3.3. Evaluation of static analysis tools

[33] with some additions made by Fang. Fang introduces four different projects where three of them have their developers comply to a standard and one does not follow any particu-lar coding standard. To address if the software quality is increased, Fang looks at the lines of code (LOC) and comment rate in each file. The result of the study shows that the LOC did not change drastically no matter if the project followed a coding standard or not. Although, the projects that followed the coding standard had a higher rate of comments. Fang concludes that a coding standard is an efficient way to boost the quality of the code and specifically the maintainability. Fang only takes the quality of a project into consideration while this thesis focuses on the security aspect, however, it is important to understand the relation between high quality and the security of the software, as Richard Bellairs explains in [9].

3.3

Evaluation of static analysis tools

As explained in section 2.7, static analysis tools will inevitably report false positives as well as miss important vulnerabilities, which can lead to a high number of false negatives. Thu-Trang Nguyen et al. [32] tried to "Enable Precise Check for SEI CERT C Coding Standard" as well as automate the verification process of true and false positives. In their study they car-ried out an experiment where they ran the static analysis tool Rosecheckers on two different large projects to get SEI CERT C warnings. The method they used was to first run Rosecheck-ers to find code in the projects that did not comply with SEI CERT C. They then verified these warnings by running deductive verification, model checking and pattern matching, which all three combined gave a more accurate result. Their method showed that 60% of the Rosecheckers warnings could be verified and that 87% of the verified warnings in the first project and 57% in the second project were true positives, while 13% and 43% were false positives, respectively. In comparison to this study, Nguyen et al. only checked for compli-ance with four different SEI CERT C areas (Declarations and Initialization (DCL), Expressions (EXP), Integers (INT) and Arrays (ARR)), both recommendations and rules, while this study considers all areas, but only rules and no recommendations. This study also does not focus on the amount of false positives as Nguyen et al., but instead the number of false negatives.

In another study written by Andrei Arusoaie et al. [4] they compared twelve different static analysis tools, Frama-C, Clang (alpha), Clang (core), Oclint, "System", Cppcheck, Splint, Facebook Infer, Uno, Flawfinder, Sparse and Flint++. In their study they ran the different tools on the Toyota ITC test suite, which contained 639 test cases. To compare the tools they checked for how many of the violations they found and the run time of each tool. In the study they also reported the amount of false positives that were found each time they ran the tools. To find out whether a tool found a violation they checked if the tool reported an error on the exact line that the violation was said to be located. They manually confirmed this approach and they found some bugs in the test suite and a few imprecision for the tools. Arusoaie et al. showed in the study that the different tools varied in found violations from 1.1% up to 44.13% and that the run time varied from 0.27s up to 50.80s, where Clang core and Clang alpha found 15.34% and 28.17% with the run times 6.42 and 13.29, respectively. This means that 84.66% and 71.83% false negatives were found for Clang core and Clang alpha. In regards to false positives they found that Clang core had a false positive rate of 0.63% and Clang alpha had a false positive rate of 10.33%. Finally, Arusoaie et al. summarize that Clang Static Analyzer (core and alpha combined) offered a good trade-off compared to the other tools in regards to run time vs. violations found. In relation to this thesis the results for Clang are especially interesting, as CodeChecker uses Clang Static Analyzer for the C and C++ analysis.

In an article from 2020 by Lisa Nguyen Quang Do et al. [21] they researched static analysis tools from a user perspective, taking the reasoning for why and how developers use the tools into account. They state that these are important points to consider when presenting requirements for new static analysis tools and improvements to current tools. In the study they came to the conclusion that time was a very important factor for static analysis tools

(28)

3.4. Collection of vulnerabilities

and that it was a deciding factor for how the tools were used. Since the time factor is very important during tools analysis, it is something that will be considered in this study as well.

Jiang Zheng et al. [54] discussed some interesting points in regards to the static analysis tools FlexeLint and Klocwork. In their study from 2006 they addressed the economic viability of static analysis tools, the effectiveness of a tool and what types of vulnerabilities that were most commonly reported by the tools. They approached these questions by analyzing three large projects where they had access to a manifest with reported violations and issues. Zheng et al. ran the different tools on the projects and compared the result with the violations in the manifest. The result of this showed that about 30% of the reported violations in the manifest were found by the tools and that compared to manual inspection, the number of findings did not deviate significantly between them. In regards to the economical viability they found that using static analysis tools in the early phases of development was more advantageous compared to having to fix the violations later on. This result shows that RQ1 is of interest for further research, since companies always strive to decrease the amount of money spent on fixing bugs. The study also showed that the most common vulnerability was "possible use of NULL pointer", which stood for 45.92% of all faults in the analyzed projects. Zheng et al.’s result will be interesting to compare to the result that will be gathered in this study, to see whether or not "null pointer dereference" is still the most common vulnerability 15 years later.

A more recent study that was conducted by Jose D’Abruzzo Pereira and Marco Vieira in 2020 [34], where they, in a similar way, evaluated two static analysis tools, Flawfinder and CppCheck, on the large open source vulnerability data set made by Mozilla. The data set was made up of vulnerabilities from five projects: Mozilla, httpd, glibc, Linux kernel, and Xen Hypervisor. The study came to the conclusion that none of the tools were that great, but CppCheck was much better choice, finding more true negatives, less false positives, less false negatives, and more true positives at 92.8%, 7.2%, 16.5% and 83.5% in comparison to Flawfinder which found 6.8%, 93.2%, 60.8%, and 36.2%, respectively. Jose D’Abruzzo Pereira and Marco Vieira’s results will be interesting to compare with the results of the different static analysis tools and on the different dataset used in this thesis.

In the research paper "Evaluating Static Analysis Defect Warnings On Production Soft-ware" written by Nathaniel Ayewah et al. [5] they conducted a study where they evaluated the performance of the static analysis tool FindBugs, which is a tool that finds bugs in Java programs. In this paper they explained that a possible reason to why static analysis tools in general reports more trivial vulnerabilities than critical vulnerabilities is because the tools can not check if the code is implemented in the way it was meant to be, since the tools do not know the purpose of the code. They continued by explaining that this could be a con-sequence of the tools analysis techniques, which often is based on finding rare and unsafe code patterns. This conclusion might be of interest when analyzing the results of the differ-ent static analysis tools used in this thesis and their performance on the differdiffer-ent languages analyzed.

3.4

Collection of vulnerabilities

James Walden et al. [53] wanted to compare different vulnerability prediction models. To achieve this they created a high quality public data set containing many different kinds of vulnerabilities. The collection of said data set was based on the following requirements:

• The source code must be available • Must be written in PHP

(29)

3.4. Collection of vulnerabilities

Even though the goal of Walden et al.’s study was different from the one in this thesis, the method of collecting applications that contain vulnerabilities can be applied in this thesis as well. This can be done by changing the programming language requirement from PHP to C and C++. Unlike Walden et al.’s study that looked at three different projects with many vulnerabilities, this thesis looks at hundreds of different projects in the CVE database and then selects the latest ones for C analysis. For the C++ analysis, the CVSS is used to rank the collected vulnerabilities and select the most severe ones.

To summarize, these related works show that there is a certain ambiguity about secure coding, and specifically coding standards, since some of the rules may not be worth imple-menting due to the risk of introducing new faults. To the best of the authors’ knowledge, there has not been much empirical studies done on the effectiveness of coding standards in regards to the security, and few on the SEI CERT Standard. This thesis gives a different approach to the selection process of vulnerabilities, analysis, and testing of the performance of the SEI CERT standards. The results give an overview of how the SEI CERT standards perform over a larger number of real world projects, which combined with the results of other studies will give a more comprehensive picture of the real world effectiveness of the SEI CERT Secure Coding Standards.

(30)

4

Method

4.1

Approach

This chapter describes the chosen methods that were used to answer the research questions. The method that was used is structured as following:

1. Gathering CVE C vulnerabilities 2. Gathering CVE C++ vulnerabilities 3. Analyzing CVE vulnerabilities

4. Gathering Rule Specific CVE vulnerabilities 5. Analyzing Rule Specific CVE vulnerabilities

Over the course of the project, data on what tools helped to comply with the SEI CERT secure coding standards were gathered. This data was then analyzed to be able to give an answer to RQ3 in the following way:

1. Examine how many of the analyzed CVE SEI CERT violations that were found by the tools

2. Study the run time of each tool in contrast to the project size (in MB) and number of files when analyzing

4.2

Gathering of vulnerabilities in CVE

The gathering process was split into two parts, one for the C programming language and one for the C++ programming language.

C

(31)

4.3. Analyzing vulnerabilities in CVE

vulnerabilities were selected by searching for ".c" in the description for each vulnerability. Filtering was also done to extract the most recently updated vulnerabilities and only those that had a link to a public GitHub where the source code could be analyzed.

1 import csv

2

3 with open("allitems.csv", encoding="utf-8") as csvfile, open("cve.txt", "w") as out:

4 csv_reader = csv.DictReader(csvfile, delimiter=",")

5 for row in csv_reader:

6 try:

7 if ".c " in row["Description"] and "github" in row["References"]:

8 out.write(row["Name"] + "\n")

9 except UnicodeDecodeError:

10 print(row)

Listing 4.1: Python script for extracting C vulnerabilities.

4.2.1

C++

The gathering of C++ related CVE vulnerabilities was done in a similar way as the one de-scribed for the C language. The script used can be seen in Appendix B. This time the focus was on the vulnerabilities with the highest severity instead of the latest updated CVE vulner-abilities. This was done by filtering the CVE:s (from 2017-2020) based on the vulnerabilities CVSS Base Score, which is a score that ranges from 0 to 10, with 10 being the highest severity. However, this method did not only collect vulnerabilities with a severity score of 10, instead it resulted in most of them being over 7.0.

4.3

Analyzing vulnerabilities in CVE

When a subset of vulnerabilities was gathered the analysis began. Some of the vulnerabilities did not have enough information to tell whether compliance to the SEI CERT standard would have made a noticeable difference to the end result, i.e. preventing the vulnerability. There were also vulnerabilities in projects that could not be compiled or built for some of the static analysis tools for various reasons. Both of these were skipped and were not included in the final analysis result. When a manual review of the vulnerability had been completed, an analysis was done with different static analysis tools. If the manual analysis outcome was that there was no rule violation for the CVE, the tools were not run to save time.

(32)

4.3. Analyzing vulnerabilities in CVE

4.3.1

Manually

The first step of the analysis process was to manually examine the code related to the vul-nerability as well as the description of the CVE. If the vulnerabilities had an issue page on GitHub, where users had done analysis and described the problem, this was also taken into consideration. When an understanding of the reported issue was achieved, the different SEI CERT rules were scanned to find the best fit, if there was one covering the vulnerability. The description of the CVE:s often included the type of vulnerability, this could be used to narrow down the search area of the SEI CERT rules. For example, in Figure 4.1 the description of the CVE states that the vulnerability relates to some limitation of characters in a format argument. From this description, the conclusion that the area can be narrowed down to "Characters and Strings (STR)" or "Input Output (FIO)" can be drawn. Since it was a buffer overflow where there was an unsafe format argument for the fscanf-function, which is part of the two previ-ously mentioned areas. When the area had been decided, each individual rule was taken into consideration to decide whether at least one rule would have prevented the vulnerability. When this process was finished it should be clear whether compliance to a specific SEI CERT rule would have helped or not.

4.3.2

Static analysis tools

To run the different static analysis tools the Ubuntu 20.04 LTS operating system was used. The tools that were used did not cover all of the SEI CERT rules, this had to be looked up on the official tool website [29, 49] to make sure the tool was not executed unnecessarily on a project. For each project, the project size (in MB), git-commit, CVSS score, SEI CERT rule and risk level were recorded. For C++ projects the run time (make time + analysis time) of each tool was also recorded.

Hardware

The hardware that was used to run the static analysis tools were two computers with differ-ent configurations and specs. One of them was a Lenovo T5 26AMR5 with an AMD Ryzen 5 3600 6-Core processor with 16GB of DDR4 RAM. The virtual machine configuration for this machine was to use 2 processor cores and 4GB of RAM. The other was a custom built com-puter with an Intel Core i7 4770k and 16GB of DDR3 RAM, where the virtual machine was configured to use 3 processor cores and 4GB of RAM.

PVS-Studio

One of the static analysis tools that were used was PVS-Studio [31], version 7.11. This was done by first making the project according to the build instructions for the project and tracing the make process by the first command shown in Listing 4.2. When the make had been successfully completed and traced, PVS-Studio analyzed the make result using the command on the second line in Listing 4.2. The third line in Listing 4.2 was used to create a readable HTML report where PVS-Studio reported all violations found. To be able to run PVS-Studio using the student license, it was also needed to add a specific set of lines to the start of each file that was analyzed [30]. Since many projects were analyzed, a python-script was made to make the analysis faster, shown in Listing 4.3.

1 sudo pvs-studio-analyzer trace -- make

2 pvs-studio-analyzer analyze -o pvs.log

3 plog-converter -a GA:1,2 -t fullhtml pvs.log -o ./reports

References

Related documents

Keywords: Carex, clonal plant, graminoid, Arctic, Subarctic, sexual reproduction, vegetative reproduction, climate, genet age, genetic variation, clonal diversity,

46 Konkreta exempel skulle kunna vara främjandeinsatser för affärsänglar/affärsängelnätverk, skapa arenor där aktörer från utbuds- och efterfrågesidan kan mötas eller

The increasing availability of data and attention to services has increased the understanding of the contribution of services to innovation and productivity in

All recipes were tested by about 200 children in a project called the Children's best table where children aged 6-12 years worked with food as a theme to increase knowledge

The EU exports of waste abroad have negative environmental and public health consequences in the countries of destination, while resources for the circular economy.. domestically

Thereafter I ad dress the responses of two contrasting subarctic- alpine plant communities: a rich meadow and a poor heath community, to factorial manipulations of

Vissa äldre dokument med dåligt tryck kan vara svåra att OCR-tolka korrekt vilket medför att den OCR-tolkade texten kan innehålla fel och därför bör man visuellt jämföra

It is demonstrated how genetic material (DNA), receptor ligands, enzyme substrates, and dyes can be introduced into single cells, single cellular processes, as