Implementation and Evaluation of an Emulated Permission System for VS Code Extensions using Abstract Syntax Trees

(1)

Linköpings universitet

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Computer Science and Engineering

2021 | LIU-IDA/LITH-EX-A--21/054--SE

Implementation and Evaluation

of an Emulated Permission Sys‐

tem for VS Code Extensions using

Abstract Syntax Trees

Implementation och Utvärdering av ett Emulerat Be‐

hörighetssystem för Extensions i VS Code med hjälp av Abstrakta

Syntaxträd

David Åström

Supervisor : Rouhollah Mahfouzi Examiner : Ahmed Rezine

(2)

Upphovsrätt

Detta dokument hålls tillgängligt på Internet ‐ eller dess framtida ersättare ‐ under 25 år från publicer‐ ingsdatum under förutsättning att inga extraordinära omständigheter uppstår.

Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Över‐ föring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och till‐ gängligheten finns lösningar av teknisk och administrativ art.

Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet än‐ dras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart.

För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/.

Copyright

The publishers will keep this document online on the Internet ‐ or its possible replacement ‐ for a period of 25 years starting from the date of publication barring exceptional circumstances.

The online availability of the document implies permanent permission for anyone to read, to down‐ load, or to print out single copies for his/hers own use and to use it unchanged for non‐commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional upon the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement.

For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its www home page: http://www.ep.liu.se/.

(3)

Abstract

Permission systems are a common security feature in browser extensions and mobile ap-plications to limit their access to resources outside their own process. IDEs such as Visual Studio Code, however, have no such features implemented, and therefore leave extensions with full user permissions. This thesis explores how VS Code extensions access exter-nal resources and presents a proof-of-concept tool that emulates a permission system for extensions. This is done through static analysis of extension source code using abstract syntax trees, scanning for usage of Extension API methods and Node.js dependencies. The tool is evaluated and used on 56 popular VS Code extensions to evaluate what re-sources are most prevalently accessed and how. The study concludes that most extensions use minimal APIs, but often rely on Node.js libraries rather than the API for external functionality. This leads to the conclusion the inclusion of Node.js dependencies and npm packages is the largest hurdle to implementing a permission system for VS Code.

(4)

Acknowledgments

First of all, I would like to thank Cybercom Stockholm for the opportunity to conduct my thesis at their Innovation Zone and their constant support throughout the project. I would especially like to thank my supervisor Magnus Kraft as well as the other thesis students at the company for all the help and moral support during the distance work situation. I also want to thank my university supervisor Rouhollah Mahfouzi and examiner Ahmed Rezine for their feedback and aid with the project.

(5)

2.4 TypeScript . . . 7 2.5 Static Analysis . . . 8 2.6 Related work . . . 8 3 Method 12 3.1 Pre-study . . . 12 3.2 Implementation . . . 13 3.3 Evaluation . . . 21 4 Results 23 4.1 Pre-study . . . 23 4.2 Implementation . . . 23 4.3 Evaluation . . . 24 5 Discussion 27 5.1 Results . . . 27 5.2 Method . . . 29 5.3 Source criticism . . . 31

5.4 The work in a wider context . . . 31

6 Conclusion 33 6.1 Future work . . . 34

(6)

Bibliography 35

A Tested Extensions 39

B Non-namespace permission messages 41

C Testing results - Permissions 43

(7)

List of Figures

3.1 Overview of the analyser architecture . . . 13 3.2 AST generated from the line let variable = object.function(prop1, prop2) 14 3.3 Visualisation of the traversal algorithm . . . 15 3.4 Visualisation of the visitor pattern . . . 15

(8)

List of Tables

2.1 VS Code API namespaces, with description if applicable, as cited from the oﬀicial

documentation. . . 7

3.1 Permission messages related to each namespace. . . 20

4.1 Permission messages detected in Bracket Pair Colorizer 2. . . 24

4.2 Permission messages detected in Path Intellisense. . . 25

4.3 Permission messages detected in Live Server. . . 25

4.4 Precision and Recall Metrics. . . 25

4.5 Commonly occurring and interesting permissions found during testing . . . 26

(9)

Listings

3.1 Tree traversal algorithm . . . 14

3.2 Default import . . . 16

3.3 Namespace import . . . 16

3.4 Named import . . . 16

3.5 Combined type import . . . 16

3.6 Require calls . . . 17

3.7 Variable declaration and property assignment . . . 17

3.8 Variable assignment binary expression . . . 17

3.9 Function declaration . . . 17

3.10 Method call expression . . . 17

3.11 Example of the ImportReference JSON structure. . . 18

3.12 Example of the usedProps JSON structure . . . 18

3.13 Example of the output data JSON structure . . . 20

3.14 Bash command to extract installed extensions . . . 22

4.1 Bash command to run the analysis tool . . . 23

4.2 Code line triggering Interact with the editor environment . . . . 24

4.3 Code line triggering Access all visible extensions . . . . 24

(10)

1 Introduction

1.1 Motivation

Many popular Integrated Development Environments (IDEs) and text editors use extensions or plugins as a way for the user to extend functionality and tune the system to their liking. This offers customisation to the user but could also pose risks. Extensions are external software, often open-source, which are installed onto a system and given rights to control certain parts of the application and/or system. It is therefore very possible that they could introduce new security vulnerabilities.

Most web browsers implement similar systems for extensions and research has been done on the security of browser extensions. Results have shown that extensions have the possibility to introduce vulnerabilities, or even function as malware. However, little to no formal research seems to have been made on the security of IDE extensions and it is rarely discussed in internet communities. Despite this, many developers tend to install such plugins without much consideration. In today’s work environment, which is becoming more and more remote, the dependency on these digital tools and extensions become even more important. At the same time, the possibility for companies to control and monitor them becomes even lower, which would make this issue very interesting to explore.

In this thesis, the security of extensions in the popular open-source IDE Visual Studio Code (VS Code) will be explored and evaluated. The focus will be on the resources and services the extensions use, and what security implications that entails. In contrast to most browsers, VS Code does not provide a system for the user to view or control permissions for extensions to address this. In a solution similar to those implemented by browsers, the user would be able to choose what permissions the extension is granted, and thereby which resources and services it can access.

1.2 Aim

As a first step towards this functionality, this thesis aims to investigate what resources a VS Code extension may have permission to access, and how access to these is implemented

(11)

1.3. Research questions

today. Based on this investigation, the possibility of detecting instances of these accesses automatically, by statically analysing an extension, will be explored. This aims to result in a proof-of-concept tool, which will then be validated by using it to evaluate a number of popular VS Code extensions. By doing this evaluation, the aim is to gain a better understanding of what resources and services are more commonly accessed in the current landscape of extensions, and what security implications that may have.

1.3 Research questions

Through these studies, this thesis aims to answer the following research questions:

RQ1. What external resources can be accessed in a VS Code extension and how is this implemented?

RQ2. Is it possible to detect occurrences of accessing external resources through static analysis of the extension source code?

RQ3. What resources and services are more prevalently accessed in practice in popular extensions?

1.4 Delimitations

The study will focus only on VS Code extensions and will not touch upon extensions for other IDEs unless they are relevant as related work.

It will also focus on providing a proof-of-concept rather than a complete solution. Because of this, the study will not necessarily cover all possible resources that an extension may be able to access, nor all supported language constructs in TypeScript, but rather focus on a limited set of common resources that are deemed relevant to examine.

Lastly, the thesis only focuses on VS Code extensions written in TypeScript, as these are the most common. Therefore, it will not attempt to analyse extensions written directly in JavaScript, extensions that do not utilise scripting, or extension components written in other languages.

(12)

2 Theory

This chapter will establish the theoretical ground on which the study is built upon. First, it will cover background which introduces the themes relevant for the study. Following this, the chapter will cover related work which have been published in scientific conferences and journals.

2.1 Software Extensions

Software extensions is a broad concept. Also known as plugins or add-ons, they are a type of smaller program that builds upon another host program to expand the functionality of the latter [12]. Extension systems are common in many types of software as a way to provide modularity to the user experience. Instead of the program developers having to design all possible features a user may want, third party developers, or the users themselves, can extend the program with the features they need [48].

Besides the benefits from a feature standpoint, designing a program for extensibility can have other positive implications. Wagner et al. [56] mention that designing for extensibility requires high modularity in the host program, something that is generally seen as an important aspect when designing software systems [18]. Designing for this modularity can often help reduce the complexity of the program as each module is responsible for a specific task. This modularity, and the possibility to even design intended main features of a program as extensions, can allow the host program to include only the central features. Instead, most features can be made optional to the user, something that greatly improves the possibility to customise the entire program to the individual user. This also makes deployment of new features a simpler task, as these can simply be plugged in as an extension [56]. By providing the extension functionality as an Application Programming Interface (API), the extensibility can be provided without the programmer even needing comprehensive knowledge of the host program.

Wagner et al. [56] however, also mentions a drawback of providing extensibility, as the extension system needs to be robust enough to handle the complexity of several extensions, which may or may not conflict with each other.

(13)

2.2. Permission systems

Browser extensions

A common domain of extensible programs are web browsers. Most popular web browsers support extensions in some way: Google provides a platform for extensions for the Chrome Browser through its Chrome Web Store [14], Apple Safari has extension support through the App Store [45], Mozilla Firefox and Microsoft Edge uses the term add-ons, and provide them through their respective store [2, 39].

2.2 Permission systems

Permission systems are a common security feature of many client-based software domains. It is a type of access control that aims to restrict an application’s access to sensitive resources [44]. While there is no standard method to implement these, they are generally on the principle of least privilege [47]. Permission systems generally bases its functionality on running applica-tions in isolated environments with minimised privileges, where the application cannot directly access user-owned resources [44]. These could include sensitive data, access to hardware and sensors, access to the file system etc. All of which could include some kind of private infor-mation. To provide access to required resources, the host exposes some kind of API for the application to use. However, in order to use these, the application needs to request which APIs it requires access to, which the user must manually approve before it is granted. In the Android operating system [5], applications are isolated into individual processes called sandboxes [6]. Applications cannot interact with restricted data or actions by default, which includes resources that may be include sensitive information. Android divides permissions into Install-time permissions and Runtime permissions. Install-time permissions are presented before installation of the application and are granted automatically upon installation. These represent data and actions outside of the sandbox which is generally deemed of less risk to the user’s privacy or other applications [43]. Runtime permissions represent more sensitive resources. These need to be actively requested before accessing the resource and triggers a prompt which the user must approve before the permission is allowed [43]. Both permission types require the permission to be declared in the application’s manifest file [43].

Browser extension permission systems

All major browsers have a permission system implemented for their extensions [42]. While these have been developed separately for many years, an emerging standard has appeared, called the WebExtensions API [31]. Extensions have very limited functionality by default, and in order to access more powerful functionality and resources, the browsers expose a set of JavaScript APIs. Each API corresponds to a permission which the extension must request access to in an extension manifest file called manifest.json, which is also standardised. Like with Android permissions, extension permission declarations can be declared as required and optional. The required permissions are granted at install-time, while optional permissions must be granted by the user during runtime.

In addition to permissions for APIs, extensions can also declare host-permissions, which defines what web hosts it may interact with. These declarations are also made in the manifest.json and be made required or optional. Declaration is done through pattern matching, so extensions can request access either to specific hosts, or broader patterns covering some general domains, or even any host [19].

Permissions are divided based on how powerful they are. If the permission is less powerful, and therefore of lower risk, no action is required by the user when using these, and the extension is stated to require no special permissions. Using more powerful, higher risk permissions however,

(14)

2.3. Visual Studio Code

result in a warning message being presented to the user. Each message is mapped to one or more permission in a table and describe in a short sentence what the implications of allowing the permissions are [20].

While the overall structure and APIs are more or less standardised between browsers, the actual implementation is up to the individual browsers, which has led to a few differences in compatibility and how the APIs are implemented [13]. In Chrome, Opera, and other Chromium based browsers, the extension API is implemented under the chrome namespace, and implemented using callbacks rather than promises. In Firefox and Edge, the API is instead implemented under the browser namespace, and in Firefox APIs implemented using promises, while Edge uses callbacks. Regarding compatibility, it varies between browsers, some APIs are not supported at all by some browsers, while some are only partly supported [9].

2.3 Visual Studio Code

Visual Studio Code, often abbreviated as VS Code, is a free code editor developed by Mi-crosoft [54]. By default, it provides a relatively small core with support for JavaScript, Type-Script and Node.js [41], as well as features such as a built-in Git client [22]. This core is also available open-source, mainly implemented in TypeScript, which allow companies and teams to implement their own version of the editor, or incorporate the editor into other software [40].

VS Code extensions

While the default functionality in VS Code is relatively small, one of its main features is its extensibility, where further features being possible to add through extensions [35], to customise the editor for the individual user’s needs and preferences. VS Code uses an Extension host, a separate Node.js [41] process which hosts and manages extensions [23]. Through this, the functionality of extensions is separated from the main VS Code process, which limits exten-sions’ ability to interact with the program in unintended ways, for both stability and security reasons [26].

Similarly to browser extensions, the Extension host exposes a set of API endpoints which the extension can use to access and interact with VS Code itself, its environment, workspaces, editors etc. [55]. The VS Code API is divided into 11 namespaces with different functionality. These contain the main resources and methods related to the host program. A list of these together with a brief description of each namespace can be seen in table 2.1.

Each VS Code extension consists of a few basic components. At its core, an extension is also a separate Node.js package and process. Therefore, it must contain a package.json [51] file in its root folder. This is a manifest file that define various properties of a Node.js package but is also used as an extension manifest by VS Code. While it may contain several different properties, the VS Code extension documentation [23] lists a few base properties that are the most important for extensions. The name and publisher fields are used as a unique ID for VS Code to identify the extension. main declares the main entry file of the extension. This is the file that contains the activate and deactivate functions that the Extension host invokes when an activation event related to the extension is fired. Activation events are the events that cause the extension to activate, such as certain commands or opening documents of a certain file type [1]. These are declared in the activationEvents field. In addition to these, the extension also declares how it contributes to VS Code in the contributes field. These contribution points could for example be commands, themes, or languages [16]. Lastly, the extension manifest must also declare the lowest supported VS Code version through engines.vscode. This property also allows the extension to use the VS Code API in its source files.

(15)

2.4. TypeScript

Table 2.1: VS Code API namespaces, with description if applicable, as cited from the oﬀicial documentation.

Namespace Description

authentication Namespace for authentication.

commands Namespace for dealing with commands. In short, a command is a func-tion with a unique identifier. The funcfunc-tion is sometimes also called com-mand handler.

comments

-debug Namespace for debug functionality.

env Namespace describing the environment the editor runs in.

extensions Namespace for dealing with installed extensions. Extensions are repre-sented by an extension-interface which enables reflection on them. languages Namespace for participating in language-specific editor features, like

In-telliSense, code actions, diagnostics etc.

scm

-tasks Namespace for tasks functionality.

window Namespace for dealing with the current window of the editor. That is visible and active editors, as well as, UI elements to show messages, selections, and asking for user input.

workspace Namespace for dealing with the current workspace. A workspace is the collection of one or more folders that are opened in a VS Code window (instance).

Extension language

As VS Code itself is built on a TypeScript code base [40], it is also the endorsed language for building extensions. Both oﬀicial and bundled extensions are generally built on TypeScript, as well as extension examples in the oﬀicial documentation [25]. However, since TypeScript is compiled to JavaScript before runtime, as described in section 2.4, it is also possible to develop extensions directly in JavaScript as well.

In addition, some extension functionality does not require scripting, and can simply be added with JSON files. Examples of such are adding colour themes [15] or syntax highlighting support for new languages [49].

2.4 TypeScript

TypeScript is a programming language that is built as a superset of JavaScript [53]. This means that it is a language that builds upon JavaScript, supporting the full JavaScript language, while adding features on top of it. The main feature offered by TypeScript is adding a static type system to JavaScript. This system allows developers to define object types, but also implements type inference where the type of a variable is inferred by the object it is assigned the value of. This type system is said to help developers increase the structure of the code, and help find errors earlier in the development phase [53].

TypeScript is implemented to be optional, in order to simplify adoption. It allows developers to convert parts of the codebase of a project to TypeScript at a time, only adding typing where necessary [53].

TypeScript is not traditionally compiled, but rather transformed into JavaScript before run-time. This also simplifies combining TypeScript and JavaScript in a project, but also adds another quirk. Since JavaScript does not support typing, this is lost in transformation.

(16)

There-2.5. Static Analysis

fore, the typing and type checks only exist during development and analysis of the TypeScript code, but are not enforced during runtime [53].

2.5 Static Analysis

Static analysis is a general term for programmatically analysing software without executing the code. This in contrast to dynamic analysis which analyses programs during runtime. Static analysis normally analyses the non-compiled source code but may also analyse compiled bytecode [34].

Abstract Syntax Trees

Abstract syntax trees (AST), or syntax trees, are an abstract representation of a program’s syntax, commonly used in compiler design and derived during compilation [3]. It represents syntax in a hierarchical tree, where each node and leaf represent a syntactical construct. In addition to its use in compilers, it is also often a central part of static code analysis and code checking. As it represents the source code in the context of the language, without taking for example white-spaces, dots, and commas into account, it allows for structured traversal and analysis of the source code. It also allows for modification of the source code by inserting or deleting nodes in the tree. One example of such a use case is in linters. These static analysis programs use the AST of source files to analyse the code [30]. They look for common coding errors and oversights that the programmer may miss and notify the user or directly corrects them before they get compiled. These errors may be things as unused variables or imports, unreachable code, or assignment of dereferenced pointers [32].

2.6 Related work

This section introduces previous research done on the security of extensions, as well as the implementation and effectiveness of permission systems as a security feature. This related work is introduced to put the work of this thesis in a larger context.

Browser extension security

While the risks of IDE extensions have not been explored, the area of browser extensions have been thoroughly researched. Several studies have evaluated popular extension in search for vulnerabilities.

For example, Carlini et al. [10] evaluated 100 Google Chrome extensions by analysing and modifying their network traﬀic during runtime, as well as through manual static taint anal-ysis. They found that at least 40% of the analysed extension contain one or more vulnera-bilities. When evaluating these vulnerabilities to the security mechanism in Chrome, isolated environments, privilege separation, and permissions, the mechanisms were overall effective at preventing vulnerabilities, but also requires the developers to utilise them properly. It was not uncommon that developers actively circumvent them when implementing features.

Bauer et al. [8] present a set of stealthier attacks to Chrome extensions. These range from tracking and stealing user data and behaviour, to privilege escalation by having extensions share data and state, thereby making data from an extension with limited permissions avail-able to another with another set of permissions. Some of these could be solved by websites implementing content security policies restricting the use of data from the website, as well as enforcing policies for information flow in the browser. More fine-grained permissions could

(17)

2.6. Related work

solve some of the problems as well. Finally, they propose methods for increasing user aware-ness when installing extensions and noting the importance of providing the user with clear information about an extension. This allows them to make informed decisions and could help steer them to extension requiring less permissions.

Wang et al. [57] used a modified browser based on Firefox to dynamically analyse 2,465 Firefox extensions for vulnerable behaviours. They categorise vulnerable behaviours in groups of high, medium, low, and none, based on their severity. The behaviours of highest severity include arbitrary file access, functionality to download, install, and launch processes, access to network functionality as well as injecting DOM objects. In their evaluation, they found occurrences of all these behaviours, and while they are not necessarily malicious, they may expose some vulnerabilities.

Effectiveness of permissions

The effectiveness of permissions as a security feature is another subject that has been well covered in literature.

Felt et al. [28] evaluated the effectiveness of install-time permissions in Chrome extensions and Android applications. They find that install-time permissions are an effective security feature, but it is often compromised by some shortcomings in its design. Most evaluated applications request at least some dangerous permission. Several of these were deemed to be over-privileged, i.e., they request permissions that they do not actually need to fulfil their functionality. This would be both because of developer errors, but also from the possible permissions not being granular enough, requiring the application to request a permission where only a small part of it is needed. They also discuss the issue of dangerous permissions being too common or too broad. This prevalence causes desensitisation in users, where the warning messages lose their perceived importance. Users simply accepts warnings without paying closer attention to them. Marouf et al. [36] studied the Chrome extension permission model and come to many of the same conclusions as Felt et al. They propose a solution using optional runtime-permissions, which gives the user more fine grained control of the extensions permission during usage, some-thing that has since been implemented in both major browsers and Android applications [31, 43].

Over-privilege detection

The issue of over-privilege mentioned above has also been studied further. Here, the focus is on detecting over-privileged applications and extensions.

A common approach is the use of various machine-learning technologies. Khazaei et al. [33] propose a system called OPEXA, Over-Privileged EXtension Analyser, which use natural lan-guage processing on extension descriptions to detect what permissions the described function-ality. These results were compared to the actual declared permissions to detect over-privilege. Shezan et al. [46] used a similar methodology but increased the dataset by creating a model by mapping permissions from different domains with similar functionality. By doing this, browser extensions, mobile applications, and internet of things services can be analysed with the same tool. On the other hand, Wu et al. [58] detected over-privileged Android applications through data mining. They divided applications based on their store category and compared applications’ declared permissions to a set of permissions commonly needed by applications in the same category to detect over-privilege.

(18)

2.6. Related work

The issue of over-privilege is similar to the problem of this thesis, as it concerns analysing what permissions the extension of application actually uses, or should reasonably need to use, compared to what it declares. The machine learning based approaches discussed above, does however require a permission system to be implemented to compare against, which makes those methods unfeasible for VS Code extensions at the moment.

Tang et al. [50] take a static analysis approach to the problem in Android applications. They decompiled application binaries and checks the resulting source code for instances of permission-related method usages. After detecting what the application actually uses, they also apply the semantic approach and compare the application description to the result of the analysis to decide what represents over-privilege. Dennis et al. [21] present the tool P-Lint, a linter that reverse engineers Android application code and detects use of permission methods and especially focuses on improper or vulnerable use patterns. In another study, Chester et al. [11] present M-Perm, which combines static and dynamic analysis to detect both under-and over-permission. They also use decompiled applications to statically compare the declared permissions to those being used in each file. In addition, a call graph of the application is used to deduce the reachability of each permission from each entry-point of the application. With this, they can detect which permissions are active at each application state.

This approach shows that using static analysis to check the code for permission usages is a valid approach. In Android applications, the fact that applications are published compiled and need decompilation before analysis, makes the static analysis a more difficult task. Bartel et al. [7] mention a few of these difficulties, making naive static analysis non-sufficient in many situations. One example is the difficulty of deciding which permission a permission use is connected to, due to string literals not necessarily being preserved during decompilation. In the case of VS Code extensions however, this should not be an issue due to the extensions generally having their source code fully available as open-source.

Evaluation of static analysis tools

While security tools asses the quality of other software, the tools themselves need to be eval-uated in order to verify their effectiveness.

Soundness is a commonly used property of static analysers which relates to if the analyser is guaranteed, i.e., if there exist a vulnerability or piece of code that the analyser should be reporting, it will report that item [38]. While soundness can often be a property to strive for, one issue is that it may result in a large number of false positives, which in the end results in the true positives losing their value.

In contrast to soundness there is the concept of completeness. If an analyser is complete, it is provable that all items reported are true positives [38]. As soundness often brings a lot of false positives, completeness instead often brings false negatives - that is when items that should be reported are missed by the analyser.

Both soundness and completeness are, by definition, all or nothing [38]. Therefore, in order to prove soundness for example, a formal proof is often needed, which is often not practical in real contexts. Instead of aiming for proving soundness, aiming for a probably sound analysis is often more practical. This can be done through for example exhaustive testing or as done by Andreasen et al. [4], through comparing the static analysis to that seen in dynamic analysis. Evaluating static analysis tools often looks at detected and non-detected instances. These are classified as four different categories:

(19)

2.6. Related work

• True Positives - Vulnerabilities detected as such.

• False Positives - Non-vulnerabilities detected as vulnerabilities. • True Negatives - Non-vulnerabilities not detected by the tool. • False Negatives - Vulnerabilities missed by the tool.

These categories can be compared in various ways to calculate various metrics related to the effectiveness in detecting vulnerabilities. In order to measure soundness and completeness, two metrics are commonly introduced - Recall and Precision.

Recall measures the level of soundness of an analyser. A high recall metric means a higher soundness. Recall is calculated by comparing the number of true positives to the total number of items the analyser should report [38]. The formula therefore is:

Recall= T rueP ositives

T rueP ositives+ F alseNegatives

Precision on the other hand, is a metric that allows you to measure the number of true positives to false positives, that is, the level of completeness in the analyser [38]. This is calculated as:

P recision= T rueP ositives

T rueP ositives+ F alseP ositives

Tang et al. [50] compare the result of their static and semantic analysis to a human reading the description of an app. Based on the text, the reader makes an assumption of what permissions would be needed for this functionality. Based on these results, they calculate precision and recall, and also introduce two additional metrics:

F− measure = 2∗ P recision ∗ Recall P recision+ Recall Accuracy= T rueP ositives+ T rueNegatives

T rueP ositives+ F alseP ositives + T rueNegatives + F alseNegatives

F-measure aims to find the best compromise between precision and recall, while accuracy gives a general score on how well the analysis performs in regard to both true positives and true negatives. In their study, they especially focus on the F-measure in order to maximise soundness, while minimising the number of false positives.

All these could be interesting to explore, while a real-life setting could also introduce aspects as speed of the analysis as a relevant metric to measure. In this study, however, recall and precision are chosen as the metrics to evaluate.

(20)

3 Method

This chapter documents the methodology used throughout the thesis. First, the pre-study is described in section 3.1, followed by the implementation in section 3.2. This section describes how the static analysis for detecting method usage in extensions was implemented. Finally, section 3.3 describes how the effectiveness of the tool was evaluated by analysing a larger set of extensions.

3.1 Pre-study

In order to gather enough information about VS Code extensions, permission systems, and static analysis, to be able to answer RQ1 and determine the possibility of implementing a static analysis tool that detects method usage in VS Code extensions, a pre-study was conducted. This pre-study was divided into two main phases.

First, information on VS Code extensions and their anatomy and architecture was gathered. This was mainly done through studying the oﬀicial documentation which describes the inner workings of extensions and how to develop your own [24].

The second phase consisted of gathering theoretic knowledge and related work on the subjects relevant to the thesis. This was done through searching for scientific articles, mainly on Google Scholar, for keywords such as extension security, permission models, static analysis, over-privilege. Through articles found with this method, further material was gathered through their respective referenced articles and citations. In those cases where no published material could be found, less formal material such as documentation and developer blogs were used to gain a better understanding on the subject.

Most of the result from this study can be read in the Theory chapter, while answers to RQ1 are presented in the Results chapter.

(21)

3.2. Implementation

3.2 Implementation

The implementation phase was conducted with the goal of creating a static analysis tool that, with the input of the source code of a VS Code extension, can analyse the program and return a list of the extension’s capabilities from a permission point of view. The tool itself is built in TypeScript to simplify extracting ASTs for extension TypeScript source files. It is implemented as a Node.js application to allow local execution and is currently implemented for use as a command line interface (CLI), while measures have been made to simplify connecting the application to a web service or similar. An overview of the system can be seen in figure 3.1 and the individual components and tasks of the tool will be presented in detail below.

Figure 3.1: Overview of the analyser architecture

Defining the extension root

The first task of the analyser is to find and define the root directory of the extension within the provided directory. The input directory is assumed to be cloned or downloaded directly from a Git repository, and while most extensions are published by themselves, with the extension root as the repository root, there might be exceptions to this. The extension MetaGo [37] is an example of this, as it also publishes its sub-extensions MetaJump and MetaWord under the same Git repository. To accommodate this, the possible extension roots of the provided source code is identified by recursively searching through all subdirectories. The goal is to find directories that contain both a package.json file, which indicates an extension, and a

tsconfig.json file, which indicates that TypeScript is used throughout the extension. Each

(22)

3.2. Implementation

Abstract Syntax Trees

In order to analyse each extension, an AST representation of each source file in the project is extracted. TypeScript supplies a compiler API through the TypeScript npm package. This is the same functionality used in the TypeScript compiler, and the API can therefore directly provide an AST representation of an imported .ts source file. An example of the AST structure provided by the compiler API can be seen in figure 3.2.

Figure 3.2: AST generated from the line let variable = object.function(prop1, prop2)

ts-morph

While the AST and methods provided by the compiler API could be used directly for the analysis, ts-morph is an open-source library that wraps the compiler API in order to simplify navigation and manipulation of TypeScript ASTs [52]. ts-morph is used throughout this project to import and navigate the source file ASTs. Some examples of functionality ts-morph adds, that are used in this project, are the possibility of finding AST nodes of certain kinds, and also the mapping of identifiers to references of that identifier. This simplifies the task of finding uses of imported methods.

AST traversal

The analysis is done through traversing the AST of each project source file. Traversal is done depth first, using both preorder and postorder tree walk [17]. By doing this, actions and analysis of nodes can be done both on entry and exit of the node, i.e., before or after visiting each child of the node. This strategy is essentially the same as the one used by estraverse in ESLint [30]. The basic traversal algorithm is presented in listing 3.1 and is visualised further in figure 3.2.

Listing 3.1: Tree traversal algorithm 1 function traverse (node) {

2 before (node)

3

4 for ( childNode in node) {

5 traverse ( childNode )

6 }

7

(23)

3.2. Implementation

9 }

Figure 3.3: Visualisation of the traversal algorithm

Node Visitors

Visiting the nodes during the traversal is implemented using a variant of a visitor pattern [29]. However, as the node types were not modifiable, the usual double dispatch functionality was not possible to implement, why the current solution resorts to a list of if statements for type checking. A UML-diagram of the implemented pattern can be seen in figure 3.2.

The visitor structure is based around an abstract NodeVisitor class. This class defines three methods. The before and after methods correspond to the same methods in the algorithm above. These take a node as input and define the visit behaviour for different node types, either to be handled before or after visiting the node’s children. Generally, most function-ality is implemented in the before method, but after is used in certain situations. The provideResults method is supposed to be called on after analysis is completed. It sum-marises and returns the results of the visitor’s analysis. The NodeVisitor class is purposefully designed to be modular to allow for extension of the analyser in the future if needed.

(24)

3.2. Implementation

AbstractMethodVisitor

For this thesis, a single subtype of NodeVisitor is implemented, AbstractMethodVisitor. This subtype is also abstract and contains functionality for searching source files for imported method calls. In turn, two concrete classes of this abstract class have been implemented, ImportMethodVisitor and VsCodeMethodVisitor. VsCodeMethodVisitor is responsible for specifically analysing methods from the VS Code API, while ImportMethodVisitor is more general and analyses other imported packages.

This subtype of NodeVisitor is based around individual source files. It requires analysis to be started with a complete source file AST, i.e., an AST with a SourceFile node as root. Before entering a source file, it stores a new SourceFileData object. This object contains basic information about the source file, and dependencies and methods found during analysis is stored in this object. Upon exiting the same node, the current SourceFileData object is pushed to a separate list for later extraction.

Implemented language constructs

For the implementation, support for a reduced set of language constructs was implemented in the analyser. These were chosen to represent a base set of the most common ways to access dependencies, properties, and methods, as well as to store subreferences to these. In the following sections, these constructs and their implemented behaviour will be described in more detail.

• Import declarations are the main method for importing external packages and mod-ules into a TypeScript source file. When visiting an import declaration, the analyser stores a reference to the imported package for each imported module. Imports in Type-Script come in three main types: Default, Namespace, and Named.

– Default imports the default export from the package.

Listing 3.2: Default import 1 import Module from 'package ';

– Namespace imports the entire package namespace into a single variable.

Listing 3.3: Namespace import 1 import * as Alias from 'package ';

– Named imports one or more individual modules from the package.

Listing 3.4: Named import

1 import { Module1 , Module2 } from 'package ';

In addition, it is possible for all imports to be declared with aliases, where the actual variable name is changed from the module name. It is also possible to combine different import types in a single import statement which is also handled by the visitor.

Listing 3.5: Combined type import

(25)

3.2. Implementation

• Require calls are another way to import packages and modules. These function sim-ilarly to import statements but are assigned as a regular variable declaration and are therefore processed slightly different to import statements, although the resulting refer-ence is the same.

Listing 3.6: Require calls 1 const module = require ('package ');

2 const { module1 , module2 } = require ('package ');

• Variable declarations and property assignments are treated in much the same way. If a property of an imported package is assigned to a variable or object property, the assigned variable is stored with a reference to the imported property it was assigned with. However, if the variable is assigned with a method call, it is not stored, as the analyser does not handle return objects at this moment. For further description on handling of method calls, se Call expressions.

Listing 3.7: Variable declaration and property assignment 1 let variable = module . property ;

2 object . property = module . property ;

• Binary expressions represent, among other types of expressions, variable assignments in the AST, i.e., assignment of values to already declared variables. These are identifiable as an Identifier node as a left node followed by an equals sign. In contrast to variable declarations and property assignments, these are not represented by a unique node type in the AST, why this special handling is needed. Except for that these instances are treated in the same way.

Listing 3.8: Variable assignment binary expression 1 variable = module . property ;

• Function declarations are somewhat handled by the analyser. While functions them-selves are currently not handled, function parameters may represent assignment of an imported type or interface. Therefore, parameters with imported types are also stored as reference to imported modules. This allows method calls on these parameters inside the method to be tracked as well.

Listing 3.9: Function declaration 1 function ( parameter : ImportedModule ) {}

• Call expressions are the node representation of function and method calls. If the property that contains the method being called is reference to an imported module, the method is tracked as used, and added to the result of the analysis.

Listing 3.10: Method call expression 1 module . method ();

Reference Handling

Upon finding an instance of an imported method or property being referenced, that instance needs to be stored, either to be able to detect further references to the referenced symbol, or to later extract all used properties and method calls. In each source file object created by a

(26)

3.2. Implementation

NodeVisitor a set of dependencies are stored, either as a single vsCodeDependency or a list of importDependencies. These each represent an imported npm package. Each dependency contains the name of the dependency, a usedProps property, and an importReferences list.

importReferences

References to a dependency throughout the analysis is stored in a nested list of ImportReference objects. At the top level, DirectImportReference objects are stored which represent references directly connected to the package. These are references that are added through import statements or require calls. Each reference also stores subReferences. These are assignments that reference this reference. References are identified using Identifier nodes.

When adding a new reference to the structure, all Identifier nodes that references this node are found using ts-morph functionality and stored in the usages field. When, for example, a call expression or variable assignment are found during analysis, the expression is mapped against existing reference usages to identify if the expression references an imported package or its sub references. In addition to these fields, each reference also contains the name, any potential alias of the reference, i.e., alias imports or variable names, its declaration Identifier, as well as a reference to the parent node of the tree, either a reference or a dependency. The structure of these objects can be seen in listing 3.11.

Listing 3.11: Example of the ImportReference JSON structure. 1 DirectImportReference {

2 name: " module ",

3 identifier : Identifier ,

4 usages : [ Identifier1 , Identifier2 , ...] , 5 subReferences : [

6 SecondaryImportReference {

7 name: " property ",

8 identifier : Identifier1 ,

9 usages : [ Identifier3 , Identifier4 , ...] ,

10 subReferences : [...] 11 aliasReference : " variableName ", 12 reference : Parent , 13 }, 14 ... 15 ] 16 aliasReference : " variableName ", 17 dependency : Parent 18 } usedProps

When imported methods are called or properties accessed, these are also pushed to a separate set called usedProps. This is stored as a nested object, but in contrast to importReferences, this only contains unique values and only the name of the resource as defined by the VS Code API. This is the list of props that is extracted from the source file after analysis. An example can be seen in listing 3.12.

Listing 3.12: Example of the usedProps JSON structure 1 usedProps : {

(27)

3.2. Implementation 3 property1 : { 4 method1 : {} 5 }, 6 method2 : {} 7 }, 8 module2 : { 9 method3 : {} 10 } 11 }

Used Prop extraction

Once analysis of all source files of an extension is finished, all usedProps are combined into a single object for the entire extension. This is done in the provideResults method in each NodeVisitor. The method iterates through each source file and dependency, merging the usedProps into a single object per dependency. The resulting list of dependencies and used props are then saved to the analyser.

Permissions

As no existing permission system is implemented for VS Code, there exists no mapping between API methods and potential warning messages. This mapping therefore has to be created manually.

API extraction

The VS Code API is available from the documentation and is generated from a vscode.d.ts type declaration file [55]. In order to work with the API, it is first extracted to a JSON file. This is also done using the ts-morph library. Using the AST of the file, all namespace and interface/class names can be easily extracted and stored. For each namespace and interface/-class, all properties and methods are then extracted and stored. If a property is of a type defined by the API, the type name is also stored in the property to map the property to any method usage found during analysis. Finally, the extracted API is stored as a JSON file to be imported during analysis.

Permission categories

When designing permission categories for the Extension API, a similar strategy to the system implemented in the WebExtensions API [31] is used. Each namespace in the Extension API is treated as an API in the WebExtensions API. These are therefore each mapped to a permission message as seen in table 3.1. These messages are formulated to be short and concise as to not overload a potential user with information, while also assuming that the user has some computer knowledge and are familiar with basic concepts in VS Code such as workspaces and commands.

However, as many studies on browser extension permissions suggest, the granularity of group-ing permissions by namespace or API is often not enough, leadgroup-ing to over-privilege issues [8, 28, 36]. In an attempt to combat this, additional messages are mapped to properties and methods of namespaces. In the case of properties being of types defined in the API, some interfaces have messages mapped to properties and methods as well, as to define the specific types of actions being done to the properties. While these more granular permissions could be added to all possible actions, focus has been put into those of types that would be deemed of more sensitive nature, based on their corresponding permissions in browser extensions, as

(28)

3.2. Implementation

described in section 2.6. These are methods and properties related to file access and modifica-tion, script execumodifica-tion, interactions with external URIs, and system specific resources such as machine ids and using the clipboard. A full list of all messages and the corresponding methods and properties they are mapped to can be seen in appendix B.

Each message is stored using an enum mapping to the message. The enums were in turn added manually to each relevant instance in the API JSON file.

Table 3.1: Permission messages related to each namespace.

Namespace Permission message

authentication Interact with and handle third-party authentication providers commands Interact with VS Code commands

comments Interact with the Comments interface debug Interact with the Debug interface env Interact with the editor environment extensions Interact with installed extensions languages Interact with language features

scm Interact with Source Control Managers tasks Interact with VS Code Tasks

window Interact with the editor window workspace Interact with the current workspace

Permission mapping

As both used methods and properties, and the extracted API are stored as hierarchical trees identified with strings, the used methods and properties are mapped recursively directly to the API. If any namespace, property, method etc. in the recursive chain maps to an instance in the API that contains a permission message, that message is added to a separate list of triggered permissions for the application. In the case of a method being applied to a namespace property, this method permission is mapped as a child to the property permission. As such, this allows the analyser to, for example, see that it is specifically the document in the active editor that is being edited, rather than any arbitrary document, increasing the granularity further.

Data extraction

The analysis program is designed for the resulting data to be processed by some other program for visualisation, for example a web client, why the resulting data is extracted in JSON format. This was chosen because of two advantages. First, it is easily readable both by humans and most programming languages, as it stores data as readable key-value pairs. It is also easy to extract the data from TypeScript objects using built-in serialisation methods. An example of the output data format can be seen in listing 3.13.

Listing 3.13: Example of the output data JSON structure

1 {

2 " extension_1 ": { 3 " messages ": [

4 {

5 " message ": " Interact with the editor window ", 6 " subMessages ": [

7 {

8 " message ": " Access the active editor ", 9 " subMessages ": []

(29)

3.3. Evaluation

10 }

11 ]

12 },

13 ],

14 " dependencies ": [" dependency1 ", " dependency2 "]

15 }, 16 " extension_2 ": { 17 ... 18 } 19 }

3.3 Evaluation

The evaluation phase was conducted with two goals. In order to answer RQ2: ”Is it possible to detect occurrences of accessing external resources through static analysis of the extension source code?” the effectiveness of the tool had to be evaluated. In addition, a systematic analysis of a larger set of extensions was done to be able to answer RQ3: ”What resources and services are more prevalently accessed in practice in popular extensions?”

Tool effectiveness

In order to evaluate the analyser’s effectiveness at detecting properties and methods that should trigger a permission message, the tool was compared to detecting these through manual review. This evaluation was done by selecting three popular extensions from the VS Code Extension Marketplace [27]. These were chosen according to three criteria. The extensions should be listed among popular extensions on the marketplace, in order to find references that represent commonly used functionality. They should be of different functionality, as to represent different types of extensions. While there are extension categories implemented in the marketplace, most extensions tend to be put into Other, why this would not necessarily be a good representation. They should also contain a reasonably sized code base, which is feasible to analyse manually in a limited time frame. The three chosen extensions are shortly described below.

• Bracket Pair Colorizer 2 - This extension colourises matching bracket pairs in unique colors, making it easier for the developer to quickly identify which opening and closing brackets are connected.

• Path Intellisense - This extension provides autocompletion when writing paths and filenames.

• Live server - This extension provides functionality to run a live development web server directly within the editor, with features such as live reload.

Manual review

The manual review was conducted by the author by opening the cloned extension repositories in VS Code. The aim was to manually identify all API usage that would result in triggering a permission message as described in Permissions. Each .ts source file of the extension were read individually, with the help of search via the Ctrl-F command and other built-in editor tools.

(30)

3.3. Evaluation

Analysis

By comparing the resulting permission messages of the manual and automatic review, results were divided into true/false positives/negatives. Here, the result of the manual review is treated as the true answers. Therefore, in this case, a true positive indicate that the permission was triggered by both reviews. False positives indicate that the analyser triggers a message that the manual review does not. False negatives indicate that the manual review triggers a message that the analyser does not.

From these results, a recall and precision metric could be calculated for each extension, indi-cating the effectiveness of the analyser.

Systematic evaluation

The second part of the evaluation aims to gather a statistical representation of how the Ex-tension API and node packages are used by exEx-tensions in general. This was done by gathering a larger set of extension to analyse using the tool.

The dataset was gathered by asking developers in at Cybercom, the company where the thesis was conducted, to submit the extensions they use in their daily work. This was done by asking two departments, via email, to share their list of installed extensions. Extracting this list is possible to do via a command added directly by VS Code, which was provided with the request email.

Listing 3.14: Bash command to extract installed extensions 1 code --list - extensions > extensions .list

The developers were then asked to share the extensions.list file with the author. After receiving them, the lists were treated anonymously without any reference to the developers they were provided by.

Once the list was compiled the extensions were manually checked by the author to conclude that the extensions were of a varied functionality, and of generally high popularity on the marketplace. In addition, extensions that only supply code snippets or colour themes were filtered out of the list as well, as these do not use any scripting, and is therefore not compatible with the analyser. Finally, a set of 56 extension had been gathered which can be seen in appendix A, and the Git repository of each extension was cloned into a subfolder of the analyser. An additional script was then added to the analyser, which runs the analysis on each extension in this folder, combining the results of each pass into a single JSON file grouped by extension.

Two sets of results were extracted from this JSON file for further analysis. One set containing the summed-up number of occurrences of each permission message, and one summarising the occurrences of external dependencies.

(31)

4 Results

This chapter presents the results of the work performed.

4.1 Pre-study

One of the main goals of the pre-study was to identify how VS Code extensions may access external resources outside of the own process. The study shows that extensions can interact with VS Code using the Extension API. In addition to editor features, the API also provides access to some system resources such as the local file system, but also provides the possibility to interact with external web pages through Webviews.

As the extension is an individual Node.js process, it is also possible for the extension to depend on any external npm package, as well as the use of internal Node.js libraries. While the extension host controls the extension’s interaction with the editor, no such control is documented with regard to the use of these external npm packages.

4.2 Implementation

The main focus of this thesis has been implementation of a static analysis tool that emulates a permission system, like those found in browser extension systems, for VS Code extensions. This extension analyser has been implemented as a CLI tool in TypeScript which scans Type-Script source files of an extension and returns a list of permissions used by the extension, corresponding to the use of methods and properties from the VS Code Extension API, as well as a list of the external npm packages the extension depends on, as a JSON file.

The tool can be used by running the commands seen in listing 4.1 from the root of the project, given that Node.js is installed on the system:

Listing 4.1: Bash command to run the analysis tool 1 npm install

(32)

4.3. Evaluation

where the path is provided in posix form, using ”/” as path separators. The resulting JSON file can then be found in the permissions folder under the name EXTENSION_NAME.json for further analysis.

4.3 Evaluation

This section will present the results found during the evaluation, split into the evaluation on the tool effectiveness and the systematic evaluation of extensions.

Tool effectiveness

The result of each extension analysis will be presented individually below, followed by the precision and recall calculations.

Bracket Pair Colorizer 2

The result of the analysis can be seen in table 4.1. The results shows that the manual review caught two permissions that the tool missed.

Table 4.1: Permission messages detected in Bracket Pair Colorizer 2.

Permission message Manual Review Tool Review

Interact with VS Code commands x x

Interact with the editor environment x

Interact with installed extensions x x

Interact with the editor window x x

Interact with the current workspace x x

Register new commands x x

Access all installed extensions x

Access all visible editors x x

Access the active editor x x

The first missed permission, Interact with the editor environment, indicate the import or use of the env namespace. In this case, the namespace and its property were used directly as a parameter in a function without first assigning it to a variable.

Listing 4.2: Code line triggering Interact with the editor environment 1 return path.join(vscode.env.appRoot , 'node_modules .asar ',

moduleName );

The second permission, Access all visible extensions was caused by the property extensions.all being assigned directly in a for statement as the array to loop over.

Listing 4.3: Code line triggering Access all visible extensions 1 for (const extension of vscode. extensions .all) {...}

Path Intellisense

The result of the analysis can be seen in table 4.2. In this case, the manual review and the tool found the same permissions.

(33)

4.3. Evaluation

Table 4.2: Permission messages detected in Path Intellisense.

Interact with language features x x

Interact with the editor environment x x

Read files in a workspace x x

Live Server

The result of the analysis can be seen in table 4.3. The manual review found one permission that the tool missed.

Table 4.3: Permission messages detected in Live Server.

Interact with VS Code commands x x

Interact with installed extensions x x

Interact with the editor window x x

Interact with the current workspace x x

Execute commands x x

Register new commands x x

Access specific extensions’ APIs x x

Create or open user input elements x x

Show error or warning messages to the user x x

Read files in a workspace x x

Access the active editor x

Save all updated documents x x

In this case, the Access the active editor permission was missed. The reason for this was that the property was assigned to a variable using a conditional operator, an operator that allows an if-statement inside the assignment. I.e., if a value is true, a certain value is assigned to the variable. Otherwise, another value is assigned.

Listing 4.4: Code line triggering Access the active editor

1 const openedDocUri = pathUri || ( window . activeTextEditor ? window

. activeTextEditor . document . fileName : '');

Precision and Recall

From the results presented above, the precision and recall metrics for each extension, as well as a summarised total for all three extensions, can be calculated. The results can be seen in table 4.4.

Table 4.4: Precision and Recall Metrics.

Extension TP TN FP FN Precision Recall

Bracket Pair Colorizer 2 7 - 0 2 1 0.778

Path Intellisense 3 - 0 0 1 1

Live Server 11 - 0 1 1 0.917

Total 21 - 0 3 1 0.875

The high precision metric would indicate a high completeness, while the recall also indicates a high soundness, although not as high as the completeness.

(34)

4.3. Evaluation

Table 4.5: Commonly occurring and interesting permissions found during testing

Permission message Count

Interact with the editor window 54 Interact with the current workspace 52

Interact with VS Code commands 48

Register new commands 45

Show error or warning messages to the user 44 Create or open user input elements 36

Execute commands 35

Access the active editor 34

Read any file on the file system 28

Open text documents 21

Read files in a workspace 19

Edit the document/documents 17

Apply workspace edits 15

Create webviews 15

Open external URI:s 10

Interact with the clipboard 9

Interact with the file system 8

Read data from the clipboard 6

Save all updated documents 6

Write any file to the file system 4

Execute shell tasks 4

Send local content to webviews 2

Delete files 1

Table 4.6: Commonly occurring and interesting dependencies found during testing

Dependency Count path 39 fs 34 os 22 child_process 18 vscode-languageclient 15 util 14 lodash 12 fs-extra 12 http 7 https 6

Systematic evaluation

The systematic tests were run on the extensions listed in appendix A. Once completed, the data was extracted from the resulting JSON file into two tables. The first, which can be seen in full in appendix C, shows the prevalence of each permission message, sorted from most prevalent to least. Here, the resulting permission structure were flattened as keeping the hierarchical structure from the results would lead to a very scattered result that would be diﬀicult to analyse. An extract of this list with the most common permissions and a few permissions of special interest can be seen in table 4.3

In table 4.3 the prevalence of dependencies is depicted. This include both external npm packages as well as internal Node.js libraries. To avoid an overly crowded table, it only contains the most common dependencies, while a full list can be seen in appendix D

(35)

5 Discussion

This chapter presents the discussion of the results and method of the thesis, as well as a reflection on the work in a wider context.

5.1 Results

Pre-study

The results concluded from the pre-study indicate that implementing a proper permission system in the current extension environment would be an arduous task. The extension host could reasonably have a permission system for the Extension API implemented, where access to these must be declared and accepted by the user before granting. However, other extension systems rely heavily on extensions running in isolated environments for security feature. With-out implementing isolated environments for extensions, the issue still exists that extensions can utilise any npm packages and Node.js libraries, which run under user privileges. This, for example, includes the fs library in Node.js, used in the analysis tool from this thesis, which is used to access the file system.

Implementation

Currently, there are a few language constructs that the analyser does not support. For example, it currently stops referencing sub references once a method has been called on a property, i.e., it does not handle any return values from methods that themselves may imply triggering permission messages. It currently does not support array properties and methods either. Both these cases were compromises that were knowingly left out of the current analyser as a way to reduce the scope of the project. They would, however, be a reasonable addition to increase the soundness of the analyser in a future revision.

As discussed regarding the pre-study, in addition to extensions being able to access external resources through the Extension API, Node.js libraries and npm packages can also be used for the same purpose. The analyser currently handles this by also storing dependencies and use of methods from these. However, the sheer amount of npm packages that are possible to rely on

Implementation and Evaluation of an Emulated Permission System for VS Code Extensions using Abstract Syntax Trees

Linköping University | Department of Computer and Information Science

Master’s thesis, 30 ECTS | Computer Science and Engineering

2021 | LIU-IDA/LITH-EX-A--21/054--SE

Implementation and Evaluation

of an Emulated Permission Sys‐

tem for VS Code Extensions using

Abstract Syntax Trees

Implementation och Utvärdering av ett Emulerat Be‐

hörighetssystem för Extensions i VS Code med hjälp av Abstrakta

Syntaxträd

David Åström

Upphovsrätt

Copyright

Acknowledgments

Contents

List of Figures

List of Tables

Listings

1

Introduction

1.1 Motivation

1.2 Aim

1.3 Research questions

1.4 Delimitations

2

Theory

2.1 Software Extensions

Browser extensions

2.2 Permission systems

Browser extension permission systems

2.3 Visual Studio Code

VS Code extensions

2.4 TypeScript

2.5 Static Analysis

Abstract Syntax Trees

2.6 Related work

Browser extension security

Effectiveness of permissions

Over-privilege detection

Evaluation of static analysis tools

3

Method

3.1 Pre-study

3.2 Implementation

Defining the extension root

Abstract Syntax Trees

Node Visitors

Implemented language constructs

Reference Handling

Permissions

Data extraction

3.3 Evaluation

Tool effectiveness

Systematic evaluation

4

Results

4.1 Pre-study

4.2 Implementation

4.3 Evaluation

Tool effectiveness

Systematic evaluation

5

Discussion

5.1 Results

Pre-study

Implementation