Finding tailored educational paths using a graph database

(1)

September 2020

Finding tailored educational paths using a graph database

Emil Stolpe

Institutionen för informationsteknologi

(2)

(3)

Teknisk- naturvetenskaplig fakultet UTH-enheten

Besöksadress:

Ångströmlaboratoriet Lägerhyddsvägen 1 Hus 4, Plan 0 Postadress:

Box 536 751 21 Uppsala Telefon:

018 – 471 30 03 Telefax:

018 – 471 30 00 Hemsida:

http://www.teknat.uu.se/student

Finding tailored educational paths using a graph database

Emil Stolpe

The Swedish educational system is full of possibilities but is also rather complicated because of that fact. There exist several

different paths to reach the same goal but how do you find them and which one is the quickest?

This project has tried to make it easier for students to find the right path from start to finish by presenting possible study paths.

It has been done by collecting information about schools and programs and inserting it into a graph database which has then been traversed to extract the fastest paths from the starting point (e.g. elementary school) of a student to their goal (e.g. Doctor) based on a few arguments.

Interviews with student counselors have been conducted in order to evaluate how practical the system is. A conclusion from these interviews is that the system is useful but halted by the fact the database contains too little information. The idea is good but the system would need to be scaled up to be more useful, which is expected when it is a prototype. To fill the database with all information necessary is left as a future work since it would be too time-consuming.

Examinator: Lars-Åke Nordén Ämnesgranskare: Anna Eckerdal Handledare: Andreas Samuelsson

(4)

(5)

Svenska utbildningssystemet har m˚anga möjligheter men är till följd även komplicerat.

Man kan g˚a m˚anga olika v¨agar och n˚a samma m˚al men hur hittar man dem och vilken

¨ar snabbast?

Det här projektet har försökt göra det lättare för studenter att hitta rätt väg fr˚an start till slut genom att presentera möjliga studievägar. Det har gjorts genom att samla ihop information om skolor och program i en grafdatabas som sedan har traverserats för att ta ut de/den snabbaste vägarna fr˚an studentens startpunk (exempelvis Grundskolan) till dess m˚al (exempelvis läkare) baserat p˚a ett antal argument.

Intervjuer med studievägledare gjordes för att undersöka hur praktiskt systemet faktiskt

är. En slutsats fr˚an dessa intervjuer är att systemet är användbart men hämmas av det faktum att det har för lite information för användaren. Det är en bra idé men skulle behöva skalas upp med mer data för att bli mer användbart, n˚agot som är väntat d˚a det

är en prototyp. Att fylla databasen med all information är lämnat till framtida arbete d˚a det är för tidskrävande.

(6)

1 Introduction 1

1.1 Delimitations . . . 2

2 Background 2 2.1 Graph databases and Neo4j . . . 2

2.1.1 Neo4j . . . 3

2.2 The data . . . 6

2.2.1 Skolverket and EMIL . . . 6

2.2.2 Jobtech . . . 7

2.3 Preconditions and motivations . . . 8

2.4 Finding a path today . . . 9

2.5 Gestalt principles . . . 9

2.6 Shortest path algorithms . . . 11

3 Method 14 3.1 Example case . . . 14

3.2 Design . . . 15

3.3 Population of database . . . 21

3.4 Forming the path . . . 21

3.4.1 Collecting nodes . . . 22

3.4.2 Finding shortest path . . . 23

3.4.3 Creating the path . . . 25

3.5 Prototype . . . 25

3.6 Evaluation method . . . 28

(7)

5 Conclusion 34

6 Future work 35

6.1 Improvements on the database . . . 35

6.2 Improvements on the path finding . . . 35

6.3 Improving collection of data . . . 36

6.4 Further evaluation . . . 37

(8)

1 Introduction

Education opens up endless opportunities for the individual and an educated population is the premise for a range of benefits for a country [8]. A high percentage (43.3%) of Sweden’s population possess a tertiary education, ranking it as the 13:th best country in this category according to OECD [1]. However the many educational paths available to students can be a bit overwhelming and there often exists several ways to reach the same goal. According to Swedish and international research the student’s choice in study is highly affected by the lack of information about what education and schools are available [23]. In Sweden students have to take their first step towards higher education at the early age of 15 - when they choose a program at an upper secondary school, which will here on be referred to as gymnasium. This is the time where they stake out a path that will eventually lead to their working career. The student might know what job they want to have but does not know the different choices they can make to reach it. The student may not know where the program they chose will lead and might regret their choice later and have to make compensations for that in the future. A change in programs at gymnasium level can lead to extra years to get a degree which can in turn also be costly.

This thesis will be conducted at a company based in Uppsala, called Ava. They work with digitalized solutions for automation of education- and work coaching [3]. It is important for them to guide students through inventive solutions which is why the goal of this project is to come up with a system that will help students find their way by being a guide in the educational jungle. It will be done by building a path finder for students which finds the optimal paths for them. For example student A has just finished compulsory school and is going to choose a program at a gymnasium. The student provides the path finder with their goal, which is to become an Electrical engineer.

They receives a path that includes what different programs on different gymnasiums the student can pick that will lead directly to a program on Uppsala University where the student receives an exam in electrical engineering. Student B has the same end goal but lacks the grades to enter a program that leads directly there. This student is provided with the possible programs and additionally what courses they have to complement with that will lead to the shortest path (in terms of years) to become an electric engineer. With the help of this system the student should be able to get a clear view of his/her options and the optimal path/paths to reach the goal. Hopefully, by providing students with a clear end goal and part goals on the way, this system can help to increase motivation for education, enlighten possibilities and help in choosing an effective path.

(9)

1.1 Delimitations

To keep the project within the time frame of a master thesis the following limitations has been set.

• This system will only be large enough so the path finding can yield interesting results. This includes a geographical limitation where the data will only be from within Uppsala and only include a handful of schools.

• Only higher educations that can be achieved at Uppsala University are included.

Higher educations from other schools will not be in the solution.

• The Swedish educations system is designed with no dead ends, a student can become anything through many means. To keep the project within limits, some opportunities has been omitted while trying to keep the “no dead ends” principle.

• Only two starting points for a student are supported: From elementary school and from Gymnasium.

2 Background

This project is covering several areas, from databases and algorithms to UI design and the data being used.

2.1 Graph databases and Neo4j

Databases are used to hold and store data used by the application. They can be of var- ious scale and types such as relational-, NoSQL- or graph databases. This project will be using a graph database and an explanation of the more traditional relational database will be provided to highlight the differences between the two types of database. In a typ- ical relational databases, RDB, data is stored as rows in tables. Every row in one table have to be of the same data type and therefore in a table holding data about employees, all rows in that table needs to correspond to an employee, same for other entities such as movie reviews, reviewers and employers.

These databases work well when concerned with storing large amounts of data, especially homogeneous data and when the relation between entities in tables are not of high importance. Having a database with a table containing employees and a table containing movies would be good since they do not have much to do with each other. However, if

(10)

you would add another table that contains employers you would suddenly need a relation between employers and employees. This link can be done by just adding a key-attribute to the tables but if information of the relation is of interest, for example you might be interested in why the employee was hired or when, then you will need a table for the relation as well. Continuing on you might want to add friends of each employee and employer and movie reviews. For all of these new tables you have relations and subse- quently you may need tables for those as well. You can see that it becomes complicated quickly and will result in complex and inefficient retrievals of data when relations are concerned. Say that a developer wants to find a movie, its review, who reviewed it and an employee that is a friend of the reviewer, then there will be a large JOIN operation to put together all data from the different tables which is not good for performance [7].

Graphs are essentially vertices with connections between them called edges. Databases can be built with graphs, creating a graph database (GDB), in which the vertices and edges are called nodes and relations respectively. These databases puts relations between data in focus, eliminating many of the problems of an RDB. Here the nodes corresponds to specific entries in the RDB, e.g. a specific employer and the relation corresponds to an entry in the relation table for example between an employer and employee. Therefore information such as date when hired is stored in the relation.

There are several benefits of using a graph database but the most significant one might be its flexibility and performance. It is flexible because it is easy to add and remove relations and nodes in a GDB and the data does not have be uniform, a useful attribute when some data is incomplete or you want to be able to scale up the project or make changes. The queries concerning a GDB is able to be localized to a sub graph so as the data grows the query will not be affected much [17].

2.1.1 Neo4j

The best known graph database according to DB-Engines is called Neo4j [11]. The first prototype was developed by Emil Eifrem et. al [22] in 2000 and Neo4j version 1.0 was released in 2010. They developed it because the handling of non-discrete data was too time consuming when using a relational database, both in a performance view but also from a developing time perspective because of the complex queries [17].

Neo4j is a so called labeled property graph which means it has the following character- istics [17]:

• It contains nodes and relationships.

(11)

• Nodes and relationships can have properties in the form of key-value pairs. The key values would be the equivalent of column names in a relational database while the values would be the value an object has under that column.

• Nodes are labeled with one or more labels. So are relationships. This would in a relational database be the name of a table in which an object belongs to.

• Relationships are directed and are always anchored in a start- and end node.

Figure 1 shows an example of what a very small Neo4j database looks like:

Figure 1 Small database consisting of Employer(Purple), Employee(Blue), Per- son(Teal) and Movie(Orange) and how they are connected. Each of these nodes and relations has a Label and one or more Properties. Relations and nodes with the same labels does not actually have to have the same properties.

Because of the way Neo4j is structured it is much more apt in handling connected data.

A test made by Partner and Vukotic shows the difference between a relational database and Neo4j in an experiment where they queried for friends-of-friends in gradually larger depth. They had a social network of 1 000 000 people where each person had 50 friends [2]. The result is shown in figure 2:

(12)

Figure 2 The difference of performance between Neo4j and a relational database when comparing queries which returns friends of friends with different depth [2].

It is clear that when it comes to connected data, Neo4j is the better choice between the two. When a relational database has to find friends-of-fiends with a depth of 5 it takes too long to finish the query, at this time Neo4j has only suffered an increase of roughly 2 seconds. This is partly due to that Neo4j is using something called index- free adjacancy. This ensures that connected nodes point to each other, resulting in fast traversal over connected data [17].

The Neo4j team has also developed Cypher, a query language for the graph database. It is a compact declarative language with the intent of being easy to learn for developers.

Here is an example Cypher query that fetches a friend of a person with the name ’Alice’:

MATCH ( : P e r s o n {name : ' Alice ' } ) [ : FRIEND OF] >(p : Person ) RETURN p

As can be seen Cypher has been designed so that the syntax closely resembles how one would write it on a white board. To show how cypher together with neo4j can simplify a query, a command retrieving the names of all persons who works at the IT-department is shown both in SQL and in Cypher in figure 3.

Figure 3 Difference between an SQL and Cypher query [10].

(13)

In this report when describing the database, relations will be written in upper-case let- ters and are snake cased like so EXAMPLE RELATION, while labels of nodes will be written using camel case with the first letter being upper-case, ExampleLabel.

2.2 The data

To build the educational guide a lot of data about the Swedish educational system had to be gathered. Schools and counties differ in what they provide and how, and there are no universal hub to gather ALL the data so it had to be fetched from different sources.

Data concerning jobs also had to be gathered to complete the path.

2.2.1 Skolverket and EMIL

Skolverket(roughly translated to Education Administration) is an administrative authority in Sweden whose mission is appointed to by the government. They have several responsibilities, for example they conduct the Study plan, legitimize teachers and most importantly for this project: they gather statistics [24]. Most of the data in this project is gathered from the statistics that Skolverket has produced.

A project called SUSA (Collective Education Information Skolverket-AMS) was created in 2002 with the purpose of handling information about education. Together with an existing education model on studera.nu, developed by H¨ogskoleverket (roughly translated to Higher Education Administration) and SUSA, a foundation was laid for what became known as “EMIL-standard” (Education Information Markup Language). EMIL was created because of the need for different parties to be able to gather data about educational events, information about educations and different schools/providers in a uniform fashion. A hub for gathering of EMIL files was placed on Skolverket and since then several authorities has joined in to continue the development of EMIL. The hub receives educational information from several directions [18] including Skolverket. An API has been created called Susanavet which lets you fetch the EMIL standardized data from the hub [26].

Since 2009 the EMIL standard will be developed by SIS (Svenska Institutet f¨or Stan- darder). Because the educational department is under constant development, the standard has to be continually updated. While there are also systems in place which depend on the standard, SIS has adopted an ambition to revise EMIL relatively often, roughly once a year. That is why the first version unofficially called EMIL 1 differs a bit from the latest version, EMIL 2 (referred to in the report as EMIL). As it stands now the standard is complicated and in addition to things like the description of the information, rules

(14)

for XML-coding and comparing the standard to that of European standards, EMIL also describes the structure of information. This structure is represented by several smaller classes and three main classes [18]:

• Education Info

This class contains information about a specific course, program or course collection.

• Education Provider

This class contains information about an organisation that provides education, for example a school or university.

• Education Event

This class contains information about the ‘when’, ‘where’ and ‘how’ of an education. It refers to both of the other main classes for this information while also contributing with its own data.

The smaller classes are contained as attributes to the three main classes. Examples are Eligibility, what requirements must be fulfilled in order to begin an education and TimeLength, how long an education is.

Susanavet exposes the EMIL files that it receives through an API where much of the data has been gathered for this project but not all. Some data is not relevant for susanavet and is available through other avenues. For example Skolverket has another API just for elementary schools and gymnasiums. Information from Skolverket has been used by Utbildningsinfo to create Beh¨orighetsvisaren (roughly translated to Eligi- bility compass) [29], a graphic illustration of what type of program at gymnasium level that grants what eligibility to further education. Additional information about programs on gymnasium level has been found on Utbildningsguiden (Educational Guide) [28].

2.2.2 Jobtech

Jobtech is developed by another governmental authority called Arbetsf¨ormedlingen (Swedish public employment service) with the purpose of supporting parties wanting to create in- novative solutions for the job market. Just as Skolverket, Jobtech also has an API to access their data. There are a lot of different kinds of data but the important one for this report is found through their taxonomy API [19].

They have structured occupations into three layers: group, field and name. These layers become smaller as you work down the tree. In the highest layer, group, you find

(15)

many occupations that are very broadly in the same category. Here are names such a

“Construction” and “Health care”. Going down in the tree to the field layer the occupations becomes more specific and it is from this layer that data has been used for the project. The next and last layer name, is the most specific and smallest category. If the field occupation was “Doctor” then children nodes in the name sub layer would include specialized types of doctors.

2.3 Preconditions and motivations

As mentioned in the introduction, students have many choices when it comes to education. The choices are affected in different ways. Research from sociologist Pierre Bourdieu suggests a view where the individual possesses different forms of capital [6].

These capitals could affect a student in the educational choice in different ways. The most relevant forms of capital from Bourdieu’s research for this project are Social capital and Economic capital.

A student has higher social capital when they can draw information about education from their family and friends. They can use this capital to make better choices about what they can study too and what path they can take to achieve their educational goal.

Economic capital is about the possessions a student has but also their knowledge about the economic system. In an educational perspective this could be knowledge about if an education will pay off in the end. A social capital can lead to information about this which in turn increases a student’s economic capital. Although these capitals are not meant to only target students and education but are instead more of a means for an individual to climb in society, this report is about education and the focus is on that aspect.

In computer science a common approach to solving problems is something called divide and conquer. With this method a large problem is divided into smaller sub problems which helps when the task seems daunting. A relevant example is when you want to become a doctor but are in a stage where you are going to choose gymnasium. A partitioning of the steps ahead might make the path seem more achievable.

The system that will be developed in this project could help a student compensate for the lack of theses capitals by showing information about what paths they can take and also how much you can earn at the end-point of their path. The division of sub goals will hopefully achieve the same affect as divide and conquer.

(16)

2.4 Finding a path today

As implied in the introduction, it can be complicated to find the right education because information is found on several places. A student has to first know what gymnasium programs they need to attend in order to be eligible for their desired university education.

They can get that information from a tool previously mentioned, Beh¨orighetsvisaren.

Now that they know what programs they can choose and have decided on a few they will need to navigate to a web page where they can see the assortment in their town.

In Uppsala that would be a solution called Dexter, but it is also possible through other avenues. In case the student also wants to know the admission statistics they can find the statistics site for each respective county on gymnasium.se [15]. In this example the student has had to reach for information from three different directions, in different counties it might be more steps or less. Either way it can be hard for a student to get a clear view of their options and possibilities.

The information in the paths that this system produces can be found from several directions, although not one party is providing all the information for a complete path. The work that is most related to this project could be a functionality from gymnasium.se.

They have a tool which can be used to search for a job and receive what type of program and higher education is necessary as well as demand and salary which is precisely what is included in a path on this system. On the same site there is also an option to instead search for a gymnasium program and receive examples of occupations that are fitting to that program. This system will contribute by collecting information from the mentioned parties and presenting it in one place. Unlike the tool on gymnasium.se, this system will be able to show what gymnasiums and institutions that provide the educations necessary and will also be able to filter out gymnasiums based on grades.

2.5 Gestalt principles

Gestalt principles were introduced by Wertheimer in the early 1920’s. They are describing the way people observe elements and how they group them together. Since its inception the gestalt principles has been further developed by psychologists e.g. Kurt Koffka, Wolfgang K¨ohler and Wolfgang Metzger [5]. Today the gestalt principles has been adapted to be used in interaction design and has been used for this project. There are many different principles, sometimes called laws that are applicable in interaction design. Some of them are; proximity, similarity, figure-ground, symmetry and closure.

A short summary of them as well as how they can be used in design is explained below.

• Law of proximity.In the case that there are several objects on the screen, the law

(17)

of proximity says that objects close to each other will be perceived as forming a group [25]. This can be used by grouping related form elements together or keeping a body of text close to the corresponding headline.

• Law of similarity. Instead of forming groups by proximity, objects can be perceived as belonging to the same group by being similar to each other. They can for example be grouped up by the same color or shape [25]. Following up on the form example, the submit buttons could for example be designed in the same way for different forms so the user would know what to expect from them.

• Law of figure-ground. Although this law is perhaps the least relevant for the project it is interesting nonetheless. When looking at a picture it can be divided into two basic parts, figure and ground. Figure is the part which the perceiver focus on and ground is experienced as the background of the picture. This can be used in some figures so that the two parts switches place, meaning the part that is the figure becomes the ground and vice versa depending on what is focused on which results in another interpretation of the figure. A common example is the Rubin Vase [25]. In design it can be used so that when the reader focuses on the important elements the rest recedes into the background and the important part becomes the figure part.

• Law of symmetry. The tendency to perceive objects as symmetrical shapes around their centre. In other words, when two symmetrical objects are observed, they are often grouped together as one object which is symmetrical around the mid-point between the two objects [25].

• Law of closure. Humans are able to “fill in the gaps” when looking at incomplete figures in order to close up the figures. For example when looking at a dotted circle, an observer can easily see the circle as if it where drawn in complete lines, making the figure “complete” instead of seeing a few isolated dots [27]. This principle can be used when creating logos by reducing the amount of elements in the logo thus making it more elegant and engaging.

• Law of continuity. Lines and curves tend to lead the eye of the viewer along. El- ements placed in a line one after another will seem related. This can be used in design when for example browsing related products, then the observer will naturally move their eyes from product to product and know that they are related [5] [9].

These laws or principles has been used in this project mainly for creating a prototype for evaluation.

(18)

2.6 Shortest path algorithms

When traversing a graph there are different approaches. Two widely known ones are Depth First Search (DFS) and Breadth First Search (BFS). A DFS will search the graph by traversing as deep as it can in the graph rather than exploring every node at the same level. When the search reaches a leaf node it will backtrack until it finds a path that has not yet been explored. BFS on the other hand will not go deeper until it has explored all neighbours on the same level first. They have different advantages and disadvantages but BFS is more commonly used for basis for shortest path algorithms [21].

Dijkstra’s algorithm, created by Edsger W. Dijkstra [12], is based on BFS and can be used to find the shortest path in a positively weighted graph between a source- and a goal node or between a source node and all other nodes. The algorithm works as described in the following steps:

1. Collect all nodes of the graph in a list. Every entry in this list will contain information of the total weighted distance to a node from the start node.

2. Set the distance to INFINITY on all nodes except the start node.

3. From the list choose the node with the smallest distance (it will be the start node in the first iteration) that has not already been visited. This will be node n. Mark n as visited.

4. Pick a neighbour, nb, of n. If the weight of nb in the list is larger than the weight of n + the cost of traversing to nb from n in the graph, update the distance of nb in the list to n + cost of n ! nb. Repeat this for all neighbours of n.

5. Break condition 1 - Repeat step 3 and 4 until node n is the goal node. Then you can terminate and the smallest weighted distance to the goal is in its entry in the list.

6. Break condition 2 - Repeat step 3 and 4 until every node in the list is marked visited. The smallest weighted distance to every node from the start node will be in the list.

Depending on the purpose break condition 1 or 2 can be used. Condition 1 focuses on finding the weighted distance to only the goal node while condition 2 will find the shortest weighted distance to all nodes. The time complexity of Dijkstra’s algorithm is O(n²)but can be reduced to O(e + n log n) (where n is the number of nodes and e is the number of edges) by improving the node list structure to be a binary heap [4]. This algorithm finds the smallest weighted distance from the start node but if you also want

(19)

to find the path that results in said distance, additional information needs to be recorded during the algorithm. In step 4 above, when updating the weight also add to nb the current node (n) as a previous node attribute. This way you will know which node every node came from. After that the path can be created by going backwards from the goal node to the start node using the previous node attribute. Below is a series of illustrations of Dijkstra’s algorithm applied on a simple graph where A is the start node and G is the goal node.

Step 1 and 2. The nodes has been put into a list where the distance is set to infinity on all nodes except A, the start node

Step 3. A has been chosen as node n because it has the smallest distance and is not visited.

It is then marked visited, indi- cated by the color green.

Step 4. The neighbours of A are B and C. Each of them has distance infinity so their distance will be updated. Mark A in the Previous attribute so we know we came from node A.

(20)

Step 3 and 4 has been repeated a couple of times. Now D has been selected as node n and is about to update its neighbours.

Note that E has already been updated once by being a neighbour to B.

The distance of D is 5 and the weight to get to E is 1 which adds up to 6. The value that is already at E is 7 so the distance got updated to 6 and the Previ- ous attribute is changed from B to D.

The goal node has been selected as node n and break condition 1 is satisfied (break condition 2 is also satisfied since there are no more unvis- ited nodes) and the algorithm is complete.

The shortest path from A to G can be obtained by “going backwards” from node G to A by looking at the Previous attribute. The shortest path is A - C - D - F - G

(21)

3 Method

The method is divided into three main parts: Design of the database, gathering data and building a traversing algorithm. These three parts was done in an agile fashion meaning they were revised many times throughout the project.

The system supports three different types of goals that a student can choose:

1. The student can choose an education as their goal, thus receiving the path to achieve an exam in the field.

2. The student can can search for a profession. This will work as an extension of the education goal with the target job as a final step in the path. It will also include similar professions as well as average salary and an indicator of the work demand.

3. The last type of goal a student can choose is if they do not know exactly what they want to become or just want to explore their options. This type of goal allows the student to search within a group of occupations and receive paths for a few occupations belonging to the group. The output will be exactly as when a student searches for one specific job but there will be several of them. The paths that the student receives will be paths leading to jobs within the chosen group, sorted based on a priority option that the student supplies. The priority options are

• Time, the student wants to reach the occupations that has the shortest paths i.e. the least amount of time to reach their goal.

• Salary, the student is looking for the occupations that yield the highest salary.

• Demand, the student is looking for the occupation that is in highest demand.

3.1 Example case

Here is an example case presented to showcase some of the system’s functionality.

Pelle is 15 years old and is about to graduate from elementary school with grades amounting into 123 points. He lives in Uppsala and wants to study to become an IT engineer which means he has chosen the second type of goal described in the previous section; to search for a specific profession. He provides the system with his goal(IT engineer), his starting position(finished elementary school), his grades(123) and town(Uppsala). The system returns:

(22)

• Pelle should attend the economy program with focus on economy on any of three gymnasiums: Lundellska skolan, Ansgargymnsiet or Fyrisskolan.

• When he has completed the program he should complement with some courses:

Chemistry 1, Math 3 and 4 and Physics 1 and 2.

• Now he can apply for “Master’s Programme in Computer and Information Engi- neering” at Uppsala University to become an IT engineer.

The estimated total time to reach the IT engineering program is 3.8125 years which is 3 years, 9 months and 23 days. That includes Pelle’s gymnasium program as well as some complementary courses that he needs to add after graduation because his grades is not sufficient for attending a gymnasium program which includes all necessary courses.

Then the system will tell him that the masters program will take 5 more years to complete landing him at a total of 8.8125 years from elementary school to receiving a master’s degree in IT engineering. Of course this is an estimate and assumes that all complementary courses can be taken at any time of the year. He might also find a way to complete courses faster which will lower the time taken. In addition to the time it takes to finish the path and what steps are on the way, Pelle also gets to know what the average salary is for an IT engineer, the demand for it and a few related professions.

3.2 Design

When creating the database design there are two important things to consider. First, keep it simple by reusing nodes from different sources and second, make it easy for the algorithm to traverse.

Courses, schools/institutions and programs are considered to be nodes and information about requirements for these nodes are kept in relations.

The graph database was built in an iterative fashion as more and more functionality was added. It was built to mimic how a student would venture through the educational system in the real world. This both simplifies the implementation of the algorithm and makes the graph structure more intuitive. The design implementation is shown in figures 4 , 5, 6 and 7 and following is the reasoning behind the design choices.

The path finder’s absolute starting point is when the student graduates from elementary school. The starting point is represented in the database as a node labeled GrundSkola (Elementary school). This node is connected to several Town nodes, each representing a Swedish town. To include the gymnasiums of a town the Town nodes are connected to nodes with the label Gymnasium. Each of these Gymnasium nodes are in turn

(23)

connected to Program nodes representing the respective program that each gymnasium offers, yielding in the following structure:

Figure 4 A small version of the design so far. The pink node is Grundskola, the brown are Town nodes, the blue are Gymnasium nodes and the Green are Program nodes.

The relationship between Gymnasium nodes and Program nodes is called HAS PROGRAM and it contains statistics about the required admission points to apply for the program.

As the design stands now the student can choose a town, a gymnasium and a program and receive information about what grades are required.

To qualify for higher education in Sweden a student must have completed certain courses as well as a program at gymnasium level. The courses vary depending on what education the student pursues and the same list of courses can qualify the student for multiple educations. This list of courses is abstracted into something called Qualification areas.

There are several different qualification areas and they are named ’AX’ where ’X’ is a number in the range of 1-15. An example is ’A1’. If a student is eligible for ’A1’

it means that they have completed a gymnasium program and the courses Chemistry 1, Physics 2 and Mathematics 4 and is then qualified for a lawyer education to get a lawyer degree. ’A6’ is a special one because that is qualification for teaching educations. They are separated by level of education into ’A6a’, ’A6b’ and ’A6c’ were the levels are pre- school, elementary school and high school/gymnasium respectively. ’A6c’ is in turn split into 15 different areas depending on the teaching subject. This means that there are

(24)

a total of 32 different qualification areas. A folder structure is shown below for clarity.

A1 - e.g Lawyer

A2 - Modern languages ...

A6a - Pre-school teacher

A6b - Lower and middle elementary school teacher A6c - Upper elementary school and gymnasium teacher

Subject 1 Subject 2 ...

Subject 15

A7 e.g Marine engineer ...

A15 e.g Physiotherapist

So the next step in the graph is to add these qualification areas as nodes. A gymnasium program can lead to several different qualification areas and the qualification areas can be reached by different programs. In other words there is a many to many relation between Program and Qualification area nodes.

As mentioned earlier, qualification areas represent which courses are needed to attend an education in order to receive a specific degree. Therefore the Qualification area nodes has relations pointed to Degree nodes representing all degrees that the area unlocks. To achieve a degree one has to complete an education, hence the Degree nodes point to the Education nodes that are required. Where this education can be taught is known by adding a relation from it to a University node. Just as with the HAS PROGRAM relation the relation between Education and University nodes contains attendee statistics as a property. The version of the graph so far is shown in fig 5.

(25)

Figure 5 Continuing from fig 4, the Program nodes leads to two Qualification area nodes (yellow). The chain of relations continue on with Degree nodes (pink), Education nodes (dark green) and ends up in a University node(teal).

As the graph is now a student can go from elementary school all the way to an education at a university. However, there is no way for a student who picked the “wrong”

gymnasium program to set the course straight or a student who wants to attend a program that does not lead to their goal. That is why instead of just a STRONG relation between Program and Qualification area nodes there should also be a WEAK one. The WEAK relation means that the program will not automatically lead to the qualification area by just attending the obligatory courses of the program but the student has to, withing the frames of the program, actively choose some courses. The courses needed is inside the WEAK relation as a property. This solves the problem for some programs but what if the student has already completed the “wrong” program or chose a program in which there is not enough time within the time span of a gymnasium to choose the right courses? Then the student can of course complement with the required courses.

To solve this, a new type of node called Course is added to the graph. Course nodes has two types of relations directed from them: UNLOCKS and QUALIFIES PARTLY.

UNLOCKS is between courses, for example the Course node named Physics 1 will have an UNLOCKS relation directed to the Course node named Physics 2 because you need to have completed the former before you can start on the latter. QUALIFIES PARTLY is between a Course node and a Qualification area node. Why it says “PARTLY” is because there are several courses with relations directed to the Qualification area node and only when one has completed every one of them will they meet the qualification.

(26)

There were two ways to design this. Either the program will point to all courses that need to be completed post-program to achieve the right qualification or the program will just point to the Course nodes that can be completed immediately after and those will in turn point onward to the unlocked Course nodes which leads to the desired Qualification area node. To follow the reasoning of mimicking a real life path the latter choice was made. The result of adding courses to the graph is show in fig 6.

Figure 6 There are different ways to reach qualification area ’A9’. Some Program nodes(green) has a WEAK relation meaning they can reach ’A9’ within the space of the gymnasium program but they have to actively choose some courses. These courses are in the relation properties. Other programs can reach ’A9’ by completing courses(red) after graduation. Some courses are prerequisites for other courses and all courses with a QUALIFIES PARTLY relation to ’A9’ has to have been completed in order to reach that Qualification area node.

So far a student can achieve a degree either directly through a gymnasium program or by complementing with additional course, so now what remains is a job that can be pointed to by an education. Firstly though there has to be a change in the graph because not all jobs require a higher education, in fact there are a few programs at the gymnasium that is created to prepare the student for work once they have gotten a degree. To solve this an additional label was added to the Program node depending on the type of program:

Practical, if the program leads to a job i.e. is vocational, and Theoretical, if the program leads to higher education.

The Job nodes have a group id property with the purpose of grouping up similar occupations. For example the Job node ’Nurse’ has the same group id as the Job node ’Doctor’.

(27)

The reason for this implementation is that a student might want to work within the medical area and can then find a path that suits the student best and still end up with a job that is within the medical area. As mentioned the student can select such a group and receive a path to a few jobs ordered by a priority that can be either time, salary or demand.

While a property for demand is stored in the Job nodes, the property for salary is stored in the relation connected to the node, CAN BECOME. The reason for this choice is that an occupation can yield different salary depending on the education of the employee.

This means that depending on the priority a job might be reached through a longer path because that will result in a higher salary in the end.

The graph now looks like in fig 7.

Figure 7 A schema of the graph database. Notice how the Program nodes(green) has been split into two different types; Practical and Theoretical.

Note that there are several ways to design the Program and Gymnasium nodes. For example the Gymnasium nodes could be removed and instead the Program nodes would have the Gymnasium as properties or vice versa; the Program nodes could be removed and put into the Gymnasiums nodes as properties. The main reason why this was not done is because of simplicity. The Swedish gymnasium programs all share the same

(28)

obligatory courses and hence the Gymnasium nodes can have relations to the same Program nodes in the graph. This greatly decreases the amount of nodes in the database since there is no need to have a unique node for each Gymnasium/Program node pair, making the graph much simpler and therefore more efficient to traverse and easier to manage.

3.3 Population of database

The data has been gathered from different directions, most of it from Skolverket directly or indirectly. Almost all of the data has been preprocessed and altered before putting in the database. Here is how each segment of data has been gathered:

• Gymnasiums. All gymnasiums, the respective programs and admission points in Uppsala was gathered from an API from Skolverket. The programs were later removed and added manually from the web page of each gymnasium but the admission points were transferred to the new programs.

• Qualification areas. The areas could not be found on any of the API’s and when it was included in Susanavet it used an older system. Because of this the qualification areas was gathered from both a table produced by antagning.se and through Beh¨orighetsvisaren.

• Courses. Beh¨orighetsvisaren has been used for this data too, mostly to insert it in the database and to create relations between them and the right Program and Qualification area nodes. Skolverket has been used to get the points that each course is worth. From an scalable perspective both courses and qualification area are data that hold universally for all of Sweden.

• Exam and Educations. Both of these could be fetched directly from Susanavet.

• Job. The jobs were taken from an API on Jobtech:dev but they were also fetched manually. This was mostly because only a few occupations from the same group was needed for this project.

3.4 Forming the path

The algorithm behind the path finder is Dijkstra’s algorithm which is explained in 2.6.

To differ from the nodes in Dijkstra’s and the nodes in the graph database the word

“element” will be used to indicate a node in Dijkstra’s algorithm and the word “node”

(29)

will be used when talking about the graph database entities. Further on here are some assumptions that has been made that mostly affects the system time-wise:

• When taking complementary courses you can start them at any time of the year.

The reason for this is because it differs depending on where you live and some schools even allow for courses on distance that can be taken at any time.

• It is not specified where the courses are taken but it is assumed that they are offered at Komvux (a school for adults).

• If a Program node has to complete some courses to reach a Qualification area node, it is assumed that these courses will be taken AFTER the student has finished the gymnasium program.

3.4.1 Collecting nodes

To begin traversing the graph using Dijkstra’s algorithm the nodes has first to be col- lected as elements in to a list. This includes all types of nodes except Course nodes because of reasons that is explained in section 3.4.2. While most types of nodes are being considered, far from all nodes are added as elements to the initial list. A reason for this is because a good way of optimising the path finder algorithm is by reducing the amount of elements which has to be visited. It is done by using input information provided by the student and in an iterative fashion finding nodes by proceeding from previous elements. This way the elements are actually chosen from a sub-graph much smaller than the original graph, resulting in a small initial list. The nodes are gathered in the following order:

1. The student provides which town they are in. Only that town node is selected.

2. Only gymnasiums connected to the town is gathered.

3. When selecting the programs there are two limiting factors. First, only programs that are connected to the already selected Gymnasium nodes are chosen. Second, the student provides the system with their grade and therefore gymnasiums which require a higher grade will not be selected.

4. All Qualification area nodes are selected since every theoretical Program node can reach each Qualification area node, even though some paths has to detour through some Course nodes.

(30)

5. Nodes from here on can be gathered by looking at the end goal. Only nodes that can reach the goal node will be selected. This includes Exam, Education and Job nodes. However, depending on the type of goal, Job nodes may not even be necessary and can be omitted from the collection.

Once all elements are in place in the initial list the path finding can start.

3.4.2 Finding shortest path

Just as with the explanation in section 2.6 the algorithm starts off with a list of all nodes, in this case the list with all the relevant elements as explained in section 3.4.1.

The element representing the starting position is assigned the distance value of 0 and the rest is set to infinity. From here it works similar to how Dijkstra’s algorithm is explained in section 2.6 with a few alternations:

The weight in the distance value is the time taken to complete a node. For example the weight between a Gymnasium node and a Program node is the time it would take to complete the gymnasium program corresponding to the Program node. This means that the weight between nodes that does not take time to complete can be 0, for example between Town and Gymnasium. When considering the neighbours of a Program node the system will skip eventual Course nodes between Program and Qualification area.

The main reason for this except for simplicity reasons is that Dijkstra will not take several paths into account. So take for example the courses between the Program node and the Qualification area node in fig 6. There are three COMPLEMENT relations from the Program node but Dijkstra’s algorithm would only choose the fastest, resulting in an incomplete final path. That is why a decision to abstract the Course nodes was made, and worry about them at the next step when the weights of nodes are being updated.

Unlike Program- and Education nodes the courses does not have any time property. In- stead they have a property called points. A student who is studying a course on Komvux at a 100% pace, is completing 20 points per week. This means that a 100 points course, which is how big most courses are will take five weeks to finish. Further on a year is divided into two semesters `a 20 weeks. When calculating the weight between a Program node and a Qualification area node, the points of the courses between is summed up and divided into the number of years they amass to.

Every element will have a previous element list that keeps track of what element was the preceding element. This list is updated for an element every time a jump to it results in a distance update. If the distance is not less than the current, which is the prerequisite for updating the distance, but is instead the same as the current, the previous element list is just appended by the new element.

(31)

For example say that element G has a distance value of 10 and in its previous element list is element B. This means that from the start element, S, to G, the path goes through B and results in a total distance of 10. In a later iteration of Dijkstra’s the algorithm has landed on element C. When looking through C’s neighbours, element G is found and the jump to element G results in a total distance of 8. Now that the path to G through C gives a smaller distance as the path through B, element B is removed from G’s previous element list and C takes its place. If however, going through element D results in a path of the same length (equal total distance), element D will be appended to G’s previous element list. This is illustrated in fig 8.

(a) The path to G from S through C. Other paths

has not been explored. (b) The path to G from S. A shorter path through B has been found.

(c) The path to G from S. An equally long path has been found through D. D has been added to the list of previous nodes together with B.

Figure 8 When there exists several shortest paths, the nodes can have more than one

”previous node” leading up to it. G in this case has 2 nodes; B and D.

(32)

3.4.3 Creating the path

When the list of relevant elements has been traversed and the shortest time to reach the goal has been calculated all that remains is extracting the actual paths leading there.

Several paths can result in the same time in the end which is why there can be more than one path.

The paths are created by going backward from the goal element, following the preceding elements. In fig 7 it is illustrated how this is done when there is only one previous node, the only difference in this case is that each node can have several previous elements.

Following the scenario in fig 8c will yield two paths. In G’s previous node list is B and D. Starting by looking at node B it is not shown in the figure but the node also has a previous node list which only consists of A. In the same way A only has S as a previous node, so the first path will be [S,A,B,G]. The second path is built by first adding D to the path and then prepending D’s only previous node (S), resulting in the path [S,D,G].

Once the paths have been produced, the last step is to add the surrounding information about each node. That means adding how long study time in terms of year each step takes, what courses are needed to be taken (if any), what exam is achieved upon completion of an education and the salary and demand of a job as well as similar jobs.

3.5 Prototype

A prototype of the system was created for evaluation purposes. The purpose of the prototype is to create an interface for the tester in which they can interact with the system as well as get a clear view of the result. To achieve this goal a Flask server was created to host a web page where the testers can access the functionality of the system.

Flask is a micro-framework and suits this purpose because of its property of being only just what is needed and nothing more [13] which allows for a rather simple and small application, which is what this prototype is. The server was launched using Heroku, a web cloud platform [16]. This allows for the server to receive a URL where it can be reached over Internet as opposed to only be available locally on the host machine. The main reason why Heroku was chosen is because is supports GrapheneDB as an addon.

GrapheneDB provides a place to store the Neo4j database on the Internet [14] where it can be reached by the Flask server (deployed on Heroku) from anywhere.

To make sure that the test subjects would not be distracted by the UI design and conse- quently let it influence their answers, the design was created with the gestalt principles in mind. Especially the law of proximity was applicable here: object which are close together are perceived as they are belonging to the same group. This is for example clear when there are more than one result, then the resulting information will be grouped to-

(33)

gether in separate columns based on each respective path. On the same results page the information is lined up from left two right so as to lead the reader’s eyes from result to result in descending order according the the set priority. This is where the law on continuity has been used.

The prototype is meant to be tested on student counselors of Elementary school in Up- psala. Hence, the starting point will always originate from there and not from a gymnasium level. Other than that the functionality of the prototype will include all functionality of the rest of the system. The prototype is shown in fig 9

(34)

(a) Home screen where the user can choose be-

tween an educational or vocational goal (b) The user has chosen a vocational goal and filled in information in the right column to search for within a group of occupations.

(c) The result from the search listed in order from left to right according to a priority. The user has clicked on a specific program to see information about what schools provide the program, additional complementary courses and study time.

Figure 9 A web page for using the system and displaying the results.

(35)

3.6 Evaluation method

To get a sense of how practical the system is, an evaluation was done by having a few student counselors test the system and anonymously answer some questions. For this purpose a prototype was made so that the counselors could easily test the system.

Several study counselors were contacted through mail. After repeated requests a few answered but of those only two accepted to participate in the evaluation. When the interviewees had accepted they were sent a consent form to make sure that they knew what the evaluation was about, that they could retract their answers at any time before submission of this report and that they were to be anonymous throughout.

For each counselor the test was conducted as follows: First an online meeting was set up and the counselor was provided a link to the prototype (see figure of prototype 9).

During the meeting they were told to find an education, a job and a group of occupations based on a priority of choice. The interviews were 30-60 min long and semi structured [20] in which the counselors were asked questions regarding the system, the prototype and the idea of the project as a whole. Follow-up questions was asked if something of interest came up. The questions below was the initial ones asked for the interviews.

• Did the prototype feel unclear?

• Were the results clear?

• In what situations would this be useful?

• Does this prototype answer questions that students actually have? Is something missing?

• Is this something that you think student would use?

• Is it something you would use as a tool when counseling students?

• What students is it useful for?

• Is it more suitable for students who know what they want to do or students who don’t?

• What did you think about being able to search for occupations withing an area and receive several paths?

• Is there anything else you would like to add?

(36)

The answers where written down and then sent to the respective counselor to confirm that they had been understood correctly. Both the questions and answers are in Swedish since it was the native language of both the interviewer and interviewee, but they are translated for the sake of this report.

(37)

4 Results and Discussion

The system can successfully generate study paths from the data in the graph database and the paths can be adapted depending on goal and starting point. Comparatively without this system a student will have to find their study path by gathering information from different directions but what this project has accomplished is to put together all information in to one place with the purpose of generating complete study paths. How practical the system can be discussed from the evaluation answers.

4.1 Answers from evaluation

Invitations for an interview was sent out to 16 student counselors but of the five who answered only two said yes. Hence, the results are based on two counselors but since this was never meant to be a large quantitative study but only a pilot study to get a sense of how the system would function in reality it will do as results. To separate the answers the counselors are nameds1 and s2. Here are the answers:

• Did the prototype feel unclear?

-s1: No, it was clear. Both the functionality and layout.

-s2: It was easy to use but to find jobs within a group is confusing for a student.

When it comes to the web page itself I think it was unclear with two forms on the same page. You don’t know if you have to fill in both. Also ”Grundskole- betyg”(Grades for elementary school) is wrong, it should be ”Meritv¨arde”(merit- value) because that’s what the points converted from grades are.

• Were the results clear?

- s1: No, not very clear. Also, when presenting the results it is perhaps not so wise to recommend natural science programs to a student that just barely passes the math requirements. Other paths might suit that student better. It must be clear that admission statistics is changing from year to year. Important to show that some educations take longer in real life than is displayed here.

-s2: For me as a counselor it was clear. Two improvements to make it more clear:

The required courses should be included in the education section of the results.

On the complementary courses it should also be clear what courses can be taken during a gymnasium program and which can’t.

Another thing to note is how complementing of courses can be done and fits the situation differently. For example if the goal is a civil engineer degree a student might attend a bas˚ar(base year, a year where the student reads up on natural sci-

(38)

ence courses) but if a student wants to become a nurse then taking a few courses at Komvux is enough.

• In what situations would this be useful?

- s1: If a student sat with this they would need a person with experience in the educational system to sit with them. The results can show that choosing natural science is super easy but would in fact not be suited for that student.

- s2: It could be a good tool for student counselors, we who meat the students.

Students could use it but they are so dependant on someone being with them.

• Does this prototype answer questions that students actually have? Is something missing?

-s1: Yes, they often ask about salary and the length of educations. Every student wants to become a doctor so they know the path to that already. Many students ask what schools are good but it is subjective what schools are good and what schools are bad. For example some schools have a sports profile and might suit the athletic student more. The profile of schools could be included in the results.

-s2: Yes, it does. There are not many students who already has a job as a goal but for those who does it answers questions like how? What school? What’s the path?

• Is this something that you think students would use?

- s1: Yes, if it is more complete. However, I think it would be more useful for students at a gymnasium -or higher education level. Elementary school students might not understand everything because they have no point of reference. They might for example think that 10 000 SEK/month is a huge amount of money.

-s2: Yes they could but they rarely take the initiative. Also, it’s a bit dangerous to show them the fastest path since there exists several other paths. There are only a few students with as clear a goal as a job and even fewer an education. However, it could be fun to just check what exists but in the end most students choose by program or school.

• Is it something you would use as a tool when counseling students?

-s1: Yes, for some students but I don’t find it necessary enough so that I would pursue or pay for it. As mentioned I think it would be more suited for students at gymnasium level.

- s2: Yes, it is a good complementary tool for us counselors. A good way of showing the fastest path.

• What students is it useful for (asked only tos1)?

-s1: High achieving students.

(39)

• Is it more suiting for students who knows what they want to do or students who doesn’t?

-s1: It suits both.

-s2: It suits those who knows what they want to do. Other students will find it confusing and will not be very helpful.

• What did you think about being able to search for occupations withing an area and receive several paths?

-s1: It is useful.

-s2: I think it was very good. The combination of searching for a job and getting related jobs is good since most students don’t think in terms of occupation groups.

That’s why its good that related work is in the results for a job.

• Do you think this system could help to increase students’ motivation to study (asked only to s2)?

- s2: The results needs to be more clear, for example what I mentioned earlier about what courses can be taken at a gymnasium program and what must be taken outside. It’s important with alternative educations and jobs. You don’t need to be a doctor to help people, there are other occupations. I think that when you see a clear path it can absolutely increase the motivation.

• Is there anything else you would like to add?

-s1: There are more ways to reach an occupation. For example: You do not have to attend the natural sciences program but you can attend many other programs.

Then to become a doctor you do not have to complete the doctors program but can instead become a nurse first and then research up to become a doctor later.

-s2: It’s important that the admission points are updated. A lot happens from year to year. Otherwise it can be misleading. It’s important to show that there exists many possibilities and it could be good to have not only the fastest path. (Lastly, the web page could be prettier.)

To summarise, the system needs to take into account that students using it might be young and because of that fact should give soft information regarding the results such as some programs might be harder and admission points can change. The system would be useful for student counselors at elementary school but according tos1 it is probably more suited for older students that are on gymnasium level. s2 also expressed concerns about the ability of the students they work with to be able to take advantage of the system. An important point both counselors made is that there are many paths to a goal which are slower than what is shown in the results but could be useful for a student to know about.

(40)

When asked about what type of students would benefit from the system, counselors1 answered ”High achieving students”. A reason for why it would not be suited for students with lower ambitions could be because the system is very focused on higher theoretical educations on University and hence there does not exist any more practical oriented educations. Hopefully this system could be further developed to be a tool to increase a student’s motivation and increase their capital (see section 2.3) by giving them more knowledge about what they can become and how. For this purpose s2 proposed two concrete improvements (see the answer from the question in the second bullet point) and also agreed that giving clear paths would increase a students study motivation.

(41)

5 Conclusion

A conclusion that can be drawn from this project is that a path finder definitely can be created and is applicable to the educational system of Sweden. However, how useful it is depends largely on how complex it can be. With more features and more information both in order to take the users possibly young age into account and to just be more infor- mative, the system will be more useful. To see widespread use, the work of this system would probably have exceeded the scope of a thesis project. Even so, the counselors from the evaluation said that they could include it as a tool to aid in their work. This goes to show that even if it is not a fully fledged system that can replace everything out there it can still be a pointer of what leads where. Of course for some students the path finder will succeed in showing a path that will be the best suited path for them or for a student who has already got a loose plan of how they would reach their goal, this system might confirm it to be a good path.

The choice of using a graph database for this project proved to be a good one. It fell naturally to build the graph as if the educational system would have been mapped out on a whiteboard. That helps to make the traversing intuitive. Being flexible as the Neo4j graph database is it is also possible to change the structure and scale it up without to much work which is definitely something that needs to be done in a future work because of how the educational system is changing frequently.

(42)

6 Future work

The Swedish educational system is advantageous since there are many possibilities for a student. This however complicates the system of this work and has left a few func- tionalities to the future work section.

6.1 Improvements on the database

The database of this system is rather thin, it contains just a bit more data than is necessary to showcase the implemented functionality. Therefore in order to make the system complete an important future work would be to scale up the database. This means scaling it up geographically, including more of Sweden by adding to the labels that already exist i.e. adding all gymnasiums in Sweden as Gymnasium nodes, all towns as Town nodes etc. This will enrich the system and allow several different paths which can originate from other towns than Uppsala.

Another improvement which ties in with the scaling is one that would give more opportunities for the user by adding types of nodes that does not exist in the database.

As it stands now the only types of higher education are those that are taught at university. There are other types of educations that lead to jobs which are taught on e.g. Folkh¨ogskolan and Yrkesh¨ogskolor. For example with the system right now you can only become an assistant nurse by completing the correct program at Gymnasium but in real life a student can also choose to apply for a special course collection post- gymnasium that also qualifies the student for an assistant nurse license.

An interesting aspect of the scaling would be to observe how the system functions with a great increase in the amount of data in the database.

6.2 Improvements on the path finding

A future work to add more advanced options would open up for more tailored paths for the user. The user now has three options concerning the end-goal; education, specific job or group of jobs. The only thing that really change their paths is the starting point, elementary school -or gymnasium level as well as their grades from elementary school.

The student might have a preference of gymnasiums and programs which could be solved rather easy by implementing a list where the student can select their preferences.

Another type of option would solve a possible scenario where the user has completed a gymnasium program and perhaps some other stray courses and now decides what they