Visualization of a blog search engine index using 3D graphics

Full text

(1)Examensarbete LITH-ITN-MT-EX--07/007--SE. Visualization of a blog search engine index using 3D graphics Linus Engback Malin Nilsson 2007-02-12. Department of Science and Technology Linköpings Universitet SE-601 74 Norrköping, Sweden. Institutionen för teknik och naturvetenskap Linköpings Universitet 601 74 Norrköping.

(2) LITH-ITN-MT-EX--07/007--SE. Visualization of a blog search engine index using 3D graphics Examensarbete utfört i medieteknik vid Linköpings Tekniska Högskola, Campus Norrköping. Linus Engback Malin Nilsson Handledare Martin Källström Examinator Matt Cooper Norrköping 2007-02-12.

(3) Datum Date. Avdelning, Institution Division, Department Institutionen för teknik och naturvetenskap. 2007-02-12. Department of Science and Technology. Språk Language. Rapporttyp Report category. Svenska/Swedish x Engelska/English. Examensarbete B-uppsats C-uppsats x D-uppsats. ISBN _____________________________________________________ ISRN LITH-ITN-MT-EX--07/007--SE _________________________________________________________________ Serietitel och serienummer ISSN Title of series, numbering ___________________________________. _ ________________ _ ________________. URL för elektronisk version. Titel Title. Visualization of a blog search engine index using 3D graphics. Författare Author. Linus Engback, Malin Nilsson. Sammanfattning Abstract The purpose. of this thesis is to find ways to make the extent and constant movement in the blogosphere visible. An application has been developed using C# and OpenGL. The application is an interactive screensaver to be run on the Windows platform. It visualizes data combining 3D and 2D elements. Geographical data is rendered using a model of the Earth, where the blog posts are constantly updated. Various statistics are displayed to give information on the current state of the blogosphere.. Nyckelord Keyword. blog, blogosphere, visualization, c#, .net, opengl, 3D graphics.

(4) Upphovsrätt Detta dokument hålls tillgängligt på Internet – eller dess framtida ersättare – under en längre tid från publiceringsdatum under förutsättning att inga extraordinära omständigheter uppstår. Tillgång till dokumentet innebär tillstånd för var och en att läsa, ladda ner, skriva ut enstaka kopior för enskilt bruk och att använda det oförändrat för ickekommersiell forskning och för undervisning. Överföring av upphovsrätten vid en senare tidpunkt kan inte upphäva detta tillstånd. All annan användning av dokumentet kräver upphovsmannens medgivande. För att garantera äktheten, säkerheten och tillgängligheten finns det lösningar av teknisk och administrativ art. Upphovsmannens ideella rätt innefattar rätt att bli nämnd som upphovsman i den omfattning som god sed kräver vid användning av dokumentet på ovan beskrivna sätt samt skydd mot att dokumentet ändras eller presenteras i sådan form eller i sådant sammanhang som är kränkande för upphovsmannens litterära eller konstnärliga anseende eller egenart. För ytterligare information om Linköping University Electronic Press se förlagets hemsida http://www.ep.liu.se/ Copyright The publishers will keep this document online on the Internet - or its possible replacement - for a considerable time from the date of publication barring exceptional circumstances. The online availability of the document implies a permanent permission for anyone to read, to download, to print out single copies for your own use and to use it unchanged for any non-commercial research and educational purpose. Subsequent transfers of copyright cannot revoke this permission. All other uses of the document are conditional on the consent of the copyright owner. The publisher has taken technical and administrative measures to assure authenticity, security and accessibility. According to intellectual property law the author has the right to be mentioned when his/her work is accessed as described above and to be protected against infringement. For additional information about the Linköping University Electronic Press and its procedures for publication and for assurance of document integrity, please refer to its WWW home page: http://www.ep.liu.se/. © Linus Engback, Malin Nilsson.

(5) Abstract The number of blogs is constantly growing.. There is a need for methods to. collect and analyze information about this everchanging collection of blogs. The purpose of this thesis is to nd ways to make the extent and constant movement in this blogosphere visible. This is made possible by using a blog aggregation framework developed at Primelabs in Linköping. In this framework, blog data is continuously collected to be made available for various services. To meet the purpose of the thesis an application has been developed using C# and OpenGL. The application is an interactive screensaver to be run on the Windows platform. It visualizes data combining 3D and 2D elements. Geographical data is rendered using a model of the Earth, where the blog posts are constantly updated. Various statistics are displayed to give information on the current state of the blogosphere. User testing has shown that the application is attractive to use and informative. The user interface is acceptable but still needs some adjustment, also the presentation of some of the statistics could be improved. Testing has also shown that the application requires fairly advanced graphics hardware for smooth operation. It has also been concluded that an installer program is needed if the application is to be distributed and that it should be possible to customize the user experience to a greater extent.. i.

(6) ii.

(7) Acknowledgement Writing a master thesis at Primelabs in Linköping has been an amazing experience.. The atmosphere at the Linköping oce is friendly and creates an. environment where there is room to grow and creativity ows. We hope we will be able to follow Primelabs as the company conquers the blogosphere. We want to send a special thanks to Martin Källström for unlimited ideas and support, Björn Milton for a nearly innite patience and help, Niclas Wiström for doing all the behind the scenes work that had to be done for things to work and Carl Fredrik Wettermark for inspiration and teaching us how to write proper C# (even though we may not always have listened).. Our gratitude also goes to. Johan for serving Malin coee and thereby saving Linus a lot of pain. Our gratitude and respect also goes to our examiner and supervisors at Linköping university. Matthew Cooper, our examiner, has believed in us and given us helpful advice. Jimmy Johansson has helped us a lot in keeping things together, not letting the project get out of hand and has helped us in all ways possible.. We envy Patric Ljung for his knowledge in OpenGL and computer. graphics and thank him for using this knowledge to help us when we were stuck. We also would like to thank our opponents Henrik Engström and Jens Raine for their feedback. Some people who have helped us with this thesis in other ways also deserve a thank you. John Wilander, we want to thank you for always seeing the broader picture, giving us good advice not limited to this thesis. glögg that kept us warm.. Thank you for the. Claes Müllern-Aspegren, or 'Class' as we prefer to. call him at work, thank you for clearing out who is Pi and who is Pu and brightening our days with fairytales about the blogosphere. Finally, a we want to thank our moms and dads for their loving support. You have always believed in us, even early mornings. We also thank our siblings who have supported us. Daniel, thank you for having more programming skills than us combined and using it to point out where there is a more clever solution. There is a reason why you earn more money than us. And last, everyone we have forgotten to thank: you too deserve a special thank you! Linköping, January 2007. Linus Engback and Malin Nilsson. iii.

(8) iv.

(9) Once upon a time there was a blogosphere, but it was so large that no one could see it. But one day two energetic it-people discovered 'the Blog'.. They asked the blogosphere why it looked so sad.. It. answered: - I am visible to nobody. The two it-people, we can call them Pi and Pu, then bravely said to the blogosphere: - We are going to visualize you, in a way that no one has ever visualized you before!. Claes Müllern-Aspegren. v.

(10) vi.

(11) Contents 1 Introduction. 1. 1.1. Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1.2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 1.3. Limitations. 2. 1.4. Basic conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1. 2. 1.4.1. Platform and hardware. . . . . . . . . . . . . . . . . . . .. 2. 1.4.2. Provided data . . . . . . . . . . . . . . . . . . . . . . . . .. 2. 1.5. Sources. 1.6. Methodology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 1.7. Target group. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 1.8. Thesis Outline. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 4. 1.9. Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 5. I Theoretical background. 2 Agile Estimating and Planning. 7 9. 2.1. Purpose. 2.2. User stories. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 2.3. Story point estimation . . . . . . . . . . . . . . . . . . . . . . . .. 10. 2.4. Implementation phase. 2.3.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 3. Techniques for estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 9. 10 11. 3 The blogosphere. 13. II Research. 15. 4 Concept development 4.1. 4.2. Inspiration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17 17. 4.1.1. Visualizing planet Earth . . . . . . . . . . . . . . . . . . .. 17. 4.1.2. Mixing 2D and 3D . . . . . . . . . . . . . . . . . . . . . .. 17. Sketches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 18. 4.2.1. 20. Main visualization 1: Planet Earth . . . . . . . . . . . . .. vii.

(12) 4.3. 4.2.2. Main visualization 2: The Tornado . . . . . . . . . . . . .. 21. 4.2.3. Secondary visualizations . . . . . . . . . . . . . . . . . . .. 22. Cg mock-ups . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 24. 5 Technologies. Implementation language. 5.2. 3D solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 25. 5.2.1. OpenGL implementations for C# . . . . . . . . . . . . . .. 26. Xml-parser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 27. 5.3. . . . . . . . . . . . . . . . . . . . . . .. 25. 5.1. III Design and implementation. 29. 6 The framework for the application 6.1. Screensaver. 6.2. Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7 Version 1 7.1. Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 7.2. Choices and issues. Handling time in the system. 7.2.2. Data reading. 7.2.3. 7.3. . . . . . . . . . . . . . . . . . . . . . . . . . .. 7.2.1. 31 31. 32. 33 33. 34. . . . . . . . . . . . . . . . .. 34. . . . . . . . . . . . . . . . . . . . . . . . . .. 35. 7.2.2.1. Compression. . . . . . . . . . . . . . . . . . . . .. 7.2.2.2. File transfer. . . . . . . . . . . . . . . . . . . . .. 35. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 36. 7.2.3.1. Creating Earth . . . . . . . . . . . . . . . . . . .. 36. 7.2.3.2. Visualizing blog posts on Earth. 37. 7.2.3.3. Finding the nearest disk to a latitude and longi-. Earth. . . . . . . . . .. 35. tude . . . . . . . . . . . . . . . . . . . . . . . . .. 39. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39. 7.2.5. Text and image drawing . . . . . . . . . . . . . . . . . . .. 41. 7.2.6. Saving and loading data . . . . . . . . . . . . . . . . . . .. 41. Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 42. 7.3.1. Evaluation. 43. 7.3.2. Final thoughts. 7.2.4. Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8 Version 2 8.1. Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 8.2. Choices and issues. 8.3. 25. . . . . . . . . . . . . . . . . . . . . . . . . . .. 43. 45 45. 46. 8.2.1. Scroll List and Information Box . . . . . . . . . . . . . . .. 46. 8.2.2. Statistics. 47. 8.2.3. Optimization. 8.2.4. Usability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48. Outcome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 48. 8.3.1. Evaluation. 49. 8.3.2. Final thoughts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. viii. 51.

(13) IV Conclusion. 53. 9 Discussion and conclusion 10 Future work 10.1 Core functionality. . . . . . . . . . . . . . . . . . . . . . . . . . .. 55 57 57. 10.2 Additional features . . . . . . . . . . . . . . . . . . . . . . . . . .. 57. 10.3 New concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 58. Bibliography A Program structure B Look up table C Performance testing D Graphical elements. 59 61 63 65 67. ix.

(14) x.

(15) List of Figures 4.1.1 Internet trac ows and Internet multicast backbone illustrating arches spanning the Earth.. . . . . . . . . . . . . . . . . . . . . .. 18. 4.1.2 Msn history visualization mixing labels placed in 2D with data placed in 3D.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 4.1.3 System to display and analyze complex information, here used as an example on how a mapping between 2D- and 3D-space can be done. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 19. 4.1.4 Google Earth, an example on how 2D elements can be overlayered without being intrusive.. . . . . . . . . . . . . . . . . . . . . . . .. 20. 4.2.1 Blog posts visualized in a tornado-like spiral. Each square represents one blog post, slowly sliding down the spiral as time passes.. 22. 4.2.2 Six ideas of secondary visualizations showing statistics. . . . . . .. 23. 4.2.3 Cg mock-up of Earth with dierent level of detail.. . . . . . . . .. 24. 7.1.1 Sketch of the layout and graphical elements in version 1. . . . . .. 34. 7.2.1 Texture of Earth in a standard cylindrical map projection used to place continents on the rendered globe. . . . . . . . . . . . . . 7.2.2 In a) and b) a time based function is used for animation.. 37. As. seen in b) there is a glitch when two blog posts appear shortly after one and other. The same situations using a Pi-regulator are shown in c) and d) . No glitch appear. . . . . . . . . . . . . . . .. 38. 7.2.3 Earth with spheres depicting the region where there is a risk of border cases.. Pink spheres should always be to right of cyan. spheres for the algorithm to work properly. In this image this is clearly the case. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 40. 7.2.4 Earth with lines for testing the algorithm. Cyan lines are the test positions and yellow lines are the closest disk chosen. As can be seen the algorithm is working ne.. . . . . . . . . . . . . . . . . .. 7.3.1 A screen shot of the layout as seen in the nalized version 1.. . .. 40 42. 8.2.1 Here the problem with pixel mapping can clearly be seen. The text must have a consistent brightness, no matter how it is mapped to the pixel grid. Here dierent bars in the E fades too much into the background. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.3.1 A screenshot of version 2, the nal version of the application.. xi. . .. 46 50.

(16) A.0.1Schematic view of the application.. . . . . . . . . . . . . . . . . .. 61. B.0.1Schematic illustration of the data structure used to aid mapping latitudes and longitudes to certain disks. . . . . . . . . . . . . . . D.0.1The graphical elements in version 2.. xii. . . . . . . . . . . . . . . . .. 64 67.

(17) List of Tables C.0.1Typical gures showing performance depending on what is rendered. By Earth is meant the surface of the globe, not including the aggregated blog posts rendered on top of it. . . . . . . . . . .. xiii. 65.

(18) xiv.

(19) Chapter 1. Introduction 1.1. Background. Primelabs is a small company in Linköping that is about to release its own blog aggregation framework, Twingly. The data collected will not be used for a regular search web page, but for a range of dierent blog-related services. Primelabs will primarily provide business to business services. These services will become more and more attractive as the blogosphere grows and gains momentum. Statistics show that in 2006 the number of blogs doubled every seventh months and there were about 1.3 million blog posts written every day, which is twice as many as the year before. [19] This master thesis is about developing a screensaver that visualizes the ow of blog posts, and at the same time shows some statistics of the indexed blogs and blog posts.. The screensaver will be downloadable free of charge and the. main purpose will be to market the search engine, to bring positive value to Primelabs and bring the user closer to the Twingly experience.. 1.2. Objectives. The purpose of this thesis is to discover and explore concrete ways to visualize the extent and constant movement in the blogosphere. This shall be done through studying the blogosphere and visualizing techniques and merging these two very interesting areas. The aim is to develop a functional piece of software which can be used as a platform for further development. To achieve the goals put up by Primelabs the screensaver should be easy to use and intuitive, but at the same time interesting enough for continued use. It also has to be aesthetically pleasing and be 'cool'. The screensaver should not only be entertaining. For those who wish it should also be informative and a way to nd interesting blog posts. Getting the screensaver up and running should be easy. There should, however, be more options available for the more advanced user; the user who wants. 1.

(20) to customize his or her experience should be able to register on the Twingly web page allowing for the application to do various ltering based on the web page settings. By this connection to the web page the user will be more closely connected to Twingly.com.. 1.3. Limitations. Due to the limited time available for this thesis some limitations in the scope had to be made. We mainly focus on the core application, not prioritizing noncentral functionality more important for Primelabs than for the thesis. Hence such functions as the connection to the Twingly web page will not have priority. The compatibility issue will also have to be of low priority, as it is more important for the thesis to have a complete functional solution than to explore every possible combination of graphics hardware.. 1.4. Basic conditions. The application will be distributed in the form of a screensaver. It is desirable that all functionality which is specic for screensavers should be implemented. The application should be scalable, both in the number of clients and in the amount of data each client can handle.. 1.4.1 Platform and hardware The application need only to run on the Microsoft Windows platform equipped with .net and an internet connection will be a prerequisite to receive the continuous blog post data. All computers running this application will have accelerated graphics hardware.. This has been decided by Primelabs since most computers already are. equipped with this, and it gives opportunity for us to implement far more advanced graphics.. 1.4.2 Provided data The application will be able to download a stream of data over the Internet. The stream will be in the form of chunks provided periodically. The period will be decided at the data provider side and can be dynamically changed. A normal period would be two minutes. The rate of the data will be a function of the number of blog posts written, the backlog in the server and resources available for processing data on the provider side. This will make the ow of data rather uneven. The volume of the data will be considerable. An average data rate of 100 kbit/s will not be uncommon. The main data available will be a stream of blog posts where for each the following will be available:. 2.

(21) •. Blog name and url.. •. Date/time when the post was inserted into the database.. •. Post url, the url to the specic blog post at hand.. •. Title of the blog post.. •. Language, iso 639 language code.. •. Latitude and longitude, coordinates on Earth where the blogger is situated, or rather, where the server is located.. •. Country, iso 3166-1 alpha-2 country code.. •. Region and City.. •. Links, with a title and latitude and longitude. Links from this blog post to other blogs and blog posts.. The latitude and longitude provided is extracted on the server side by using a commercially available ip-address to location table. There is often a large geographical distance between the actual user and the server they use for publishing their blog. This is because there are many large blog hotels that have global users.. Since we get the latitude and longitude of the server the geographical. information does not always reect where the user is located. The application will also be able to download statistics on demand.. •. Current total number of blogs (globally or in given country/language).. •. Number of posts per hour/day/week/month during a given interval.. •. Number of posts per language in the last hour/day/week/month.. Finally a service providing user information will be implemented by Primelabs. This service will provide data where selected user data such as favorite blogs and tags can be extracted. To get information about a certain user the user's password will be required.. 1.5. Sources. Factual information used in this report has been found primarily in books and articles, either printed or published on the Internet. When deciding if a source was reliable enough for our purposes, we sometimes rated the same source dierently depending on our ability to judge the quality of the information and how central the information is to the report. For obvious reasons inspiration material about visualizations did not need a quality assessment, since this material is not used as facts.. 3.

(22) 1.6. Methodology. We have chosen to use an adaptation of the methodology suggested in Agile Estimating and Planning (aep) [10], described in more detail in chapter 2. This method is the main method used at Primelabs. The company uses it because it is a dynamic method which solves of some of the problems present in many other methods.. It is a method which focus on short iterations, frequent feedback. and continuous planning.. This makes it better adapted to handle real world. situations. Since there are only two team members involved in this project and we have very close contact with our project owners we have adapted the method to our needs. iterations.. We use a few long iterations in which there are smaller sub-. Frequent evaluation of the to-do list is done and the priorities of. tasks are changed as needed. Adding tasks, removing tasks and changing tasks is also done as it becomes more clear where our focus should be. In this process feedback from our project owners is also used to make new decisions. By working in this way a very exible development is obtained.. 1.7. Target group. The target group for this report are those who study information technologies or similar. An interest in and basic knowledge about the blogosphere is benecial but not necessary. The report is of special interest for people working with data visualization and 3D graphics programming.. 1.8. Thesis Outline. Part I: Theoretical background.. This part gives a theoretical background. to the methodology used in this project and an introduction to the blogosphere.. Chapter 2: Agile estimating and planning.. A brief summary of the. methodology which has been adapted for use in this project.. Chapter 3: The blogosphere.. Here a basic introduction to the blogosphere. is made for those who have no prior knowledge in the area.. Part II: Research.. In this part a discussion and presentation is made of the. research that was done before the implementation phase.. Chapter 4: Concept development.. A chapter about where inspiration was. found, and how the concepts were created.. Chapter 5: Technologies.. This chapter is a short comparison between dif-. ferent technologies that could be used in this project.. 4.

(23) Part III: Design and implementation.. The implementation phase is de-. scribed in this part.. Chapter 6: The framework for the application.. Before the actual imple-. mentation could commence a solid foundation had to be in place. This chapter is about the screensaver functionality and the main structure of the project.. Chapter 7: Version 1.. Here the implementation of the rst version is de-. scribed, focus is placed on what choices were made, what issues arose and reection on the outcome.. Chapter 8: Version 2.. The second version is described in this chapter, in. much the same manner as the rst version.. Part IV: Conclusion. Chapter 9: Discussion and conclusion.. This part reects back and summarizes the thesis. Here the outcome is compared to. the objectives proposed in chapter 1.2.. Chapter 10: Future work.. A collection of ideas and concepts for future. development.. 1.9. Glossary. •. aep - Agile Estimation and Planning.. •. api - Application Programming Interface.. •. blog - Web log, see chapter 3 for more information.. •. blog post - A post in a web log, see chapter 3 for more information.. •. cg - Computer Graphics.. •. cpu - Central Processing Unit.. •. crc - Cyclic Redundancy Check.. •. cylinder - In this report a cylinder means the same as a disk (mentioned below), but with a height indicating aggregated blog posts.. •. disk - In this report, one of the objects that the Earth surface is made of.. •. fps - Frames Per Second.. •. ftp - File Transfer Protocol.. •. gpu - Graphics Processing Unit.. •. http - HyperText Transfer Protocol.. 5.

(24) •. iso - International Organization for Standardization.. •. opengl - Open Graphics Library.. •. tag - In a blog context: keyword describing the content of a blog or blog post.. •. url - Uniform Resource Locator.. •. xml - Extensible Markup Language.. •. zip - A le compression format.. 6.

(25) Part I. Theoretical background. 7.

(26) 8.

(27) Chapter 2. Agile Estimating and Planning 2.1. Purpose. To succeed in software development estimating and planning is important, regardless of the size of the project.. In many common software development. methodologies very detailed plans and specications are used.. This does not. necessarily make development easier and can be misleading. Aep focuses more on continuous planning than on the use of rigid plans. This is because making a plan is an eort to nd the optimal solution to the product development. Making the plan at the very beginning of the project will not work since it is hard to see everything that will happen as the plan is executed. Instead it has to be an iterative process where the plan is extended and adapted gradually. Good planning will have these benets [10]:. •. Establishing trust.. •. Conveying information.. •. Reducing risk and uncertainty.. •. Supporting better decision making.. 2.2. User stories. In aep rather than making exact specication such as 'as the mouse moves 1 pixel to the right...' user stories are used. These describe scenarios and general features, such as 'there should be a mouse interaction which...'.. The stories. together with drawings, diagrams and other descriptions outline the application.. 9.

(28) 2.3. Story point estimation. When all stories in the project have been set up, they are given points relative to each other.. The points reect the size of the story, where size means the. amount of eort, complexity, risk and so on involved in developing the story. This means a story which is given twice as many points as another is considered as being twice the size. It would be easy to consider this as another way to do time estimations of the project. But the points are not directly related to time. Instead the team will use a measure called. velocity which is the number of points completed in. the last iteration the team did. For this to work obviously the length in time of the iterations has to be roughly the same. It may seem using points and velocity to estimate how much eort the project will take is making things complicated without much benet. But the benet of this method is that the planning errors are self correcting over time. If the velocity was wrongly estimated it will be corrected to reect the actual speed of progress, since the points are relative to each other. In the normal case both the size of tasks and the speed of work will be estimated and hence there will be a double error. By using aep one of these errors can be eliminated, only the size is estimated.. 2.3.1 Techniques for estimations When using aep the estimations of a story are made collaboratively. There is well-known evidence that estimates made by the one who will do the work are better than estimates by someone else. Still, collaboration is used since we tend not to know who will do the work in the end. For most teams it's easier to estimate if the numbers are within a given scale. The larger the story at hand, the less accurate the estimation will be. This is a reason why it might be a good idea to use a scale where the gaps between numbers increase with size. For example the Fibonacci sequence could be used. The three most common techniques for estimating are:. Expert Opinion. . A common way to estimate is to ask someone who has. special knowledge in the area. This is less useful in aep since the points should be related to how dicult the story is for the team.. However, a nice benet. is that it does not take much time to ask someone, and that often it is more accurate than an analytical approach. [14]. Analogy. . This is when we compare the story at hand to other stories which. we already have given points to. Here it's important to use. triangulation, that. is it should be compared to several other stories to establish its relative size.. Disaggregation. . Here the story is split into smaller pieces which can be. estimated. This might be a good approach, but the stories should not be split into too small pieces since the combined error then might become very large. An approach that combines the three is. planning poker. Here everyone in the. team should participate. If there are more than 10 people in the team you might consider splitting the team when doing planning poker. Each person is given a deck of cards with the points that can be given to a story. For each user story. 10.

(29) each person privately selects a card which reects the number of points she or he thinks is appropriate. When everyone has chosen a card they simultaneously ips their cards, this to avoid the participants aecting each other. points will most likely dier.. Now the. If so the persons giving the highest and lowest. scores will explain why they selected their particular points. This because these persons might have thought of something no one else thought about. Finally the group can talk about the story for a few minutes before everyone makes a re-estimation. This is done over and over until everyone has the same number of points. This will rarely take more than three rounds.. 2.4. Implementation phase. When time comes to start implementing this is done in iterations. The length of each iteration is decided from factors such as the release date of the entire project, the ease of getting feedback and the amount of uncertainty. A normal iteration length would be between two and four weeks. Each iteration is planned at a meeting with all people involved.. At the. meeting it is decided what stories should be included in the iteration. decision is made by rst prioritizing the user stories.. The. Next as many of the. highest prioritized stories as possible are selected, without using up more points than the team can do in one iteration. Each user story is split into smaller tasks with an estimation of the actual time needed for the task. At the meeting no particular person is assigned to a task. During the iteration all individuals pick one or two tasks at a time to complete. No one is supposed to select a new task before their previous task is completed. In the end of each iteration no loose ends should be present. That is, only completed user stories should be part of the version at hand. Points for uncompleted stories can not be counted when calculating the velocity. By having this rule we ensure that focus is on completing tasks. Which is productive since multitasking decreases the overall productivity. Having a complete working version in the end of each iteration also guarantees we do not postpone unattractive problems and bug xes.. 11.

(30) 12.

(31) Chapter 3. The blogosphere A blog is, in essence, a journal written on a website. Being published on the Internet gives the opportunity to make it far more than a regular journal. One of the main points with having a blog is having other people read it, comment on it and to have links to other blogs. This makes blogs something new which has not been seen before. A blog is not a forum. This because a blog is centralized around one or a few persons, whereas a forum is a collaborative eort. [15] Technologically a blog would normally be dierent from a regular web page in that it is published either using software on your computer or a web form where the tools are simplied to suit the needs of a blog. This makes learning how to write blog posts fast. It also adds an amount of standardization to the blogosphere which aides the tracking services. A blog post would typically consist of the following: [4]. •. Title, the headline of the post.. •. Body, the main content of the post.. •. Permalink, the url of the current post.. •. Post date, when the post was published.. To this comes optionally:. •. Comments, from other users.. •. Categories, often in the form of 'tags', key words describing the contents.. •. Trackback and or pingback, gives the possibility to see when other people have linked to your blog and to automatically notify others when you link to theirs.. The collective of blogs is often referred to as the blogosphere.. Other terms. used are blogtopia, blogspace, blogiverse and blogistan. This blogosphere can be thought of as a very large social network.. 13. Topics and discussions spread.

(32) from blog to blog just as discussions in everyday social life. That these social interactions take place in a technologically highly developed environment gives opportunities never seen before to track the patterns in the communications. There are several tracking services which use the hypertext links and the metadata present in blogs to reveal patterns and see what subjects are popular. Using this information e.g.. ranking can be done, can allow one to determine. what blog is the most inuential. [5]. 14.

(33) Part II. Research. 15.

(34) 16.

(35) Chapter 4. Concept development Concept development was done by starting with a broad and open search for inspiration and ideas. Next a number of loose concepts were drawn on paper. After that a pruning was done to select the ideas with the most potential. Finally a computer graphics (cg) mock-up was made from selected ideas.. 4.1. Inspiration. When looking for inspiration the Internet was primarily used. There are plenty of sources containing visualizations, as well as other graphics that can be found by simple browsing.. 4.1.1 Visualizing planet Earth On the web page 'An Atlas of Cyberspaces' [11] many visualizations make use of the Earth as a logical means to display data with a geographical connection. In the leftmost image in gure 4.1.1 there are arches spanning the globe using colors indicating trac load. The geography on the globe is very modest, while still very functional, which leaves space for the information to be in focus. In the rightmost image we nd a more crude rendering. The interesting part here would be the way that it uses dierent heights of the arches to separate them.. 4.1.2 Mixing 2D and 3D Mixing 2D and 3D can combine the strengths of both techniques: the clarity of 2D and visualization power of 3D. By not only layering 2D over 3D or the other way around impressive eects can also be obtained. In gure 4.1.2 from the website of Rojas [17] we can see how the labels are clearly presented in two dimensions, whereas the data is nicely presented in three dimensions.. Spline. curves are used to connect labels with data over the dimension border. In gure 4.1.3 we can see an example of how three dimensional data can be given more meaning by being mapped onto a two dimensional surface. The 3D. 17.

(36) Figure 4.1.1: Internet trac ows and Internet multicast backbone illustrating arches spanning the Earth.. data would be likely to lose some of its information if being projected into a two dimensional space, at the same time displaying the map in three dimensions would mean certain parts would not be visible at all times. Although being a somewhat cheap rendering, it still displays a way to combine the two views. [12] In gure 4.1.4 there are among other things stars moving in the background giving the illusion that the planet is actually in space but this is mainly an example of how two dimensional data can simply be over-layered without being too intrusive. The designers work with transparency and anti-aliasing to make it occupy as little screen real-estate and making it as subdued as possible. [6]. 4.2. Sketches. Sketches were done as a form of brainstorming for developing ideas and for making our ideas more concrete. At this stage no narrowing down was made. A wide approach was used to avoid missing any potentially useful ideas. The next step in the process was selecting those ideas with potential, eliminate ideas without potential, and grouping similar ideas. The concepts could be grouped along two axes.. •. Real time visualization versus statistics.. •. Main visualization versus secondary visualization.. Most of our main visualizations are real time, but can also be extended to display statistics.. After some more culling two main visualizations were left,. planet Earth and the tornado described in the following sections.. 18.

(37) Figure 4.1.2: Msn history visualization mixing labels placed in 2D with data placed in 3D.. Figure 4.1.3: System to display and analyze complex information, here used as an example on how a mapping between 2D- and 3D-space can be done.. 19.

(38) Figure 4.1.4: Google Earth, an example on how 2D elements can be overlayered without being intrusive.. 4.2.1 Main visualization 1: Planet Earth Since geographical data on the blogs is available to this project, a geographical visualization is an obvious solution. This, in combination with the implementation being done in 3D, makes an implementation based on planet Earth attractive. This is a real-time visualization where statistics is limited to potentially showing where on Earth most blog posts are written. Through discussions with Primelabs and our research and sketching, the following was decided. Rendering of the Earth could be as follows:. •. Not photorealistic. A more computer game or movie look is wanted.. •. Transparent, or nearly transparent seas.. •. Stars in the background to enhance the feeling of planet rotation.. •. Continents could be rendered as small disks, as outlines or as single colored elds.. Our ideas on how the blog posts arriving could be visualized on the Earth are the following:. •. Beams of light reaching space from the core of Earth passing through the hull in the place where the post was written.. 20.

(39) •. Text reaching out in a similar fashion to the light beams. The text should preferably be the title of the blog post.. •. Objects falling from space reaching the place where the post was written.. •. Objects falling and piling up on Earth building mountains where many people are blogging.. •. Stars falling from the sky as shooting stars reaching the place where a post was written.. •. Objects passing by the location of the camera hence being displayed in great size in the beginning of the fall, shrinking to dot-size when reaching Earth. In the beginning of the fall text such as the title of the blog post could be visible.. •. Small lights ashing on Earth.. In a blog post links to other blogs might be present. It is desirable to visualize these since they make interconnection of the blogosphere apparent.. To avoid. cluttering of the visualization and information overload not all links can be shown at all time. Hence a function for selection and removal of links is needed.. •. Arches or straight lines passing straight through Earth.. If the seas are. selected to be wholly or partly transparent this will be a usable solution.. •. Arches spanning the globe on the outside. This requires rather advanced design to give a consistent look.. 4.2.2 Main visualization 2: The Tornado A spiral has the special property of both being linear and periodic. One can follow a path along the spiral, but one can also more swiftly move in a radial direction. One can also easily make older data grow smaller compared to new, either by selecting a perspective or by making size a function of time. From this the idea of the Tornado was derived, see sketch in gure 4.2.1. The Tornado will show a real time visualization as well as showing statistics over when blog posts are written.. •. The main shape of the visualization will be a spiral. The spiral will have its largest radius in the top and will be increasingly smaller when moving in the down direction. Hence it will have the shape of a tornado.. •. To enhance the 'tornado' look some programing will be used to give it the proper shape and possibly movement.. •. Blog posts will be visualized as small units coming from the side of the screen and then attaching to the spiral.. 21.

(40) Figure 4.2.1: Blog posts visualized in a tornado-like spiral. Each square represents one blog post, slowly sliding down the spiral as time passes.. •. The blog posts will be visible until they are so far down in the spiral that they disappear due to the perspective foreshortening.. •. Blog posts of dierent importance can be marked by dierent colors or dierent size.. •. The spiral will rotate with constant speed. If the period is selected to be, for example, 24 hours then the spiral can be used to see tendencies, such as when people are blogging the most.. •. An 'on mouse over' eect might be wanted. This way the user could get more information on a particular post by hovering the mouse pointer over the post.. 4.2.3 Secondary visualizations These visualization are intended to be displayed in smaller formats at the same time as the primary visualization. They will be more directed towards statistics. Their purpose is mainly to provide information, being easy to read and not stealing attention from the main visualization. The ideas listed below are illustrated in gure 4.2.2.. Statistics pie chart with time axis (a).. This is an extension to the. normal pie chart. Not only does it show the present state, it also shows what happens over time. This is done by putting a series of pie charts on top of each other.. Sparkline (b).. A sparkline is a small graph without either axis or legend.. It is basically just a line showing tendencies. No exact values can be derived. 22.

(41) Figure 4.2.2: Six ideas of secondary visualizations showing statistics.. from it, but it provides a very easy to grasp and unobtrusive way to display data. [20]. Bar chart (c).. A bar chart is a well known way to visualize data.. Our. version would be fairly regular, but would aim to be non-intrusive and simple. The ideas from the sparkline could also be applied here, that is having a minimum of decorations. For displaying, for example, weeks shading the weekend would be discreet yet informative.. Stacked area chart with mirroring (d).. The same discreet style used. in our sparkline and bar chart could also be used here, although a legend would probably be necessary. Our only addition to the regular stacked area is centering the chart around a horizontal line.. Snakes on a graph (e).. This idea comes from the previous thoughts about. spirals. Here statistics is shown in spiral form which makes it possible to see how things evolve over time at the same time as the spiral form makes it possible to see tendencies e.g. what time of the day most people are blogging. Here we have three dierent ideas on how the data could be displayed along this spiral axis.. •. Pearls. A predetermined amount of blog posts correspond to one pearl. These gather on the axis making bigger clusters when the data ow is large.. •. Pearls with variable size. increases.. Instead of clustering the pearls the pearl size. This poses a still unsolved problem, as the pearls will also. extend on the time axis.. •. Tube with variable width. The amount of blog posts corresponds to the diameter of the tube via some appropriate function.. This could possi-. bly give visual problems when rapid changes occur since we would like a smooth outline.. Counters (f).. A counter is not a very revolutionary way to show data. Yet,. for some data like the total number of blogs in the blogosphere it is appropriate.. 23.

(42) Figure 4.2.3: Cg mock-up of Earth with dierent level of detail.. To make it a bit more interesting it can be designed so that changing numbers happen through some sort of animation, e.g. rotating cylinders.. 4.3. Cg mock-ups. To test the ideas and get a more real image of how it would look some more detailed computer graphics (cg) mock-ups were made. Since our 3D model of Earth was the most complex of the primary visualizations we focused our eorts on that model. Google Earth [6] was used to obtain the continent outlines of the planet. Various ltering tricks were used to pick the image apart and make a layered image. Working with layers made it possible to try dierent coloring, shading and lighting. The results (gure 4.2.3) were encouraging which made us decide this was a track worth following.. 24.

(43) Chapter 5. Technologies 5.1. Implementation language. For the implementation three dierent object-oriented languages were considered.. •. Java by Sun Microsystems. This is a multi-platform language, a property which was considered the most important feature in this language.. •. C++. This language has long been the de-facto standard for applications where both object orientation and performance is important. It is also a native language of both OpenGL and DirectX. For this project the major drawback is the more time consuming development compared to the other alternatives.. •. C# by the Microsoft Corporation. C# is Microsoft's fusion between C++ and Java. It is basically C++ enhanced with what they considered the advantages of Java and some other enhancements. Primelabs mainly develop in C# since it makes software development more rapid.. For the above reasons we chose C# as our implementation language. The Java cross-platform capabilities are not enough to raise it over the other languages, C++ takes too much time from the actual development and C# is already the company language of Primelabs. It being the prime language of the company also gives the great advantage of extensive knowledge and experience being available on site.. 5.2. 3D solutions. One of the core components of this project is the use of 3D graphics. Today it is hard to do advanced 3D graphics without using hardware acceleration. This is because the parallel processing capabilities of modern graphics hardware makes it possible to render scenes with high level of detail in real time.. 25.

(44) In the market there are two reasonable solutions for using accelerated 3D graphics.. One is OpenGL, an open standard governed by The OpenGL Ar-. chitecture Review Board (arb), the other is DirectX, owned and controlled by the Microsoft Corporation.. A comparison between the two was made before. selecting which to use. We found that DirectX is very game oriented, meaning it has lots of features useful for game developers such as a joystick interface. This is however of limited importance to this project since the features that might be of use in this project are also available in OpenGL. The fact that Microsoft has released a version of DirectX designed for the .net 2.0 framework is an argument for using it, since the implementation language of use will be C#. OpenGL is an open standard supported in many operating systems. DirectX on the other hand is proprietary and can only be used with Microsoft Windows. Even though we have chosen to use C# which is also proprietary, using OpenGL still makes the bulk of the code reusable with minor changes. Given the fact that Primelabs recommended us to use OpenGL, OpenGL being an open portable standard and other dierences being minor we chose to use OpenGL.. 5.2.1 OpenGL implementations for C# To nd a good OpenGL solution for C# we did research using the Internet. We found the following solutions.. •. Colin Fahey's C# wrapper for OpenGL. forms a 'pure C#' wrapper. to the OpenGL application programming interface (api) for the Windows operation system. It is limited to only support the functions of OpenGL 1.1 and no extensions. Colin Fahey himself recommend the Tao framework for serious applications. [13]. •. CsGL. is a C# graphics library. The latest news on the web page is from. 2003, and there is also a notice that development has essentially stopped. Also on this web page they suggest using Tao. [1]. •. SharpGL. is another C# wrapper.. Unfortunately the web page is bro-. ken and somewhat confusing, so we could not nd any documentation on SharpGL. This makes the purpose a bit unclear. [2]. •. Tao framework.. The Tao framework is a open source, language-neutral. framework, working on multiple platforms. Tao seems to be a living and professional project. It being suitable for our needs was conrmed through testing. [9] The Tao framework was the implementation chosen. noticeable aws or recommended the use of Tao.. 26. All other solutions had.

(45) 5.3. Xml-parser. The format for the data transported in the project is xml (Extensible Markup Language). This format has the advantage of being a common standard, making sure there is functionality available for handling it. In C# we use the functionality in. System.XML to read the data.. Another advantage is that xml is readable. to humans as well as computers. The major drawback for us in using xml is the very large overhead (such as tag names and formatting), in our case more than 100%. This problem is partly solved by using compression.. 27.

(46) 28.

(47) Part III. Design and implementation. 29.

(48) 30.

(49) Chapter 6. The framework for the application 6.1. Screensaver. Since one of the basic conditions for the project is that it should be run as a screensaver, solving this issue had a high priority. To avoid problems with integrating screensaver functionality into a fairly large application, the integration was done at the very beginning of the project. To do this some research was made into the structure of screensavers. Although a screensaver could be as simple as a regular executable with a changed lename extension, implementing full screensaver capabilities is not trivial. Making the project a screensaver was a requirement, but was not the focus, therefore we chose to search for a working open source screensaver which could be altered to suit our needs.. This way we would save valuable project time,. avoid inventing the wheel again, and could get a base which is well tested. The screensaver of choice was found on the website 'The Code Project' [16]. It has the advantage of having good multiple monitor support, can be run both in a window and full screen, and nally it implements all special screensaver functionality such as the settings dialog. The project is also fairly well structured and not too hard to get acquainted with. The screensaver design was structured so that the actual drawing is separated from the screensaver layer. This was a good structure but, since we couldn't settle with just regular drawing, changes had to be made in all layers. Modifying the base screensaver layer was complicated by the fact that the application was designed to be very versatile when it comes to how windowing is done. The screensaver of use in this project can be run in three basic modes; as a common window, as a screensaver and as a preview. In addition to this, in screensaver mode it can be run either as one window spanning all screens or multiple windows with one on each screen. This gives four dierent setups which had to be modied to implement a Tao OpenGL window.. 31. Doing this.

(50) was fairly uncomplicated, the only major problem being that preview window did not work properly and a bug in the implementation of Tao forcing us to make the OpenGL window slightly larger than the screen.. We also had to. change the registration of mouse and keyboard events to listen to the OpenGL context instead of the frame beneath. In doing this it was discovered that not all functionality is properly implemented in Tao. Some keyboard functionality proved not to be working properly.. 6.2. Structure. It was considered important to have a good structure for the application before starting to implement features. In the bottom of the application is the screensaver framework mentioned in chapter 6.1. On top of this is a central core of functionality. This takes care of central tasks such as coordinating the application, rendering, passing data, timing and keyboard and mouse interaction. Attached to the core is the main data reader, a number of visualizations and various support functionality. The program structure can be seen in appendix A. When the application is run it has an initial phase and a running phase. As much of the calculation as possible should be done in the initial phase rather than in the running phase since memory is not a limiting factor whereas processor and graphics processing unit (gpu) is a scarce resource when drawing frames at high frequency. The delay in the program start is clearly less annoying than slower rendering. The incoming data used in the application is provided as xml les polled from the server. The les downloaded contain a number of blog posts collected over some time period and header data including 'time to next le' and 'total number of blog posts in the database'.. The 'time to next le'-variable is a. number of seconds used to tell the application how long to wait until it requests a new le, and the 'total number of blog posts in the database'-variable is used for displaying statistics. The main part of the le consists of blog posts. Each post has basic information consisting of blog name and id, and post title and id. Many posts have localization data such as latitude, longitude, country, region and city.. With this information comes a date when it was created, language. information and a list of links, if available.. 32.

(51) Chapter 7. Version 1 7.1. Goal. The goal of the rst version was set in cooperation with the ceo of Primelabs. Dierent ideas were evaluated on the criteria of visual impact, information gain, feasibility and time consumption.. The applications should not necessarily be. a screensaver, an ordinary application would be enough.. No special start up. sequence is needed and no customization or user handling should be in this version. The rst priority was reading data from a web server, using compression, and nally anti-aliasing. Since the application should work on a computer anywhere in the world, reading from a web server was essential. As shown earlier in chapter 1.4.2, the amount of data that has to be transferred is large. The data proved to be compressible down to 10% of its previous size, hence this is a technology that should be included. To make the visualization look good, and to be able to use small details, anti-aliasing is crucial. The second priority was the basic elements needed to make the actual visualization. The amount of screen space available is limited, hence there is only room for one primary visualization. Having two would also make the application less kept together. The Earth was selected over the tornado since the second lacks geographical representation. The Earth should be rendered as small disks evenly distributed on the landmasses. The height of a disk (or rather cylinder) represents the number of blog posts that have arrived in that location, slowly decreasing over time. To even out the height of the cylinders a non-linear function was to be used. When blog posts arrive the corresponding cylinder pops up and ashes. The Earth should be slowly spinning, apart from when the user interacts using the mouse to spin the globe. Apart from the globe there should also be a Twingly logotype, a digital clock and a counter showing the total number of blog posts in the database. The counter should increase using an interpolation function to get a continuous. 33.

(52) Figure 7.1.1: Sketch of the layout and graphical elements in version 1.. update, since the data ow isn't continuous. The font face used was to be 'Arial'. Finally shading and coloring should be decided in cooperation with Primelabs. Priority three was a 'statistics ring' placed around the globe showing the proportion of blog posts written in dierent languages. Text labels should be displayed showing the names of the languages. It's important to visualize the connection between incoming blog posts and the ring. Hence the ring should only depict the current state. Last priority was a list showing the titles of incoming blog posts. This should be a fast scrolling list giving the feeling of movement in the blogosphere. A sketch of version one can be seen in gure 7.1.1.. 7.2. Choices and issues. 7.2.1 Handling time in the system In every blog post a time stamp is present that shows what time it was registered in the database.. To avoid downloading many small les the blog posts are. collected over a longer time span (e.g. two minutes) and are then made available to the client.. For this reason all blog posts in one data le should not be. visualized at the time they are available. To solve this problem in a rst approach the system used a time delay. The delay in the system was set when the very rst blog post arrived. When the main application requested all due blog posts it dequeued all blog posts having. 34.

(53) a time stamp smaller than the present time, with a compensation for the system time delay. This way the posts were displayed at approximately the same rate as they arrived in the database, but later. In theory this was a good solution to the problem. It would mirror reality as closely as possible.. Some testing proved that in real life this solution had. some drawbacks. The time stamps in the blog posts do not properly reect the time when the post was written, but rather the time when the data processing clients have nished their tasks. For this reason many of the blog posts tend to appear at the same instant and at the same location. To solve this problem a simplied version was implemented.. This evened. out the ow of blog posts so that they are shown at an even pace. To make it look less calculated, a randomization was used to vary the pace slightly. The fact that many blog posts from the same location tend to end up after one another made it clear that the order of the blog posts had to be randomized as well. The consequence of using this method is that the visualization does look more realistic. We get a more lively view, but at the same time it's slightly less connected to the present reality.. 7.2.2 Data reading The data reader takes care of le transfer, parsing and queuing of the data. In the data reader extensive exception handling has been necessary. Transferring les over the Internet is unreliable and les can arrive damaged, empty or even not at all.. For our application to be stable all possible scenarios have to be. handled. In this context a decision was made not to implement any retransmissions. Erroneous data is simply dropped and execution continues. Since each single package of data is not critical for the working of the application, this is a feasible solution.. 7.2.2.1 Compression File transfer as well as compression had to be designed in close cooperation with Primelabs.. This because the provider side and the application must be. compatible. We chose to use a third party package from the company ionic [7] for doing the compression. The package uses zip for compression, a well known format which can be read by most decompression tools. Not using a proprietary standard makes it easier both to test the application and to make changes, for example on the server side, without aecting the client. When testing the application it turned out the package lacks cyclic redundancy check (crc). For this reason data was often accepted despite being corrupted, which complicated our debugging of the le transfer.. 7.2.2.2 File transfer At rst fetching the le from the Internet was done using the le transfer protocol (ftp) for the data transfer. Ftp was chosen in cooperation with Primelabs. 35.

(54) since it is a well tested protocol for le transfer, but was swiftly abandoned since it uses a port number which is often blocked in rewalls because the ftp protocol is often considered to be insecure. Ftp also demands some sort of login, even when it is used anonymously. For the same reason ftp is session-oriented, which means resources must be allocated for each connection. Instead the hypertext transfer protocol (http) was used. Most of the problems for our application inherent in ftp are not present when using http. Http is not session-oriented and demands no login. It also uses a port number hardly ever blocked in a rewall since it is used for web browsing.. 7.2.3 Earth. 7.2.3.1 Creating Earth When it was decided the Earth should be rendered as small disks placed a small distance from each other, how to spread the disks evenly was our greatest problem.. The algorithm used had to give a good result both by the equator. and the poles. Some ideas for a solution were found in a web page concerning algorithms for packing disks on a sphere [18]. Using this information we decided to use an adapted version of the Sa and Kuijlaars algorithm. From the solid round surface the shape of the landmasses had to be cut. As decided earlier no seas should be visible, hence these disks should not be drawn. This was accomplish by using a texture image of Earth, see gure 7.2.1. For each disk a look up was made in the texture to nd whether the disk is located on a landmass or in the sea. The way Earth is drawn creates some challenges. First of all small islands, and even some rather large islands are not drawn. This is partly deliberate, to make the map clearer, and partly an unwanted eect of the xed positions of the disks. This could potentially make people upset since the place they are living in is not drawn. In some cases it will also mean their blog posts are discarded since they are considered to be located in the ocean. This problem could partly be solved by editing the texture le and tweaking the shapes of the continents. Another solution would be to nd a better algorithm for placing the disks. Drawing the Earth in this fashion also has the disadvantage of using a very large number of polygons. This is heavy on the graphics card, and testing has shown that this is by far the most resource-consuming task in the application, see appendix C. Obviously this part had to be optimized to increase the frame rate.. One idea was to use a display list for storing and later rendering the. Earth. However, display lists only store the calls for making the drawing. Hence OpenGL operations that are inecient will still be inecient when using a display list. Some of these inecient operations are OpenGL translate, rotate and color changes, operations which are abundantly used in the application. Hence this idea was not the solution to the problem.. 36.

(55) Figure 7.2.1: Texture of Earth in a standard cylindrical map projection used to place continents on the rendered globe.. 7.2.3.2 Visualizing blog posts on Earth Visualizing the blog posts as cylinders is done in a very straightforward fashion. All blog posts are used to populate a list, where they are grouped by their nearest disk calculated from their latitude and longitude, see chapter 7.2.3.3. Then the placing of the cylinder is done by regular OpenGL rotate and translate calls. The displayed height of the cylinders is not a linear representation of the number of posts because this number varies greatly and locations with few posts would be very diminished next to locations with many posts.. Several. functions for avoiding this have been tested and evaluated. The function should be designed to grow fast for small data and slower for larger data. It should preferably not grow innitely. The. arctan function has all these properties but. was discarded since it is rather computational heavy. An inverted exponential function also has the desired properties, but proved to be hard to optimize for both small and large data. A. logarithm lls all demands, apart from having an. upper limit. But the function is the one used since even a very large erroneous estimation of the possible number of posts will have very small consequences. If posts were allowed to remain in the list forever, new blog posts would have increasingly smaller impact while old posts would be considered as important as new. On the other hand, if a post would disappear at too high a rate the extent of the blogosphere would not be shown.. Hence a function for making. blog posts slowly degrade has been implemented. This way old posts will have gradually smaller and smaller inuence.. 37.

(56) Figure 7.2.2: In a) and b) a time based function is used for animation. As seen in b) there is a glitch when two blog posts appear shortly after one and other. The same situations using a Pi-regulator are shown in c) and d) .. No glitch. appear.. It was wanted to make it very visible when new blog posts arrive. On each new blog post the cylinders should pop up and ash in a dierent color. This presented some challenges. The most obvious solution to the pop eect would have been to make each jump look the same.. This was not an entirely good. solution since it would look fairly bad when new posts arrive before the previous jump is nished. Instead it was decided to use a pi-regulator. These regulators will always regulate towards the correct value, if properly implemented. Hence new blog posts will simply change the target value for the regulator.. An il-. lustration of these two methods can be seen in gure 7.2.2. The pi-regulator was set up in a slightly unorthodox way.. A normal regulator would aim at. regulating fast with as little over-steering as possible. In our case our aim was to regulate so slowly that is would be visible, while still giving a fast feel, and having so much over-steering that the jump eect would look good. Hence the proportional factor was lowered and the integrating factor was increased, while still being safely under the instability limit.. Regulating in this way gave very nice results, but combining it with the logarithm function made the jumps not visible for large values. To solve this issue the integrating variable was forced to a rather large value on each new blog post. That is, when a blog post arrives to a cylinder the regulator gets a kick which will make the over steer proportional to the height of the cylinder.. 38.

(57) 7.2.3.3 Finding the nearest disk to a latitude and longitude Since the Earth rendering is drawn as a number of small disks, nding which disk a certain latitude and longitude maps to is necessary. The problem is not trivial though, the function used for placing the disks is not invertible. The problem was solved by using a multidimensional data structure and an algorithm for searching it, for details see appendix B.. This algorithm will nd the nearest. disk within a given radius. The radius helps with making sure locations close to the sea will end up on land. When developing this algorithm, potential problems with border cases were identied. It was concluded there was a small risk of the algorithm failing to choose the best disk. For verifying that this does not happen some on screen testing was done. An example of the results can be seen in gure 7.2.3. These tests did not prove that this will never happen, but made it clear it was likely it wouldn't. This, together with the fact that the error will be at the most a displacement by one disk makes this algorithm good enough. Finally, testing was done using a number of locations all over the globe. Proving beyond any doubt that the algorithm is correct was deemed unfeasible, especially since the algorithm was not designed to be totally accurate.. But. screen testing, seen in gure 7.2.4, gave very good results.. 7.2.4 Statistics As an additional visualization a statistics ring has been implemented. This is basically a pie chart where the center has been cut-out, leaving only a narrow ring. The ring has been designed to be general enough to allow an easy change of data. Although the goal was to ll the ring with language data, in the current implementation the ring shows what countries blog posts are written in. This since the language data was not available.. The statistics ring uses a degrade. function much in the same manner as the blog post list to slowly fade out old posts. It was desirable to animate the changes in the ring. Using the same type of. pi-regulation as on the Earth proved to be a bad choice since the movement of one slice aected all other slices in the ring. A pi-regulator simply gave too much chaotic movement in the ring. Instead a linear interpolation was used. This gave the advantage of being predictable and smooth, but has the disadvantage that no animation is visible when the increase in one statistics item is small.. To. compensate for the lack of movement a color ash was added. For a slice in the statistics ring to have a pleasing look it should not be too small. To avoid this happening a cut-o has been used. This will move statistics items which are too small into an 'other' category.. When a statistics item is. moved it shrinks, and the 'other' category grows at the same rate. The same happens in reverse when a statistics item leaves the 'other' category. To make the labels of the ring more informative, the iso-country codes available in the data had to be replaced by real country names. Research showed that there was no simple way to use culture information in the Windows system. 39.

(58) Figure 7.2.3: Earth with spheres depicting the region where there is a risk of border cases.. Pink spheres should always be to right of cyan spheres for the. algorithm to work properly. In this image this is clearly the case.. Figure 7.2.4:. Earth with lines for testing the algorithm.. Cyan lines are the. test positions and yellow lines are the closest disk chosen. As can be seen the algorithm is working ne.. 40.

(59) classes for doing this.. Instead iso-code data was fetched from Wikipedia [8].. The names were also edited to be the most commonly known form.. 7.2.5 Text and image drawing When researching for the project it became obvious that there is no simple way to draw regular at text on top of the OpenGL contents. Doing this is a necessary part of the project. It turned out the best way to accomplish this is to create an image object onto which text is written. This image is then used as a texture on a polygon. The polygon is then properly placed and scaled to make the text look as if it has been drawn at. For all this to work well the image should have a transparent background and the mapping of the image pixels to the screen pixels should be 1:1. The drawing order of the textures is important to get the transparency to work properly.. The textures are only transparent in respect to each other if. they are drawn in the right order.. This problem become apparent when the. textures are drawn in the same plane. Placing and scaling the textures took some eort since it was decided that it should both be possible to place the texture using 2D coordinates relative to the window and 3D coordinates relative to the OpenGL space. In all cases the pixel mapping had to be 1:1, regardless of the distance to the camera. To simplify the equation it was decided the camera had to be placed along the x-axis.. 7.2.6 Saving and loading data While development continued it became apparent the application was more appealing when more accumulated data was visible. It was frustrating to have to start over again collecting data each time the application was run. For this reason it was decided, in cooperation with Primelabs, that the current state should be stored on disk when closing the application and then loaded back from disk when starting. This makes the data displayed more loosely connected to what is really happening since there will be large gaps in time in the data. Considering the benet of not having to start over each program run, this was considered a minor problem. Saving and loading the state was implemented by serializing the needed data and then feeding it to a le. A feature requested by Primelabs was to be able to record sequences of data, and then later replay it. The reason is that it should be possible to demonstrate the application even when there is no internet connection available, or there are problems on the data provider side. Implementing this was done by saving incoming data in an xml-le formatted like normal input data.. Making the. application play back the le could then simply be done by using the same code as is used when using live data.. 41.

(60) Figure 7.3.1: A screen shot of the layout as seen in the nalized version 1.. 7.3. Outcome. The rst version of the application does fulll most of the goals. It does read data with compression, render and animate the Earth, allow for mouse interaction, has a post counter and a clock, a logotype and a statistics ring. can be seen in gure 7.3.1.. The layout. Most of these elements turned out the way they. were intended to. The statistics ring displays what country blogs are written in, rather than languages. Issues on the data provider side with nding what language blog posts were written in made this necessary.. Also the statistics. ring has been given a far longer degrade time than was originally planned. This gives a better connection to the cylinders on the Earth, but gives a weaker connection to incoming data. To compensate for this the ends of the segments ash in response to incoming data. The title scroll list that had least priority in this version was not implemented. There was simply not enough time to nish this feature. Saving and loading the state was implemented, as was recording and playing data. These two related features were not included in the original plan, but were considered crucial for the success of the application. A major issue is the poor performance, manifested in a very low frame rate, typically 12 frames per second (fps) and even slower when interacting.. 42. Also.

No results found