Christoph Olbrich
Seminar paper draft
Seminar aus Informationswirtschaft
o. Univ. Prof. Dkfm. Dr. Wolfgang H. Janko
Univ.-Ass. Mag. Dr. Stefan Koch
Abteilung f¨ur Informationswirtschaft
Wirtschaftsuniversit¨at Wien, UZA II 3.Ebene
Augasse 2-6, A-1090 Wien, Austria
Die Nachfrage nach Daten im Internet erh¨ohte sich gewaltig innerhalb
der letzten Jahre, Server wurden immer leistungsf¨ahiger und die
Bandbreite sowohl der Internetanschl¨usse f¨ur End-User als auch die
der Backbones stieg enorm innerhalb der letzten zehn Jahre. Trotzdem
erleben End-User immer wieder lange Wartezeiten beim Aufruf
einer Webseite oder beim Download einer Datei. Gr¨unde f¨ur diese
schlechte Performance k¨onnen sowohl direkt beim Server liegen (z.B.
schlechte Performance von Server-seitigen Anwendungen oder w¨ahrend
flash crowds) als auch bei der Netzinfrastruktur (z.B. lange geographische
Distanzen, Netzwerk¨uberlastung, etc.).
In dieser Seminararbeit m¨ochte ich einige Verfahren vorstellen um
derartige Probleme zu vermeiden und die Performance von Datentransfers
¨uber das Internet anzuheben. Ich werde Peer-to-Peer-Netzwerke
(P2P), “content delivery networks” (CDNs) sowie Proxy-Server vorstellen
und einen kurzen ¨Uberblick ¨uber Server-seitige Beschleunigungsmethoden
geben.
Keywords:
Kernpunkte f¨ur das Management:
2
Contents
1 Preface 4
2 Internet Infrastructure 4
3 Caching techniques 4
3.1 Proxy Servers . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Content Delivery Networks (CDNs) . . . . . . . . . . . . . . . 5
4 Peer-to-Peer networks (P2P) 5
4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 P2P network architecture . . . . . . . . . . . . . . . . . . . . 5
4.2.1 The napster model . . . . . . . . . . . . . . . . . . . . 6
4.2.2 The gnutella model . . . . . . . . . . . . . . . . . . . . 6
4.2.3 HAN (hierarchical network architecture) model . . . . 6
4.2.4 The BitTorrent model . . . . . . . . . . . . . . . . . . 6
4.3 P2P network traffic analysis . . . . . . . . . . . . . . . . . . . 6
4.4 P2P networks to handle flash crowds . . . . . . . . . . . . . . 6
5 Other techniques to improve internet performance 6
5.1 Load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
6 Conclusion 7
3
1 Preface
2 Internet Infrastructure
In this chapter I will give a short overview over todays internet infrastructure
and the processes involved, when establishing a connection between two hosts
over the internet.
Important points I will mention:
• backbones, Internet Service Provider, End User Connection
• establishing a connection: TCP/IP, DNS, HTTP.
• static versus dynamic content
• internet bandwidth today
• identifying the bottlenecks
3 Caching techniques
In this chapter I will discuss different caching techniques used for static and
dynamic content caching. On the static content caching side I will concentrate
on proxy caching and on a special case of “proxy caching”: content
distribution networks.
Dynamic content is not as easy to cache as static content as it is essential
to always check whether the cached content is not already outdated. Dynamic
content caching can be implemented at the process of content generation (e.g.
database caching, caching on the application side), at the process of content
delivery or even at the client side (e.g. browser caching).
3.1 Proxy Servers
Proxy servers act as intermediary servers between users who request content
and servers that serve content. When a user sends a request for content he
does not send the request directly to the server that serves the content but
to the proxy server that fetches the requested content and delivers it to the
user. In most cases proxy servers cache the content they fetch. When a user
sends a request for content that the proxy server already keeps in its cache,
the proxy server delivers the content directly from its cache. This technique
can save bandwidth and reduce response times. But proxy servers can go
4
further and prefetch popular content in order to reduce the response time
also for the first user who sends a request.
In this chapter I will describe the functioning of a proxy server (e.g.
Squid), how prefetching can be realised and I will present an approach to
cache dynamic content.
3.2 Content Delivery Networks (CDNs)
Content delivery networks are a network of servers that cache mainly static
content. The servers are geographically distributed and the nearest or most
idle server delivers the content to the user. As there is not a single server
that has to handle all requests and most of the time the answering server is
located near the user, response times are kept low and bandwidth is saved.
CDNs can be compared with distributer/retailer warehouses [HoKiSm04],
only that they store and distribute content instead of physical goods.
In this chapter I will describe the different architectural approaches to
content delivery networks (overlay approach versus network approach) and
some optimization models for CDNs.
4 Peer-to-Peer networks (P2P)
Peer-to-Peer networks changed the storage mode of the internet from a “content
located in center” mode to “content located in edge” mode [Shir02].
In almost all P2P network models (the napster model is an exception) the
whole network does not depend on centerly located servers that could be
single points of failure or potential bottlenecks, but on its peers. This leads
to high scalability and high redundancy.
In this chapter I will discuss different architectural approaches to P2P
networks.
4.1 Definition
In this section I will give a definition of P2P networks as they are not only
used to share files, but also to share computing capacity (e.g. Seti@home).
4.2 P2P network architecture
In this section I will describe the most common P2P network architectures.
I will outline the main differences and show the improvements in newer architectures.
5
4.2.1 The napster model
The napster model is one of the first P2P network models. It needs centrally
located infrastructure to organize its nodes and is therefore not as reliable as
other P2P network architectures that do not need any central server.
4.2.2 The gnutella model
The gnutella model does not need any centrally based infrastructure, it is
completely based on its nodes. Its biggest shortcoming is its poor search
performance.
4.2.3 HAN (hierarchical network architecture) model
The HAN introduces super nodes into the gnutella model to achieve signifi-
cant search performance gains.
4.2.4 The BitTorrent model
4.3 P2P network traffic analysis
This section will contain an anlysis of traffic patterns on P2P networks.
4.4 P2P networks to handle flash crowds
In this section I would like to introduce a model according to [RuSaSt04].
5 Other techniques to improve internet performance
In this chapter I want to give a short overview of server side techniques
to improve WWW performance. Load balancers use different policies to
distribute user requests to several back end servers. If possible I would
like to describe the load balancer of the LPIS (Lehrveranstaltungs- und
Pr¨ufungsinformationssystem der Wirtschaftsuniversit¨at Wien).
Compression can also lead to significant performance improvements. Most
of the pictures used on web sites aready use a compressed format such as
JPEG, PNG or GIF but most of the plain text information travels to the
user in an uncompressed form. Software such as the Apache module mod gzip
compresses the output of the web server before it delivers it to the user. This
technique could significantly reduce the traffic needs of web pages.
6
5.1 Load balancing
5.2 Compression
6 Conclusion
7
Bibliography
[AlLi99 ]
Albitz, Paul; Liu, Cricket: DNS und Bind. 2. Auflage, O’Reilly, K¨oln
1999.
[Ba01 ]
Barkai, David: Technologies for sharing and collaborating on the Net.
Peer-to-Peer Computing, 2001. Proceedings. First International
Conference on , 27-29 Aug. 2001, pp. 13 - 28
[BlKuVo03 ]
Bleich, Holger; Kuri, J¨urgen; Vogt, Peter: Zwischen Boom und
Baustopp: Schweinezyklus beim Ausbau der Internet-Backbones. In:
c’t, Magazin f¨ur Computertechnik Heft 21, 2003, pp.184-187
[CaCoYu99 ]
Cardellini, Valeria; Colajanni, Michele; Yu, Philip S.: Dynamic load
balancing on Web-server systems. In: Internet Computing, IEEE,
Volume: 3, Issue: 3, May-June 1999. pp.28 - 39
[ChJuLe ]
Chon, Kilnam; Jung, Jaeyeon; Lee, Dongman: Proactive Web
Caching with Cumulative Prefetching for Large Multimedia Data.
http://www9.org/w9cdrom/315/315.html, accessed on 12 April 2004.
[CiKuSo01 ]
Cidon, Israel; Kutten, Shay; Soffer, Ran: Optimal allocation of
electronic content. In: INFOCOM 2001. Twentieth Annual Joint
Conference of the IEEE Computer and Communications Societies.
Proceedings. IEEE ,Volume: 3 , 22-26 April 2001, pp.1773 - 1780
[DaDuTh03 ]
Datta, Anindya; Dutta, Kaushik; Thomas, Helen; VanderMeer,
Debra: World Wide Wait: A Study of Internet Scalability and
Cached-Based Approaches to Alleviate It. In: Management Science
Vol. 49, No 10, October 2003, pp.1425-1444
[DaDuTh02 ]
Datta, Anindya; Dutta, Kaushik; Thomas, Helen; VanderMeer,
Debra, Suresha, Ramamritham: Proxy-Based Acceleration of
8
Dynamically Generated Content on the World Wide Web: An
Approach and Implementation. In: Proc 2002 ACM SIGMOD
Internat. Conf. Management Data. ACM, MAdison, WI, pp.96-108
[DeSoTi01 ]
DeSouza, Mohan; Tilley, Scott: Spreading knowledge about Gnutella:
a case study in understanding net-centric applications. In: Program
Comprehension, 2001. IWPC 2001. Proceedings. 9th International
Workshop on 12-13 May 2001, pp.189 - 198
[Fox01 ]
Fox, Geoffrey: Peer-to-peer networks. In: Computing in Science &
Engineering (see also IEEE Computational Science and Engineering) ,
Volume: 3, Issue: 3, May-June 2001, pp.75 - 77
[HoKiSm04 ]
Hosanagar, Kartik; Krishnan, Ramayya; Smith, Michael; Chuang,
John: Optimal pricing of content delivery network (CDN) services.
In: System Sciences, 2004. Proceedings of the 37th Annual Hawaii
International Conference on, 5-8 Jan. 2004, pp.205 - 214
[HuHuLi03 ]
Li, Zupeng; Huang, Daoying; Liu, Zinrang; Huang, Jianhua: Research
of peer-to-peer network architecture. Communication Technology
Proceedings, 2003. ICCT 2003. International Conference on ,Volume:
1 , 9-11 April 2003, pp:312 - 315
[LaTe01 ]
Lazar, Irwin; Terrill, William: Exploring Content Delivering
Networking. In: IT Pro July, August 2001, pp.47-49
[LinWan03 ]
Lin, Tsungnan; Wang, Hsinping: Search performance analysis in
peer-to-peer networks. In: Peer-to-Peer Computing, 2003. (P2P
2003). Proceedings. Third International Conference on , 1-3 Sept.
2003, pp.204 - 205
[Mark02 ]
Markatos, Evangelos P.:Tracing a large-scale peer to peer system: an
hour in the life of Gnutella. Cluster Computing and the Grid 2nd
9
IEEE/ACM International Symposium CCGRID2002, 21-24 May
2002, pp.56 - 65
[PaSuWh01 ]
Parameswaran, Manoj; Susarla, Anjana; Whinston, Adrew B.: P2P
networking: an information sharing alternative. In: Computer,
Volume: 34, Issue: 7, July 2001, pp.31 - 38
[PaVa03 ]
Pallis, George; Athena, Vakali: Content delivery networks: status and
trends. In: Internet Computing, IEEE , Volume: 7 , Issue: 6 ,
Nov.-Dec. 2003, pp.68 - 74
[Pri03 ]
Primetrica free resources: Interregional Internet Bandwidth, 2003.
http://www.telegeography.com/ee/free resources/gig2004-02.php,
accessed on April 12, (access only after registration).
[Ripe01 ]
Ripeanu, Matei: Peer-to-peer architecture case study: Gnutella
network. In: Peer-to-Peer Computing, 2001. Proceedings. First
International Conference on , 27-29 Aug. 2001, pp.99 - 100
[RuSaSt04 ]
Rubenstein, Dan; Sahu, Sambit; Stavrou, Angelos: A lightweight,
robust P2P system to handle flash crowds. In: Selected Areas in
Communications, IEEE Journal on, Volume: 22, Issue: 1, Jan. 2004,
pp.6 - 17
[SchoScho02 ]
Schollmeier, R¨udiger; Schollmeier, Gero: Why peer-to-peer (P2P)
does scale: an analysis of P2P traffic patterns. In: Peer-to-Peer
Computing, 2002. (P2P 2002). Proceedings. Second International
Conference on , 5-7 Sept. 2002, pp.112 - 119
[SenWan04 ]
Sen, Subhabrata; Wang, Jia; Analyzing Peer-To-Peer Traffic Across
Large Networks. In: Networking, IEEE/ACM Transactions on,
Volume: 12, Issue: 2, April 2004, pp.219 - 232
10
[Shir02 ]
Shirky, Clay: What is p2p and what isn’t, O’Reilly’s Emerging
Technology Conference, May 13-16. 2002
[WebRef ]
HTTP Compression Speeds up the Web.
http://www.webreference.com/internet/software/servers/http/compression/.
Accessed on April 10, 2004.
11
WolfiW - 22. Apr, 16:30