I would take this opportunity to thank my research supervisor, family and friends for their support and guidance without which this research would not have been possible.
DECLARATION
I, [type your full first names and surname here], declare that the contents of this dissertation/thesis represent my own unaided work, and that the dissertation/thesis has not previously been submitted for academic examination towards any qualification. Furthermore, it represents my own opinions and not necessarily those of the University.
Signed __________________ Date _________________
ABSTRACT
In the few years since its beginning, the WorldWideWeb (WWW) has grown dramatically in the member of users, servers, and its geographical distribution. It has arisen as sociable form that people use to create, uphold, and become part of communities. These technologies for the first time allowed people of the all categories such as economic status, ages and backgrounds to feel opened to the new modern world. It has arisen as sociable form that people use to create, uphold, and become part of communities. Indeed, the development of Internet technologies has brought convenience and benefits to the world. The proposed research will be based on mixed method approach. Mixed methods came into emergence during the 1990s as a process of combining quantitative and qualitative approaches at different stages within a single research study. Known as the third paradigm in research methodology after traditional quantitative and qualitative methods, mixed methods attempts to legitimate the use of multiple approaches in answering research questions. In this case, users were modelled as random walkers that move along the links of the WWW. The precise timing of the visit to a particular web document were ignored because such topological quantities can be expressed using diffusion time, where each time step corresponds to a single diffusion step. Python programming language will be used as a web development tools to extract data form web pages. Python is a dynamically typed language with a rich set of native types. Its number hierarchy includes native arbitrary-length integers, hardware-precision floating-point and complex numbers, and library support for rational numbers and arbitrary precision floating point.
TABLE OF CONTENTS
ACKNOWLEDGEMENT2
DECLARATION3
ABSTRACT4
Background of the Study6
Problem Statement6
Purpose of the Study7
Research Questions7
Risk & Limitation of the study7
CHAPTER 2: LITERATURE REVIEW9
Network Connection among Web Pages9
Related Work11
Web Data Extraction Systems12
Interaction with Web pages12
Generation of a Wrapper13
Automation and Scheduling13
Data Transformation14
Use of Extracted Data14
Applications15
Summary15
Programming with Python16
Who's Using Python?17
Python Philosophy18
Complex Programming Tasks19
Advantages of Python20
Disadvantages of Python22
Web Connectivity Analysis22
CHAPTER 3: METHODOLOGY24
Research Design24
Overview of Qualitative and Quantitative Research Approaches25
Benefits and Disadvantages of Mixed Method27
Informed Consent28
Confidentiality28
Validity28
Overview of the Mixed Method Research Approach30
Data Collection Tool30
Data Analysis & Procedures31
Web Development Tool Used31
CHAPTER 4: FINDINGS AND DISCUSSION33
Python: An Ecosystem for Scientific Computing33
Changing Landscape33
Python: A General Purpose Programming Language36
Background and Overview37
Numeric and Data40
Languages, Environments, and Distributions45
Tools45
Interactive, exploratory work45
Numerical arrays46
Data Visualization46
Algorithms47
Performance47
Symbolic manipulation48
Documentation48
Distributions49
Getting Started51
CHAPTER 5: CONCLUSIONS53
REFERENCES54
CHAPTER 1: INTRODUCTION
Background of the Study
Traditional Web search engines take a query as input and produce a set of (hopefully) relevant pages that match the query terms. While useful in many circumstances, search engines have the disadvantage that users have to formulate queries that specify their ...