Main.WhatTM (r1.1 vs. r1.9)
Diffs

 <<O>>  Difference Topic WhatTM (r1.9 - 02 Nov 2010 - GeoffreyRockwell)

META TOPICPARENT TaAbout

Web Mining for Research

Line: 32 to 32

  1. Change Research. Research to help organizations deal with the disruptive changes brought about by the Internet and the Web. For example, if you Google.ca "web research" the top link is to a New Zealand company, WEB Research, that helps "organisations create a platform for innovation during periods of rapid change and uncertainty."(What WEB Does.)
Changed:
<
<
  1. Research Using Web Sources. Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Student's Guide to Research with the WWW, a guide that "will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites." Numerous books are also available on this, like Find It Online, though they are often aimed at a broader audience than just students. The seventh Google result is an example of another type of response to the issue of quality research on the web, it is A Reader's Guide to Canadian Military History put together by the Library and Archives of Canada to guide readers to appropriate resources on a topic.
>
>
  1. Research Using Web Sources. Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Student's Guide to Research with the WWW, a guide that "will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites." Numerous books are also available on this, like Find It Online, though they are often aimed at a broader audience than just students. (See also Online Research for High School and College Students.) The seventh Google result is an example of another type of response to the issue of quality research on the web, it is A Reader's Guide to Canadian Military History put together by the Library and Archives of Canada to guide readers to appropriate resources on a topic.

  1. Web Usability Research. Research into how the Web works at the micro level - how people behave when using the web. The third link on Google is to an Alertbox by Jakob Nielsen on Web Research: Believe the Data.
Line: 143 to 143

Credits

Changed:
<
<
Thanks to Terry Flynn, Stéfan Sinclair, Laurence Mussio and others of the SWiiT group for help with this.
>
>
Thanks to Terry Flynn, Stéfan Sinclair, Laurence Mussio and others of the SWiiT group for help with this. Thanks also to Denise Ross.

-- GeoffreyRockwell - 22 Feb 2007


 <<O>>  Difference Topic WhatTM (r1.8 - 27 Apr 2007 - GeoffreyRockwell)

META TOPICPARENT TaAbout

Web Mining for Research

Line: 136 to 136

Search performed Feb. 22nd, 2007 using www.google.ca (note that it was the Canadian Google.) I should note that using Google this way is an example of Web Research where you use Google's Page Ranking as an indication of popularity.

Added:
>
>

See Also


Credits

Thanks to Terry Flynn, Stéfan Sinclair, Laurence Mussio and others of the SWiiT group for help with this.


 <<O>>  Difference Topic WhatTM (r1.7 - 14 Mar 2007 - GeoffreyRockwell)

META TOPICPARENT TaAbout

Web Mining for Research

Line: 138 to 138

Credits

Changed:
<
<
Thanks to Terry Flynn, Stéfan Sinclair, Laurence Mussio and others of the SWiiT group for help with this.
>
>
Thanks to Terry Flynn, Stéfan Sinclair, Laurence Mussio and others of the SWiiT group for help with this.

-- GeoffreyRockwell - 22 Feb 2007


 <<O>>  Difference Topic WhatTM (r1.6 - 05 Mar 2007 - GeoffreyRockwell)

META TOPICPARENT TaAbout

Web Mining for Research

Line: 84 to 84

  • Data Entry Services like Hi-Tech Export which advertises on Google.ca that they do "Online data collection, website mining services in India." They offer Web searching, market research, analysis, and mining.
Changed:
<
<
  • Crawlers and Spiders are tools that harvest information off the Web, often crawling for pages to index for search engines. They are programs that run unattended and are also called bots, ants, or spiders. http://openkapow.comopenkapow]] is a tool for developers to create robots (bots) crawl the web and gather results.
>
>
  • Crawlers and Spiders are tools that harvest information off the Web, often crawling for pages to index for search engines. They are programs that run unattended and are also called bots, ants, or spiders. openkapow is a tool for developers to create robots (bots) crawl the web and gather results.

  • Metasearch Engines are tools that gather results from other search engines and indexes providing one-stop searching. Dogpile is an example. HighBeam Research is another that is optimized for research.
Line: 94 to 94

  • Text Analysis tools like those available from the Portal let you analyze electronic texts.
Changed:
<
<
>
>

See also Anatomy of Web Intelligence a white paper that outlines the types of tools in detail.


Conclusions


 <<O>>  Difference Topic WhatTM (r1.5 - 26 Feb 2007 - GeoffreyRockwell)

META TOPICPARENT TaAbout

Web Mining for Research

Line: 80 to 80

What tools are there to help with Web mining research? Some of the types of tools and services include:

Changed:
<
<
  • News Services like the CNW Group provide customized news feeds as a service.
>
>

  • Data Entry Services like Hi-Tech Export which advertises on Google.ca that they do "Online data collection, website mining services in India." They offer Web searching, market research, analysis, and mining.

 <<O>>  Difference Topic WhatTM (r1.4 - 23 Feb 2007 - GeoffreyRockwell)

META TOPICPARENT TaAbout

Web Mining for Research

Changed:
<
<
The Web provides an unprecedented opportunity for humanities, social science and business research. We have never had so much evidence of human behaviour at hand that is so easy to gather and analyze. This white paper outlines what web mining for research could be and how it is relevant to the humanities.
>
>
The Web provides an unprecedented opportunity for humanities, social science, health and business research. We have never had so much evidence of human behaviour at hand that is so easy to gather and analyze. This white paper outlines what web mining for research could be and how it is relevant to the humanities.

TOC: No TOC in "Main.WhatTM"

Line: 11 to 11

The Web has the following features that make it useful evidence for research into contemporary culture, politics, marketing, and ideas:

  • It is already digital so it is tractable to text analysis and mining techniques
Changed:
<
<
  • It is accessible to crawlers, bots, search engines and aggregators
>
>
  • It is accessible to crawlers, bots, search engines and aggregators so it can be harvested into study sets

  • Much of the evidence is primary in the sense that it is the uninterpreted opinions and behaviour of individuals as opposed to secondary research evidence. In fact you can find both online.
  • It can be considered as published information so you can get personal information without waivers. (That doesn't mean there aren't ethical problems with Web Research.)
  • There are a wealth of useful tools for research using the Web like search engines and archives
  • Above all, it provides an deep and wide amount of evidence of human communication on just about any subject
Changed:
<
<
As Newhagen and Rafaeli put it in a dialogue published in 1996 online:
>
>
As Newhagen and Rafaeli put it in a dialogue about computer-mediated communications that was published in 1996 online:

Not only does it occur on a computer, communication on the Net leaves tracks to an extent unmatched by that in any other context-the content is easily observable, recorded, and copied. Participant demography and behaviors of consumption, choice, attention, reaction, learning, and so forth, are widely captured and logged. (Newhagen and Rafaeli)
Changed:
<
<
There are, however, some limitations, especially for global and historical research. Most of the evidence on the web is recent (since the 1990s) and most of it is written by Westerners with access to the Web.
>
>
There are, however some limitations, especially for global and historical research. Most of the evidence on the web is recent (since the 1990s) and most of it is written by Westerners with access to the Web. It is therefore less useful for historical research or research about communities without adequate access.

But what is Web Research?

Line: 28 to 28

Senses of "Web Research"

Changed:
<
<
Some of the senses of Web Research are:
>
>
We can survey some of the senses of Web Research by looking at the types of sites listed when you search Google. (Note) The different senses can be summarized into the following types:

Changed:
<
<
  1. Change Research Research to help organizations deal with the disruptive changes brought about by the Internet and the Web. For example, if you Google "web research" the top link is to a New Zealand company, WEB Research, that helps "organisations create a platform for innovation during periods of rapid change and uncertainty."What WEB Does. (Note)
>
>
  1. Change Research. Research to help organizations deal with the disruptive changes brought about by the Internet and the Web. For example, if you Google.ca "web research" the top link is to a New Zealand company, WEB Research, that helps "organisations create a platform for innovation during periods of rapid change and uncertainty."(What WEB Does.)

Changed:
<
<
  1. Research Using Web Sources Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Student's Guide to Research with the WWW. " This guide will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites." Numerous books are also available on this, see Find It Online.
>
>
  1. Research Using Web Sources. Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Student's Guide to Research with the WWW, a guide that "will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites." Numerous books are also available on this, like Find It Online, though they are often aimed at a broader audience than just students. The seventh Google result is an example of another type of response to the issue of quality research on the web, it is A Reader's Guide to Canadian Military History put together by the Library and Archives of Canada to guide readers to appropriate resources on a topic.

Changed:
<
<
  1. Web Usability Research Research into how the Web works at the micro level - how people behave when using the web. The third link on Google is to an Alertbox by [[http://www.useit.com/jakob/][Jakob Nielsen] on Web Research: Believe the Data.
>
>
  1. Web Usability Research. Research into how the Web works at the micro level - how people behave when using the web. The third link on Google is to an Alertbox by Jakob Nielsen on Web Research: Believe the Data.

Changed:
<
<
  1. Research about the Web Research at the macro level about Internet statistics, how the Web is structured, social networking on the Web, and about Semantic Web research. This is obviously related to sense 3, but is on a different scale and comes from a different tradition. Web usability research is usually coming out of Human Computer Interface research, while research about the web has elements of market research, network research, and statistics. Web usability often uses eyetracking to look at how individuals use a web page, research about the web looks at traffic logs and often graphs them into cyberatlases.
>
>
  1. Research about the Web. Research at the macro level about Internet statistics, how the Web is structured, social networking on the Web, and about Semantic Web research. The fourth and sixth results in Google.ca, for example, link to Semantic Web Research sites, but there are other types of research about the web, like all the research listed on the Complete Guide to Internet Statistics and Research site. This is obviously related to sense 3, but is on a different scale and comes from a different tradition. Web usability research is usually coming out of Human Computer Interface research, while research about the web has elements of market research, network research, and statistics. Web usability often uses eyetracking to look at how individuals use a web page, research about the web looks at traffic logs and often graphs them into cyberatlases.

Changed:
<
<
  1. Web Mining Research that uses the Web as a source of evidence. This is different from 2 - using Web resources, but in traditional craft research ways. Web mining harvests (mines) the Web on a large scale for information and then applies text analysis and statistical techniques to the resulting repositories of unstructured data. Web mining treats the Web as a whole as evidence rather than as an electronic library. It is a form of research The ninth result from Google, the Chilean Centre for Web Research, for example, is doing basic research on "Data mining" and "Data extraction from the web." This is the sense that is developed in this white paper.
>
>
  1. Web Mining. Research that uses the Web as a source of evidence. This is different from 2 - using Web resources, but in traditional craft research ways. Web mining harvests (mines) the Web on a large scale for information and then applies text analysis and statistical techniques to the resulting repositories of unstructured data. Web mining treats the Web as a whole as evidence rather than as an electronic library. It is a form of research that has obvious applications to strategic business intelligence and marketing, but is also relevant to humanists and social scientists. The ninth result from Google, the Chilean Centre for Web Research, for example, is doing basic research on "Data mining" and "Data extraction from the web." This is the sense that is developed in this white paper.

Defining Web Mining for Research

Changed:
<
<
Web mining is a subset of Internet Research that is sometimes called text mining or web knowledge-discovery in data mining circles. In the hyped language of knowledge management, knowledge-discovery is the extraction of useful information from heterogeneous data-sets like the Web. Because the Web is mostly unstructured text data with embedded multimedia objects, web mining, typically involves forms of text analysis and mining. We define web mining as having the following phases:
>
>
Web mining is a subset of Internet Research that is sometimes called text mining or web knowledge-discovery in data mining circles. In the hyped language of knowledge management, knowledge-discovery is the extraction of useful information from heterogeneous data-sets like the Web. Because the Web is mostly unstructured text data with embedded multimedia objects, web mining, typically involves forms of text indexing, retrieval and processing. The "mining" usually refers to techniques that "discover" patterns in large datasets through statistical or machine learning techniques. We define web mining as having the following phases:

  1. Gathering Web mining involves gathering a subset of the whole web as a study repository. In principle it is possible to use the whole Web and its archives, but that is not practical so most web mining involves carving out a subset that is amenable to analysis and mining technologies. In cases where the studyset is a set of changing documents like RSS feeds it can be called a stream. Gathering is usually done with scrapers or semi-automated techniques.
Line: 50 to 50

  1. Analysis Finally, repositories are studied using text analysis techniques and mining techniques.
Changed:
<
<
Web mining for research is in a tradition of social science and communications research practices that looked at identity and social communication using computer-mediated communications like e-mails, discussion lists, and chat transcripts. Web mining expands the application beyond the social sciences to humanities and business research. The web is being used to study language, popular culture, the flow of ideas, social networks, reception, and corporate communication. It is being used by industry to track their brands and conduct market research. It is being used by strategic communications researchers to study the discourse around companies, services, and markets.
>
>
Web mining for research is in a tradition of social science and communications research practices that looked at identity and social communication using computer-mediated communications like e-mails, discussion lists, and chat transcripts. Web mining expands the application beyond the social sciences to humanities and business research. The web is being used to study language, popular culture, the flow of ideas, social networks, reception, and corporate communication. It is being used by industry to track their brands and conduct market research. It is being used by strategic communications researchers to study the discourse around companies, services, and markets. In short, Web mining is when the Web is treated as primary evidence for gathering and computer-assisted analysis.

Examples of Web Mining for Research

Line: 82 to 82

  • News Services like the CNW Group provide customized news feeds as a service.
Changed:
<
<
  • Data Entry Services like Hi-Tech Export which offers Web research and mining including sarching, market research, analysis, and mining.
>
>
  • Data Entry Services like Hi-Tech Export which advertises on Google.ca that they do "Online data collection, website mining services in India." They offer Web searching, market research, analysis, and mining.

  • Crawlers and Spiders are tools that harvest information off the Web, often crawling for pages to index for search engines. They are programs that run unattended and are also called bots, ants, or spiders. http://openkapow.comopenkapow]] is a tool for developers to create robots (bots) crawl the web and gather results.
Line: 132 to 132

Notes

Changed:
<
<
Search performed Feb. 22nd, 2007. I should note that using Google this way is an example of Web Research where you use Google's Page Ranking as an indication of popularity.
>
>
Search performed Feb. 22nd, 2007 using www.google.ca (note that it was the Canadian Google.) I should note that using Google this way is an example of Web Research where you use Google's Page Ranking as an indication of popularity.

Credits


 <<O>>  Difference Topic WhatTM (r1.3 - 22 Feb 2007 - GeoffreyRockwell)

META TOPICPARENT TaAbout

Web Mining for Research

Line: 6 to 6

TOC: No TOC in "Main.WhatTM"

Added:
>
>

Why the Web?


The Web has the following features that make it useful evidence for research into contemporary culture, politics, marketing, and ideas:
Changed:
<
<
  • It is digital so it is tractable to text analysis techniques
  • It is accessible to scrapers, bots, search engines and aggregators
  • Much of the evidence is primary in the sense that it is the uninterpreted opinions and behaviour of individuals as opposed to secondary research evidence. In fact you can find both online.
  • It can be considered as published information so you can get personal information without waivers. (That doesn't mean there aren't ethical problems with Web Research.)
  • Above all, it provides an deep and wide amount of evidence of human communication on just about any subject
>
>
  • It is already digital so it is tractable to text analysis and mining techniques
  • It is accessible to crawlers, bots, search engines and aggregators
  • Much of the evidence is primary in the sense that it is the uninterpreted opinions and behaviour of individuals as opposed to secondary research evidence. In fact you can find both online.
  • It can be considered as published information so you can get personal information without waivers. (That doesn't mean there aren't ethical problems with Web Research.)
  • There are a wealth of useful tools for research using the Web like search engines and archives
  • Above all, it provides an deep and wide amount of evidence of human communication on just about any subject

As Newhagen and Rafaeli put it in a dialogue published in 1996 online:

Not only does it occur on a computer, communication on the Net leaves tracks to an extent unmatched by that in any other context-the content is easily observable, recorded, and copied. Participant demography and behaviors of consumption, choice, attention, reaction, learning, and so forth, are widely captured and logged. (Newhagen and Rafaeli)
Added:
>
>
There are, however, some limitations, especially for global and historical research. Most of the evidence on the web is recent (since the 1990s) and most of it is written by Westerners with access to the Web.

But what is Web Research?

Web Research typically refers to research about the Web or from the Web. But what if we don't treat the Web as a library and instead treat it as a unique form of evidence with which to understand ourselves?

Line: 27 to 32

  1. Change Research Research to help organizations deal with the disruptive changes brought about by the Internet and the Web. For example, if you Google "web research" the top link is to a New Zealand company, WEB Research, that helps "organisations create a platform for innovation during periods of rapid change and uncertainty."What WEB Does. (Note)
Changed:
<
<
  1. Research Using Web Sources Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Stident's Guide to Research with the WWW. " This guide will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites."
>
>
  1. Research Using Web Sources Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Student's Guide to Research with the WWW. " This guide will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites." Numerous books are also available on this, see Find It Online.

Changed:
<
<
  1. Web Usability Research Research into how the Web works at the micro level - how people behave when using the web. The third link on Google is to an Alertbox by [[http://www.useit.com/jakob/][Jakob Nielsen] on Web Resarch: Believe the Data.
>
>
  1. Web Usability Research Research into how the Web works at the micro level - how people behave when using the web. The third link on Google is to an Alertbox by [[http://www.useit.com/jakob/][Jakob Nielsen] on Web Research: Believe the Data.

  1. Research about the Web Research at the macro level about Internet statistics, how the Web is structured, social networking on the Web, and about Semantic Web research. This is obviously related to sense 3, but is on a different scale and comes from a different tradition. Web usability research is usually coming out of Human Computer Interface research, while research about the web has elements of market research, network research, and statistics. Web usability often uses eyetracking to look at how individuals use a web page, research about the web looks at traffic logs and often graphs them into cyberatlases.
Line: 37 to 42

Defining Web Mining for Research

Changed:
<
<
Web mining is a subset of Internet Research that is sometimes called text mining or web knowledge-discovery in data mining circles. In the hyped language of [[http://en.wikipedia.org/wiki/Knowledge_management][knowledge management], knowledge-discovery is the extraction of useful information from heterogeneous data-sets like the Web. Because the Web is mostly unstructured text data with embedded multimedia objects, web mining, typically involves forms of text analysis and mining. We define web mining as having the following phases:
>
>
Web mining is a subset of Internet Research that is sometimes called text mining or web knowledge-discovery in data mining circles. In the hyped language of knowledge management, knowledge-discovery is the extraction of useful information from heterogeneous data-sets like the Web. Because the Web is mostly unstructured text data with embedded multimedia objects, web mining, typically involves forms of text analysis and mining. We define web mining as having the following phases:

  1. Gathering Web mining involves gathering a subset of the whole web as a study repository. In principle it is possible to use the whole Web and its archives, but that is not practical so most web mining involves carving out a subset that is amenable to analysis and mining technologies. In cases where the studyset is a set of changing documents like RSS feeds it can be called a stream. Gathering is usually done with scrapers or semi-automated techniques.
Line: 45 to 50

  1. Analysis Finally, repositories are studied using text analysis techniques and mining techniques.
Changed:
<
<
Web mining for research has evolved out of social science and communications research practices that looked at identity and social communication using computer-mediated communications like e-mail and chat transcripts.
>
>
Web mining for research is in a tradition of social science and communications research practices that looked at identity and social communication using computer-mediated communications like e-mails, discussion lists, and chat transcripts. Web mining expands the application beyond the social sciences to humanities and business research. The web is being used to study language, popular culture, the flow of ideas, social networks, reception, and corporate communication. It is being used by industry to track their brands and conduct market research. It is being used by strategic communications researchers to study the discourse around companies, services, and markets.

Added:
>
>

Examples of Web Mining for Research


Added:
>
>
Journals like the Journal of Computer-Mediated Communication have examples of published research that uses the Internet as evidence. We can imagine uses in the humanities and business:

Changed:
<
<

Bibliography

>
>
  • Studying Popular Culture is possible using the Web. You can track the controversy around the Dixie Chicks and their comments about George W. Bush through the news stories and blogs about them.

Changed:
<
<
White Papers by Marcus P. Zillman includes links to white papers and annotated link compilations on subjects like Business Intelligence Online Resources.
>
>
  • Studying the Epidemiology of Ideas is possible using the Web. You can look at how "postmodernism" appears in academic discourse - how it is taught and how it discussed outside the academy.

Changed:
<
<
Complete Guide to Internet Statistics and Research This site is a good place to start looking for statistics about the Web.
>
>
  • Studying Everyday Language Use is possible with the Web. You can gather examples of usage of words or patterns in blogs or discussion lists.

Changed:
<
<
Mann, Chris and Stewart, Fiona. Internet Communication and Qualitatieve Research: A Handbook for Researching Online. SAGE: London, 2000. On Amazon
>
>
  • Studying a Brand like "Nike" is possible by gathering examples of how the brand is discussed by a target population in their blogs or web sites.

Changed:
<
<
Schlein, Alan M. Find It Online. 4th Ed. Fact on Demand Press: Tempe, AZ, 2004. On Amazon
>
>
  • Studying a Community like Hamilton teens is possible if you can identify teen web sites and blogs from Hamilton. What are they interested in? How are their concerns different from Toronto teens?

Changed:
<
<
Jones, Steve. Doing Internet Research: Critical Issues and Methods for Examining the Net. SAGE: Thousand Oaks, CA, 1999. On Amazon
>
>

Philosophical Analysis and the Web


Changed:
<
<
Neuendorf, Kimberly A. The Content Analysis Gudiebook. SAGE: Thousand Oaks, CA, 2002. On Amazon
>
>
This white paper is interested in Web Mining as practice for a renewed form of philosophical analysis. Ian Hacking in his book Historical Ontology talks about how we can look at "thick" concepts, analyze them, and look at how they are constructed over time and through discourse. There is an ethical dimension to Hacking's project. He believes it is possible to show through historical ontology "how to understand, act out, and resolve present problems, even when in so doing it (historical ontology) generated new ones." (p. 24-25) Web mining gives us one way to look at specific choices people make when using concepts on a large scale.

Changed:
<
<
Newhagen, John E. and Rafaeli Sheizaf. "Why Communication Researchers Should Study the Internet: A Dialogue". (Originally in Journal of Computed-Mediated Communication [Online], 1996, vol. 1, no. 4.) Now available Online. Also at Blackwell Synergy.
>
>
The connection between philosophical analysis and text analysis is made clear in another quote from Hacking:

Philosophical analysis is the analysis of concepts. Concepts are words in their sites. Sites include sentences, uttered or transcribed, always in a larger site of neighborhood, institution, authority, language. If one took seriously the project of philosophical analysis, one would require a history of the words in their sites in order to comprehend what the concept was. But isn't "analysis" a breaking down, a decomposition into smaller parts, atoms? Not entirely; for example, "analysis" in mathematics denotes the differential and integral calculus, among other things. Atomism is one kind of analysis ... (p. 68)

Web mining and text analysis lets us look at "words in their sites", something philosophers have only done anecdotally until now. It isn't the end of philosophical analysis, but it is a way to get at the way concepts are used.

Tools and Services

What tools are there to help with Web mining research? Some of the types of tools and services include:

  • News Services like the CNW Group provide customized news feeds as a service.

  • Data Entry Services like Hi-Tech Export which offers Web research and mining including sarching, market research, analysis, and mining.

  • Crawlers and Spiders are tools that harvest information off the Web, often crawling for pages to index for search engines. They are programs that run unattended and are also called bots, ants, or spiders. http://openkapow.comopenkapow]] is a tool for developers to create robots (bots) crawl the web and gather results.

  • Metasearch Engines are tools that gather results from other search engines and indexes providing one-stop searching. Dogpile is an example. HighBeam Research is another that is optimized for research.

  • Aggregators are tools that gather information from multiple web sites into a collection for analysis. The TAPoRware [[http://taporware.mcmaster.ca/~taporware/otherTools/googlizer.shtml][Googlizer] and Aggregator are examples.

  • Web Clipping tools like NetSnippets let you clip Web pages or passages from them to a personal collection that can then be organized and shared.

  • Text Analysis tools like those available from the Portal let you analyze electronic texts.

Conclusions

The Summit on Digital Tools for the Humanities called for the development of four types of tools including tools for the "Exploration of Resources" which would allow for the aggregation of studysets, the sharing of studysets, the exploration of large studysets, and the visualization and presentation of such sets. (Final Report) At the moment people who do Web research, whether informally or formally, do so using what is at hand from Google to personal text analysis tools. The potential for Web mining for research depends on:

  • the discussion of web research methodologies and the web as evidence,
  • the development of better tools, especially gathering and aggregating tools, and
  • these tools being interoperable so researchers can combine them in innovative ways.

Research in the digital age will not simply be a matter of using online facsimiles of print resources, or creating ever larger textbases of the same stuff we studied before. Digital research has to consider the Web as evidence of human behaviour, art and commerce. Our research practices have to evolve to not just use the web but to think through it.

Bibliography

Complete Guide to Internet Statistics and Research This site is a good place to start looking for statistics about the Web.

Hacking, Ian. Historical Ontology. Harvard University Press: Cambridge, MA, 2002. HUP Site

Jones, Steve. Doing Internet Research: Critical Issues and Methods for Examining the Net. SAGE: Thousand Oaks, CA, 1999. On Amazon


Journal of Computer-Mediated Communication has articles both about Internet communications research methods and articles that are based on computer-mediated communication methods.

Added:
>
>
Mann, Chris and Stewart, Fiona. Internet Communication and Qualitative Research: A Handbook for Researching Online. SAGE: London, 2000. On Amazon

Moretti, Franco. Graphs, Maps, Trees: Abstract Models for Literary History. Verso, 2005. On Amazon
Added:
>
>
Neuendorf, Kimberly A. The Content Analysis Guidebook. SAGE: Thousand Oaks, CA, 2002. On Amazon

Newhagen, John E. and Rafaeli Sheizaf. "Why Communication Researchers Should Study the Internet: A Dialogue". (Originally in Journal of Computed-Mediated Communication [Online], 1996, vol. 1, no. 4.) Now available Online. Also at Blackwell Synergy.

Schlein, Alan M. Find It Online. 4th Ed. Fact on Demand Press: Tempe, AZ, 2004. On Amazon

Summit on Digital Tools for the Humanities: Report on Summit Accomplishments

Zillman, Marcus P. White Papers by Marcus P. Zillman includes links to white papers and annotated link compilations on subjects like Business Intelligence Online Resources.


Notes

Search performed Feb. 22nd, 2007. I should note that using Google this way is an example of Web Research where you use Google's Page Ranking as an indication of popularity.

Added:
>
>

Credits

Thanks to Terry Flynn, Stéfan Sinclair, Laurence Mussio and others of the SWiiT group for help with this.


-- GeoffreyRockwell - 22 Feb 2007

 <<O>>  Difference Topic WhatTM (r1.2 - 22 Feb 2007 - GeoffreyRockwell)

META TOPICPARENT TaAbout
Changed:
<
<

Web Mining for Research

>
>

Web Mining for Research


Changed:
<
<
The Web provides an unprecedented opportunity for research. The Web has the following features that make it useful evidence for research into contemporary culture:
>
>
The Web provides an unprecedented opportunity for humanities, social science and business research. We have never had so much evidence of human behaviour at hand that is so easy to gather and analyze. This white paper outlines what web mining for research could be and how it is relevant to the humanities.

TOC: No TOC in "Main.WhatTM"

The Web has the following features that make it useful evidence for research into contemporary culture, politics, marketing, and ideas:


  • It is digital so it is tractable to text analysis techniques
  • It is accessible to scrapers, bots, search engines and aggregators
Changed:
<
<
  • Much of the evidence is primary in the sense that it is the uninterpreted opinions as opposed to secondary research evidence. In fact you can find both online.
  • Much of it
>
>
  • Much of the evidence is primary in the sense that it is the uninterpreted opinions and behaviour of individuals as opposed to secondary research evidence. In fact you can find both online.
  • It can be considered as published information so you can get personal information without waivers. (That doesn't mean there aren't ethical problems with Web Research.)
  • Above all, it provides an deep and wide amount of evidence of human communication on just about any subject

As Newhagen and Rafaeli put it in a dialogue published in 1996 online:

Not only does it occur on a computer, communication on the Net leaves tracks to an extent unmatched by that in any other context-the content is easily observable, recorded, and copied. Participant demography and behaviors of consumption, choice, attention, reaction, learning, and so forth, are widely captured and logged. (Newhagen and Rafaeli)

But what is Web Research?

Changed:
<
<
Web Research typically refers to research about the Web or for the Web. Here I will focus on using the Web as evidence to study contemporary politics, culture, and ideas.
>
>
Web Research typically refers to research about the Web or from the Web. But what if we don't treat the Web as a library and instead treat it as a unique form of evidence with which to understand ourselves?

Senses of "Web Research"


Some of the senses of Web Research are:

Changed:
<
<
  • Change Research Research to help organizations deal with the disruptive changes brought about by the Internet and the Web. For example, if you Google "web research" the top link is to a New Zealand company, WEB Research, that helps "organisations create a platform for innovation during periods of rapid change and uncertainty."What WEB Does.Note 1
>
>
  1. Change Research Research to help organizations deal with the disruptive changes brought about by the Internet and the Web. For example, if you Google "web research" the top link is to a New Zealand company, WEB Research, that helps "organisations create a platform for innovation during periods of rapid change and uncertainty."What WEB Does. (Note)

  1. Research Using Web Sources Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Stident's Guide to Research with the WWW. " This guide will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites."

  1. Web Usability Research Research into how the Web works at the micro level - how people behave when using the web. The third link on Google is to an Alertbox by [[http://www.useit.com/jakob/][Jakob Nielsen] on Web Resarch: Believe the Data.

  1. Research about the Web Research at the macro level about Internet statistics, how the Web is structured, social networking on the Web, and about Semantic Web research. This is obviously related to sense 3, but is on a different scale and comes from a different tradition. Web usability research is usually coming out of Human Computer Interface research, while research about the web has elements of market research, network research, and statistics. Web usability often uses eyetracking to look at how individuals use a web page, research about the web looks at traffic logs and often graphs them into cyberatlases.

  1. Web Mining Research that uses the Web as a source of evidence. This is different from 2 - using Web resources, but in traditional craft research ways. Web mining harvests (mines) the Web on a large scale for information and then applies text analysis and statistical techniques to the resulting repositories of unstructured data. Web mining treats the Web as a whole as evidence rather than as an electronic library. It is a form of research The ninth result from Google, the Chilean Centre for Web Research, for example, is doing basic research on "Data mining" and "Data extraction from the web." This is the sense that is developed in this white paper.

Defining Web Mining for Research

Web mining is a subset of Internet Research that is sometimes called text mining or web knowledge-discovery in data mining circles. In the hyped language of [[http://en.wikipedia.org/wiki/Knowledge_management][knowledge management], knowledge-discovery is the extraction of useful information from heterogeneous data-sets like the Web. Because the Web is mostly unstructured text data with embedded multimedia objects, web mining, typically involves forms of text analysis and mining. We define web mining as having the following phases:

  1. Gathering Web mining involves gathering a subset of the whole web as a study repository. In principle it is possible to use the whole Web and its archives, but that is not practical so most web mining involves carving out a subset that is amenable to analysis and mining technologies. In cases where the studyset is a set of changing documents like RSS feeds it can be called a stream. Gathering is usually done with scrapers or semi-automated techniques.

  1. Structuring Repositories or streams are often enhanced with metadata, edited to remove irrelevant data, and structured so that they can be manipulated in the analysis phase. Typically repositories are so large they are also pre-indexed for rapid search, retrieval, and mining.

  1. Analysis Finally, repositories are studied using text analysis techniques and mining techniques.

Web mining for research has evolved out of social science and communications research practices that looked at identity and social communication using computer-mediated communications like e-mail and chat transcripts.

Bibliography

White Papers by Marcus P. Zillman includes links to white papers and annotated link compilations on subjects like Business Intelligence Online Resources.

Complete Guide to Internet Statistics and Research This site is a good place to start looking for statistics about the Web.

Mann, Chris and Stewart, Fiona. Internet Communication and Qualitatieve Research: A Handbook for Researching Online. SAGE: London, 2000. On Amazon

Schlein, Alan M. Find It Online. 4th Ed. Fact on Demand Press: Tempe, AZ, 2004. On Amazon

Jones, Steve. Doing Internet Research: Critical Issues and Methods for Examining the Net. SAGE: Thousand Oaks, CA, 1999. On Amazon

Neuendorf, Kimberly A. The Content Analysis Gudiebook. SAGE: Thousand Oaks, CA, 2002. On Amazon

Newhagen, John E. and Rafaeli Sheizaf. "Why Communication Researchers Should Study the Internet: A Dialogue". (Originally in Journal of Computed-Mediated Communication [Online], 1996, vol. 1, no. 4.) Now available Online. Also at Blackwell Synergy.

Journal of Computer-Mediated Communication has articles both about Internet communications research methods and articles that are based on computer-mediated communication methods.

Moretti, Franco. Graphs, Maps, Trees: Abstract Models for Literary History. Verso, 2005. On Amazon


Changed:
<
<
  • Research Using Web Sources Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Stident's Guide to Research with the WWW. " This guide will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites."
>
>

Notes


Changed:
<
<
Search performed Feb. 22nd, 2007. I should note that using Google this way is an example of Web Research where you use Google's Page Ranking as an indication of popularity.
>
>
Search performed Feb. 22nd, 2007. I should note that using Google this way is an example of Web Research where you use Google's Page Ranking as an indication of popularity.

-- GeoffreyRockwell - 22 Feb 2007


 <<O>>  Difference Topic WhatTM (r1.1 - 22 Feb 2007 - GeoffreyRockwell)
Line: 1 to 1
Added:
>
>
META TOPICPARENT TaAbout

Web Mining for Research

The Web provides an unprecedented opportunity for research. The Web has the following features that make it useful evidence for research into contemporary culture:

  • It is digital so it is tractable to text analysis techniques
  • It is accessible to scrapers, bots, search engines and aggregators
  • Much of the evidence is primary in the sense that it is the uninterpreted opinions as opposed to secondary research evidence. In fact you can find both online.
  • Much of it

But what is Web Research?

Web Research typically refers to research about the Web or for the Web. Here I will focus on using the Web as evidence to study contemporary politics, culture, and ideas.

Some of the senses of Web Research are:

  • Change Research Research to help organizations deal with the disruptive changes brought about by the Internet and the Web. For example, if you Google "web research" the top link is to a New Zealand company, WEB Research, that helps "organisations create a platform for innovation during periods of rapid change and uncertainty."What WEB Does.Note 1

  • Research Using Web Sources Academics have serious anxieties about letting students use Web sources in traditional research instead of print resources. Colleagues will ban the Wikipedia as a source in the fond hope that that will force students into the library (which of course is rapidly divesting itself of print resources in order to get more electronic resources.) A reasonable and common approach is to help student understand how to assess Web sources. The second link if you Google "web research" is A Stident's Guide to Research with the WWW. " This guide will help you explore the resources of the World Wide Web for your research, and introduce you to some strategies for evaluating Web sites."

Search performed Feb. 22nd, 2007. I should note that using Google this way is an example of Web Research where you use Google's Page Ranking as an indication of popularity.

-- GeoffreyRockwell - 22 Feb 2007


Topic: WhatTM . { View | Diffs | r1.9 | > | r1.8 | > | r1.7 | More }

Revision r1.1 - 22 Feb 2007 - 16:06 - GeoffreyRockwell
Revision r1.9 - 02 Nov 2010 - 23:32 - GeoffreyRockwell