Introduction to PopTarts

Discussion in 'Strategy' started by llHllAllNllDll-of-llGllOIlDll, Jul 9, 2013.

  1. Meeting notes are documents which contain lots of structure. This structure is often implicit in layout and reserved words. On the other hand, since meetings tend to occur regularly and are repeated for long periods of time, this structure is often (semi-)formalized. This makes these documents suit- able for automatic semantic annotation efforts.
    We describe the annotation we performed on the notes of more than 20 years of Dutch parliamentary debates. We an- notated every word spoken in parliament with 1) the speaker, 2) her party at the time of speaking, 3) her role/function in parliament and 4) the iso-date. These annotations yield nu- merous new ways of searching, browsing, mining and sum- marizing these documents.
    Meetings are always too long, whence so are their verba- tim notes. But of course they contain valuable information and notes have to be consulted from time to time. In this paper we show that semantic annotation can make finding things easier, and more fun.
    1. INTRODUCTION
    Parliamentary proceedings are an interesting set of data to apply state-of-the-art information retrieval technology. Par- liamentary proceedings are written records of parliamentary activities containing a wide range of document types. In this paper we only discuss notes of meetings of parliament. As with all meeting notes, these records have the purpose to store the content of the meeting. They have varying degrees of detail. Currently in most Western democracies it is com- mon to transcribe everything that is being said, keeping the content, but making it grammatically correct and pleasant to read.
    We list a number of characteristics which make these doc- uments of special interest to the IR community:
    • large historical corpora; For example, in Holland all data from 1814 will be available in 2010, at the time of writing it is available since 1974; for the Flemish parliament all data since 1971 is available in PDF; the
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.
    ESAIR09 2009, Barcelona, in conjunction with WSDM 09 Copyright 2009 ACM ...$5.00.
    British Hansard archives have all parliamentary min- utes since 1803 available in XML.
    • documents contain a lot of consistently applied struc- ture which is rather easy to extract and make explicit;
    • transcripts of meetings might be accompanied by au- dio and video recordings, creating interconnected mul- timedia data;
    • data integration issues and opportunities both within one country (collections from different periods, in different formats, styles, language, . . . ), and across countries (Cross-lingual IR);
    • natural corpus for content and structure queries, com- bining keyword search with XPath navigation and se- lection;
    • natural corpus for search tasks in which the answers do not consist of documents: expert or people search video search1 and entry point retrieval
    In this paper, we describe the annotation we performed on the notes of Dutch parliamentary debates (Section 2). These annotations yield numerous new ways of searching, brows- ing, mining and summarizing these documents. We give examples of all this in .
    2. STRUCTURE OF PARLIAMENTARY PRO- CEEDINGS
    Notes of a formal meeting with an agenda (e.g., business meeting, council meeting, meeting of the members of a club, etc) are full of implicit structure and contain many common elements. The notes of meetings with a large historical tra- dition, like parliamentary debates are in a uniform format which fluctuates very little in time. This makes these notes very well suited for semantic annotation efforts.
    Up to our knowledge there is at the time of writing no DTD or markup language for meeting notes available.
    Transcripts of a meeting contain three main structural elements:
    the topics discussed in the meeting (the agenda);
    the speeches made at the meeting: every word that is be-
    ing said is recorded together with 1) the name of the..... Thank you
     
  2. What...the...****?
     
  3. Crazy......
     
  4. Wrong copy/paste? Lol
     
  5. Wtf is this
     
  6. Really thought this was going to be about Poptahts :(
     
  7. Umm. DA HELL DID I JUST READ
     
  8. Go to the moderator room in palringo, they're always on there