CPSC Computational Linguistics - UBC Department of Computer ...

CPSC Computational Linguistics - UBC Department of Computer ...

Intelligent Systems (AI2) Computer Science cpsc422, Lecture 27 Nov, 10, 2017 CPSC 422, Lecture 26 Slide 1 NLP: KnowledgeFormalisms Map (including probabilistic formalisms) State Machines (and prob. versions) Morphology

Syntax Semantic s Pragmatics Discourse and Dialogue (Finite State Automata, Finite State Transducers, Markov Models) Mac hi ne Le

ar ni ng Neural Models, Neural Sequence Modeling Rule systems (and prob. versions) (e.g., (Prob.) Context-Free Grammars) Logical formalisms (FirstOrder Logics, Prob. Logics) AI planners (MDP Markov Decision Processes) CPSC 422, Lecture 25

2 Lecture Overview Recap English Syntax and Parsing Key Problem with parsing: Ambiguity Probabilistic Context Free Grammars (PCFG) Treebanks and Grammar Learning CPSC 422, Lecture 26 3 Key Constituents: Examples

(Specifier) X (Complement) Noun phrases (NP) (Det) N (PP) the cat on the table Verb phrases (VP)

(Qual) V (NP) eat a cat Prepositional phrases never (PP) (Deg) P (NP) almost in the ne Adjective phrases(AP) (Deg) A (PP) very happy about it Sentences (S) 4

CPSC 422, Lecture 26 (NP) (-) (VP) Context Free Grammar (CFG) 4-tuple (non-term., term., productions, start) (N, , P, S) P is a set of rules A; AN, (N)* CPSC 422, Lecture 26 5 CFG Example Grammar with example

phrases CPSC 422, Lecture 26 Lexicon 6 Derivations as Trees Nominal Nominal flight CPSC 422, Lecture 26

7 Example of relatively complex parse tree Journal of the American Medical Informatics Association, 2005, Improved Identification of Noun Phrases in Clinical Radiology Reports CPSC 422, Using Lecture 26 a High-Performance8 Statistical Natural Language Parser Augmented with Lecture Overview Recap English Syntax and Parsing

Key Problem with parsing: Ambiguity Probabilistic Context Free Grammars (PCFG) Treebanks and Grammar Learning CPSC 422, Lecture 26 9 Structural Ambiguity (Ex. 1) VP -> V NP ; NP -> NP PP VP -> V NP PP I shot an elephant in my

pajamas CPSC 422, Lecture 26 10 Structural Ambiguity (Ex.2) I saw Mary passing by cs2 I saw Mary passing by cs2 (ROOT

(ROOT (S (S (NP (PRP I)) (NP (PRP I)) (VP (VBD saw) (VP (VBD saw) (S (NP (NNP Mary)) (NP (NNP Mary)) (S (VP (VBG passing) (VP (VBG passing) (PP (IN by) (PP (IN by)

(NP (NNP cs2))))))) (NP (NNP cs2))))))) CPSC 422, Lecture 26 11 Structural Ambiguity (Ex. 3) Coordination new student and profs CPSC 422, Lecture 26 12

Structural Ambiguity (Ex. 4) NP-bracketing French language teacher CPSC 422, Lecture 26 13 Lecture Overview Recap English Syntax and Parsing Key Problem with parsing: Ambiguity Probabilistic Context Free Grammars (PCFG) Treebanks and Grammar Learning

(acquiring the probabilities) Intro to Parsing PCFG CPSC 422, Lecture 26 14 Probabilistic CFGs (PCFGs) GOAL: assign a probability to parse trees and to sentences Each grammar rule is augmented with a conditional probability If these are all the rules for VP and .55 is the A. 1

VP -> Verb .55 B. 0 VP -> Verb NP .40 VP -> Verb NP NP ?? What ?? should be ? CPSC 422, Lecture 26 C. .0 5 D. None of the above 15

Probabilistic CFGs (PCFGs) GOAL: assign a probability to parse trees and to sentences Each grammar rule is augmented with a conditional probability The expansions for a given nonterminal sum to 1 VP -> Verb .55 VP -> Verb NP .40 VP -> Verb NP NP .05 Formal Def: 5-tuple (N, , P, S,D) CPSC 422, Lecture 26

16 Sample PCFG CPSC 422, Lecture 26 17 PCFGs are used to. Estimate Prob. of parse tree A. Sum of the probs of all the rules applied B. Product of the probs of all the rules applied

Estimate Prob. of a sentence A. Sum of the probs of all the parse trees B. Product of the probs of all the parse trees CPSC 422, Lecture 26 18 PCFGs are used to. Estimate Prob. of parse tree P(Tree ) Estimate Prob. to sentences

P (Sentence ) CPSC 422, Lecture 26 19 Example P(Tree a ) .15 .4 ... 1.5 10 6 P(Tree b ) .15 .4 ... 1.7 10 6 P(" Can you...." ) 1.7 10 6 1.5 10 6 3.2 10 6

CPSC 422, Lecture 26 20 Lecture Overview Recap English Syntax and Parsing Key Problem with parsing: Ambiguity Probabilistic Context Free Grammars (PCFG) Treebanks and Grammar Learning (acquiring the probabilities) CPSC 422, Lecture 26 21

Treebanks DEF. corpora in which each sentence has been paired with a parse tree These are generally created Parse collection with parser human annotators revise each parse Requires detailed annotation guidelines POS tagset Grammar instructions for how to deal with particular grammatical constructions. CPSC 422, Lecture 26

22 Penn Treebank Penn TreeBank is a widely used treebank. Most well known is the Wall Street Journal section of the Penn TreeBank. 1 M words from the 1987-1989

Wall Street Journal. CPSC 422, Lecture 26 23 Treebank Grammars Such grammars tend to contain lots of rules. For example, the Penn Treebank has 4500 different rules for VPs! Among them... CPSC 422, Lecture 26

24 Heads in Trees Finding heads in treebank trees is a task that arises frequently in many applications. Particularly important in statistical parsing We can visualize this task by annotating the nodes of a parse tree with the heads of each corresponding node. CPSC 422, Lecture 26

25 Lexically Decorated Tree CPSC 422, Lecture 26 26 Head Finding The standard way to do head finding is to use a simple set of tree traversal rules specific to each non-terminal in the grammar. Each rule in the PCFG specifies where the head of the expanded

non-terminal should be found CPSC 422, Lecture 26 27 Noun Phrases CPSC 422, Lecture 26 28 Acquiring Grammars and Probabilities

Manually parsed text corpora (e.g., PennTreebank) Grammar: read it off the parse trees Ex: if an NP contains an ART, ADJ, and NOUN then we create the rule NP -> ART ADJ NOUN. Probabilities: P( A ) Ex: if the NP -> ART ADJ NOUN rule is used 50 times and all NP rules are used 5000

times, then the rules probability is CPSC 422, Lecture 26 29 CPSC 422, Lecture 26 30 Learning Goals for todays class You can: Provide a formal definition of a PCFG Apply a PCFG to compute the probability of a parse tree of a sentence as well as the probability of a sentence

Describe the content of a treebank Describe the process to identify a head of a syntactic constituent Compute the probability distribution of a PCFG from a treebank CPSC 422, Lecture 26 31 Next class on Wed Parsing Probabilistic CFG: CKY parsing PCFG in practice: Modeling Structural and Lexical Dependencies

Assignment-3 due on Nov 20 (last year took students 8-18 hours) Assignment-4 will be out on the same day CPSC 422, Lecture 26 32

Recently Viewed Presentations

  • Lets Practice! Make sure your title is on

    Lets Practice! Make sure your title is on

    His smile is the sunshine that brightens my day. ... Write what you have learned about similes and metaphors. Be ready to share! Make sure your name is on your notes. (First and Last name.) You are going to hand...
  • 幻灯片 1 - EmbracingChina

    幻灯片 1 - EmbracingChina

    Title: 幻灯片 1 Author: User Last modified by: Austin Sirles Created Date: 7/22/2010 8:05:55 AM Document presentation format: On-screen Show Company
  • The Scarlet Letter - Ms. Shaba's Website

    The Scarlet Letter - Ms. Shaba's Website

    Hawthorne uses Dimmesdale to show that guilt can destroy a person, both in body and soul. Theme vs. Fact Topic sentences/ Secondary Claims must also provide an idea to prove or support. This is where you might state one effect...
  • الشريحة 1 - Have Fun Learning English!

    الشريحة 1 - Have Fun Learning English!

    *They launched the land offensive in the middle . of the night. Offensive = a planned . military attack = (n.) *He told some . offensive . jokes. = insulting. This meaning is what you have in your books.
  • Chapter 2

    Chapter 2

    Two types of DNS requests: Recursive - a query that demands a resolution or the answer "It can't be found" Iterative - a query where the local server issues queries to other servers. Other servers only provide information if they...
  • Why 'deflection' matters-Part II: Places, encounter ...

    Why 'deflection' matters-Part II: Places, encounter ...

    To assess the need for multilevel modeling, we first ran a multilevel logit model to obtain the district level variance, likelihood ratio test, and intraclass correlation (ICC). The likelihood ratio test compares the goodness of fit of hierarchical model to...
  • CORT Tool IPR Agenda and Slides _5 28 15

    CORT Tool IPR Agenda and Slides _5 28 15

    DA & Managers - show all contracts number for each department and make all documents available. CO, CS, QAPOC, Auditor - Show all contracts for registered DoDAACs and make all documents available. ... 7-8 Jan 2015. Critical Design Review (CDR)...
  • IB Environmental Systems and Societies

    IB Environmental Systems and Societies

    - Students who have already taken Biology AND who got less than a B in Algebra 1 OR will be enrolled in Algebra 1. - Incoming freshmen who would like more support before taking upper level chemistry or physics courses....