Logo: University of Southern California

Events Calendar


  • PhD Dissertation Defense - Binh Vu

    Fri, May 17, 2024 @ 03:00 PM - 05:00 PM

    Thomas Lord Department of Computer Science

    University Calendar


    Title: Exploiting Web Tables and Knowledge Graphs for Creating Semantic Descriptions of Data Sources  
     
    Committee: Craig Knoblock (Chair), Sven Koenig, Daniel Edmund O'Leary, Yolanda Gil, Jay Pujara  
     
    Date and Time: Friday, May 17th - 3:00p - 5:00p
     
    Location: SAL 322
     
    Abstract: There is an enormous number of tables available on the web, and they can provide valuable information for diverse applications. To harvest information from the tables, we need precise mappings, called semantic descriptions, of concepts and relationships in the data to classes and properties in a target ontology. However, creating semantic descriptions, or semantic modeling, is a complex task requiring considerable manual effort and expertise. Much research has focused on automating this problem. However, existing supervised and unsupervised approaches both face various difficulties. The supervised approaches require lots of known semantic descriptions for training and, thus, are hard to apply to a new or large domain ontology. On the other hand, the unsupervised approaches exploit the overlapping data between tables and knowledge graphs; hence, they perform poorly on tables with lots of ambiguity or little overlapping data. To address the aforementioned weaknesses, we present novel approaches for two main cases: tables that have overlapping data with a knowledge graph (KG) and tables that do not have overlapping data. Exploiting web tables that have links to entities in a KG, we automatically create a labeled dataset to learn to combine table data, metadata, and overlapping background knowledge (if available) to find accurate semantic descriptions. Our methods for the two cases together provide a comprehensive solution to the semantic modeling problem. In the evaluation, our approach in the overlapping setting yields an improvement of approximately 5\% in F$_1$ scores compared to the state-of-the-art methods. In the non-overlapping setting, our approach outperforms strong baselines by  10\% to 30\% in F$_1$ scores.

    Location: Henry Salvatori Computer Science Center (SAL) - 322

    Audiences: Everyone Is Invited

    Contact: Felante' Charlemagne

    OutlookiCal

Return to Calendar