Skip to content

Category Guide

kcclaffy edited this page Dec 9, 2022 · 3 revisions

Category Guide

Categories allow for classification of objects (rows) or properties (columns) in a table.

Consider two tables that both contain information about an organization: "ASes owned by an Organization" |org_id (Organization)| member (ASN)| and "Profits in 2022" |stock symbol (Organization)| name | profits|. The column names alone (org_id, stock symbol) do not reveal that the two tables describe the same type of entity (Organization). The Category notation enables the data curator to indicate that both org_id and stock symbol are organization identifiers, which means that if one had a mapping from org_id to stock symbol, one could join the two tables.

Contrast an Organization and People Table. A row in the organization table represents information about an organization, e.g., the organization's name, size, and earnings. This Organization Table will have the category "organization." A row in the Person Table contains a person's name, age, and organization. Only the organization column in the Person Table will have the category "organization".

This is not an exhaustive or strict list, but a guide to assigning categories in our data Schema.

  • Categories are nested. This allows for a general or more specific search. A user can search for ip which would find both ip.address and ip.prefix. A search for location would find both location.country and location.city, but a search for location.city or city will only find city.
  • Categories may also have short names, which will help us to display them concisely in UIs. internet.ip.prefix is the same as prefix.
    • Different data providers may have used the same short name for different objects; the user will need to be more specific to avoid the collision.
    • CAIDA's catalog name space will have a single preferred mapping for each short name.
  • When searching, displaying, or defining schemas short names can be used.

Internet

  • internet
    • internet.ip (ip)
      • ip.address
        • ip.version_4 (ipv4)
        • ip.version_6 (ipv6)
      • ip.prefix (prefix)
      • ip.port (port)
      • ip.protocol
      • ip.flow (flow)
        • ip.flow.source (flow_src)
          • flow_src.ip.address (src_ip)
          • flow_src.ip.port (src_port)
        • ip.flow.destination (flow_dst)
          • flow_dst.ip.address (dst_ip)
          • flow_dst.ip.port (dst_port)
        • ip.flow.protocol
    • internet.as (as)
      • as.number (asn)
      • as.name

Location categories

  • location
    • location.geoname_id
    • location.usa_zip (usa_zip)
    • location.address (address)
      • address.usa_region_city_street_zip
      • address.country_region_city_street_zip
    • location.road (road)
      • road.cities
      • road.name
    • location.city (city)
      • city.name
      • city.country_state_name
      • city.iata
    • location.region (region,state)
      • region.usa_two_letter
      • region.name
      • region.country_name
      • country.name
    • location.country (country)
      • country.iso2
      • country.name
    • location.continent (continent)
      • continent.name
      • continent.two_letter

organization

  • organization (org)
    • org.caida_as2org_id
    • org.stock_symbol