Uploaded image for project: 'Planet4'
  1. Planet4
  2. PLANET-4717

Custom solution for Wayback machine API limitation

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Merged
    • Icon: Must have Must have
    • None
    • None
    • 12
    • Search
    • Sprint #120, Sprint #121, Sprint #122, Sprint #123, Sprint #124, Sprint #125, Sprint #126, Sprint #127, Sprint #128

      Putting data from our P3 database into Elastic Search database

      Tasks

      • Get P3 db data from DevOps team - currently they can only provide a whole db dump (21GB)
      • Create a mysql db locally to dump smaller data extract - also consider workaround of creating db instance in GCP
      • Write script to transfer data into ElasticSearch database
      • NEW - Develop crawler script to crawl P3 sites
      • Thin entire db down so that is easier to manipulate - just leave: description, title, and URL (leave content and image out for now)
      • Transfer data <-- this task will happen in PLANET-4827

            sdeshmuk Sagar Deshmukh
            lreyes Lilian Reyes
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: