Data popularity (Anil)


  • Rucio tracer system identifies when a file is used/downloaded

    • file metadata updated with timestamp and number of times accessed
    • Rucio trace can also send other information
      • account, remotesite, localsite, etc.
        • state (DONE, FAILED, etc)
        • flags for origin (user job, production job)
        • can update popularity based on any combination of this information (e.g. only DONE state)
      • which of these do we want to use?
    • Infrastructure to send trace to Rucio is in place
      • for user jobs, send trace when worker nodes (successfullya?) download input files
      • not clear how to handle on production system
      • if rucio download is used, trace is sent automatically, otherwise may need to modify gb2_ds_get
  • Discussion:

    • should focus on gbasf2 jobs accessing mdst and udst, since this will provide information on which samples need to be replicated and which are not being used (feedback for working group leaders)

gb2_ds_rm (Justin)

  • PR (492) is open for single-file and empty-dataset deletion
    • two new methods added
      • listDirectory to avoid calling methods directly from FileCatalog in the gb2 tool
      • removeFileAsync to avoid rewriting a method that may be used elsewhere