Attending: Gareth, Jens (chair+mins), John B, John H, Adam, Winnie, Sam,
Brian, David, Elena, Matt D, Jeremy, Duncan, Steve, Robert, Govind, Ewan,
Peter L; Dave C at 10.28.

Special guest stars (from RAL): Tom Byrne, Bruno Canning, Alistair Dewhurst,
George Vasilakakos; Tom presenting.


0. Operational blog posts

   Sam has things he ought to report in the blog.  Many people have things
   they ought to report in the blog...

1. Wahid's report from last week's pre/GDB

   Wahid's presentation
   https://indico.cern.ch/event/319743/session/1/contribution/2/material/slides/1.pdf

   * CMS still use SRM.
   * Issues with DAV for deletion.  SRM quicker for bulk deletion.
   * Choice of space if not using space token (xroot or WebDAV)
   * CEPH - needs an extra layer anyway to manage the ingest and retrieval of
     data.
   * Support for 3rd party transfer essential - like GridPP
   * Need to remove RFIO as protocol, also for DPM's internal communication.
   * If only BringOnline is used in the SRM API, there'd be little need for
     SRM even for tape, as BringOnline alone could be implemented more simply.
   * lcg-utils still being used despite being first deprecated and now not
     supported.

   Note the inertia: people are still using RFIO with CASTOR (at CERN), and
   still using lcg-utils.  It takes a long time for people to change,
   particularly when they have few resources to modify their middleware.

   dcache talked about standards - there are three kinds, if you will, those
   from a standards body such as IETF and OGF (like HTTP, DAV, resp. SRM and
   GridFTP), those that are de facto standards but arising out of our own
   community (such as xroot), and those that are de facto standards but owned
   by someone outside our community (like S3).  The risk is obviously higher
   with the latter as the chance that the protocol being changed by
   circumstances outside our control is higher.

2. Update from the RAL CEPH team

   EOS Diamond (as usual, no relation to either EOS or Diamond) plugin for
   xroot; worked on firefly which is the older release.  Infrastructure
   currently running on older hardware, RAL saw bad performance and then the
   cluster died: experience shows that we need one OSD per core, and usage is
   not necessarily balanced.  Guidelines 1GHz/OSD; 1GB RAM/OSD (RAL seeing
   ~700MB/OSD).  All the nodes in the cluster should be kept up to spec.  RAL
   using EC, not replication.

   A major upgrade of the not heavily loaded test cluster took 30 mins; a
   similar upgrade on a production cluster went well except some OSDs needed
   restarting, and one was wiped as the filesystem had corrupted and it was
   easier to rebuild it than to try to rescue it.

   Gateway - easier to get things out the same way they were put in.

   S3 support in FTS: you will need to give it your key...

3. AOB

wahid: (21/01/2015 10:04:12)
https://indico.cern.ch/event/319817/
https://indico.cern.ch/event/319743/session/1/contribution/2/material/slides/1.pdf
Samuel Cadellin Skipsey: (10:11 AM)
It should be noted that there's a conflation of "protocol" and "device type" when talking about CEPH. CephFS, S3, RBD "storage" hosted on CEPH aren't mutually compatible at the low level, but that's not because of the protocol they are accessed by, but because they represent entirely different types of "virtual storage space".
There's nothing stopping you from writing multiple *protocol* shims that all back onto the same kind of CHEP high-level storage paradigm.
Ewan Mac Mahon: (10:33 AM)
So is that one core per disk? That's somewhat CPU heavy compared with what we do at the moment.
Michael Adam James Huffman: (10:34 AM)
Isn't it 1G RAM per OSD?
(as the recommendation)
Ewan Mac Mahon: (10:36 AM)
That's a /lot/ of CPU. Say 1PB cluster =~ 250 4TB disks (or 750 disks for 3x replication) that number of cores is a respectable compute cluster in its own right.
Jens Jensen: (10:43 AM)
:-)
(May need to speed up a bit - we should finish by 11 at the latest...)
Samuel Cadellin Skipsey: (10:51 AM)
(It's called radosstriper for those looking for it in the source.)