Attending: Tom, Jens (chair+mins), Winnie, Duncan, John B, Matt, Elena, John H, Ewan, Gareth, Sam, Marcus, Luke, ... David, ... Pete

Apologies: none received.

0. Operational blog posts...

   Two blog posts for the quarter (Brian, Daniel) - need a few more...

   Simon's problem with firewalls was not a site firewall issue - his
   node is already outside the site firewall - but one of firewalling
   the individual nodes using the system's firewall.  DPM documentation
   seems to be out of date as none of it takes xroot into account, and
   xroot does not seem to like being firewalled.  Raise with either DPM
   developers or xroot.  Note that DPM takes additional components such
   as GridFTP and xroot pretty much as-is, only adapting slightly as
   necessary, as opposed to dCache which reimplemented xroot.

   Unhelpfully, David Smith has been reshuffled; he would have been the
   right person to talk to.

   More generally, the firewall question will pop back up because people
   are debating the feasibility of moving large volumes of data between
   academic sites in the UK.  Sometimes specialised infrastructure is
   necessary such as Jodrell Bank's e-Merlin which used UDP on 30 Gb
   networks - and that was 10 years ago?  More?  Anyway, the MSDC would
   surely be interested in the general question.

1. Report from the WLCG workshop (Duncan and Sam)

   ATLAS: "reduce number of SRM endpoints by ~10"
   Patrick: NorduGrid did that ages ago, with dCache
   Similar approach in UK with reductions to T2D sites

   Zephyr (Ian Bird) - ideas being followed up despite not being funded.

   xroot caching at Stanford.  Used by CMS only so far, and on "US style"
   networks, so may not generalise automatically to other experiments
   or Europe - needs more testing.  We (GridPP) could test but would
   need volunteers.

   SRM "retirement" - sites feel they can do without SRM and (some) VOs
   do the same; however ATLAS use SRM at T1s.  Remark about "esoteric
   protocols" (Sam) similar to our very own Andrew McNab's proposal to
   use HTTP N years ago (where N could be 10).  What would an HTTP cache
   look like?  There is lots of work on caches, e.g. squid, but tend
   to cache files rather than byte ranges, and also don't do ACLs like
   the SE itself would.  NIKHEF work on CDN [this sounds eerily similar
   to the WLCG Amsterdam meeting in June 2010 - but then again it was a
   "future" meeting]

   Rucio already "obscures" filenames.  Some heated discussions between
   experiments, possibly even based on misunderstandings.

   Could imagine a model with T2 as object store and Rucio managing the
   access controls.  Rucio creates its own unique naming which in turn
   leads to the empty directories being created.  Also important whether
   file is a cache (temporary copy) or permanent [this is similar to
   SRM where the TURL may be given to the temporary copy...]  Online
   documentation for Rucio is not up to date and not terribly useful
   anyway, so one will need to read the Documentation (aka source code).

   One endpoint, many sites.  DPM multisite use in Italy.  Also anonymous
   http federation, dynafeds:
   https://svnweb.cern.ch/trac/lcgdm/wiki/Dynafeds
   DPM considered "lightweight" and "agile" enough to update to changing
   requirements and evolution.
   
   There is a general point about jobs requesting a file - we have
   already discussed remote access vs creating local copies.  ATLAS tend
   to cache the file, and do multiple reads.

   Another general point about metadata.  If your file has structure, it
   may not be read sequentially, or in pieces, because you'd need to
   check its integrity, indices, metadata, or something, so you'd need
   more than just a segment.  (Similar issues being discussed e.g. in
   climate data)

   Storage infrastructure may change - indeed may have to change - to
   meet future requirements.

2. Report from the ATLAS Jamboree (Elena)

   Storageless sites; federation; cache; protocols for cache - IOW quite
   similar to the report from the WLCG workshop.

3. AOB

   NOB


Ewan Mac Mahon: (10/02/2016 10:04:38)
If people are bored running stable production services then having some components for their distributed computing system arbitrarily refuse to talk to other bits is an excellent way to make everyone's lives more interesting.
Matt Doidge: (10:08 AM)
Undocumented ports are not a good reason for not firewalling though.
Ewan Mac Mahon: (10:09 AM)
No, the sheer total pointlessnes is.
The fact that it makes things work better is a bonus.
Worth having though.
Working is nice.
Jens Jensen: (10:12 AM)
Ian Collier would be interested in zephyr
Ewan Mac Mahon: (10:14 AM)
Are we really that far behind the US in availability of network capacity? I know we mostly don't have our big sites connected faster than 10Gbit/s, but that's mainly because we don't/haven't needed to.
We could upgrade a bunch to 40Gbit. 100Gbit might be a stretch.
Duncan Rand: (10:15 AM)
Most US T2s are 100 Gbps I think.
Ewan Mac Mahon: (10:16 AM)
But also with a lot more CPU, right?
Matt Doidge: (10:16 AM)
%&^% glexec
(not a fan)
Ewan Mac Mahon: (10:16 AM)
Bandwidth per node isn't wildly different IIRC (which I may not)
Duncan Rand: (10:16 AM)
I would be interesting to note what larger UK sites have. IC: 40, QMUL 20 (I think), RHUL: 10. 
US sites are similar in CPU to us I think.
Ewan Mac Mahon: (10:18 AM)
(and on the current topic, it's not like there aren't non-PP communities that care quite a lot about http CDNs)
Jens Jensen: (10:31 AM)
So you'\d need to give hints to the storage/cache system... (like SRM :-)
... like hdf5 I guess
Similar activities in climate, to optimise metadata (in NetCDF)
Lukasz Kreczko: (10:35 AM)
yup, SOLID also looking into HDf5
Ewan Mac Mahon: (10:46 AM)
This is the 'for research' intro page:
https://www.gridpp.ac.uk/users/research/
It does not say "Here's Tom" it says, "here's the user guide".
But yes, I agree - force subscribing people as a condition of access seems reasonable.
Marcus Ebert: (10:47 AM)
it could be helpful to have directly on the gridpp webpage a new item like "join gridpp" or "for new members", and there then the links to documents somewhere in the wiki and to mailing lists that are important for joinning VOs