Attending: Jens (chair+mins), Daniel, Winnie, John B, Steve, Govind, Marcus, Duncan, David, Gareth, Sam, Ewan, Matt, Brian

Apologies: Tom (Internet access issues)


0. Operational blog posts

Marcus sent his preferred address for blogging and has been invited to
the blog.

Winnie's primary name node went kablooie and the fallback failed to
operate properly.  Luke fixed it; not sure why it broke but would be
worth understanding?

Govind's issues reported to the list.  The file system listing - since
it dies for a specific file, it may be a database problem and Govind
should run the dpm_dpck tool.
* Don't run on a live system, or at least not with automated fixing turned on.
* Aim to use the next version coming out in the next release as it will
  have bugfixes and goodness.
* Sam will double check and mail the list.
* Ewan has run the tool.  Suggest to run it manually, i.e. manually
  check the fixes it is suggesting.  In Ewan's case it had suggested fixing
  things on non-existing and long gone disk servers.

Govind's second issue was on the file listing - biomed wanted file
listing with time stamps.  If the DPM listing is like CASTOR's nsls
(with -lR switches) then it generates directory names followed by a
listing of the contents of the directory; these could be put together to
provide full path names for files using something simple like an Emacs
macro or a Perl script - in Govind's case, biomed only have 23K files.

As an aside, if each VO wants a different flavour of listing, that
obviously makes lives harder for us and it would be easier if a joint
listing (such as syncat) could be produced and the local flavours could
be derived from it.  However, ATLAS had looked at syncat and declared it
"too unwieldy" - however, something flexible might still be possible
locally.

1. Storage related summary of last week's hepsysmen and women -
particularly Sam's storage singalong on Friday.

The background to this is that Sam has been nominated to speak for
sites - all WLCG sites - at the coming WLCG event in Feb.  The topic
is "medium term evolution", meaning ~2-10yrs.  Presentations expected
from experiments - a joint one! - developers, likely led by dCache but
obviously with input from others, and from sites (Sam).  Sam's search
for input from sites has until now been relatively fruitless but Alastair
Dewhurst from T1 has promised input and hepsysman was another opportunity.

But future thinking can include stuff like what future interfaces would
look like, and if we also need to support broader user bases - beyond
WLCG - perhaps we should look at the "other" interfaces even if they
are not standard ones and they are less efficient.

Also relevant to future stuff is hardware evolution (for storage); Sam
presented thoughts that RAID6 would no longer provide the protection
it currently provides due to longer rebuild times.  Disks are expected
to continue to grow in capacity over the next 10 years, but not in
performance (cf. shingled drives).  Other things may become attractive,
like erasure codes or specific implementations like HDFS, or CEPH.

Whatever it turns out to be, if it is not the same as today we will
need a migration strategy, so this should be a focus for the workshop -
and could also usefully be raised at the ATLAS Jamboree.

There is no DPM Skunk Works to develop secret stealth DPMs or super high
velocity back ends.  RAL famously has a CEPH team that have been working
for a while getting CEPH into production; CERN have a EOS-CEPH interface
and RAL has been testing *->CEPH.  For the grid we would of course need
GridFTP, xroot, and BDII.  And maybe HTTP/WebDAV.  There was an attempt
at DPM-to-HDFS which did strange and a bit hacky things to the metadata.

Is there a role for HDFS or CEPH in a future T2?  Of course some T2s
are already running HDFS although they are predominantly in the US and
use BeSTMan.  dCache has a CEPH backend and may be the better way to
run a Grid interface to CEPH?

While RAL has spent a lot of time getting its CEPH into production,
far more time than a T2 could allocate to it, RAL has also spent a
lot of time getting CEPH into production and a T2 might be able to
benefit from this, as long as things don't come out with "RAL" written
on them (Quattor anyone?)  In which case, *if* CEPH is to be an option
for a future T2, we need a volunteer T2 to test it as a secondary SE.

A secondary SE could be added to ATLAS's AGIS [thanks to Brian for
explaining].  Edinburgh currently have something like a second SE
configured.  Migration could be accomplished by marking the old SE
read-only and eventually churn would move much of the data - in theory.

Ewan, who is about to leave GridPP, cheerfully volunteered Glasgow to
be such a site.

2. Catalogue topics revisited (cf Govind's question and the ongoing saga
of producing catalogues for the experiments)

Apparently the DPM listing is like the CASTOR one, which lists the
directory name and then the list of files (or subdirectories) in said
directory, e.g.,

/castor/ads.rl.ac.uk/prod/biomed/disk/:
drwxrwxr-x   2 bio001   bio                       0 Sep 18 22:17 01788d86-2629-4127-9799-335afee36e0d
drwxrwxr-x   2 bio001   bio                       0 Mar 05  2015 026a272d-5310-4d15-9d72-4f6b6231ea5d
drwxrwxrwx   2 bio001   bio                       0 Dec 11  2013 0bc26966-6ad1-4432-99e6-6bc10eb16de7
drwxrwxr-x   2 bio001   bio                       0 Nov 07  2014 0c25fd70-58de-4f4f-ba4c-42f4c45618de
drwxrwxr-x   2 bio001   bio                       0 Sep 18 21:13 17d17510-e029-4959-bee4-2ad333a0b8d7
drwxrwxr-x   2 bio001   bio                       0 Jul 13  2012 232bcc1a-38d1-4cf8-9d7f-3ef40bdafded
drwxrwxrwx   2 bio001   bio                       0 Jan 09  2014 27231445-5da2-43f4-bb2c-8366093b284e
drwxrwxr-x   2 bio001   bio                       0 Jun 29  2015 28f628e4-bd1d-4273-9f54-076e4f326349

So what one needs to write is a script to take this information and
throw away stuff that is not needed and glue the filename onto the
directory name.  Simple!  Whoever does it gets to choose a sensible
scripting language (Perl, bash, python, scheme, ruby).  Maybe timestamps
need a bit of reformatting?


3. AOB

   NOB


[1] https://www.eff.org/pages/https

Steve Jones: (20/01/2016 10:03:43)
Went Kablui like the Nipigon Bridge?
Paige Winslowe Lacesso: (10:05 AM)
Kablooey
Steve Jones: (10:06 AM)
I stand corrected...
Ewan Mac Mahon: (10:20 AM)
The mental image that comes to mind for a 'joint position of all the experiments; is a game of Twister.
IMO there are two site positions:
a) everything's going to have to change to be more Ceph
b) please don't change anything we have neither money nor effort
(with people holding both of those, not picking one)
Daniel Traynor: (10:28 AM)
Dynamic Disk Pools instead of raid6
https://github.com/mar-file-system/marfs
Samuel Cadellin Skipsey: (10:32 AM)
(This is because DPM doesn't actually have the manpower.)
Ewan Mac Mahon: (10:34 AM)
I think the long term future of DPM is that there isn't one.
Daniel Traynor: (10:34 AM)
marfs looks likes like interesting soslution to glue storage systems together+tiered storage
we have got ceph / lustre / hdfs solutions woring
working
Samuel Cadellin Skipsey: (10:37 AM)
with StoRM on top, Dan?
Daniel Traynor: (10:37 AM)
ontop of lustre yes
I ment ceph / hdfs working at other sites
Brian Davies @RAL-LCG2: (10:38 AM)
https://twiki.grid.iu.edu/bin/view/Documentation/Release3/HadoopOverview
Ewan Mac Mahon: (10:38 AM)
The way this works with academic funding though is that you have to jump of the top of the cliff and hope someone funds the trampoline at the bottom before you get there.
If you don't jump, it's seen as there being no need for the trampoline.
Daniel Traynor: (10:40 AM)
although Terry had ceph working for his business for at least a year now (in production, earning money) 
Ewan Mac Mahon: (10:42 AM)
Ceph clearly works. The questionable bit is the gridftp/xrootd interfaces.
But they're not /that/ questionable since the Tier 1 has basically comitted to them. They did already jump off the metaphorical cliff, so they're going to have to make it work.
AIUI they don't have a "let's call the whole thing off" option in the plan.
Steve Jones: (10:44 AM)
Let's draw straws!
Paige Winslowe Lacesso: (10:45 AM)
Sorry, another meeting, must -
Ewan Mac Mahon: (10:48 AM)
I think ceph arguably is the conservative option - ceph itself demonstably work at scale and it's Red Hat backed, which is always nice.
And assuming that the Tier 1 stick to (what I think is) the plan, then they'll definitely be doing that, so you'd be going the same way rather than another way, and regardless of the merits of the individual ways, the consistency has some value.
Also, does HDFS to erasure coding?
Because I'm assuming that Tier 2s aren't going to wear the cost increase of going for replication.
Duncan Rand: (10:51 AM)
There is always dcache...
Daniel Traynor: (10:51 AM)
lets have a storage bakeoff