Attending: Jens (chair+mins), Winnie, John B, Marcus, Daniel, Matt, Steve, Sam, Lukasz, Duncan, Tom, Govind, Brian


0. Operational blog posts...

  * Any blog posts that should have been written
    * Currently Quarterly Count (=Qount) is Marcus=3, everybodyelse=0
  * Any "small VOs" that should have been progressed
  * Any other operational issue

 -> Lots of GridFTP transfers into QM causing hammercloud timeouts.  Also lots of data going out.
    with StoRM one can "just" add another GridFTP server whereas with DPM one would have to add a
    full disk server.  Storage boxes in general seeing high load (in Lustre), access patterns being
    mixed rather than, say, sustained writes.  Getting 5-6 Gb/s; perhaps nodes are under powered?
    Woudl need more threads than CPUs?  Also need to look at fasterdata recommendations again.

    HDFS "not great for local users" (POSIX) - in general we are interested in technologies to be
    "carried forward" as options for future T2Ds and Cs (for caching).  HDFS (Luke) with RAID6,
    one OSD per box.  Disk servers are also WNs whcih is not necessarily recommended...

    Could also raid at other levels, e.g. RAIN (=nodes).

    Access also depends on local access protocols, e.g. whether POSIX is needed at all.  At Glasgow
    users use direct IO with xroot.

1. Update on the "einfrastructure" stuff, discussion

    There was a general report on einfrastructures in general and the fact that GridPP is one in
    particular - a successful one, supporting not just WLCG, and arguably a Science DMZ although
    the interest in Science DMZs depends on whom you talk to (among networkingologists for example).
    Nevertheless as the end-to-end workshop demonstrated, there is interest in the concept.  Jens
    had asked about the value of "new" networking technologies (not being a networkingologist as
    such) in building more dynamic Science DMZs.  Duncan had a set of ATLAS presentations which
    he'd send, for Jens' general edification.

2. Sam's report on DPM collaboration meeting, including exciting bullet points:

  * DOME
  * testing request for DOME
  * SRMless DPM drive
  * distributed DPM
  * Centos7 timescale (and testing call) "caching DPM"

    DOME will be replacing RFIO but can coexist.  It is more HTTP and ReSTful and needs testing!
    DOME itself is SRMless and support for SRM will require running the "legacy" parts, still.
    Note that as regards the space token (used by ATLAS in particular), SRM will still be required,
    whereas DOME can set per-directory quotas (which are obviously addressed via the path), via
    the imaginatively named quota tokens.

    Geographically distributed DPMs should be possible, a la the NorduGrid dCache.

    DOME expected to be production ready by end of Q4, ~1.9.0.  So future T2s might run distributed
    DPMs across sites (or of course dCache!), although if we were to go that route, some sort of
    data migration plan would be needed.

    1.8.11 expected to support caching, via different levels of cachiness, i.e. not {0,1} but more
    like [0,1].

    CentOS7 update: official beta "available now", people interested in testing should consult
    EPEL testing repo.  Full support expected in 1.9.0.

    In general there didn't seem to be much interest in distributed DPM at this stage, at least
    not outside the UK (we are more open minded, you see).  Caching was of interest also to FR and
    ES, DOME+RFIO also to FR.  SRMless DOME needs testing - volunteer from the UK??

    As ATLAS tend to use space tokens, testing could focus on CMS.

    Sam will check options for contributing with the board.

3. AOB

    NOB

Lukasz Kreczko: (27/04/2016 10:07:52)
do not forget HDFS ;)
Duncan Rand: (10:08 AM)
dcache
Lukasz Kreczko: (10:15 AM)
HTTPS?
Samuel Cadellin Skipsey: (10:15 AM)
Well, I think ATLAS was planning to test http/s over FTS, so...
Daniel Peter Traynor: (10:16 AM)
also local users like posix access
Lukasz Kreczko: (10:16 AM)
Nice. Might make things nicer for non-HEP
Oh, since I mentioned HDFS: MapR is similar, but has proper POSIX
brian: (10:20 AM)
Aplogies fo rmy tardiuness
Lukasz Kreczko: (10:22 AM)
Is that mainstream SDN? Bristol IoT network. Our local team is also keen on SDN.
http://techspark.co/bristol-becomes-an-internet-of-things-testbed-with-hypercat-initiative/
Daniel Peter Traynor: (10:24 AM)
the word i was looking for is erasure coding! that in the lustre deveploment plan.
Samuel Cadellin Skipsey: (10:25 AM)
ah, okay, but that's not "multiple servers" :) I mean, RAID is also EC...
HDFS supports EC too, of course, although the documentation for it has always been a bit shonky
Tom Whyntie: (10:26 AM)
Yup
Would SNO+ be a good place to discuss re. networking services/technologies? I'm going to need some help on that as we haven't really done anything like that with CERN@school (as opposed to CVMFS, DIRAC, etc.)
Lukasz Kreczko: (10:29 AM)
Solid (neutrino) would like just storage (they process the data as it comes in on 1 node) and they stream at 300 GB/day
med-imaging want to stream data as well, bursts of ~ 10 Gbit/s.
Authentication is the biggest hurdle atm
Jens Jensen: (10:35 AM)
RFIO is probably the oldest code in DPM :-)
Marcus Ebert: (10:39 AM)
are there documents available how to setup a DOME only dpm system?
Lukasz Kreczko: (10:51 AM)
and that's Centos 7.2, right?
Duncan Rand: (10:53 AM)
Please post links to dome talks Sam
Samuel Cadellin Skipsey: (10:55 AM)
I shall try to dig them up and post in the mailing list, Duncan - there weren't any slides in the collaboration board meeting itself
Duncan Rand: (10:56 AM)
thanks