Attending: Jens (chair+mins), Winnie, Gareth, Sam, Brian, John H, Robert, John B, Ewan, David, Pete, Raul, Elena

0. Operational blog posts as usual, plus end of quarter stuff for Brian
and Sam in particular

For the EoQ stuff, Jens needs to report to PMB on the publications, etc.;
since we had a lot of interesting events in this quarter - ASGC, CHEP, HEPiX,
hepsysman, and a WLCG workshop, there ought to be a year's worth of good
stuff.  Which is OK, this part of the system was designed to allow a year's
worth of publications in one quarter.

Speaking of reporting targets, we have five blog posts (Jens four,
Brian one) which is less than the target eight but then we also had the
Purdah and there are only eight posts altogether for the whole year so
far on the GridPP planet aggregator, two of which are from SSI (or are
some missing?).  So we are kind of doing OK, relatively speaking.

Winnie raised an operational issue associated with the publication of storage
paths.  Many storage systems manage storage across aggregated resources,
providing a unified addressing mechanism: but Bristol have GPFS so already
have a unified storage system under their SE.  So the DPNS namespace is kind
of similar to the GPFS namespace, except you'd probably chop off a top bit of
one and add something?  Like

/dpm/gridpp/vo.gridpp.ac.uk/ <-> /gpfs/gridpp/vo.gridpp.ac.uk

?  CMS and ATLAS do not use the path being published in the information
system, nor do they use the "close SE" environment variables but it still
makes sense to publish this information because some tools may use them and
some VOs may use them.


1. Semi-regular update on new/small(ish)/non-LHC  VOs, and update on WebFTS?

Update on DiRAC - ticking along, some 40-60TB in 15K files, coming in
at a rate of ~250MB/s but would like to scale to ~400MB/s if pos during
the summer months when students' netflix and facebook traffic is evicted
from Durham's networks.

Useful as another GridPP case study: currently outstanding issues are
(a) proxy - probably needs a robot but maximal lifetime of a VOMS proxy is 24
hrs, a policy decision.  Transfers can work by DN alone so doesn't need
vomsification, but accounting would be improved with VOMS data.

(b) Ownership - need a new user for each of the other sites.  Local file
ownership in Durham is of course lost as only the file is copied, not its
associated metadata (as it would be if you'd tarred it or dumped the
filesystem) so maybe the best solution is to maintain a file with chmod and
chown instructions inside them - crude but it'd work.

(c) Data security in general; if they start doing griddy things at the end
user level then we might need to start thinking about access control
permissions - this may be an issue also for other VOs.  For example, Hydra can
manage fine grained ACLs but GridPP is not running Hydra.

(d) BDII publishes information as D1T0 or D0T0 when it is D0T1 - probably a
config issue - Brian raised a ticket in RAL's helpdesk.

No other news on other VOs.


2. Yes, let's discuss the CMS thing Brian sent around today/yesterday
(on 30 Jun 2015).

Background information - CMS proposal for sites:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/SpaceMonSiteAdmin

and more generally the syncat format for SEs, or as close as we can get:
https://twiki.cern.ch/twiki/bin/view/LCG/ConsistencyChecksSEsDumps#Format_of_SE_dumps

Preparing a dump weekly-ly and uploading it manually may be too onerous.
Brunel had a go, dumping about 1M files but had too much metadata? and would
like the see the requirements for the whole exercise.

Clearly the format is different for different SEs, which is not great, so we
have some sympathy for CMS's position/proposal.  UK sites that advertise
resources for CMS are (in the order dumped from BDII) RALT1, QMUL, RALPP,
Brunel, ECDF, RHUL, UCL, SHEF, IC, BRIS, DURHAM, OX, LIV, BHAM, LANCS, CAM,
GLASGOW, MAN, i.e. pretty much everyone.

Also not 100% clear whether it's done by SE type or by underlying storage or
both (for example, StoRMs are capable of managing several types of
filesystems, where dCache is more like an integrated "solution" and CASTOR is
Different(tm)).

To summarise our feedback arising from this discussion:

	1. We'd like to see the requirements documented, so T2s can see how
	   they can contribute, other than just providing technical feedback

	2. It'd be best to use a single format such as syncat

	3. We suggest that the catalogue be stored in a single well defined
	   location, e.g. <VOPath>/acct/syncat.xml

A VO may have more than one VO path - probably in a T1 - but this would
probably be figureoutable.  In this case, no extra tools would be required and
everybody's lives would be easier.


3. AOB

Sam and Ewan suggested a discussion on the technical layout of what a future
T2 might look like.  The advance of CEPH and of archive drives e.g. using
shingled storage, could be that a T2 could support predominantly-reads
patterns on such storage and thus achieve a better capacity/cost ratio than
traditional high end storage systems.  Such a T2 would be like a Classic SE,
providing GridFTP and xroot access.

A test would need to be conducted at a useful scale to be, er, useful -
say a PB.  As there are risks associated with it we need something like a
business case: here's an opportunity, there are risks associated with it,
here's what we'd plan to do and what it would cost.  We then present the
case to the PMB and see what they say.  Ewan had started a document but
it was just started, so he'd send it around to some people for suggestions
and contributions.

Sam and Ewan had different things they wanted to test:

	1. Ewan wanted a new setup, say the aforementioned PB, with a
	   realistic but predominantly reading access pattern
	2. Sam wanted to look a case of new disks in old chassis;
	3. and also points out the need for a migration to a classic SE of
	   this form.

The outcome, other than the study itself, could be the model of a future T2
with even better bang for the buck than today.

However, we recognise there are risks and the presentation (document) to the
PMB would need a careful risk register.  For example, CERN have run a large
scale CEPH instance (30 PB disk) and seen "some scaling problems."  With 7500
"OSD"s, those problems would not affect a T2.

Also, although access patterns are currently different, Brian suggests
that ATLAS and CMS may in the near future begin to use T2 disk in usage
patterns similar to T1 disk (ie more scratch use), which would rather
scupper the case for archival drives.

And CEPH may need SSDs for the metadata/journaling stuff: The T1 CEPH team is
known to have "played it safe" on the hardware and new nodes tend to have
non-RAIDED disks but with SSDs for the metastuff.


Paige Winslowe Lacesso: (01/07/2015 10:03:51)
Im asking if the pool accounts USE environment variables that yaim puts into grid-env.sh - it's NOT dpm-specific; I know cms doesn't
And I think ops does not either?
There are *no* DP-anything env vars set for pool accounts.
partly, yes. Email reply = I could've studied it further & less sound intereference - can you send that data in an email to me?
Ewan Mac Mahon: (10:13 AM)
It's also what a tarball does.
But I think the 'fork' file idea with the metadata in a magic file would do fine.
Especially if they can create them on their filesystem at backup time.
Jens Jensen: (10:14 AM)
Thanks...
Samuel Cadellin Skipsey: (10:14 AM)
Ewan: so, we've just invented the old OS7 filesystem from the 1990s? 
(not that I object to this)
Ewan Mac Mahon: (10:15 AM)
It does have its similarities.
Though it could be a single file (basically an ls -lR) with the metadata for everything in, rather than a per-file resource though.
It's syncat with a /lot/ of knobs on, by the look of it.
Jens Jensen: (10:18 AM)
Could you do an XSLT to convert one XML format to another?
Ewan Mac Mahon: (10:19 AM)
Not quite sure why they need any of the knobs.
The 'make a namespace dump' shouldn't be too hard work, but the whole quasi-phedex validation and upload stage looks rather over cumbersome.
(also, on a minor point, I'm not sure I expect Tony Wildish to be subscribed to the gridpp-storage list, and, while I haven't asked, I'm not sure Daniela did either)
Jens Jensen: (10:44 AM)
As I understand CEPH it needs to have a fairly large number of nodes... (from the T1 ceph team)
Or we increase the capacity by being able to buy more
Ewan Mac Mahon: (10:48 AM)
I'm actually not too worried about migration, btw; there's a lot of churn on our ATLAS data so if we marked the old thing offline and the new thing online a lot would shift 'naturally'.
And then you'd FTS the rest.
There are several ways to get 'cheap' kit, of course, there's low spec from a performance POV, and there's low spec from a resilience POV.
You can get a lot of oomph for not too much money if you're using i7 CPUs, non-ECC RAM etc.
We pay a significant premium for making our machines reliable, and maybe they don't need to be.
Jens Jensen: (10:51 AM)
But even T1 will move D1T0 to CEPH
Gareth Douglas Roy: (10:52 AM)
I agree but I'm not sure how far down the cost scales
Ewan Mac Mahon: (10:55 AM)
Be fun to find out though, wouldn't it?
We've got this question from DB about why we're spending more than the ~Ł30/TB that he can buy a USB disk for, and we should try to answer it.
Robert Wolfgang Frank: (10:57 AM)
http://www.storagereview.com/seagate_archive_hdd_review_8tb