Present:
	Oxford: Ewan
	Edinburgh: Wahid
	Cambridge: Santanu
	Glasgow: Gareth, Sam, David
	Liverpool: Stephen, Mark
	RALT1: Brian, Jens (chair+mins)
	Imperial: Duncan
	QMUL: Chris
	Lancaster: Matt

https://savannah.cern.ch/task/?group=srmsupportuk

20475 C
20474 ? Ricardo sent Matt something to test
17931 + unbalanced disk server load - Sam has beta code to do some rebalancing, making use of DPM improvements (fs waiting - no new writes but files can be deleted)
16729 - check with Chris
16728 - wiki page started?
16727 C
16725 C
16724 C exists in DPM core - should be prod (sites are not all using it)
16723 C DPM got better at checksumming, on demand
16722 O clarify which values are needed
16721 C Lots of studies around - ATLAS GDB before last - ATLAS monitoring available, Wahid will send links (see chat)
16683 C no longer needed
16350 C Done, wiki
16349 C Done, wiki
15359 C Done
15359 C No longer relevant
15357 C Isn't going to work: StoRM needs POSIX ACLs.
15356 C Done


hepsysman,
T1 "EOS" evaluations:
** Criteria for evaluations - which candidates should the T1 be considering?
   How files will migrate from the disk system to the tape system?  They don't,
   or it's managed from the outside eg with FTS.
   - BeStMan on Lustre?
   - Is BeStMan not being supported anymore?  Will they move away from SRM?
** Puppet templates for DPM not out?  Santanu found templates on CERN web site,
   trying to do everything from installation to configuration.
** Two solutions layered on top of storage, StorageD and SDB: would something
   git-like be suitable.
** iRODS can interface to other things, QMUL will be running an iRODS.

Sam and Wahid and Chris will be going to CHEP/WLCG

There shall be a meeting next, nonetheless.


[09:59:49] Stephen Jones joined
[10:00:29] David Crooks joined
[10:01:08] Wahid Bhimji hah
[10:01:19] Wahid Bhimji what a milestone 200m meetings!
[10:01:21] David Crooks left
[10:01:40] Brian Davies joined
[10:02:08] Wahid Bhimji I can't open it
[10:02:19] Ewan Mac Mahon joined
[10:02:37] Jens Jensen https://savannah.cern.ch/task/?group=srmsupportuk
[10:03:08] Mark Norman joined
[10:04:43] Duncan Rand joined
[10:06:32] Ewan Mac Mahon No, but you can set it's fs weight to several bajillion.
[10:08:18] Ewan Mac Mahon So for the rebalancer if you just set every fs other than the target to weight 0, it'll definitely g\
o where you want.
[10:12:21] Sam Skipsey https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Components
[10:12:51] PPRC QMUL joined
[10:13:42] Ewan Mac Mahon I'm not sure anyone actually got fsprobe running, but I think we all decided that it was a) a good i\
dea and b) should be fairly easy
[10:15:40] Ewan Mac Mahon On demand.
[10:15:46] Wahid Bhimji when an atlas job runs on the data it picks corruption up
[10:15:53] Wahid Bhimji which I think is good enough
[10:16:27] Ewan Mac Mahon Though you could imagine a DPM-fsprobe that would do the fsprobe style background checking, but via \
DPM and comparing with its stored checksums.
[10:16:59] Matthew Doidge joined
[10:20:27] Wahid Bhimji I meant this link
[10:20:28] Wahid Bhimji http://dashb-atlas-job.cern.ch/dashboard/request.py/terminatedjobsstatus_individual?sites=UK&sitesSort\
=8&start=null&end=null&timeRange=lastMonth&sortBy=0&granularity=Daily&generic=0&series=All&type=pfe
[10:20:40] Govind Songara joined
[10:20:58] Govind Songara left
[10:21:05] Brian Davies yes, actually below is an exapmple fo r asibngle site ( oxford in this case)
[10:21:06] Brian Davies http://dashb-atlas-job-prototype.cern.ch/dashboard/request.py/dailysummary#button=successfailures&site\
s[]=UKI-SOUTHGRID-OX-HEP&sitesSort=0&start=null&end=null&timerange=lastWeek&granularity=Hourly&generic=0&sortby=0&series=All
[10:21:41] Wahid Bhimji My link takes you straight to the error codes for all UK sites
[10:23:08] Wahid Bhimji Bestman support is ending - no point in us using
[10:24:05] Ewan Mac Mahon But the action to test it is complete - it was tested, it didn't work.
[10:24:24] Duncan Rand dpm is being prepared for hdfs
[10:24:52] Wahid Bhimji DMLITE indeed may bring us a whole heap of new combos                                                  [10:25:07] Sam Skipsey And yeah, one of them is a hadoop backend.
[10:25:22] Sam Skipsey (Which is actually interesting)
[10:25:32] Ewan Mac Mahon HepSysMan was good; we like it.
[10:25:55] Ewan Mac Mahon Though I'm not completely clear from Sam's talk - which fs is it we're supposed to be using?
[10:26:56] Sam Skipsey I'm assuming you're trolling, Ewan. :p
[10:27:04] Ewan Mac Mahon
[10:27:15] Duncan Rand didn't the consensus end up with storm & lustre?
[10:27:39] Sam Skipsey "Storm and Lustre" is "good enough", yes. And actually works.
[10:27:49] Wahid Bhimji so does DPM
[10:27:54] Sam Skipsey (Which I think is an underrated virtue)
[10:27:54] Gareth Roy left
[10:28:05] Duncan Rand so does dcache
[10:28:43] Sam Skipsey Sure: DMLITE + HDFS would also be good.
[10:28:47] Gareth Roy joined
[10:28:58] Wahid Bhimji so there are your candidates + EOS (shudder )
[10:29:05] Ewan Mac Mahon I think for doing it right now it's got to be STORM+lustre
[10:29:16] Wahid Bhimji bestman has no long term support
[10:29:16] Sam Skipsey (You might notice that I'm increasingly pro filesystem backends that can actually do block level parall\
elism)
[10:29:20] Ewan Mac Mahon All the others are potentially interesting for the future.
[10:29:21] Wahid Bhimji so I'd say forget it
[10:29:31] Duncan Rand don't forget gpfs
[10:29:43] Ewan Mac Mahon No, really, DO forget gpfs.
[10:29:51] Ewan Mac Mahon There be dragons.
[10:29:56] Sam Skipsey And expenses.
[10:30:43] Ewan Mac Mahon Not so much expenses, apparantly. There are some academic deals around (the Oxford e-Research Centre\
 seem to run it for mostly gratis)
[10:30:50] Ewan Mac Mahon But still dragons.
[10:31:40] Duncan Rand sam are you suggesting DMLITE + HDFS for the tier-1? if so how many replicas?
[10:31:52] Ewan Mac Mahon I think Andrew ruled HDFS out.
[10:31:52] Duncan Rand cos replicas cost money
[10:31:59] Ewan Mac Mahon For that very reason.
[10:32:24] Ewan Mac Mahon Something RAIN like might be interesting, but replicas of whole files are just too damn expensive.
[10:32:25] Sam Skipsey you don't have to replicate.
[10:32:53] Stephen Jones We use puppet, without any specific templates.
[10:32:57] Sam Skipsey In which case, it's the block level distribution that's still useful, as it smears load.
[10:33:28] Sam Skipsey (Annoyingly, all the things that do RAIN that aren't expensive are still beta.)
[10:33:42] Ewan Mac Mahon I think HDFS would still count as slightly wierd compared with Lustre as a more mainstream feeling o\
ption.                                                                                                                         [10:33:49] Wahid Bhimji well no replicas kind of rules EOS out too - anyway I found some of the criteria floated as mandatory \
as maybe not all that mandatory so would be interesting to see criteria
[10:34:09] Wahid Bhimji iterate on that a bit and then write down the options
[10:34:21] Sam Skipsey Lots of things rule out EOS, Wahid, last time I talked to the RAL guys they were unkeen on it.
[10:35:07] Brian Davies for last month in UK ~15% of jobs failed due to inpur file missing (possiblysolved by federation copy \
in, also retry?) 15% of jobs failed copying form WN to local Se wchihc wpould be solved by job recovery.
[10:36:42] Duncan Rand what's wrong with EOS?
[10:37:34] Ewan Mac Mahon iRODS on top of SRM might be fun.
[10:37:49] Ewan Mac Mahon For all the ex-NGS would be iRODS users.
[10:38:47] Ewan Mac Mahon And Sanger in the place with the 16PB Lustre that I keep mentioning.
[10:39:06] Ewan Mac Mahon (and some grid front end nodes that don't get any use)
[10:42:24] Duncan Rand 16 PB is not an insignificant volume
[10:42:26] Ewan Mac Mahon Alternatively, get the responsible folks to blog each of the five bits separately.
[10:42:32] Ewan Mac Mahon What he said.
[10:43:07] Ewan Mac Mahon Jasmine.
[10:43:42] Ewan Mac Mahon What you need to do, Brain, is blog more
[10:45:10] Govind Songara left
[10:45:15] Wahid Bhimji left