Attending: John B, Steve, Robert, Rob, Raja, Matt, John H, Brian, Jens, Chris B, Ewan, David, Martin, Chris W, Pete, Alessandra, Raul, Elena 1. WLCG workshop summary https://indico.cern.ch/conferenceOtherViews.py?view=standard&confId=251191 Who else went to this workshop? Shaun, and Dave C. Wahid attended only some of it. Much planning for run 2; increasing flexibility in T1/2 (and hence in use of disk and tape, like using tapes for archives and backups and disks for working repositories, etc.) Need some more testing in prep.; eg xroot with FTS, support for "small" VOs. xroot federations in production but need to share lessons learned. Good monitoring, but now need to understand requirements. Is there scope for collaborating on big data? We had this activity in the summer which was useful, but WLCG may start up an activity next year. Also talk about "cloud storage" - will probably need a HEP-aware layer on top of it, or not? Improvements to ROOT. IO likely to be stress tested prior to second run; what are the requirements? Need to make sure GridPP is ready for the challenges. In general, things are evolving, using more standards products, instead of custom built. 2. HEPiX summaries from Chris B and Martin http://indico.cern.ch/conferenceDisplay.py?ovw=True&confId=247864 FNAL will be running an extra dCache instance, and also, interestingly, EOS. There was some discussion about OpenAFS and YFS (a derivative(?)). Specifically, if OpenAFS is to support IPv6, it will need some funded effort. dCache itself will get support for "mutable files" so should be able to support "home" style storage. Mutable files are likely to be mutable until they are fixed (duh) but can then be migrated and/or replicated. So like disk/tape storage...? Are there questions about performance? We should wait and see, may appear in 2.8. Lots of talk about Condor and Puppet; Torque seems to be used less. CEPH (again): FS interesting but not quite production ready yet, looking good though [so same as before then] Talks on clouds, but mostly compute end, eg OpenStack. CERN and others deploying dropbox style storage for internal users. Some discussion about preservation, either bit or DPHEP - at the bit level there was a survey of archives. Also the talk about reliability of drives, looking at the workload of the drive as the parameter. Desktops do lots of idling, whereas a WN (or disk server) would see heavy use. Other talks about identity management, security, IPv6, etc. Tier 1 procurements - buying disk servers, 4TB drives instead of 3TB, so total about 120TB. Single CPU, 64 gig RAM, 10Gb. 4810 switches. Systems with WD SAS and WD SE, supermicro 36 bay chassis. 130TB usable file space, no hot spares. Vendor proposing RAID60. Generally systems start decommissioning around year five. Martin will speak to Dell at SC, will report back. T1 have vendors run tests, basically based on IOZone; vendors meeting minimal requirements are considered. Which benchmarks are useful, what are the requirements? Martin will share some information with the list (reqs docs.) There may also be a WLCG working group which could do stuff in this area. [10:06:26] David Crooks (Sam and I are a bit late because we've been sent out the office by joiners) [10:11:24] Christopher Walker Sounded good. [10:11:39] Christopher Walker I don't consider xrootd as a production service quite yet [10:12:07] Martin Bly Microphone isn'rt working atm... [10:12:20] John Bland chris: redirection or the service itself? [10:12:47] John Bland (I would agree on the former IME) [10:13:00] Wahid Bhimji John / Chris - I think we are talking about FAX redirection. For CMS it is (so they say) production [10:13:08] Ewan Mac Mahon Bearing in mind that the xrootd on Chris' StoRM is a bit different to the DPM based one. [10:13:13] Wahid Bhimji for ATLAS FAX it is production but a bit broken [10:13:29] Wahid Bhimji I mean it is actually saving real jobs from failing [10:13:48] Wahid Bhimji but involves too much intervention / restarting etc to be the same maturity as the rest of our infrastructure [10:14:38] Wahid Bhimji I think its on-track though to be useful at some point. The local xrootd service on DPM ins indeed stable I think [10:15:09] Sam Skipsey Reading up, I would say that the issue I have with xroot is that most of the really interesting things I'd like out of it are targeted for 4.x, which keeps being delayed. [10:15:30] Sam Skipsey But, yes, it's stable on DPM for the current 3.x releases. [10:16:45] Sam Skipsey Frankly, I'd be surprised if OpenAFS went IPv6. [10:16:52] Martin Bly There was a very distict lack of appetite! [10:19:11] Sam Skipsey ...that was an interesting turn of phrase, Chris [10:20:32] Martin Bly yes. [10:22:54] Ewan Mac Mahon You're really quiet Sam. [10:23:24] Sam Skipsey so, my technical question waS: [10:23:38] Sam Skipsey Two ways to do this with dCache: 1) make them really directly writable to [10:23:48] Sam Skipsey 2) make them immutable files with diffs associated with them [10:24:07] Sam Skipsey (which gives better write performance on a filesystem optimised for immutable files in the first place) [10:24:21] Sam Skipsey (then closing the file rationalises the diffs into a final copy) [10:24:52] Ewan Mac Mahon Sounds like it might be a sort of heirarchical thing where you just keep it on conventional NFS-style storage to start with, then 'archive' it to real dcache. [10:25:04] Ewan Mac Mahon Like kaving disk fronted tape. [10:25:10] Ewan Mac Mahon s/k/h/ [10:25:41] Ewan Mac Mahon Oh. Sounds /exactly/ like that then. [10:27:05] Christopher Walker Martin you have gone quite faint [10:27:15] Ewan Mac Mahon Now Martin's really quiet. [10:29:48] Brian Davies we lost you martin [10:30:06] Ewan Mac Mahon We should just have these meetings in IRC. [10:31:09] Christopher Walker Quiet again Martin - please speak up. [10:31:18] Christopher Walker Better thanks. [10:32:09] Ewan Mac Mahon That seems backwards. I care a lot more if the drive in my desktop goes pop than if one in a RAID array does. [10:34:10] Ewan Mac Mahon Is it worth going over anything about what kit the T1 is looking to be buying, while we're here? [10:35:40] Christopher Walker Can you speak up again please. [10:35:53] Jens Jensen That\s better.. [10:36:19] Ewan Mac Mahon That's bonded gig for the worker nodes? [10:36:24] Ewan Mac Mahon Or storage? [10:36:33] Ewan Mac Mahon Oh. That's new. [10:37:53] Wahid Bhimji same as last year is what [10:38:05] Wahid Bhimji ok - [10:38:13] Wahid Bhimji 4TB are nearline SAS drives? [10:38:25] Wahid Bhimji On the dell site they only had 4TB SATA [10:38:31] Wahid Bhimji I thought [10:39:01] Wahid Bhimji ok [10:40:43] Sam Skipsey No hot spares? [10:40:49] Christopher Walker One partition? [10:41:27] Sam Skipsey IIRC, the LSI controllers can't do 1 partition with 34 disks in [10:41:34] Ewan Mac Mahon Can you do that? I didn't think the adaptec's supported arrays over 32 drives. [10:45:04] Christopher Walker 2*18 matches 2^n+2 better than 30+2 [10:45:19] Chris Brew got to go [10:45:22] Christopher Walker How much difference this makes in practice would be interesting to know. [10:45:59] Chris Brew Upgraded dCache yesterday, few niggles but think I've worked it out now. [10:46:07] Jens Jensen Thanks Chris [10:50:58] Ewan Mac Mahon OK, you're turning your old disk servers over rather more rapidly than I am then. [10:51:14] Ewan Mac Mahon Which is more or less to say that you're doing it at all, actually. [10:54:06] Sam Skipsey The rumour we heard was the opposite, Wahid. [10:54:20] Sam Skipsey (Or rather: that it's up to your local Dell people to make deals with you) [10:54:57] Ewan Mac Mahon If there is anything, particularly internal Dell people's names, that we can point our Dell reps at, that could be very useful. [10:55:17] Ewan Mac Mahon Dell's easily big enough that one bit doesn't always know what another is doing. [10:55:39] Wahid Bhimji Borut told me that Andy Lankford said something in some ATLAS ICB thinkg [10:55:52] Wahid Bhimji I couldn't find any evidence - so emailed them now ... [10:57:15] Ewan Mac Mahon Indeed; if we could give the vendors something simply to run like HS06 that could be quite useful. [10:57:31] Ewan Mac Mahon We sortof do need a standard disk-spec-06, as it were. [10:57:45] Wahid Bhimji we've used those RAL tests before [10:57:53] Sam Skipsey We have, too [10:58:07] Wahid Bhimji someone circulated them a while back - useful for burnin apart from anyything else [10:58:34] Ewan Mac Mahon This is different to the 'ral disk thrashing scripts' though, I assume? [10:58:47] Wahid Bhimji I was thinking it was the same - maybe not [10:58:48] Ewan Mac Mahon 'Sucess' for those is when they ran for long enough without killing the machine. [10:58:52] Sam Skipsey We're going to have to go now, for our tech meeting. Sorry [10:59:29] Wahid Bhimji ah sorry for conflating - yeah I havent' seen the benchmark one then [11:00:57] Wahid Bhimji There is a WLCG Benchmarking Working group [11:01:21] Wahid Bhimji Data wise - now in my wlcg data group - but its trying to make a more accurate thing that iozone [11:01:28] Wahid Bhimji but that makes it less simple or soon [11:01:30] Ewan Mac Mahon There is an extent to which the requirement varies with the size of the machine. [11:01:42] Ewan Mac Mahon And the size of the cluster. [11:01:42] Wahid Bhimji I would think a simple iozone thing was enough for this purpose [11:02:15] Ewan Mac Mahon But something that would compare one machine to another on a moderately HEPpy workload could still be useful. [11:02:35] Christopher Walker Industry standard benchmarks giving information on various IO patterns would be interesting. [11:02:59] Ewan Mac Mahon The same's true of HS06 too, of course - someone selling machines with half the performance is still interesting if they're proposing to sell you twice as many. [11:03:02] Christopher Walker Ideally this wouldn't be something they run just for us. [11:03:47] Ewan Mac Mahon Indeed; an easy first step would be if we can wiki page some figures for our current kit. [11:04:01] Ewan Mac Mahon And yes, I just used 'wiki page' as a verb. [11:04:43] Wahid Bhimji Thanks ! [11:05:00] Wahid Bhimji I better go too now.. [11:05:14] Christopher Walker Very useful. [11:05:19] Wahid Bhimji Bye - see you next weelk - thanks Maritin - really usefful [11:05:41] Ewan Mac Mahon Indeed; thanks for all that. [11:05:44] Martin Bly Bye. [11:05:49] Matt Doidge Thanks! [11:06:30] Ewan Mac Mahon Maybe if Jens just rings you Chris?