Duncan Rand Shaun de Witt Stephen Jones Wahid Bhimji Sam Skipsey Govind Songara Mark Norman David Crooks Matt Doidge John Bland John Hill Brian Davies Jeremy Coles David Colling Jens Jensen Ewan Mac Mahon Pete Gronbech 1. T1 alternatives to CASTOR. Must cope with hot files Must not perform worse than CASTOR (with transfer manager) CASTOR highly optimised for a large number of small servers. Block based storage solution may be better. ... Would like to have a mountable filesystem. Action to get criteria onto the GridPP wiki. Candidates selected for evaluation, some partly because we already have instances running. dCache, Lustre, OrangeFS, Hadoop, CEPH. BeStMan as SRM. BeStMan will not be supported long term... will it be open source? StoRM is also an option, but getting development effort is non-trivial. Hot files - how much more disk is needed for that? Not necessarily double, as not all files are equally hot... so HDFS will not be suitable. HDFS RAID: checksumming in HDFS http://wiki.apache.org/hadoop/HDFS-RAID Users want NFS4 - or at least a mountable filesystem? FUSE with DPM? FUSE didn't work with CASTOR because of the nameserver. DPM may have the same problem. What is the difference between a large T2 and a T1 (for disk only storage)? Would things that score highly for the T1 be useful for the T2s? DPM is sort of ruled out for the T1 perspective - CMS (at IC) complain of things taking long to sort out and CMS would not be keen to see it at T1 either. Would be useful for the T1 evaluation to be more visible to the storage group, and maybe also for the T2s to see how much could be gained for the T2s from the T1 evaluation (eg how to support hot files). Either on the mailing list or on the wiki, and Shaun could attend semi-regularly. Conversely, this group could input into this evaluation. Could we have a document ready, informed by testing and other studies. Need to understand the risks to the project, because there are many uncertainties. Also need to feed back into the process, to state our position with DPM. There may be risks associated with changing but there may also be opportunities? Prior knowledge (and the devil you know?) counts for something as well. Are we planning for the future also with the changing experiment requirements in mind? E.g. if SRM in the future were less important. But then we need to know what we're aiming for, so we can put this into the evaluations. At the next GridPP meeting, this is one of the things that will be presented. This is definitely in the scope for the group. Would RAL consider DM-Lite as a possible SRM front end? How much effort do we need to put into SRM? E.g. if the use of SRM will be phased out? You'd get familiar tools, and also a Lustre back end. What is the experiment input into the evaluation? There is input from ATLAS. CERN wanted some support by the middle of Sept., so they could go to the EGI TF and look for further support. Would community support work? Position of other countries unknown. [10:01:54] Duncan Rand joined [10:02:19] Matt Doidge joined [10:02:21] John Bland joined [10:02:37] John Hill joined [10:03:49] Brian Davies joined [10:04:00] Jeremy Coles joined [10:04:24] David Colling joined [10:08:17] Wahid Bhimji Ceph [10:08:57] Wahid Bhimji bestman ! are you going to take over long term support of it [10:10:05] Stephen Jones Darth Vader is back [10:11:03] Duncan Rand "BestMan support has transitioned from active development to maintenance only, and from the original development team to the OSG." [10:14:24] Wahid Bhimji seems quite expetimental [10:15:22] Sam Skipsey It's not that experimental: it's just RAID5/6 applied to block distribution rather than block striping. (But, yes, it's a new feature in HDFS) [10:15:55] Sam Skipsey (but ZFS basically already does this, and it's on the btrfs feature plan as well as ceph's) [10:17:57] Brian Davies HDFS,Lustre and dCache are already in production with in WLCG, [10:18:10] Duncan Rand dcache also i think (namespace) [10:23:18] Duncan Rand does ceph support posix acl's? [10:23:49] Duncan Rand sorry not acl's [10:24:39] Wahid Bhimji yes indeed the posix acls are an issue for cern [10:24:49] Wahid Bhimji its usually rfio I think? [10:26:02] Ewan Mac Mahon joined [10:27:38] Sam Skipsey (a quick run through of the ceph documentation merely repeats several times that the kernel module provides a "fully POSIX filesystem") [10:28:16] Duncan Rand what I meant was the extra posix features that storm needs to operate (can't rember the term) [10:31:10] Sam Skipsey I know, Duncan. The documentation doesn't seem to include specific comments about support for the extended POSIX attributes. [10:35:53] Brian Davies left [10:36:11] Brian Davies joined [10:36:25] Brian Davies apologies, evo crashe don me [10:36:50] Pete Gronbech joined [10:37:14] Wahid Bhimji ok - no probs thanks [10:37:16] David Colling left [10:38:21] Ewan Mac Mahon I think we need to encourage the experiments to have an answer for the question "How much longer are you going to want SRM?" if we're pinning our hopes on status updates at the GridPP meeting. [10:39:47] Duncan Rand are there any atlas bestman sites apart from eos? [10:40:23] Brian Davies yes in th US [10:40:32] Duncan Rand any idea which sites? [10:41:17] Brian Davies thttp://www2.ph.ed.ac.uk/~wbhimji/SRMMonitoring/ [10:41:34] Brian Davies put bestman into the veersion filter text [10:42:07] Brian Davies Estonia have it so already in the CMS_UK cloud1 [10:42:36] Sam Skipsey https://svnweb.cern.ch/trac/lcgdm/wiki/Dpm/Dev/Dmlite - the Dmlite page on the lcgdm trac [10:43:09] Jeremy Coles Thanks Shaun. [10:43:24] Shaun de Witt n/p [10:43:28] Shaun de Witt always fun [10:47:21] Duncan Rand 3 FTE from 200 sites doesn't sound much [10:49:29] Duncan Rand tax each site [10:49:37] Shaun de Witt [10:51:28] Ewan Mac Mahon That's not necessarily a bad model. People will stump up thousands for proprietary software, and if (say) 20 sites chip in 2k each, that's 40k, which is a good chunk of an FTE. [10:51:40] Ewan Mac Mahon And a couple of grand each isn't a lot. [10:51:41] John Hill left [10:51:43] Shaun de Witt left [10:51:43] Jeremy Coles left [10:51:46] Matt Doidge left [10:51:48] John Bland left [10:51:50] Stephen Jones left [10:52:01] Ewan Mac Mahon left [10:52:02] Wahid Bhimji left