Attn: Jens (chair+mins), John H, John B, Matt, Winnie, Daniel, Govind, Marcus, Gareth, David, Steve, Ewan, Sam Apologies: Brian 0. Operational blog posts Reminder to people to blog when they do something Interesting(tm) 1. Review of (storage/data) issues for the WLCG workshop and the ATLAS Jamboree Nothing new/else suggested. For ATLAS at least it would be worth following up on the catalogue testing (right format, right location, instrictions for all SEs? - probably everything is fine or close enough at this point) plus the "T2C testing" 2. And discussion of current issues such as T2C testing, ATLAS space tokens, non-LHC VOs and other loose ends - may be quick if there is no news. Maybe the state of the docs in the wiki. (These issues were raised during the round table discussion (q.v.)) 3. I think we are overdue for a quick round table of storage and data related issues and interesting things. I'd quite like to do them slightly more frequently - if nothing else to give people who don't often say much a chance to say something - but we don't always get around to it. John H: upgrade to DPM 1.8.10 on SL6 - probably taking a "big bang" approach - get as much as possible ready, announce the downtime, do the work and comeback up. Intending to use puppet - but nontrivial - are there updated instructions? Instructions on the wiki are somewhat out of date, but there is of course the upstream documentation (from DPM). Using a smallish pool as testbed prior to the upgrade. Marcus: needs to upgrade nodes anyway, so could take a look at documentation. Steve points out that we should just delete (well, archive) it if it is out of date and no longer needed (ie all the necessary docs are available from DPM); if we need our own supplementary information then we need to update it. Also looking at options for extension of storage e.g. with iSCSI on SL6 or SL7. Does anyone have relevant experience? Related to that, maybe it's time we had a technology review John B: same as John H: need a new head node in particular. Using CentOS7, then migrating. Steve: same as John B. Matt: Same as Johns H and B, and Steve. New head node needed. There is extra space available on uni cluster, which could perhaps be made available via DPM. Also interested in New and Exciting(tm) stuff, e.g. SRM-less storage. Could be volunteered to try out new stuff... [that's the spirit!] Winnie: Unstable storage [also discussed last week under op iss.] Would be nice if Bristol's support could do some testing? Daniel: Standalone xroot supporting multiple VOs on single security configuration. Aiming to turn into production, on Lustre. Everyone else: "Ooo! Interesting!" - turn into blog post? Govind: struggling to drain old storage, as it is very slow - only 1/4 TB/day. Tried the dmlite drain feature which managed a more respectable 2-3TB in a few hours. Sam points out they need passwordless ssh from the head nodes to the pool nodes in order to speed up the draining - would be much faster than RFIO. Problems on other nodes - dpmck -n checks whole filesystem, not just selected nodes. No way currently to check only selected nodes. Using -n (dry-run) without stopping DPM would be OK but might show spurious problems which could be resolved manually by inspecting timestamps, or by running the script again later and comparing the outputs. Ewan points out it is expected to run in ~< 1hr; the dry run would give an indication of the run time since the scanning is the slow bit; the fixing would be relatively quick. Ewan: DPM on SL7; currently have a single node install using puppet. Amusingly the WLCG repo name was changed by CERN, so needs update. Also pondered the /dpm path name which can be removed with a single setting. Aim to pare down config to a single disk pool, then integrate it with DPM. [It is worth documenting this sort of stuff, or sharing it, since as we have seen with the other sites there is lots of interest in doing DPM with puppet.] Gareth: Discussions at Glasgow: getting new storage would mean an opportunity to retire old hardware and install HDFS which should work also with the older non-retired hardware. dCache?! Sam: mostly spent time preparing for the WLCG workshop - gathering requirements from sites, feedback from UK and FR - and teaching... David: same as Gareth... Jens: working in Copious Spare Time(tm) on GLUE2 for CASTOR with Rob Appleyard; lots of EUDAT stuff; occasional work on Indigo (think iRODS++) when there is a moment; DiRAC. As Jens is away next week, and it is the week of the WLCG workshop (vidyo link already available!), we are expecting to cancel next week's meeting. However, the meeting slot *is* available, and if people should feel the need or desire to connect and have a chat, that is absolutely fine. $ For all you NVM fans, you might be interested in having a look at the presentations (they're all online) from the recent SNIA NVM summit: http://www.snia.org/events/non-volatile-memory-nvm-summit Even if new technologies make you go "pshah!" (I know, unlikely), you will still find fascinating stuff like performance and benchmarking methodologies and musings on data efficiencies and words of wisdom in general. $$. BTW. We're hiring! If you want to be a castorologist or you know someone who might want to be a castorologist http://www.topcareer.jobs/Vacancy/irc215357_6221.aspx John Bland: (27/01/2016 09:59:47) good moaning Matt Doidge: (09:59 AM) I was just pissing by... John Hill: (10:01 AM) Sorry?? Matt Doidge: (10:01 AM) Continuing on the the 'allo 'allo skit. John started it! Ewan Mac Mahon: (10:13 AM) That was a bit too breaky-uppy. I'm pretty sure I missed critical bits. As a general principle, I have no idea why anyone would run iSCSI unless someone forced them though. Daniel Peter Traynor: (10:19 AM) ok Samuel Cadellin Skipsey: (10:23 AM) https://indico.cern.ch/event/432642/attachments/1201318/1748385/dpmdbck2015.pdf Govind: (10:27 AM) can you hear me.. looks like my sound gone.. Ewan Mac Mahon: (10:28 AM) Yes, you can't hear us. Anyway, I shall stop now. I think I missed about the first 5mins. John Bland: (10:34 AM) our vague plan is (once we know what we're doing) install new pool nodes C7/puppet, migrate headnode to C7/puppet then maybe a rolling drain/reinstall on remaining pool nodes this assumes all the different os/dpm versions interoperating Ewan Mac Mahon: (10:35 AM) I suspect that if anything, it might be a matter of configuring the new version to turn off the new stuff. But that's total gueswork. Of course, a pool node can offer clever services without bothering anyone if the head node simply ignores them (e.g. imagine they had web servers, but the head node never directs HTTP requests to them, you'd be fine) What particularly concerns me is the new gridftp redirection magic. Samuel Cadellin Skipsey: (10:37 AM) Generally, yeah, new stuff won't work. (Gridftp redirection, for example) Some of the new things fall back to old mechanisms (less efficiently), and some just break stuff. Ewan Mac Mahon: (10:38 AM) The interesting point is whether the new pool nodes will repond to old style requests as directed to them by the old style head node. But we'll see soon enough. Samuel Cadellin Skipsey: (10:38 AM) Gridftp redirection, IIRC, falls back to an inefficient mechanism which makes things slower. Ewan Mac Mahon: (10:39 AM) Slower than the new hotness, or slower than it used to be? Samuel Cadellin Skipsey: (10:39 AM) Slower than it used to be. The fallback I'm aware of essentially picks a random gridftp server on a random pool node to use. Although, I'm not actually sure if that fallback works for *pool node* compatibility. Ewan Mac Mahon: (10:40 AM) Hmm. Can an old pool node get a file from another pool node? Do they rfio tunnel them?