Attending: Brian, Jens, John B, John H, Winnie, Sam, Wahid, Gareth, Elena, Raja, Matt D, Chris B, David, Steve, Raul, Govind, Ewan Apologies: Tom 0. Operational blog posts as usual Discussion on list with gsoap errors reported in transfers - also seen in CASTOR, although they may not be the same - they don't seem to be associated with failed transfers, though. Sometimes a particular VO, may be associated with FTS? Also seen at Lancaster. 1. There are some potentially interesting things in next week's pre-GDB http://indico.cern.ch/event/319819/ 2. Round table (postponed from last week) of storage related things. Edinburgh: puppet on hold, but intend to get back to it. Also need to look at interfaces and namespace/space reporting. Liverpool: "if it ain't broke, don't fix it." WebDAV died, however, and refused to restart. Could it be something with an update to HTTP which had failed to update correctly? Glasgow: CEPH plugins for GFAL2 testing imminent (could turn into a blog post but also need to turn into a CHEP talk). Also work on Puppet with Edinburgh. Brunel: about to upgrade to 1.8.9. Sent question to list: tracing files is much better supported in 1.8.9, Fabricio mining log data. LHCb: nothing immediate, but LHCb will need to start some testing. Bristol: everything OK. Lancaster: there was a problem with the DNS: a change of DNS server was not picked up by DPM, but there shouldn't be any special caching in DPM itself. Maybe worth restarting something. Several people had reported similar problems. Also some systems still on SL5. Cambridge: Also on 1.8.8; ready to move to 1.8.9 but no specific timeframe. RHUL: Nothing to report. Still on 1.8.8. Priority is getting services outside of firewall. Oxford: deliberately remaining on stable 1.8.8. Growing backlog of stuff, including SL5 and YAIM, so at some point will save data and reinstall to clear all the issues. Maybe a few days of downtime, not hugely concerned with scheduling otherwise. Sheffield: disk servers on 1.8.8. Some badly behaving disk servers were taken offline, and some had to be downgraded. Are there any instructions on the upgrade to 1.8.9? We used to have some general installation and upgrade experiences recorded in the wiki. We could try to resurrect this, but for now it's probably simplest and quickest for Elena to ask on the list when particular problems pop up. RALPP: Upgraded to dCache 2.10 last week. Mostly went smoothly, but the SRM didn't start, possibly due to database oddities, duplicate entries. All storage nodes on SL6 - used the new servers to park the data while upgrading the old ones. Got it all on puppet; expecting to upload puppet recipes to github. Interested in whether CEPH can provide a backend to dCache? RAL(Brian): investigating filesystem draining rates and a problem with GFAL for SNO+, a GFAL test will become critical so it may be advantageous for it to work. Also looking at empty directories in CASTOR. A namespace dump takes 10 days. 3. AOB Next week! An audience with LIGO. Oops, except I forgot I will probably be at cloud expo. Hm. Maybe you can have a chat with LIGO without me. LIGO will be using the T1 and SouthGrid initially (although I saw some messages on tb-support which seemed to suggest NorthGrid?) wahid: (04/03/2015 10:06:21) ssl.conf file in http t dies a lot anyway 1.8.8. version for sure and I think still with 18.9 not restarting - there is the ssl.conf issue - but not aware of another issue so if it happens again the error or something would be interesting to know Matt Doidge: (10:11 AM) There's still a lot of crappy noise in the 1.8.9 logs John Bland: (10:11 AM) the error seems to be something about "Error string not specified yet: mod_gridsite: mod_ssl_with_insecure_reneg = 1" Jens Jensen: (10:11 AM) RAL is using elasticsearch to search and summarise the CASTOR logs John Bland: (10:11 AM) which may well be the ssl thing, looks like webdav was down for longer than I thought Matt Doidge: (10:12 AM) The standard logrotate for dmlite doesn't compress John Hill: (10:12 AM) John - That error message looks familiar I stopped the problem recurring by creating an empty ssl.conf wahid: (10:13 AM) so the one thing is putting cat /etc/dmlite.conf LoadPlugin plugin_config /usr/lib64/dmlite/plugin_config.so LogLevel 1 Include /etc/dmlite.conf.d/*.conf Paige Winslowe Lacesso: (10:13 AM) no update, al is well! wahid: (10:13 AM) the other is in /etc/rsyslog.conf $IncludeConfig /etc/rsyslog.d/*.conf John Bland: (10:14 AM) john: yes, the messages sound like your ssl problem, I just discounted it at first as I was sure I'd tested webdav recently but it must have been something else. We have no ssl.conf after yaiming. wahid: (10:14 AM) then I think they provide a file /etc/rsyslog.d/20-log-dmlite.conf ## send all the dmlite log message to dmlite.log $RepeatedMsgReduction off # log every message :msg,contains," dmlite " -/var/log/dmlite/dmlite.log $RepeatedMsgReduction on I think matt is correct though - this isn't bein log rotated - so need to add this conf too John Hill: (10:16 AM) yaim renames ssl.conf to ssl.conf.yaimsave John Bland: (10:17 AM) john: yep, that's what we've got raul: (10:17 AM) wahid: thanks. I might email you for help. John Hill: (10:18 AM) Which means on the next http update a new ssl.conf is created wahid: (10:18 AM) peeople might start paying more attentin once theres a 'task force' Ewan Mac Mahon: (10:18 AM) I think that's normal We can hear you. Possibly not the othr way around though, wahid: (10:18 AM) (http task force that was) John Bland: (10:19 AM) john: yes, I might add a local puppet rule for the headnode to ensure ssl.conf is absent John Hill: (10:20 AM) It also applies to the pool nodes - I've had the same issue there wahid: (10:22 AM) matt on the logrotate of dmlite log do you have a /etc/logrotate.d/dmlite mine seems to have strange content John Bland: (10:24 AM) john: yes, our pool nodes saw the same as well, but they were reyaimed a while ago so we never noticed The moral of the story is we/wlcg need to monitor webdav better. wahid: (10:27 AM) hi - it was this wiki that we were writing general upgrade experiences https://www.gridpp.ac.uk/wiki/DPMUpgradeTips anyone is welcome to add more recent ones it is currently useless for 1.8.9 Chris Brew: (10:28 AM) Oh, we alos got NFS4.1 working so we have file:// access as well wahid: (10:30 AM) there is no particular issue