Jens (chair+mins), John B, John H, Matt D, Winnie, Rob, Robert, Steve, Wahid, Raul, Raja, Elena, Brian, Tom, Sam, Ewan, Jeremy (briefly) Special guest star: Paul Hopkins (Cardiff/LIGO) Apologies: Jeremy 0. Operational Blog Posts Wahid is leaving!! - next time will be his last time!! - he's going to NERSC!! DPM 1.8.9 - could write up some lessons learned. E.g. new head node set up with puppet - what is needed if you're not using puppet. Is symlinking libraries really necessary? Regarding backups, there are two databases, cns_db and dpm_db, the former being the namespace and the latter being mostly static stuff. Not in general necessary to back up old requests, so these tables could be omitted. Could one do incremental backups of the database? It could be possible using binary logs. Should back up at least once a day. Migration could use a checkpoint restore (i.e. restore the most recent backup and add in the incrementals). Brunel - GridFTP segfaults. Node died, reboot with IPMI. After upgrade, node contacts MySQL - fixed. Currently at 40TB running out of space. And xroot wasn't configured properly with puppet - used the wrong library - feedback from David Smith. Could we run DPM in Docker? Raul is willin', has testbed. 1. Jeremy did listen in to the pre-GDB, will give a report next week. 2. An audience with LIGO (Paul Hopkins, Cardiff) VO is about 10^3 people, studying (or trying to detect?) gravitional waves. Six compute clusters in US, one in DE, all custom built for LIGO. Also using a HTCondor cluster in Cardiff with 700 Westmere cores and acceptance testing 1400 Haswells. LIGO (in UK) is STFC funded. Investigating using a data analysis pipeline on GridPP resources. Already talked to Catalin from RALT1 about CVMFS, but need to install client. We should set up a VO. There is no EGI VO; there used to be an OSG one but it is now not maintained. Sites in UK would be (could be) GLA, BHM, OXF, SHF. There are already gravitywaveologists in Glasgow but they are not involved with the LIGO activity, yet there may be benefits of bringing them together. Also note that IT and science working together (many of us IT folks have science backgrounds anyway, so not too much of a stretch) How do people do LIGO stuff? They log into a cluster and submit a job. Working with the grid - say, with the DIRAC server at IC - they would submit "to the grid" instead. Currently they have 8TB at Cardiff of raw data, but new runs are expected, initially with 10TB/yr, maybe ramping up to 100TB/yr. A single job will need data on the order of gigabytes. Some will be long running (~week), some will be short. Suggestion that we start with data at one site. Also, some people run numerical simulations of black holes which also generate output data. We should set up a VO. LIGO "own" www.ligo.org, so vo.ligo.org would be a good name for a new VO. Also good opportunity to debug our to-be-updated instructions for setting up a new VO. Paul should not wait for the VO to be set up but could get started using the GridPP VO. 3. AOB Brian reports problems with FTS transfers from CERN to RAL-tape for ATLAS and speculates whether there is a generic problem of CERN-to-X with FTS for ATLAS due to a below-baseline version of GFAL being used (2.6.8, where baseline is 2.7.8). Alternatively whether there are other problems of X-to-RAL. Recommendation to check whether GFAL components are up to date, particularly since the SAM tests will switch to GFAL. There are known issues with CASTOR which are GFAL related. Paige Winslowe Lacesso: (11/03/2015 10:03:05) WILL MISS**N YOU WAHID!!! wahid: (10:04 AM) yes we are talking https://www.gridpp.ac.uk/w/index.php?title=DPMUpgradeTips ggus at https://ggus.eu/index.php?mode=ticket_info&ticket_id=112272 raul: (10:08 AM) Very sorry that Wahid is leaving. Sam and Wahid are my preferred source of knowledge/help in DPM. wahid: (10:11 AM) does anyone actually have a problem backing up the whole thing though ? Ewan Mac Mahon: (10:12 AM) Back up the binary logs? Though I'm not quote sure why you'd bother - is it actually at all troublesome to back up the whole thing? wahid: (10:12 AM) --incremental -maybe - but anyway is this an academic discussion Ewan Mac Mahon: (10:14 AM) There used to be a problem with the dump scripts locking the database for the duration, but I tihnk that went away with the move to innodb tables (?) I think this might be a case for 'everyone post their backup scripts to the list/wiki' wahid: (10:15 AM) https://www.gridpp.ac.uk/wiki/Performance_and_Tuning its on there I think Ewan Mac Mahon: (10:16 AM) The od thing is I don't think I have binary logs on. Hmm. Samuel Cadellin Skipsey: (10:16 AM) Ewan: I shall check, then. Innodb tables dump faster anyway, though. John Bland: (10:16 AM) ewan: one line essentially, mysqldump --all-databases --single-transaction, which I use succesfully for other dbs as well Ewan Mac Mahon: (10:19 AM) Ours is basically the same - IIRC the '--single-transaction' is the critical don't-lock-the-entire-db bit. Samuel Cadellin Skipsey: (10:19 AM) Yes, iit is. Ewan Mac Mahon: (10:20 AM) We've got a bit of boilerplate around it to name the dumps after the day of the week, so after seven days it starts overwriting the old one. Also our tiny little shell script is aparantly a tiny little perl script, which is a bit surprising. But apparantly we cribbed it from Glasgow. Samuel Cadellin Skipsey: (10:21 AM) Ah, so with --single-transaction, mysql already does a binary log dump at that point (even without binary logs explicitly enabled). So, yeah, you're fine, Ewan. (We turned them on as it also helped with some other database operations we wanted to do, and avoided locking for them) Ewan Mac Mahon: (10:28 AM) Usual PSA: Catalin uses male pronouns. Elena Korolkova: (10:30 AM) We can support LIGO in Sheffield. I'll talk to our LIGO guy. Tom Whyntie: (10:34 AM) There's an overview here: https://www.gridpp.ac.uk/w/images/5/5f/Twhyntie_DRN000024-v1-0_DIRAC-CVMFS-CERNVM_mk01.jpg Ewan Mac Mahon: (10:35 AM) Is there actually a ligo VO atm, or do we need to set oneup? Jens Jensen: (10:35 AM) Sounds like we need to set one up (with EGI) Ewan Mac Mahon: (10:35 AM) I think we might be skipping the regional incubator phase here. (and I think we should skip the regional incubator phase in this case) So, who wants to be the one place? I'm thinking either the Tier 1 or Glasgow, given who's involved. But the Tier one has better networking. Define 'a lot' It's always important to use numbers since we tend to find that our idea of 'big' can be somewhat divergent from other people's. So we need: a VO, stuff in the cvmfs, space at the tier 1, and CPU-only resources at the Tier 2s. Paul Hopkins: (10:45 AM) paul.hopkins@astro.cf.ac.uk Tom Whyntie: (10:45 AM) https://www.gridpp.ac.uk/wiki/Quick_Guide_to_Dirac https://www.gridpp.ac.uk/wiki/DIRAC_new_user_checklist Ewan Mac Mahon: (10:46 AM) How long does it take to set up a VO? an EGI one. We should ping Jeremy, I think he's most up to date. Unless Tom is? Elena Korolkova: (10:48 AM) I think the name should be chosen carefully. I mean we don't want to change ligo to ligo.org as we did for t2k. Ewan Mac Mahon: (10:50 AM) That's true, we'll need a DNS name. Ewan, Jens, Sam, I think. And Elena. Er. Jeremy? wahid: (10:50 AM) was he ever here I mean today Ewan Mac Mahon: (10:51 AM) That was going a bit existentialist for a moment there. Tom Whyntie: (10:53 AM) Thanks Paul. Steve Jones: (10:54 AM) Note to Tom: sounds like a good prototyping process is being made. Good work. Pls document so it can be reused with other VOs. With other _new_ VOs, I mean.