Mark Norman
Duncan Rand
Ewan Mac Mahon
Wahid Bhimji
John Hill
Gareth Roy
John Bland
David Crooks
Sam Skipsey
Stephen Jones
Brian Davies
Chris Walker
Jens Jensen (chairing)
Matt Doidge
Someone on phone bridge?


1. Roundup of DPM issues.  Do we need to look at the documentation?
   When I say "documentation," I often meen "source code"...

   Documentation is out of date, and there's lots of new stuff which
   says how to install it.  If we got more involved, we may have to
   read more source code and/or do more testing, but could also update
   the "actual" documentation.

   Publishing negative values in BDII.  The BDII output is using
   dpm-listspaces which is a python script.  Maybe we need to look at
   the database - maybe there's something wrong in the database?  The
   site in question (reminded by Daniela) seems to not have eliminated
   the problem yet (previously we had tebi/tera).  Could Ricardo help
   resolve it...?

   There should be checks in listspaces, which should try to catch
   negative values and set them to zero.  Maybe only does one of free?

   In InstalledCapacity, total = free + used, so we sometimes get
   free=total-used but used is not straightforward to measure.

   Govind - send output of dpm-listspaces to list.

   Cambridge issue seems to have been resolved - DPM can segfault if
   files are deleted under certain obscure circs, but 1.8.3 has fixed
   the bug (and John has 1.8.3 running).  It's not the threading
   problem?  Sam will send a mail to the list, with a list of things
   that John should send to the list.

2. FTS 3 for testing - hooray!  This is excellent news - testing plans?
   Presumably we can feed stuff back into the process.

   How does the small file globbing transfer work, does it work with
   SRM or only with GridFTP?  Andrew Lahiff looked into the features;
   CERN firewalls blocked testing a bit, though.  Only supports ATLAS,
   LHCb, and CMS.

   There are possibilities for changing the deployment models, because
   it is no longer channel based.  We should continue to assume that
   T1 will run FTS.  Prototype 1 has no authentication, or could you
   just not renew delegated proxies?

   Brian to talk to Catalin and Andrew L to get one set up, even a
   prototype.

3. How good is "replica" really?  How to measure it...?  Is it site
   dependent?

   Are there any statistical methods useful to decide whether a file
   is likely to go missing?  Other faults may skew the results, eg
   zero length transfers, or timeouts.  ATLAS keep stats on which
   files have gone missing.  Do we need a plan to report this
   information?  Chris and Wahid suggest that we can collect this
   without too much hassle.

   Maybe need to look both at ingest and the storage, if the ingest
   failure is at the T2.

   Checking for files which have become corrupted would help the
   storage stats, but not files which have gone missing while stored.

   Checksumming a whole filesystem would take a long time, so instead
   we should maybe check files which were written six months ago
   (say), or similar?

   Interesting research on how to preserve files and recover from
   errors, eg the loss of a certain number of drives.

   If we find a corrupted file, it should be useful to compare to the
   orginal?

4. Back to stress testing SEs (postponed from last week, but might end
   up getting postponed again - how urgent is it?)

   Benchmarking?

5. AOB

   Brian - Default SE value published - are non-LHC VOs using them?
   Sam believes that lcg-cp (or used to be environment variables)?

   Sam - John is applying the 


[11:04:51] John Bland joined
[11:04:51] David Crooks joined
[11:04:51] Gareth Roy joined
[11:04:53] Mark Norman joined
[11:04:53] Duncan Rand joined
[11:04:56] Ewan Mac Mahon joined
[11:04:56] Wahid Bhimji joined
[11:04:56] John Hill joined
[11:04:57] Sam Skipsey joined
[11:04:58] Stephen Jones joined
[11:06:48] Brian Davies joined
[11:11:08] Christopher Walker joined
[11:11:13] Christopher Walker left
[11:11:13] Matt Doidge joined
[11:11:17] Christopher Walker joined
[11:11:34] Christopher Walker left
[11:11:37] Christopher Walker joined
[11:12:51] Phone Bridge joined
[11:13:02] Ewan Mac Mahon Yes, but, philosophically speaking, the sites are right and everyone else is wrong. So there :-P
[11:13:15] Ewan Mac Mahon *prthhrpp*
[11:13:19] Wahid Bhimji we've been around this before. 
[11:15:19] Stephen Jones It's a case of GIGO, by the sound of it.
[11:15:54] Christopher Walker Wasn't there a way to weight storage - and if you gave it a weight of 0, no new files would go there. 
[11:15:57] Ewan Mac Mahon Someone like Sam and/or Wahid need to get a login on the box and poke at it.
[11:16:14] Ewan Mac Mahon anything else is just pointlessly faffy.
[11:16:42] Ewan Mac Mahon Chris: I think so, but that's only in suitably recent DPM.
[11:17:23] Wahid Bhimji I have to leave at 10.30 btw...
[11:17:24] Sam Skipsey Indeed, the 1.8.x series allows that, Chris.
[11:20:26] Ewan Mac Mahon And by human being we mean Santanu.
[11:20:33] Ewan Mac Mahon Worth checking, I think.
[11:21:36] Wahid Bhimji ps the thing we are talking about is covered in the relaese notes here:
[11:21:36] Wahid Bhimji https://svnweb.cern.ch/trac/lcgdm/blog/official-release-lcgdm-183
[11:21:43] Wahid Bhimji export GLOBUS_THREAD_MODEL="pthread"

[11:22:01] Wahid Bhimji in the sysconfig for dpm and dpndaemon
[11:24:45] David Crooks Sorry, I'm going to have to drop out a bit early today.
[11:24:49] David Crooks left
[11:26:41] Wahid Bhimji please do 
[11:29:50] Wahid Bhimji Can we schedule a transfer with the CERN prototype (if we have a cern account etc). Is there any instruction on how to do that
[11:30:34] John Bland another meeting, bye
[11:30:37] John Bland left
[11:34:33] Wahid Bhimji I also have to go in a minute - sorry - hopefully can catch up on this later.
[11:35:37] Wahid Bhimji ...bye.... 
[11:35:42] Wahid Bhimji left
[11:36:28] Ewan Mac Mahon For things that aren't lustre there's the possibility of using the pool nodes to do the checksumming, so your bandwidth scales with storage.
[11:36:49] Ewan Mac Mahon Plus you avoid over-the-network transfers
[11:36:51] Duncan Rand exactly - do it in parallel
[11:37:30] Ewan Mac Mahon Sam should write a dm-lite plugin for that  
[11:38:17] Sam Skipsey I should say that my checksumming tool *does* to the checksumming on the pool nodes  
[11:38:37] Sam Skipsey That's why it needs you to annoyingly have a python-ssh module installed.
[11:40:13] John Hill left