Patch-tag is now https safe, so you can access your projects with cryptographic security. Just use https style urls when interacting with patch-tag.
By the way — for you code shops with mission critical, high-value, national security endangering repos hosted at patch-tag — https isn’t completely safe.
You could, for instance, get your password sniffed by a man in the middle attack with arp poisoning, if you are in a public network like an internet cafe, or on a trusted network with a coworker who enjoys hanky panky.
Disclaimer: If you are not technically savvy, and using patch-tag only for the gitit wiki, this option is probably not for you. I have some ideas percolating though that might make this a lot more straightforward if having an offline gitit wiki is something people want.
Thanks to our benchmark suite, a performance regression for check and repair was detected, and Reiner Lamers blocked the current release process until this regression is fixed:
I've taken, with a tiny bit of prodding, the Token Stream Diffs Using Pygments from toy to a nascent toolchain that may even almost be useful.
I've brought in two new dependencies Google diff-match-patch and, for nice argument parsing, argparse. The diff-match-patch library provides a character-based diff algorithm and patch format (a character-based unidiff-like format with character escaping) in a number of languages, including my friend Python. I can use diff-match-patch to produce useful patch output (and apply said patches with a simple new tokpatch.py file that is but a wrapper around diff-match-patch patching).
tokdiff.py has grown three new output formats. The original "toy" format I've renamed "verbose" and its quite interesting for debugging and getting an idea of why diffs look the way they do. Most useful, and the new default, is the unidiff-like output. There's also diff-match-patch's much more compact tab-delimited "delta" format, which is interesting, but I don't think is all that safe. (It's an undocumented, outside of the code itself, feature...)
The final output format is the "compare" which outputs some pretty HTML visually showing the differences between the tokenized diff approach and diff-match-patch's standard character-based diff, plus some basic benchmarking of the two algorithms.
Both tools and both dependencies can be grabbed from the darcs repository:
darcs get http://repos.worldmaker.net/tokdiff/main tokdiff
I'll consider putting together a deeper code site for it in the near future.
Some brief observations and thoughts for future directions:
I shouldn't be too surprised by it, but the tokenized diff does generally seem to be an order of magnitude faster than diff-match-patch's more generalized character-based diff algorithm.
I think there are still some interesting heuristics that can be further applied to make the tokenized diff even smarter. I'm not sure how exactly to start on that (I've been lucky to get as far as I have on the backs of existing diff algorithms).
I'd like to experiment with darcs-like patch selection UI using the tokenized diffs; particularly using the existing tokenization for syntax highlighting.
I'd like to know if anyone finds interesting real applications of this quick hack.
I have been doing some python brush-up over the last couple of days, and came across the following bit of code in an irc chat about approaches for fibonacci in python.
def fib(n):
a, b = 0, 1
for _ in xrange(n):
a, b = b, a + b
return a
print fib(1000)
I think this is pretty typical, reasonable, and efficient python code.
I wondered, how would I express this in haskell? How different this computation looks from the canonical
fibs = 1:1:zipwith (+) fibs (tail fibs)
!
The answer, as is commonly the case for code written in imperative languages, is that this algorithm lives in the state monad. It can be ported to haskell, practically transliterated, as:
import Control.Monad.State
{-
-- haskell translational of python algo
def fib(n):
a, b = 0, 1
for _ in xrange(n):
a, b = b, a + b
return a
print fib(1000)
-}
fib :: Integer -> Integer
fib n = flip evalState (0,1) $ do
forM (xrange n) $ \_ -> do
(a,b) <- get
put (b,a+b)
(a,b) <- get
return a
xrange n = [0..(n-1)]
-- test that it works
traditionalFibs' = 1: 1: (zipWith (+) traditionalFibs' (tail traditionalFibs'))
traditionalFib n = traditionalFibs' !! (n-1)
t = fib 1000 == traditionalFib 1000
Patch-Tag is getting closer to a place where I’d feel comfortable charging for it, but there are a couple of things I need to take care of first.
First, I want to be more confident that I can scale this service out if I start getting a lot more users. It needs to be rock solid. No more downtime, no more crashes. This will definitely involve code tweaks for places that use unjustifiably large amounts lot of memory, and may (or may not — haven’t decided yet) mean infrastructure changes.
Second, Patch Tag already does backups but I am going to revisit this process one more time to convince myself that this is as rock solid as it can get.
Happy belated thanksgiving everyone. There will be some more interesting news soon, I am feeling sure!
The Darcs Hacking sprint was co-located in Vienna, Austria and Portland, Oregon, USA.
The Vienna team had a unique experience as the sprint happened to coincide with a student protest that resulted in an occupied University building. Great fun was had by all as we shared our hacking space with the Occupiers on Saturday.
The Portland team was graciously hosted by Galois. As a special treat, we got a visit from Thomas Hartman, one of the folks behind Patch-tag.
Overview
Among other things, the upcoming Darcs 2.4 release has some nice performance improvements from Petr Ročkai's hashed-storage work.
One first goal in this sprint was to polish up this work, making our hashed repository support good enough for GHC team to upgrade from the deprecated old-fashioned format. We think we've largely accomplished what we set out to do. Read on for more details!
Our second goal was to provide a space for new Darcs hackers to get started with making contributions to the project. We had 14 new Darcs developers at this sprint! We hope that they will stay on with the project and continue submitting patches.
Issues resolved
Note that some of this work is still pending patch review or amendements.
issue540 - darcs remove --recursive - Roman Plášil
issue835 - show files with arguments - Luca Molteni
issue1224 - darcs convert on darcs-2 repositories - Tomáš Caithaml
issue1377 - hardcoding of darcs executable name - Benedikt Huber
issue1392 - use Parsec for authorspellings - Tomáš Caithaml
issue1394 - show time elapsed after each test - Kim Wallmark
issue1499 - versioned show files - Thomas Hartman
issue1500 - misleading darcs progress reports - Roman Plášil
issue1624 - break up global cache into subdirectories - Luca Molteni
issue1643 - optimize --upgrade should do optimize - Christian Berrer
Hacking
Studying darcs
Thomas DuBuisson and Jason Dagit studied some hanging merge cases on the GHC repository to see if we could garner some insights into them. While this did not yield much fruit, it did result a few code cleanups.
Jonathan Daugherty and Josh Hoyt took a tour of the Darcs source. We hope they'll be joining us as Darcs hackers in the future :-)
As a warm-up exercise, everybody worked on tidying the Darcs and hashed-storage source, eliminating annoying GHC warnings and implementing suggestions from hlint.
Tomáš Caithaml simplified the darcs show authors authorspellings code by rewriting it in Parsec.
Jason Dagit cleaned up the elegant_merge function and removed commutex in favor of commute.
Jonathan Daugherty submitted some improvements to the user manual.
Performance improvements
Luca Molteni improved Darcs's handling of the global cache, splitting it into buckets for better performance.
User interface improvements
Tomáš Caithaml got darcs convert to abort in case user accidently tries to convert a darcs 2 repository a second time.
Roman Plášil improved the user experience behind Darcs progress reporting replacing misleadingly definitive text with a more accurate description as shown in the following example :
OLD TEXT NEW TEXT Fetching pristine cache... 4/45 4 done, 41 queued (sometime later)... 10/166 10 done, 155 queued (finally)... 297/297 297 done
Christian Berrer extended the darcs optimize --upgrade command to also act as plain old darcs optimize in addition to upgrading the repository format.
David Markvica modified darcs get --complete so that it no longer offers to create a lazy repository.
Benedikt Huber relaxed the assumption that darcs executable would be called "darcs", instead fetching this from the command line.
New features
Matthias Fischmann and Radoslav Dorcik worked on a new trackdown --bisect feature for darcs trackdown which makes trackdown more useful for larger sets sets of patches.
Roman Plášil implemented an oft-requested remove -r feature, making it easier to undo accidental adds of large directories.
Luca Molteni implemented darcs show files with an argument to limit the output to a specific file or directory.
Thomas Hartman implemented matchers support for darcs show files, allowing to get the list of files tracked by Darcs at a given point in the history (eg. darcs show files --tag 2.4).
Petr Ročkai worked on a new more extensible, self-documenting format mechanism as a future alterantive to _darcs/format.
UTF-8 Support
Reinier Lamers continued his work on storing darcs patch metadata in UTF-8. He worked on autodetecting UTF-8 content (in the patch metadata) and found some subtle bugs in the utf8-string library along the way.
SVN integration
The Vienna local team, Christian Berrer, Thomas Danecker and David Markvica, worked on SVN integration and studied the libsvn directory in great detail.
Zero Windows Bugs!
Salvatore Insalaco slayed the 18 Windows bugs that resulted from our recent hashed-storage work. This means that we can can definitely release the new hashed-storage work in Darcs 2.4.
Darcs Team infrastructure
Joachim Breitner made some improvements to the darcswatch core and user interface. He also added hooks to Darcswatch for integration with Darcs Team patch tracker. Joachim Breitner, Eric Kow, Petr Ročkai also added some polish to patch tracker, in particular to its email gateway interface and interaction with the darcs-users mailing list. The result of these efforts is a more efficient Darcs Team with simpler and smoother patch review process.
Kim Wallmark worked on making the 'cabal test' friendlier by outputting the amount of time elapsed after long tests.
Discussions
Priorities for the Sprint
Salvatore Insalaco, Eric Kow, Reinier Lamers, Luca Molteni and Petr Ročkai discussed the key goals for the sprint and also produced a list of the top ProbablyEasy.
Type witnesses
Jason Dagit gave a presentation about our type-witnesses, how the type hackery in darcs is implemented, and why it is useful.
Patch-tag
Thomas Hartman and the Portland crew discussed patch-tag: how it felt from a user perspective and what the most useful features would be.
SVN integration
Thomas Danecker, Eric Kow, Petr Ročkai, Ganesh Sittampalam (IRC) discussed the SVN integration project: what the roadmap should look like and how this feature might fit into Darcs, whether as a fully integrated feature, a plugin or a standalone application.
Petr Ročkai and Eric Kow also discussed the nature of patch/changeset based revision control (like Darcs) and snapshot based revision control (everything else). Petr observed that changeset based version control can be seen as just snapshot based control with a fixed diffing algorithm. To support things like SVN integration, Darcs will need to freeze its diffing algorithm.
Darcs roadmap
Salvatore Insalaco, Eric Kow, Reinier Lamers and all Vienna hackers developed the roadmap for Darcs 2.4 (January 2010) and Darcs 2.5 (July 2010). Using a chalkboard we listed the features we wanted, how desirable they were and what order we should do them in.
Windows support
Salvatore Insalaco, Eric Kow and Reinier Lamers discussed the future of Windows support in Darcs.
Official binary: Our first priority will be to get an official binary for Windows downloadable from Darcs.net. The upcoming Darcs release (Darcs 2.4 in January 2010) will the first to provide an official binaries. Currently, we have the resources to provide a binary for Windows only, but in the future the list of platforms may grow.
Refined integration: After the Darcs 2.4 release, we will work on developing a friendly Windows installer for Darcs which includes not just the Darcs binary but (optionally), TortoiseSSH for better ssh support and a tool for sending email via darcs send.
Cygwin support: We will not be able to support Cygwin, but we will work on providing more explicit documentation on what works (mostly everything) and what does not (absolute cygwin paths). We think that the improved Windows installer will provide a sufficiently comfortable Darcs experience for most Cygwin users.
Fun and speculation
Joachim Breitner, Reinier Lamers and Petr Ročkai discussed abstracting over patch types to just maps (eg. directories), sets and lists (eg. files).
Better Darcs Hacking Sprints
Every sprint teaches something new for what we hope to be a long tradition of biannual Darcs Hacking events.
One thing which worked out well this year was that we provided a clean list of the best ProbablyEasy bugs for new hackers to work on. This allowed new developers to make highly desirable contributions from the very beginning. This year the list was built with live discussion, a few Darcs Team members huddled in a hostel room over an open bug tracker. Perhaps we can replicate the success by having similar pre-sprint meetings in the future.
Replicating success is one thing; how can we do even better? Mainly we need to improve the experience that new hackers have. We were lucky this year to have many new Haskellers and Darcs developers, but we could have done a better job in helping them to get started.
First, some technical issues: A lot of our new developers lost the better part of Saturday morning setting up the machines to build Darcs and to send patches to the list. In the next sprint, we will be ready with much more precise and detailed setup instructions (developed hastily during this sprint) going all the way to the best default settings for Darcs, configuration files for msmtp (or similar sendmail replacements). It also has been suggested that we provide a virtual machine image for instant Darcs hacking.
Second, our mentoring strategy: We were fairly successful at providing in-depth individual mentoring; however what we could done better was to provide more mass mentoring at the very beginning, to help developers send their first patches. To make this work, we should try starting the next sprint with a group mentoring session with the specific purpose of getting people set up to send their first patches. We should encourage new developers to work in small teams, for example with an extra USB keyboard to facilitate pair programming on a single machine.
Attendees
Vienna
Joachim Breitner
Benedikt Huber
Eric Kow
Reinier Lamers
David Markvica
Petr Rockai
Radoslav Dorcik
Salvatore Insalaco
Matthias Fischmann
Thomas Danecker
Christian Berrer
Luca Molteni
Roman Plášil
Tomáš Caithaml
Pivo
Portland
Jason Dagit
Josh Hoyt
Thomas Hartman
Thomas DuBuisson
Kim Wallmark
Jonathan Daugherty
Thanks!
Many thanks to the local team at TU Vienna and Galois for a wonderful welcome! Thanks to the many generous donors who helped us to subsidise travel to the sprint. Thanks also to Microsoft Austria for sponsoring the sprint with drinks and snacks. Molte grazie to Salvatore Insalaco for a wonderful lunch on Sunday.
Finally, thanks to Petr and Luca for the photos, and more generally to everybody who participated in the sprint! It was great to have you and we hope to see you again either on the mailing list or at future sprints.
See you in March!
The fourth Darcs Hacking Sprint will be taking place in Zürich on 19-21 March 2010 as part of the Haskell Hackathon. Hope to see you there!
The third darcs hacking sprint happened last weekend in Vienna and Portland! An extensive report will be posted here soon. As a teaser of all the work done, here is a draft implementation of trackdown --bisect written in Vienna by Matthias Fischmann:
I’m keeping this one brief… Matt Elder has decided that it’s time to move on from patch-tag, in order to focus on other projects and the new addition to his family.
Thanks for all the good work, Matt. Patch-tag wouldn’t have
gotten this far without you.
I am in the process of migrating patch-tag to the latest version of happstack, and I thought I would post some diffs to aid others who have the same task.
This probably isn’t of much interest unless you are actually faced with a migration — but if you are, could save you some time and starting at compile complaints.
To get the most out of the following blog post, first try writing your partition function, which does the same thing as Data.List.partition.
*Partition> partition even [1..10]
([2,4,6,8,10],[1,3,5,7,9])
Then write a testing function, tpf, which checks your partition function against a variety of input, including large or infinite input. (Which means we don’t just use quickcheck).
Then read the blog post to see what obstacles I hit when I went through this process.
source here if you’re irked by wordpress aggressively cutting off the right side of the page.
{-# LANGUAGE NoMonomorphismRestriction #-}
module Partition where
import Debug.Trace.Helpers
import qualified Data.List as DL
import Debug.Trace
-- first attempt
partition f [] = ([],[])
partition f (x:xs) =
let (as,bs) = partition f xs
in if (f x) then (x:as,bs) else (as,x:bs)
t1 = tpf partition
-- testing function.
tpf partif =
let a = last . snd . partif even $ [1..(10^6)]
b = head . snd . partif even $ [1,2..]
c = last . snd . partif even $ [1..(10^8)]
in (a,b,c)
-- sanity checks / benchmarks. our own code should be at least as fast as Data.List versions.
-- a is immediate, b is immediate, c takes about 10 seconds.
tDL = tpf DL.partition
-- a test and b test are fine but that 10^8 list seems to take forever.
-- let's think about this. What algorithm are we using?
-- Actually, I don't know the answer to that.
-- but looking at that let/in, I don't have a clear sense of crunching through a list and building up a result, which is usually the ideal.
-- So let's try this with a fold.
-- Next question, what kind of fold?
-- Easy answer: just try both and see what happens!
-- Once you've written one, you have the other just by flipping the arguments in the helper function
-- (Note: for the foldl, we try strict foldl' by default, as there is usually no reason to use lazy foldl:
-- http://haskell.org/haskellwiki/Foldr_Foldl_Foldl'
partitionR f xs = foldr g ([],[]) xs
where g next accum@(as,bs) = if f next then (next:as,bs) else (as,next:bs)
partitionL' f xs = DL.foldl' g ([],[]) $ reverse xs
where g accum@(as,bs) next = if f next then (next:as,bs) else (as,next:bs)
-- Harder answer, which requires thought: we probably want a foldr here, because the b test uses an infinite list and a strict fold
-- won't return anything if you feed it an infinite list.
-- OK, let's try foldr
t2 = tpf partitionR -- result: immediate stack overflow on the a test, and if you try them separately, it overflows all 3 tests.
-- that's kind of sad.
-- for no particular reason, let's try a strict foldl version (warning! be careful running this!)
t3 = tpf partitionL'
-- at least the a test succeeds after a few seconds, confirming the intuition that strict foldl' crunches through a list
-- and returns output after it's done.
-- the b test exhausts computer memory, eventually the fan starts whirring louder than I can think, mouse stops responding and have to reboot
-- so like I said, careful running this, and hit control c!
--okay, I give up, let's peek at the standard libs (SL) Data.List
-- this isn't word for word what the ghc standard libs Data.List has, so as to be more similar to the functions we wrote so far,
-- but it's the same algo, and same performance.
-- the important difference is the lazy match
partitionSL p xs = foldr g ([],[]) xs
where g next accum = if (p next) then (next:fst accum,snd accum) else (fst accum, next:snd accum)
-- which is the same as below, using irrefutable pattern syntax:
-- g next ~accum@(as,bs) = if (p next) then (next:as,bs) else (as, next:bs)
-- Sure enough, same performance as Data.List.Partition
-- Note: to get the same performance as lazy list, you need to compile the module before loading in ghci, eg
-- ghc --make PartitionM.hs, then ghci PartitionM
-- if you do ghci PartitionM.hs, this doesn't terminate (at least, not in under a minute I'm an impatient guy.)
tSL = tpf partitionSL
-- So, why does partition need an irrefutable pattern in the accum argument?
-- Let's look at the original right folded partition again, rewritten as follows, for a 3 element list
partitionR2 f xs = foldr (select f) ([],[]) xs
select f next accum@(as,bs) = if f next then (next:as,bs) else (as,next:bs)
-- select' g next accum = if (p next) then (next:fst accum,snd accum) else (fst accum, next:snd accum)
tpr2 :: ([Int],[Int])
tpr2 = partitionR even [1..3]
tpr2a = 1 `g` ( 2 `g` ( 3 `g` ([],[]) ) )
g = select even
-- when the accum argument to select is a strict pattern match, the algorithm can't proceed past the first element
-- without evaluating everything inside the first set of parenthesis.
-- For a million element list, this is going to be a problem.
-- if the accum arg is lazy (just accum without the @ binding), the algorithm can proceed to the if test,
-- calculate the tuple with the list parts as just a thunk to be evaluabed later, and keep chugging along,
-- by calculating the list parts next.
It is said that in haskell, If it Compiles, It Works.
This is true, but what do you do when it won’t compile?
I came across a real world scenario for this just now, when I was in the process of removing the HStringTemplateHelpers dependency from happs tutorial. The reason for this is that HStringTemplateHelpers uses the unix package (indirectly, via FileManip), which means that at the present time happs tut won’t run on windows, which lately has me smacking myself like dobby the masochistic house elf in the harry potter movies.*
The dependency wasn’t in one file, but scattered throughout the code. Some places, the dependency wasn’t that important, I could just comment the function out, but in other cases the function was core to the app and commenting it out caused cascading compile failures. Doing the type/function arithmetic in my head for figuring out what depended on what was giving me a headache and tempting me to veg out on youtube rather than face the problem, always a warning sign for me that I’m doing something wrong.
I needed to make my program compile, just so I could think about it with my head screwed on straight.
So, instead of commenting out the HStringTemplateHelpers-dependent functions, I set them equal to undefined and commented out their type signatures (since the type sig might be using a type that was defined in the dependency.
A few comment-outs later and cabal install compiles! Of course, if I actually try running the resulting binary, I will hit an undefined error right away, but the point is that I can think again, and start rewriting my offending functions in a methodical way until there are no more undefineds, which is easly checked with
grep -irn undefined src
For what it’s worth, I could have also used
paintProfile rglobs user cp userimagepath = error "paintProfile uses HStringTemplateHelpers... bad dobby! bad dobby! Dobby will have to shut his ears in the oven door for this."
But whatever
* I suppose the righter thing to do would be to fix FileManip, but I am choosing the easy way out just to get it done. Eventually I woud like to move away from HStringTemplate with happs and use mostly Text.XHTML and/or HSP, which is type safe, which HSTringTemplate just thrown in for convenience and newb friendliness.
Patch-Tag now writes the _darcs/prefs/email file for repositories with the owner’s email.
That means if you check out a public repo (without being a member with write access to shared) you can contribute patches back to the repo creator simply with the command
darcs send
This assumes you have sendmail configured for sending email from the command line. But if you don’t, no problem, you can output to a file and send as attachment. Every repo has instructions that will hopefully make this easy for newbies, for example one of mine.
Thanks for the users who suggested implementing this feature, now that it’s there it seemed obvious all along.
I am once more without day job, and so patch-tag is getting more attention than it is used to. Could patch-tag be a day job, after all? Let’s just say I am toying with this idea but trying to stay grounded too. I don’t have as many users as I wanted when I started this project — not by a long shot. However, a good proportion of the users I do have seem to be “real” users — actually using repos, browsing around, not just signing up and moseying away when the novelty wears off. This is good. On the other hand, I worry that the logs I am basing this on might not be fair — perhaps bots or other phenomena are responsible for the apparent signs of activity. So, this — understanding true usage — is also something I am working on.
This week was mostly incremental improvements, but it was all stuff that has to happen before shooting for bigger goals (gitit, git format repos, and todo lists).
Edit profile page improvements: you can change your primary email, delete profile, etc
The repo command page has a “push patches over email: ” section.
Added some admin functionality, for better understanding my user base, and in preparation to pull the trigger on paid repos at some time in the not too distant future
Refactored and cleaned up a lot of code. Hey, is that a feature? Maybe not for you… but it’s a feature for me, the guy that has to deal with self-created mess.
I am also working on a talk to evangelize happstack to the southern california FP and web dev community sometime in the next few weeks.
1. Work on hashed-storage, implemented by Petr and reviewed by Ganesh, was finally merged into the main darcs repository. Eric provided explanations for people who want a faster darcs now:
The following is a copy of my recent post to the darcs-users mailing list.
Hi everybody,
So you may have noticed me saying this in a couple of recent threads. Petr Ročkai's hashed-storage work from his 2009 Google Summer of Code project has been merged!
I thought I would take a few moments to give everybody an overview of how this work benefits us, and where we'll be going in the future.
In a nutshell
What does this mean for you? Faster repository-local operations.
Hashed format repositories (with darcs-1 and darcs-2 patches alike) should now be faster to use on a daily basis. We saw the very beginnings of this work in Darcs 2.3.0 with a faster darcs whatsnew. Now these speed improvements cover all repository-local operations.
The next Darcs beta is a couple of months away, but before that, I would like to encourage you to try this out for yourself:
darcs get --lazy http://darcs.net cd darcs.net cabal install
For best results, please run darcs optimize --upgrade followed by darcs optimize --pristine. Pay attention over the next couple of weeks when you try a record, amend, revert, unrecord. If we've done our work right, there should be nothing to see. Darcs should be less noticeable, with fewer "Synchronizing pristine" messages and a faster return to the command prompt. We think you'll like it. But please get back to us. Is Darcs faster for you?
If you're particularly interested, I will step through these changes in greater detail at the end of this message. Meanwhile, I would like to step back a little and take stock of how these improvements fit in to the bigger picture.
The road ahead
The hashed storage work is a big step forward and definitely a cause for celebration. I think it is useful to reflect on this progress and consider how it fits in with our progress since darcs 1.0.9:
ssh connection sharing (darcs transfer mode)
HTTP pipelining
lazy repositories
the global cache
and now
index-based diffing
hashed-storage efficiency
We cannot promise that Darcs will magically become fast overnight. But what we can and will do is continue chipping away at it, solving problems one at a time; release by release, a little bit better, a little bit faster every time until one day we can look back and marvel at all the progress we've made.
So Petr's work makes Darcs easier to live with on a day-to-day basis. But that's not enough. Now we need to turn our attention to that crucial first impression; what happens when people try Darcs out for the first time is that they darcs get a repository they want and... then... they... wait...
This is embarrassing, but we can fix it. In fact, we already have started working on the problem. The next version of hashed-storage will likely introduce a notion of "packs" in which the many often very small files that Darcs keeps track of will be concatenated into more substantial "packs" that compress better and reduce the ill effects of latency. My hope is that we will be able to complete the packs work by Darcs 2.5.
There's a lot more progress to be made: smarter patch representations, tuning for large patches, file-to-patch caching for long histories. And that's just performance! For more details about our performance work, please have a look at
If you could do anything to help, benchmark, profile, anything at all, please let us know :-)
The fight continues.
Thank-you!
Petr and Ganesh deserve a huge round of applause. Petr, thanks for thinking up this work, getting it done and pushing it through. Ganesh, thanks for an extremely thorough and thoughtful review. The two of you, thanks for holding on, for tenacious cooperation in the face of adversity.
Thanks also to all the wider Darcs community for all your support, comments, patch reviews.
I'm looking forward to seeing you at the upcoming Darcs hacking sprint. The sprint will take place in Vienna, Austria on the weekend of 14-15 November. Everybody, especially Darcs and Haskell newbies, is welcome to join in. Details on http://wiki.darcs.net/Sprints/2009-11
And if I may take a paragraph to mention this, Darcs needs your support. Every little counts, if you can send patches, review patches, tweak documentation, profile, benchmark, submit bug reports. Barring that, you could also make a contribution to our travel fund via the Software Freedom Conservancy. See http://darcs.net/donations.html for details.
Thanks everybody and enjoy!
Eric
Changes in detail
Darcs uses an "index" file to compute working directory and pristine cache diffs. This avoids timestamps going out of synch when you have multiple local branches, which saves a huge and needless slowdown.
Hashed storage is more efficient in general. Even if you already have perfect timestamps, the new optimisations should make Darcs faster in general.
The new 'darcs optimize --pristine' reduces spurious mismatches on directories.
Darcs no longer requires a one second sleep after applying patches.
A wise haskell hacker said you don’t need to understand monads to use em, and this I find to be mostly true.
You don’t need to understand category theory either.
Lately though, I’ve been trying to deepen my intuition a bit anyway.
This is something I wrote lately that I keep revisiting when thinking about monadic machinery: specificaly the bind, left bind, join and return operators, as well as the pure and fmap operators used with Applicative and Functor.
How do these funny named operators fit together and what are they good for?
Concrete examples help, and for now my concrete example is the list monad. Pure and return do the same thing, since the list monad is also an applicative functor: just :[], put in a list. Fmap is, of course, map. I find myself using left bind more often than bind in my code, and in the list monad left bind is concatMap. Oh, and join is concat. You don’t hear about join much, but it turns out to be important when understanding monads categorically. Join takes m (m a) -> m (a), ie two monad nestings deep to one. Mysterious, eh?
If any of this rang a bell and you have been scratching your head for a simple bit of code you can stare at and play around with in your head, you may enjoy this.
To start with, try out f1 xs, f2 xs, f3 xs, u1 xs, u2 xs in ghci, just to see what’s giong on. Then… well, just read the code for suggestions about how to learn from it.
{-# LANGUAGE NoMonomorphismRestriction #-}
import Control.Applicative
import Control.Monad
import Test.QuickCheck
-- inspired by mixing monads, arrows, and applicative functors
-- http://yumagene.livejournal.com/2245.html
-- http://www.sfu.ca/~ylitus/courses/cmpt481731/slides/FPjul9.pdf
-- An interesting thing for learning you can do in your head:
-- replace (=<<), (<$>), and join with the list equivalents, after staring at this for a while.
-- The equivalents:
-- (=<<): concatMap
-- (<$>): map
-- join: concat
-- doing that helped me understand monad/functor machinery better.
-- a list of lists, to illustrate "burrowing in" two levels with (<$>) . (<$>) and other machinery
xs :: [[Integer]]
xs = [[2,4,6],[7..13],[10,20..50]]
t = sequence_ [tflattened, tunflattened]
infs = repeat [1,2,3] -- the f functions produce output fine with infinite lists as well.
-- you lose the "list of lists" structure, because (=<<)/join are flattening
f1 els = (=<<) ((<$>) (* 3)) $ els
f2 els = join . ((<$>) $ (<$>) (* 3)) $ els
f3 els = join . (((<$>) . (<$>)) (*3)) $ els -- same thing, note the burrowing in two levels thing.
tflattened = quickCheck p
where p :: [[Integer]] -> Bool
p xs = (f1 xs) == (f2 xs) && (f1 xs) == (f3 xs)
-- if you want to preserve the list of lists structure, compose your monadic function with (pure .) behind it.
u1 els = (=<<) (pure . (<$>) (* 3)) els
u2 els = join . ((pure . (<$>) (* 3)) <$>) $ els
tunflattened = quickCheck p
where p :: [[Integer]] -> Bool
p xs = u1 xs == u2 xs
--flatten=join
-- Questions/Lessons Learned:
-- Q0: are f1 and f2, and u1 and u2 the same function?
-- by same I mean
-- a) produce the same output for the same input everywhere (appears to be true for list monad)
-- b) computed the same way, one is not more efficient than the other
-- Answer to Q0: yes, pretty much.
-- Q1. what is the relationship between bind and join?
-- A. -- (=<<) f mx = join . (f <$>) $ mx -- bind could have been defined this way, though in practice it's not.
-- translation : -- functor map your monad function over the monad value (<$> f)
-- and pop one level of structure (join)
-- in a list, (=<<) f is concat . (map f)
-- Q. can this be proved using monad/functor/applicative laws or similar?
-- A. not proved exactly, but more like trivially true assuming the monad laws hold for your structure, which they should.
-- (Monad laws should be proven separately for each structure you define a monad laws. They *are* proven for list, maybe, maybe others.
-- Q. does ((<$>) . (<$>)) have some interesting roles in reasoning about monads or applicatives?
-- A. Burrow in two levels before applying the function.
-- (=<<) is concatMap in list context. join is concat in list context.
I have also started logging darcs actions to see which ones take the longest in hopes of providing the development cabal with intelligence for some hard to squash bugs.