Light

The bright meadows - this place should be safe for children.

English

Here you can find my english pages. When there are enough of them, they might get the same or a similar structure as the german ones.

You can view these pages like a blog by checking the

 

< < new english posts (weblog) > >

 

- they also feature an RSS-Feed.

Also you can find some more of my english writings by looking at the blog-entries in LJ which I tagged english.

Best wishes,
Arne

A tale of foxes and freedom

Singing the songs of creation to shape a free world.

One day the silver kit asked the grey one:

“Who made the light, which brightens our singing place?”

The grey one looked at him lovingly and asked the kit to sit with him, for he would tell a story of old, a story from the days when the tribe was young.

“Once there was a time, when the world was light and happiness. During the day the sun shone on the savannah, and at night the moon cast the grass in a silver sheen.

It was during that time, when there were fewer animals in the wild, that the GNUs learned to work songs of creation, deep and vocal, and they taught us and everyone their new findings, and the life of our skulk was happiness and love.

But while the GNUs spread their songs and made new songs for every idea they could imagine, others invaded the plains, and they stole away the songs and only allowed singing them their way. And they drowned out the light, and with it went the happiness and love.

And when everyone shivered in cold and darkness, and stillness and despair were drawn over the land, the others created a false light which cast small enclosures into a pale flicker, in which they let in only those animals who were willing to wear ropes on their throats and limbs, and many animals went to them to escape the darkness, while some fell deeper still and joined the others in enslaving their former friends.

Upon seeing this, the fiercest of the GNUs, the last one of the original herd, was filled with a terrible anger to see the songs of creation turned into a tool for slavery, and he made one special song which created a spark of true light in the darkness which could not be taken away, and which exposed the falsehood in the light of the others. And whenever he sang this song, those who were near him were touched by happiness.

But the others were many and the GNU was alone, and many animals succumbed to the ropes or the ropers and could move no more on their own.

To spread the song, the GNU now searched for other animals who would sing with it, and the song spread, and with it the freedom.

It was during these days, that the GNU met our founders, who lived in golden chains in a palace of glass.

In this palace they thought themselves lucky, and though the light of the palace grew ever paler and the chains grew heavier with every passing day, they didn't leave, because they feared the utter darkness out there.

When they then saw the GNU, they asked him: "Isn't your light weaker than this whole palace?" and the GNU answered: "Not if we sing it together", and they asked "But how will we eat in the darkness?" and the GNU answered "you'll eat in the light of your songs, and plants will grow wherever you sing", and they asked "But is it a song of foxes?" and the GNU said: "You can make it so", and he began to sing, and when our founders joined in, the light became shimmering silver like the moon they still remembered from the days and nights of light, and they rejoiced in its brightness.

And whenever this light touched the glass of the palace, the glass paled and showed its true being, and where the light touched the chains, they whithered and our founders went into the darkness with the newfound light of the moon as companion, and they thanked the GNU and promised to help it, whenever they were needed.

Then they set off to learn the many songs of the world and to spread the silver light of the moon wherever they came.

And so our founders learned to sing the light, which brightens every one of our songs, and as our skulk grew bigger, the light grew stronger and it became a little moon, which will grow with each new kit, until its light will fill the whole world again one day.”

The grey one looked around where many kits had quietly found a place, and then he laughed softly, before he got up to fetch himself a meal for the night, and the kits began to speak all at once about his story. And they spoke until the silver kit raised its voice and sung the song of moonlight1, and they joined in and the song filled their hearts with joy and the air with light, and they knew that wherever they would travel, this skulk was where their hearts felt home.

PS: This story is far less loosely based on facts than it looks. There are songs of creation, namely computer programs, which once were free and which were truly taken away and used for casting others into darkness. And there was and still is the fierce GNU with his song of light and freedom, and he did spread it to make it into GNU/Linux and found the free software community we know today. If you want to know more about the story as it happened in our world, just read the less flowery story of Richard Stallman, free hackers and the creation of GNU or listen to the free song Infinite Hands.

PPS: I originally wrote this story for Phex, a free Gnutella based p2p filesharing program which also has an anonymous sibling (i2phex). It’s an even stronger fit for Firefox, though.

PPPS: License: This text is given into the public under the GNU FDL without invariant sections and other free licenses by Arne Babenhauserheide (who has the copyright on it).

P4S: Alternate link: http://draketo.de/english/tale-of-foxes-and-freedom


  1. To make it perfectly clear: This moonlight is definitely not the abhorrent and patent stricken silverlight port from the mono project. The foxes sing a song of freedom. They wouldn’t accept the shackles of Microsoft after having found their freedom. Sadly the PR departments of some groups try to take over analogies and strong names. Don’t be fooled by them. The moonlight in our songs is the light coming from the moon which resonates in the voices of the kits. And that light is free as in freedom, from copyright restrictions as well as from patent restrictions – though there certainly are people who would love to patent the light of the moon. Those are the ones we need to fight to defend our freedom. 

Emacs

Cross platform, Free Software, almost all features you can think of, graphical and in the shell: Learn once, use for everything.

» Get Emacs «

Emacs is a self-documenting, extensible editor, a development environment and a platform for lisp-programs - for example programs to make programming easier, but also for todo-lists on steroids, reading email, posting to identi.ca, and a host of other stuff (learn lisp).

It is one of the origins of GNU and free software (Emacs History).

In Markdown-mode it looks like this:

Emacs mit Markdown mode

More on emacs on my german Emacs page.

Babcore: Emacs Customizations everyone should have

1 Intro

PDF-version (for printing)

Package (to install)

orgmode-version (for editing)

repository (for forking)

project page (for fun ☺)

Emacs Lisp (to use)

I have been tweaking my emacs configuration for years, now, and I added quite some cruft. But while searching for the right way to work, I also found some gems which I direly miss in pristine emacs.

This file is about those gems.

Babcore is strongly related to Prelude. Actually it is exactly like prelude, just with the stuff I consider essential.

But before we start, there is one crucial piece of advice which everyone who uses Emacs should know:

C-g: abort

Hold control and hit g.

That gets you out of almost any situation. If anything goes wrong, just hit C-g repeatedly till the problem is gone - or you cooled off far enough to realize that a no-op is the best way to react.

To repeat: If anything goes wrong, just hit C-g.

Table of Contents

2 Package Header

As Emacs package, babcore needs a proper header.

;; Copyright (C) 2013 Arne Babenhauserheide

;; Author: Arne Babenhauserheide (and various others in Emacswiki and elsewhere).
;; Maintainer: Arne Babenhauserheide
;; Created 03 April 2013
;; Version: 0.0.2
;; Version Keywords: core configuration

;; This program is free software; you can redistribute it and/or
;; modify it under the terms of the GNU General Public License
;; as published by the Free Software Foundation; either version 3
;; of the License, or (at your option) any later version.

;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with this program. If not, see <http://www.gnu.org/licenses/>.

;;; Commentary:
;; Quick Start / installation:
;; 1. Download this file and put it next to other files Emacs includes
;; 2. Add this to you .emacs file and restart emacs:
;;      (require 'babcore)
;;
;; Use Case: Use a common core configuration so you can avoid the
;;   tedious act of gathering all the basic stuff over the years and
;;   can instead concentrate on the really cool new stuff Emacs offers
;;   you.
;;
;; Todo:
;;

;;; Change Log:
;; 2013-04-03 - Minor adjustments
;; 2013-02-29 - Initial release

;;; Code:

Additionally it needs the proper last line. See finish up for details.

3 Feature Gems

3.1 package.el, full setup

The first thing you need in emacs 24. This gives you a convenient way to install just about anything, so you really should use it.

Also I hope that it will help consolidate the various emacs tips which float around into polished packages by virtue of giving people ways to actually get the package by name - and keep it updated almost automatically.

;; Convenient package handling in emacs

(require 'package)
;; use packages from marmalade
(add-to-list 'package-archives '("marmalade" . "http://marmalade-repo.org/packages/"))
;; and the old elpa repo
(add-to-list 'package-archives '("elpa-old" . "http://tromey.com/elpa/"))
;; and automatically parsed versiontracking repositories.
(add-to-list 'package-archives '("melpa" . "http://melpa.milkbox.net/packages/"))

;; Make sure a package is installed
(defun package-require (package)
  "Install a PACKAGE unless it is already installed 
or a feature with the same name is already active.

Usage: (package-require 'package)"
  ; try to activate the package with at least version 0.
  (package-activate package '(0))
  ; try to just require the package. Maybe the user has it in his local config
  (condition-case nil
      (require package)
    ; if we cannot require it, it does not exist, yet. So install it.
    (error (package-install package))))

;; Initialize installed packages
(package-initialize)  
;; package init not needed, since it is done anyway in emacs 24 after reading the init
;; but we have to load the list of available packages
(package-refresh-contents)

3.2 Flymake

Flymake is an example of a quite complex feature which really everyone should have.

It can check any kind of code, and actually anything which can be verified with a program which gives line numbers.

As alternative you might want to look into flycheck. It looks really cool, but I don’t yet have experience with it, so I cannot recommend it, yet.

;; Flymake: On the fly syntax checking

; stronger error display
(defface flymake-message-face
  '((((class color) (background light)) (:foreground "#b2dfff"))
    (((class color) (background dark))  (:foreground "#b2dfff")))
  "Flymake message face")

; show the flymake errors in the minibuffer
(package-require 'flymake-cursor)  

3.3 auto-complete

This gives you inline auto-completion preview with an overlay window - even in the text-console. Partially this goes as far as API-hints (for example for elisp code). Absolutely essential.

;; Inline auto completion and suggestions
(package-require 'auto-complete)

3.4 ido

To select a file in a huge directory, just type a few letters from that file in the correct order, leaving out the non-identifying ones. Darn cool!

; use ido mode for file and buffer Completion when switching buffers
(require 'ido)
(ido-mode t)

3.5 printing

Printing in pristine emacs is woefully inadequate, even though it is a standard function in almost all other current programs.

It can be easy, though:

;; Convenient printing
(require 'printing)
(pr-update-menus t)
; make sure we use localhost as cups server
(setenv "CUPS_SERVER" "localhost")
(package-require 'cups)

3.6 outlining everywhere

Code folding is pretty cool to get an overview of a complex structure. So why shouldn’t you be able to do that with any kind of structured data?

; use allout minor mode to have outlining everywhere.
(allout-mode)

3.7 Syntax highlighting

Font-lock is the emacs name for syntax highlighting - in just about anything.

; syntax highlighting everywhere
(global-font-lock-mode 1)

3.8 org and babel

Org-mode is that kind of simple thing which evolves to a way of life when you realize that most of your needs actually are simple - and that the complex things can be done in simple ways, too.

It provides simple todo-lists, inline-code evaluation (as in this file) and a full-blown literate programming, reproducible research publishing platform. All from the same simple basic structure.

It might change your life… and it is the only planning solution which ever prevailed against my way of life and organization.

; Activate org-mode
(require 'org)
; and some more org stuff

; http://orgmode.org/guide/Activation.html#Activation

; The following lines are always needed.  Choose your own keys.
(add-to-list 'auto-mode-alist '("\\.org\\'" . org-mode))
; And add babel inline code execution
; babel, for executing code in org-mode.
(org-babel-do-load-languages
 'org-babel-load-languages
 ; load all language marked with (lang . t).
 '((C . t)
   (R . t)
   (asymptote)
   (awk)
   (calc)
   (clojure)
   (comint)
   (css)
   (ditaa . t)
   (dot . t)
   (emacs-lisp . t)
   (fortran)
   (gnuplot . t)
   (haskell)
   (io)
   (java)
   (js)
   (latex)
   (ledger)
   (lilypond)
   (lisp)
   (matlab)
   (maxima)
   (mscgen)
   (ocaml)
   (octave)
   (org . t)
   (perl)
   (picolisp)
   (plantuml)
   (python . t)
   (ref)
   (ruby)
   (sass)
   (scala)
   (scheme)
   (screen)
   (sh . t)
   (shen)
   (sql)
   (sqlite)))

3.9 Nice line wrapping

If you’re used to other editors, you’ll want to see lines wrapped nicely at the word-border instead of lines which either get cut at the end or in the middle of a word.

global-visual-line-mode gives you that.

; Add proper word wrapping
(global-visual-line-mode t)

3.10 goto-chg

This is the kind of feature which looks tiny: Go to the place where you last changed something.

And then you get used to it and it becomes absolutely indispensable.

; go to the last change
(package-require 'goto-chg)
(global-set-key [(control .)] 'goto-last-change)
; M-. can conflict with etags tag search. But C-. can get overwritten
; by flyspell-auto-correct-word. And goto-last-change needs a really
; fast key.
(global-set-key [(meta .)] 'goto-last-change)

3.11 flyspell

Whenever you write prosa, a spellchecker is worth a lot, but it should not unnerve you.

Install aspell, then activate flyspell-mode whenever you need it.

It needs some dabbling, though, to make it work nicely with non-english text.

; Make german umlauts work.
(setq locale-coding-system 'utf-8)
(set-terminal-coding-system 'utf-8)
(set-keyboard-coding-system 'utf-8)
(set-selection-coding-system 'utf-8)
(prefer-coding-system 'utf-8)

;aspell und flyspell
(setq-default ispell-program-name "aspell")

;make aspell faster but less correctly
(setq ispell-extra-args '("--sug-mode=ultra" "-w" "äöüÄÖÜßñ"))
(setq ispell-list-command "list")

3.12 control-lock

If you have to do the same action repeatedly, for example with flyspell hitting next-error and next-correction hundreds of times, the need to press control can really be a strain for your fingers.

Sure, you can use viper-mode and retrain your hands for the completely alien command set of vim.

A simpler solution is adding a sticky control key - and that’s what control-lock does: You get modal editing with your standard emacs commands.

Since I am a german, I simply use the german umlauts to toggle the control-lock. You will likely want to choose your own commands here.

; control-lock-mode, so we can enter a vi style command-mode with standard emacs keys.
(package-require 'control-lock)
; also bind M-ü and M-ä to toggling control lock.
(global-set-key (kbd "M-ü") 'control-lock-toggle)
(global-set-key (kbd "C-ü") 'control-lock-toggle)
(global-set-key (kbd "M-ä") 'control-lock-toggle)
(global-set-key (kbd "C-ä") 'control-lock-toggle)
(global-set-key (kbd "C-z") 'control-lock-toggle)

3.13 Basic key chords

This is the second strike for saving your pinky. Yes, Emacs is hard on the pinky. Even if it were completely designed to avoid strain on the pinky, it would still be hard, because any system in which you do not have to reach for the mouse is hard on the pinky.

But it also provides some of the neatest tricks to reduce that strain, so you can make Emacs your pinky saviour.

The key chord mode allows you to hit any two keys at (almost) the same time to invoke commands. Since this can interfere with normal typing, I would only use it for letters which are rarely typed after each other.

The default chords have proven themselves to be useful in years of working with Emacs.

; use key chords invoke commands
(package-require 'key-chord)
(key-chord-mode 1)
; buffer actions
(key-chord-define-global "vg"     'eval-region)
(key-chord-define-global "vb"     'eval-buffer)
(key-chord-define-global "cy"     'yank-pop)
(key-chord-define-global "cg"     "\C-c\C-c")
; frame actions
(key-chord-define-global "xo"     'other-window);
(key-chord-define-global "x1"     'delete-other-windows)
(key-chord-define-global "x0"     'delete-window)
(defun kill-this-buffer-if-not-modified ()
  (interactive)
  ; taken from menu-bar.el
  (if (menu-bar-non-minibuffer-window-p)
      (kill-buffer-if-not-modified (current-buffer))
    (abort-recursive-edit)))
(key-chord-define-global "xk"     'kill-this-buffer-if-not-modified)
; file actions
(key-chord-define-global "bf"     'ido-switch-buffer)
(key-chord-define-global "cf"     'ido-find-file)
(key-chord-define-global "vc"     'vc-next-action)

To complement these tricks, you should also install and use workrave or at least type-break-mode.

3.14 X11 tricks

These are ways to improve the integration of Emacs in a graphical environment.

We have this cool editor. But it is from the 90s, and some of the more modern concepts of graphical programs have not yet been integrated into its core. Maybe because everyone just adds them to the custom setup :)

On the other hand, Emacs always provided split windows and many of the “new” window handling functions in dwm and similar - along with a level of integration with which normal graphical desktops still have to catch up. Open a file, edit it as text, quickly switch to org-mode to be able to edit an ascii table more efficiently, then switch to html mode to add some custom structure - and all that with a consistent set of key bindings.

But enough with the glorification, let’s get to the integration of stuff where Emacs arguably still has weaknesses.

3.14.1 frame-to-front

Get the current Emacs frame to the front. You can for example call this via emacsclient and set it as a keyboard shortcut in your desktop (for me it is F12):

emacsclient -e "(show-frame)"

This sounds much easier than it proves to be in the end… but luckily you only have to solve it once, then you can google it anywhere…

(defun show-frame (&optional frame)
  "Show the current Emacs frame or the FRAME given as argument.

And make sure that it really shows up!"
  (raise-frame)
  ; yes, you have to call this twice. Don’t ask me why…
  ; select-frame-set-input-focus calls x-focus-frame and does a bit of
  ; additional magic.
  (select-frame-set-input-focus (selected-frame))
  (select-frame-set-input-focus (selected-frame)))

3.14.2 urgency hint

Make Emacs announce itself in the tray.

;; let emacs blink when something interesting happens.
;; in KDE this marks the active Emacs icon in the tray.
(defun x-urgency-hint (frame arg &optional source)
  "Set the x-urgency hint for the frame to arg: 

- If arg is nil, unset the urgency.
- If arg is any other value, set the urgency.

If you unset the urgency, you still have to visit the frame to make the urgency setting disappear (at least in KDE)."
  (let* ((wm-hints (append (x-window-property 
                            "WM_HINTS" frame "WM_HINTS" 
                            source nil t) nil))
         (flags (car wm-hints)))
    ; (message flags)
    (setcar wm-hints
            (if arg
                (logior flags #x00000100)
              (logand flags #x1ffffeff)))
    (x-change-window-property "WM_HINTS" wm-hints frame "WM_HINTS" 32 t)))

(defun x-urgent (&optional arg)
  "Mark the current emacs frame as requiring urgent attention. 

With a prefix argument which does not equal a boolean value of nil, remove the urgency flag (which might or might not change display, depending on the window manager)."
  (interactive "P")
  (let (frame (car (car (cdr (current-frame-configuration)))))
  (x-urgency-hint frame (not arg))))  

3.14.3 fullscreen mode

Hit X11 to enter fullscreen mode. Any self-respecting program should have that… and now Emacs does, too.

; fullscreen, taken from http://www.emacswiki.org/emacs/FullScreen#toc26
; should work for X und OSX with emacs 23.x (TODO find minimum version).
; for windows it uses (w32-send-sys-command #xf030) (#xf030 == 61488)
(defvar babcore-fullscreen-p t "Check if fullscreen is on or off")
(setq babcore-stored-frame-width nil)
(setq babcore-stored-frame-height nil)

(defun babcore-non-fullscreen ()
  (interactive)
  (if (fboundp 'w32-send-sys-command)
      ;; WM_SYSCOMMAND restore #xf120
      (w32-send-sys-command 61728)
    (progn (set-frame-parameter nil 'width 
                                (if babcore-stored-frame-width
                                    babcore-stored-frame-width 82))
           (set-frame-parameter nil 'height
                                (if babcore-stored-frame-height 
                                    babcore-stored-frame-height 42))
           (set-frame-parameter nil 'fullscreen nil))))

(defun babcore-fullscreen ()
  (interactive)
  (setq babcore-stored-frame-width (frame-width))
  (setq babcore-stored-frame-height (frame-height))
  (if (fboundp 'w32-send-sys-command)
      ;; WM_SYSCOMMAND maximaze #xf030
      (w32-send-sys-command 61488)
    (set-frame-parameter nil 'fullscreen 'fullboth)))

(defun toggle-fullscreen ()
  (interactive)
  (setq babcore-fullscreen-p (not babcore-fullscreen-p))
  (if babcore-fullscreen-p
      (babcore-non-fullscreen)
    (babcore-fullscreen)))

(global-set-key [f11] 'toggle-fullscreen)

3.14.4 default key bindings

I always hate it when some usage pattern which is consistent almost everywhere fails with some program. Especially if that is easily avoidable.

This code fixes that for Emacs in KDE.

; Default KDE keybindings to make emacs nicer integrated into KDE. 

; can treat C-m as its own mapping.
; (define-key input-decode-map "\C-m" [?\C-1])

(defun revert-buffer-preserve-modes ()
  (interactive)
  (revert-buffer t nil t))

; C-m shows/hides the menu bar - thanks to http://stackoverflow.com/questions/2298811/how-to-turn-off-alternative-enter-with-ctrlm-in-linux
; f5 reloads
(defconst kde-default-keys-minor-mode-map
  (let ((map (make-sparse-keymap)))
    (set-keymap-parent map text-mode-map)
    (define-key map [f5] 'revert-buffer-preserve-modes)
    (define-key map [?\C-1] 'menu-bar-mode)
    (define-key map [?\C-+] 'text-scale-increase)
    (define-key map [?\C--] 'text-scale-decrease) ; shadows 'negative-argument which is also available via M-- and C-M--, though.
    (define-key map [C-kp-add] 'text-scale-increase)
    (define-key map [C-kp-subtract] 'text-scale-decrease)
    map)
  "Keymap for `kde-default-keys-minor-mode'.")

;; Minor mode for keypad control
(define-minor-mode kde-default-keys-minor-mode
  "Adds some default KDE keybindings"
  :global t
  :init-value t
  :lighter ""
  :keymap 'kde-default-keys-minor-mode-map
  )

3.15 Insert unicode characters

Actually you do not need any configuration here. Just use

M-x ucs-insert

To insert any unicode character. If you want to see them while selecting, have a look at xub-mode from Ergo Emacs.

3.16 Highlight TODO and FIXME in comments

This is a default feature in most IDEs. Since Emacs allows you to build your own IDE, it does not offer it by default… but it should, since that does not disturb anything. So we add it.

fic-ext-mode highlight TODO and FIXME in comments for common programming languages.

;; Highlight TODO and FIXME in comments 
(package-require 'fic-ext-mode)
(defun add-something-to-mode-hooks (mode-list something)
  "helper function to add a callback to multiple hooks"
  (dolist (mode mode-list)
    (add-hook (intern (concat (symbol-name mode) "-mode-hook")) something)))

(add-something-to-mode-hooks '(c++ tcl emacs-lisp python text markdown latex) 'fic-ext-mode)

3.17 Save macros as functions

Now for something which should really be provided by default: You just wrote a cool emacs macro, and you are sure that you will need that again a few times.

Well, then save it!

In standard emacs that needs multiple steps. And I hate that. Something as basic as saving a macro should only need one single step. It does now (and Emacs is great, because it allows me to do this!).

This bridges the gap between function definitions and keyboard macros, making keyboard macros something like first class citizens in your Emacs.

; save the current macro as reusable function.
(defun save-current-kbd-macro-to-dot-emacs (name)
  "Save the current macro as named function definition inside
your initialization file so you can reuse it anytime in the
future."
  (interactive "SSave Macro as: ")
  (name-last-kbd-macro name)
  (save-excursion 
    (find-file-literally user-init-file)
    (goto-char (point-max))
    (insert "\n\n;; Saved macro\n")
    (insert-kbd-macro name)
    (insert "\n")))

3.18 Transparent GnuPG encryption

If you have a diary or similar, you should really use this. It only takes a few lines of code, but these few lines are the difference between encryption for those who know they need it and encryption for everyone.

; Activate transparent GnuPG encryption.
(require 'epa-file)
(epa-file-enable)

3.19 Colored shell commands

A shell without colors is really hard to read. Let’s make that easier.

; colored shell commands via C-!
(add-hook 'shell-mode-hook 'ansi-color-for-comint-mode-on)
(defun babcore-shell-execute(cmd)
  "Execute a shell command in an interactive shell buffer."
   (interactive "sShell command: ")
   (shell (get-buffer-create "*shell-commands-buf*"))
   (process-send-string (get-buffer-process "*shell-commands-buf*") (concat cmd "\n")))
(global-set-key (kbd "C-!") 'babcore-shell-execute)

3.20 Save backups in ~/.local/share/emacs-saves

This is just an aestetic value: Use the directories from the freedesktop specification for save files.

Thanks to the folks at CERN for this.

(setq backup-by-copying t      ; don't clobber symlinks
      backup-directory-alist
      '(("." . "~/.local/share/emacs-saves"))    ; don't litter my fs tree
      delete-old-versions t
      kept-new-versions 6
      kept-old-versions 2
      version-control t)       ; use versioned backups

3.21 Basic persistency

If I restart the computer I want my editor to make it easy for me to continue where I left off.

It’s bad enough that most likely my brain buffers were emptied. At least my editor should remember how to go on.

3.21.1 saveplace

If I reopen a file, I want to start at the line at which I was when I closed it.

; save the place in files
(require 'saveplace)
(setq-default save-place t)

3.21.2 recentf

Also I want to be able to see the most recently opened files. Almost every single program on my computer has a “recently opened files” list, and now emacs does, too.

; show recent files
(package-require 'recentf)
(recentf-mode 1)
(setq recentf-max-menu-items 1000)

3.21.3 savehist

And I want to be able to call my recent commands in the minibuffer. I normally don’t type the full command name anyway, but rather C-r followed by a small part of the command. Losing that on restart really hurts, so I want to avoid that loss.

; save minibuffer history
(require 'savehist)
(savehist-mode t)

3.21.4 desktop globals

This is the chainsaw of persistency. I commented it out, because it can be overkill and actually disturb more than it helps, when it recovers stuff I did not need.

;; save registers and open files over restarts,
;; thanks to http://www.xsteve.at/prg/emacs/power-user-tips.html
;; save a list of open files in ~/.emacs.desktop
;; save the desktop file automatically if it already exists
;(setq desktop-save 'if-exists)
;(desktop-save-mode 1)

;; ;; save a bunch of variables to the desktop file
;; ;; for lists specify the len of the maximal saved data also
;; (setq desktop-globals-to-save
;;       (append '((extended-command-history . 300)
;;                 (file-name-history        . 100)
;;                 (grep-history             . 30)
;;                 (compile-history          . 30)
;;                 (minibuffer-history       . 5000)
;;                 (query-replace-history    . 60)
;;                 (read-expression-history  . 60)
;;                 (regexp-history           . 60)
;;                 (regexp-search-ring       . 20)
;;                 (search-ring              . 2000)
;;                 (shell-command-history    . 50)
;;                 tags-file-name
;;                 register-alist)))

;; ;; restore only 5 buffers at once and the rest lazily
;; (setq desktop-restore-eager 5)

; maybe nicer: http://github.com/doomvox/desktop-recover

3.22 use the system clipboard

Finally one more minor adaption: Treat the clipboard gracefully. This is a tightrope stunt and getting it wrong can feel awkward.

This is the only setting for which I’m not sure that I got it right, but it’s what I use…

; Use the system clipboard
(setq x-select-enable-clipboard t)

3.23 Add license headers automatically

In case you mostly write free software, you might be as weary of hunting for the license header and copy pasting it into new files as I am. Free licenses, and especially copyleft licenses, are one of the core safeguards of free culture, because they give free software developers an edge over proprietarizing folks. But they are a pain to add to every file…

Well: No more. We now have legalese mode to take care of the inconvenient legal details for us, so we can focus on the code we write. Just call M-x legalese to add a GPL header, or C-u M-x legalese to choose another license.

(package-require 'legalese)

3.24 finish up

Make it possible to just (require 'babcore) and add the proper package footer.

(provide 'babcore)
;;; babcore.el ends here  

4 Summary

With the babcore you have a core setup which exposes some of the essential features of Emacs and adds basic integration with the system which is missing in pristine Emacs.

Now go and see the M-x package-list-packages to see where you can still go - or just use Emacs and add what you need along the way. The package list is your friend, as is Emacswiki.

Happy Hacking!

Date: 2013-04-03,
Author: Arne Babenhauserheide,
Org version 7.9.2 with Emacs version 24
Validate XHTML 1.0

Note: As almost everything on this page, this text and code is available under the GPLv3 or later.

Conveniently convert CamelCase to words_with_underscores using a small emacs hack

I currently cope with refactoring in an upstream project to which I maintain some changes which upstream does not merge. One nasty part is that the project converted from CamelCase for function names to words_with_underscores. And that created lots of merge errors.

Today I finally decided to speed up my work.

The first thing I needed was a function to convert a string in CamelCase to words_with_underscores. Since I’m lazy, I used google, and that turned up the CamelCase page of Emacswiki - and with it the following string functions:

(defun split-name (s)
  (split-string
   (let ((case-fold-search nil))
     (downcase
      (replace-regexp-in-string "\\([a-z]\\)\\([A-Z]\\)" "\\1 \\2" s)))
   "[^A-Za-z0-9]+"))
(defun underscore-string (s) (mapconcat 'downcase   (split-name s) "_"))

Quite handy - and elegantly executed. Now I just need to make this available for interactive use. For this, Emacs Lisp offers many useful ways to turn Editor information into program information, called interactive codes - in my case the region-code: "r". This gives the function the beginning and the end of the currently selected region as arguments.

With this, I created an interactive function which de-camelcases and underscores the selected region:

(defun underscore-region (begin end) (interactive "r")
  (let* ((word (buffer-substring begin end))
         (underscored (underscore-string word)))
    (save-excursion
      (widen) ; break out of the subregion so we can fix every usage of the function
      (replace-string word underscored nil (point-min) (point-max)))))

And now we’re almost there. Just create a macro which searches for a function, selects its name, de-camelcaeses and underscores it and then replaces every usage of the CamelCase name by the underscored name. This isn’t perfect refactoring (can lead to errors), but it’s fast and I see every change it does.

C-x C-(
C-s def 
M-x mark-word
M-x underscore-region
C-x C-)

That’s it, now just call the macro repeatedly.

C-x eeeeee…

Now check the diff to fix where this 13 lines hack got something wrong ( like changing __init__ into _init_ - I won’t debug this, you’ve been warned ☺).

Happy Hacking!

AnhangGröße
2015-01-14-Mi-camel-case-to-underscore.org2.39 KB

Custom link completion for org-mode in 25 lines (emacs)

Update (2013-01-23): The new org-mode removed (org-make-link), so I replaced it with (concat) and uploaded a new example-file: org-custom-link-completion.el.
Happy Hacking!

1 Intro

I recently set up custom completion for two of my custom link types in Emacs org-mode. When I wrote on identi.ca about that, Greg Tucker-Kellog said that he’d like to see that. So I decided, I’d publish my code.

The link types I regularly need are papers (PDFs of research papers I take notes about) and bib (the bibtex entries for the papers). The following are my custom link definitions :

(setq org-link-abbrev-alist
      '(("bib" . "~/Dokumente/Uni/Doktorarbeit-inverse-co2-ch4/aufschriebe/ref.bib::%s")
       ("notes" . "~/Dokumente/Uni/Doktorarbeit-inverse-co2-ch4/aufschriebe/papers.org::#%s")
       ("papers" . "~/Dokumente/Uni/Doktorarbeit-inverse-co2-ch4/aufschriebe/papers/%s.pdf")))

For some weeks I had copied the info into the links by hand. Thus an entry about a paper looks like the following.

* Title [[bib:identifier]] [[papers:name_without_suffix]]

This already suffices to be able to click the links for opening the PDF or showing the bibtex entry. Entering the links was quite inconvenient, though.

2 Implementation: papers

The trick to completion in org-mode is to create the function org-LINKTYPE-complete-link.

Let’s begin with the papers-links, because their completion is more basic than the completion of the bib-link.

First I created a helper function to replace all occurrences of a substring in a string1.

(defun string-replace (this withthat in)
  "replace THIS with WITHTHAT' in the string IN"
  (with-temp-buffer
    (insert in)
    (goto-char (point-min))
    (replace-string this withthat)
    (buffer-substring (point-min) (point-max))))

As you can see, it’s quite simple: Just create a temporary buffer and and use the default replace-string function I’m using daily while editing. Don’t assume I had figured out that elegant way myself. I just searched for it in the net and adapted the nicest code I found :)

Now we get to the real completion:

<<string-replace>>
(defun org-papers-complete-link (&optional arg)
  "Create a papers link using completion."
  (let (file link)
       (setq file (read-file-name "papers: " "papers/"))
       <<cleanup-link>>
    link))

The real magic is in read-file-name. That just uses the file-completion with a custom command prefix.

cleanup-link is only a small list of setq’s which removes parts of the filepath to make it compatible with the syntax for paper-links:

(let ((pwd (file-name-as-directory (expand-file-name ".")))
  (pwd1 (file-name-as-directory (abbreviate-file-name
                 (expand-file-name ".")))))
  (setq file (string-replace "papers/" "" file))
  (setq file (string-replace pwd "" (string-replace pwd1 "" file)))
  (setq file (string-replace ".pdf" "" file))
  (setq link (concat "papers:" file)))

And that’s it. A few lines of simple elisp and I have working completion for a custom link-type which points to research papers - and can easily be adapted when I change the location of the papers.

Now don’t think I would have come up with all that elegant code myself. My favorite language is Python and I don’t think that I should have to know emacs lisp as well as Python. So I copied and adapted most of it from existing functions in emacs. Just use C-h C-f <function-name> and then follow the link to the code :)

Remember: This is free software. Reuse and learning from existing code is not just allowed but encouraged.

3 Implementation: bib

For the bib-links, I chose an even easier way. I just reused reftex-do-citation from reftex-mode:

<<reftex-setup>>
(defun org-bib-complete-link (&optional arg)
  "Create a bibtex link using reftex autocompletion."
  (concat "bib:" (reftex-do-citation nil t nil)))

For reftex-do-citation to allow using the bib-style link, I needed some setup, but I already had that in place for explicit citation inserting (not generalized as link-type), so I don’t count following as part of the actual implementation. Also I likely copied most of it from emacs-wiki :)

(defun org-mode-reftex-setup ()
  (interactive)
  (and (buffer-file-name) (file-exists-p (buffer-file-name))
       (progn
        ; Reftex should use the org file as master file. See C-h v TeX-master for infos.
        (setq TeX-master t)
        (turn-on-reftex)
        ; don’t ask for the tex master on every start.
        (reftex-parse-all)
        ;add a custom reftex cite format to insert links
        (reftex-set-cite-format
         '((?b . "[[bib:%l][%l-bib]]")
           (?n . "[[notes:%l][%l-notes]]")
           (?p . "[[papers:%l][%l-paper]]")
           (?t . "%t")
           (?h . "** %t\n:PROPERTIES:\n:Custom_ID: %l\n:END:\n[[papers:%l][%l-paper]]")))))
  (define-key org-mode-map (kbd "C-c )") 'reftex-citation)
  (define-key org-mode-map (kbd "C-c (") 'org-mode-reftex-search))

(add-hook 'org-mode-hook 'org-mode-reftex-setup)

And that’s it. My custom link types now support useful completion.

4 Result

For papers, I get an interactive file-prompt to just select the file. It directly starts in the papers folder, so I can simply enter a few letters which appear in the paper filename and hit enter (thanks to ido-mode).

For bibtex entries, a reftex-window opens in a lower split-screen and asks me for some letters which appear somewhere in the bibtex entry. It then shows all fitting entries in brief but nice format and lets me select the entry to enter. I simply move with the arrow-keys, C-n/C-p, n/p or even C-s/C-r for searching, till the correct entry is highlighted. Then I hit enter to insert it.

./2012-06-15-emacs-link-completion-bib.png

And that’s it. I hope you liked my short excursion into the world of extending emacs to stay focussed while connecting seperate data sets.

I never saw a level of (possible) integration and consistency anywhere else which even came close to the possibilities of emacs.

And by the way: This article was also written in org-mode, using its literate programming features for code-samples which can actually be executed and extracted at will.

To put it all together I just need the following:

<<org-papers-complete-link>>
<<org-bib-complete-link>>

Now I use M-x org-babel-tangle to write the code to the file org-custom-link-completion.el. I attached that file for easier reference: org-custom-link-completion.el :)

Have fun with Emacs!

PS: Should something be missing here, feel free to get it from my public .emacs.d. I only extracted what seemed important, but I did not check if it runs in a pristine Emacs. My at-home branch is “fluss”.

Footnotes:

1 : Creating a custom function for string replace might not have been necessary, because some function might already exist for that. But writing it myself was faster than searching for it.

AnhangGröße
2012-06-15-emacs-link-completion-bib.png77.24 KB
2012-06-15-Fr-org-link-completion.org7.29 KB
org-custom-link-completion.el2.13 KB

Easily converting ris-citations to bibtex with emacs and bibutils

The problem

Nature only gives me ris-formatted citations, but I use bibtex.

Also ris is far from human readable.

The background

ris can be reformatted to bibtext, but doing that manually disturbs my workflow when getting references while taking note about a paper in emacs.

I tend to search online for references, often just using google scholar, so when I find a ris reference, the first data I get for the ris-citation is a link.

The solution

Making it possible

bibutils1 can convert ris to an intermediate xml format and then convert that to bibtex.

wget -O reference.ris RIS_URL
cat reference.ris | ris2xml | xml2bib >> ref.bib

This solves the problem, but it is not convenient, because I have to switch to the terminal, download the file, convert it and append the result to my bibtex file.

Making it convenient

With the first step, getting the ris-citation is quite inconvenient. I need 3 steps just for getting a citation.

But those steps are always the same, and since I use Emacs, I can automate and integrate them very easily. So I created a simple function in emacs, which takes the url of a ris citation, converts it to bibtex and appends the result to my local bibtex file. Now I get a ris citation with a simple call to

M-x ris-citation-to-bib

Then I enter the url and the function appends the citation to my bibtex file.2

Feel free to integrate it into your own emacs setup (additionally to the GPLv3 you can use any license used by emacswiki or worg).

(defun ris-citation-to-bib (&optional ris-url) 
  "get a ris citation as bibtex in one step. Just call M-x
ris-citation-to-bib and enter the ris url. 
Requires bibutils: http://sourceforge.net/p/bibutils/home/Bibutils/ 
"
  (interactive "Mris-url: ")
  (save-excursion
    (let ((bib-file "/home/arne/aufschriebe/ref.bib")
          (bib-buffer (get-buffer "ref.bib"))
          (ris-buffer (url-retrieve-synchronously ris-url)))
      ; firstoff check if we have the bib buffer. If yes, move point to the last line.
      (if (not (member bib-buffer (buffer-list)))
          (setq bib-buffer (find-file-noselect bib-file)))
      (progn 
        (set-buffer bib-buffer)
        (goto-char (point-max)))
      (if ris-buffer
          (set-buffer ris-buffer))
      (shell-command-on-region (point-min) (point-max) "ris2xml | xml2bib" ris-buffer)
      (let ((pmin (- (search-forward "@") 1))
            (pmax (search-forward "}

"
))) (if (member bib-buffer (buffer-list)) (progn (append-to-buffer bib-buffer pmin pmax) (kill-buffer ris-buffer) (set-buffer bib-buffer) (save-buffer) ))))))

Happy Hacking!


  1. To get bibutils in Gentoo, just call emerge app-text/bibutils

  2. Well, actually I only use M-x ris- TAB, but that’s a detail (though I would not want to work without it :) ) 

El Kanban Org: parse org-mode todo-states to use org-tables as Kanban tables

Kanban for emacs org-mode.

Update (2013-04-13): Kanban.el now lives in its own repository: on bitbucket and on a statically served http-repo (to be independent from unfree software).

Update (2013-04-10): Thanks to Han Duply, kanban links now work for entries from other files. And I uploaded kanban.el on marmalade.

Some time ago I learned about kanban, and the obvious next step was: “I want to have a kanban board from org-mode”. I searched for it, but did not find any. Not wanting to give up on the idea, I implemented my own :)

The result are two functions: kanban-todo and kanban-zero.

“Screenshot” :)

TODODOINGDONE
Refactor in such a way that the
let Presentation manage dumb sprites
return all actions on every command:
Make the UiState adhere the list of
Turn the model into a pure state

kanban-todo

kanban-todo provides your TODO items as kanban-fields. You can move them in the table without having duplicates, so all the state maintenance is done in the kanban table. Once you are finished, you mark them as done and delete them from the table.

To set it up, put kanban.el somewhere in your load path and (require 'kanban) (more recent but potentially unstable version). Then just add a table like the following:

|   |   |   |
|---+---+---|
|   |   |   |
|   |   |   |
|   |   |   |
|   |   |   |
#+TBLFM: $1='(kanban-todo @# @2$2..@>$>)::@1='(kanban-headers $#)

Click C-c C-c with the point on the TBLFMT line to update the table.

The important line is the #+TBLFM. That says “use my TODO items in the TODO column, except if they are in another column” and “add kanban headers for my TODO states”

The kanban-todo function takes an optional parameter match, which you can use to restrict the kanban table to given tags. The syntax is the same as for org-mode matchers. The third argument allows you to provide a scope, for example a list of files.

To only set the scope, use nil for the matcher.

See C-h f org-map-entries and C-h v org-agenda-files for details.

kanban-zero

kanban-zero is a zero-state Kanban: All state is managed in org-mode and the table only displays the kanban items.

To set it up, put kanban.el somwhere in your load path and (require 'kanban). Then just add a table like the following:

|   |   |   |
|---+---+---|
|   |   |   |
|   |   |   |
|   |   |   |
|   |   |   |
#+TBLFM: @2$1..@>$>='(kanban-zero @# $#)::@1='(kanban-headers $#)

The important line is the #+TBLFM. That says “show my org items in the appropriate column” and “add kanban headers for my TODO states”.

Click C-c C-c with the point on the TBLFMT line to update the table.

The kanban-zero function takes an optional parameter match, which you can use to restrict the kanban table to given tags. The syntax is the same as for org-mode matchers. The third argument allows you to provide a scope, for example a list of files.

To only set the scope, use nil for the matcher.

An example for matcher and scope would be:

#+TBLFM: @2$1..@>$>='(kanban-zero @# $# "1w6" '("/home/arne/.emacs.d/private/org/emacs-plan.org"))::@1='(kanban-headers $#)

See C-h f org-map-entries and C-h v org-agenda-files for details.

Contribute

To contribute to kanban.el, just change the file and write a comment about your changes. Maybe I’ll setup a repo on Bitbucket at some point…

Example

In the Hexbattle game-draft, I use kanban to track my progress:

Table of Contents

1 Kanban

STARTED
Refactor in such a way that the
let Presentation manage dumb sprites
return all actions on every command:
Make the UiState adhere the list of
Turn the model into a pure state

2 refactor Hexbattle    1w6

… and so on …

Advanced usage

“Graphical” TODO states

To make the todo states easier to grok directly you can use unicode symbols for them. Example:

#+SEQ_TODO: ❢ ☯ ⧖ | ☺ ✔ DEFERRED ✘
| ❢ | ☯ | ⧖ | ☺ | |---+---+---+---| | | | | | #+TBLFM: @1='(kanban-headers $#)::@2$1..@>$>='(kanban-zero @# $#)

In my setup they are ❢ (todo) ☯ (doing) ⧖ (waiting) and ☺ (to report). Not shown in the kanban Table are ✔ (finished), ✘ (dropped) and deferred (later), because they don’t require any action from me, so I don’t need to see them all the time.

Collecting kanban entries via SSH

If you want to create a shared kanban table, you can use the excellent transparent network access options from Emacs tramp to collect kanban entries directly via SSH.

To use that, simply pass an explicit list of files to kanban-zero as 4th argument (if you don’t use tag matching just use nil as 3rd argument). "/ssh:host:path/to/file.org" retrieves the file ~/path/to/file.org from the host.

| ❢ | ☯ |
|---+---|
|   |   |
#+TBLFM: @1='(kanban-headers $#)::@2$1..@>$>='(kanban-zero @# $# nil (list (buffer-file-name) "/ssh:localhost:plan.org"))

Caveeat: all included kanban files have to use at least some of the same todo states: kanban.el only retrieves TODO states which are used in the current buffer.

AnhangGröße
kanban.el5.86 KB

How to show the abstract before the table of contents in org-mode

I use Emacs Org-Mode for writing all kinds of articles. The standard format for org-mode is to show the table of contents before all other content, but that requires people to scroll down to see whether the article is interesting for them. Therefore I want the abstract to be shown before the table of contents.

1 Intro

There is an old guide for showing the abstract before the TOC in org-mode<8, but since I use org-mode 8, that wasn’t applicable to me.

With a short C-h v org-toc TAB TAB (means: search all variables which start with org- and containt -toc) I found the following even simpler way. After I got that solution working, I found that this was still much too complex and that org-mode actually provides an even easier and very convenient way to add the TOC at any place.

2 Solution

(from the manual)

At the beginning of your file (after the title) add

#+OPTIONS: toc:nil

Then after the abstract add a TOC:

#+BEGIN_ABSTRACT
Abstract
#+END_ABSTRACT
#+TOC: headlines 2

Done. Have fun with org-mode!

3 Appendix: Complex way

This is the complicated way I tried first. It only works with LaTeX, but there it works. Better use the simple way.

Set org-export-with-toc to nil as file-local variable. This means you just append the following to the file:

# Local Variables:
# org-export-with-toc: nil
# End:

(another nice local variable is org-confirm-babel-evaluate: nil, but don’t set that globally, otherwise you could run untrusted code when you export org-mode files from others. When this is set file-local, emacs will ask you for each file you open whether you want to accept the variable setting)

Then write the abstract before the first heading and add tableofcontents after it. Example:

#+BEGIN_ABSTRACT
Abstract
#+END_ABSTRACT
#+LATEX: \tableofcontents
AnhangGröße
2013-11-21-Do-emacs-orgmode-abstract-before-toc.pdf143.29 KB
2013-11-21-Do-emacs-orgmode-abstract-before-toc.org2.23 KB

Insert a scaled screenshot in emacs org-mode

@marjoleink asked on identi.ca1, if it is possible to use emacs org-mode for showing scaled screenshots inline while writing. Since I thought I’d enjoy some hacking, I decided to take the challenge.

It does not do auto-scaling of embedded images, as far as I know, but the use case of screenshots can be done with a simple function (add this to your ~/.emacs or ~/.emacs.d/init.el):

(defun org-insert-scaled-screenshot ()
  "Insert a scaled screenshot 
for inline display 
into your org-mode buffer."
  (interactive)
  (let ((filename 
         (concat "screenshot-" 
                 (substring 
                  (shell-command-to-string 
                   "date +%Y%m%d%H%M%S")
                  0 -1 )
                 ".png")))
    (let ((scaledname 
           (concat filename "-width300.png")))
(shell-command (concat "import -window root " filename)) (shell-command (concat "convert -adaptive-resize 300 " filename " " scaledname)) (insert (concat "[[./" scaledname "]]")))))

Now just call M-x org-redisplay-inline-images to see the screenshot (or add it to the function).

In action:

scaled screenshot

Have fun with Emacs - and happy hacking!

PS: In case it’s not obvious: The screenshot shows emacs just as the screenshot is being shot - with the method shown here ☺)


  1. Matthew Gregg: @marjoleink "way of life" thing again, but if you can invest some time, org-mode is a really powerful note keeping environment. → Marjolein Katsma: @mcg I'm sure it is - but seriously: can you embed a diagram2 or screenshot, scale it, and link it to itself? 

  2. For diagrams, you can just insert a link to the image file without description, then org-mode can show it inline. To get an even nicer user-experience (plain text diagrams or ascii-art), you can use inline code via org-babel using graphviz (dot) or ditaa - the latter is used for the diagrams in my complete Mercurial branching strategy

AnhangGröße
screenshot-20121122101933-width300.png108.08 KB
screenshot-20121122101933-width600.png272.2 KB

Minimal example for literate programming with noweb in emacs org-mode

If you want to use the literate programming features in emacs org-mode, you can try this minimal example to get started: Activate org-babel-tangle, then put this into the file noweb-test.org:

Minimal example for noweb in org-mode

* Assign 

First we assign abc:

#+begin_src python :noweb-ref assign_abc
abc = "abc"
#+end_src

* Use

Then we use it in a function:

#+begin_src python :noweb tangle :tangle noweb-test.py
def x():
  <<assign_abc>>
  return abc

print(x())
#+end_src

noweb-test.org

Hit C-c C-c to evaluate the source block. Hit C-c C-v C-t to put the expanded code into the file noweb-test.py.

The exported code looks like this:

def x():
  abc = "abc"
  return abc
print(x())

noweb-test.py

(html generated with org-export-as-html-to-buffer and slightly reniced to escape the additional parsing I have on my site)

And with org-export-as-pdf we get this:

org-mode-noweb-example

noweb-test.pdf

Add :results output to the #+begin_src line of the second block to see the print results under that block when you hit C-c C-c in the block.

You can also use properties of headlines for giving the noweb-ref. Org-mode can then even concatenate several source blocks into one noweb reference. Just hit C-c C-x p to set a property (or use M-x org-set-property), then set noweb-ref to the name you want to use to embed all blocks under this heading together.

Note: org-babel prefixes each line of an included code-block with the prefix used for the reference (here <<assign_abc>>). This way you can easily include blocks inside python functions.

Note: To keep noweb-references literally in the output or similar, have a look at the different options to :noweb.

Note: To do this with shell-code, it’s useful to change the noweb markers to {{{ and }}}, because << and >> are valid shell-syntax, so they disturb the highlighting in sh-mode. Also confirming the evaluation every time makes plain exporting problematic. To fix this, just add the following somewhere in the file (to keep this simple, just add it to the end):

# Local Variables:
# org-babel-noweb-wrap-start: "{{{"
# org-babel-noweb-wrap-end: "}}}"
# org-confirm-babel-evaluate: nil
# org-export-allow-bind-keywords: t
# End:

Have fun with Emacs and org-mode!

AnhangGröße
noweb-test.pdf81.69 KB
noweb-test.org290 Bytes
noweb-test.py.txt49 Bytes
noweb-test-pdf.png6.05 KB

Org-mode with Parallel Babel

Babel in Org

Emacs Org-mode provides the wonderful babel-capability: Including code-blocks in any language directly in org-mode documents in plain text.

In default usage, running such code freezes my emacs until the code is finished, though.

Up to a few weeks ago, I solved this with a custom function, which spawns a new emacs as script runner for the specific code:

; Execute babel source blocks asynchronously by just opening a new emacs.
(defun bab/org-babel-execute-src-block-new-emacs ()
  "Execute the current source block in a separate emacs,
so we do not block the current emacs."
  (interactive)
  (let ((line (line-number-at-pos))
        (file (buffer-file-name)))
    (async-shell-command (concat 
                          "TERM=vt200 emacs -nw --find-file " 
                          file 
                          " --eval '(goto-line "
                          (number-to-string line) 
                          ")' --eval "
     "'(let ((org-confirm-babel-evaluate nil))(org-babel-execute-src-block t))' "
                          "--eval '(kill-emacs 0)'"))))

and its companion for exporting to beamer-latex presentation pdf:

; Export as pdf asynchronously by just opening a new emacs.
(defun bab/org-beamer-export-new-emacs ()
  "Export the current file in a separate emacs,
so we do not block the current emacs."
  (interactive)
  (let ((line (line-number-at-pos))
        (file (buffer-file-name)))
    (async-shell-command (concat 
                          "TERM=vt200 emacs -nw --find-file " 
                          file 
                          " --eval '(goto-line " 
                          (number-to-string line) 
                          ")' --eval "
     "'(let ((org-confirm-babel-evaluate nil))(org-beamer-export-to-pdf))' "
                          "--eval '(kill-emacs 0)'"))))

But for shell-scripts there’s a much simpler alternative:

GNU Parallel to the rescue! Process-pool made easy.

Instead of spawning an external process, I can just use GNU Parallel for the long-running program-calls in the shell-code. For example like this (real code-block):

#+BEGIN_SRC sh :exports none
  oldPWD=$(pwd)
  cd ~/tm5tools/plotting
  filename="./obsheat-increasing.png" >/dev/null 2>/dev/null
  sem -j -1 ./plotstation.py -c ~/sun-work/ct-production-out-5x7e300m1.0 -C "aircraft" -c ~/sun-work/ct-production-out-5x7e300m1.0no-aircraft -C "continuous"  --obsheat --station allnoaa --title "\"Reducing observation coverage\"" -o ${oldPWD}/${filename}
  cd -
#+END_SRC

Let me explain this.

sem is a part of GNU parallel which makes parallel execution easy. Essentially it gives us a simple version of the convenience we know from make.

for i in {1..100}; do 
    sem -j -1 [code] # run N-1 processes with N as the number of
                     # pocessors in my computer
done

This means that the above org-mode block will finish instantly, but there will be a second process managed by GNU parallel which executes the plotting script.

The big advantage here is that I can also set this to execute on exporting a document which might run hundreds of code-blocks. If I did this with naive multiprocessing, that would spawn 100 processes which overwhelm the memory of my system (yes, I did that…).

sem -j -1 ensures, that this does not happen. Essentially it provides a process-pool with which it executes the code.

If you use this on export, take care to add a final code-block which waits until all other blocks finished:

sem --wait

A word of caution: Shell escapes

If you use GNU parallel to run programs, the arguments are interpreted two times: once when you pass them to sem and a second time when sem passes them on. Due to this, you have to add escaped quote-marks for every string which contains whitespace. This can look like the following code (the example above reduced to its essential parts):

sem -j -1 ./plotstation.py --title "\"Reducing observation coverage\""

I stumbled over this a few times, but the convenience of GNU parallel is worth the small extra-caution.

Besides: For easier editing of inline-source-code, set org-src-fontify-natively to true (t), either via M-x customize-variable or by adding the following to your .emacs:

(setq org-src-fontify-natively t)

Summary

With the tool sem from GNU parallel you get parallel execution of shell code-blocks in emacs org-mode using the familiar syntax from make:

sem -j -1 [escaped code]

Publish a single file with emacs org-mode

I often write small articles on some experience I make, and since I want to move towards using static pages more often, I tried using emacs org-mode publishing for that. Strangely the simple usecase of publishing a single file seems quite a bit more complex than needed, so I document the steps here.

This is my first use of org-publish, so I likely do not use it perfectly. But as it stands, it works. You can find the org-publish version of this article at draketo.de/proj/orgmode-single-file.

1 Why static pages?

I recently lost a dynamic page to hackers. I could not recover the content from all the spam which flooded it. It was called good news and I had wanted to gather positive news which encourage getting active - but I never really found the time to get it running. See what is left of it: http://gute-neuigkeiten.de

Any dynamic page carries a big maintenance cost, because I have to update all the time to keep it safe from spammers who want to abuse it for commercial spam - in the least horrible case. I can choose a managed solution, but that makes me dependant on the hoster providing what I need. Or I can take the sledgehammer and just use a static site: It never does any writes to the webserver, so there is nothing to hack.

As you can see, that’s what I’m doing nowadays.

2 Why Emacs Org-Mode?

Because after having used MacOS for almost a decade and then various visual-oriented programs for another five years, Emacs is nowadays the program which is most convenient to me. It achieves a level of integration and usability which is still science-fiction in other systems - at least when you’re mostly working with text.

And Org-mode is to Emacs as Emacs is to the Operating System: It begins as a simple todo-list and accompanies you all the way towards programming, reproducible research - and publishing websites.

3 Current Solution

Currently I first publish the single file to FTP and then rename it to index.html. This translates to the following publish settings:

(setq private-publish-ftp-proj (concat "/ftp:" USER "@" HOST ":arnebab/proj/"))

(setq org-publish-project-alist
      '(("orgmode-single-file"
         :base-directory "~/.emacs.d/private/journal"
         :publishing-directory (concat private-publish-ftp-proj "orgmode-single-file/")
         :base-extension "org"
         :publishing-function org-html-publish-to-html
         :completion-function (lambda () (rename-file 
                                          (concat private-publish-ftp-proj 
                                                  "orgmode-single-file/2013-11-25-Mo-publish-single-file-org-mode.html") 
                                          (concat private-publish-ftp-proj 
                                                  "orgmode-single-file/index.html") t))
         :section-numbers nil
         :with-toc t
         :html-preamble t
         :exclude ".*"
         :include ["2013-11-25-Mo-publish-single-file-org-mode.org"])))

Now I can use C-c C-e P x orgmode-single-file to publish this file to the webserver whenever I change it.

Note the lambda: I just copy the published to index.html, because I did not find out, how to rename the file by just setting an option. :index-filename did not work. But likely I missed something which would make this much nicer.

Note that if I had wanted to publish a folder full of files, this would have been much easier: There actually is an option to create an automatic index-file and sitemap.

For more details, read the org-mode publishing guide.

4 Conclusion

This is not as simple as I would like it to be. Maybe (or rather: likely) there is a simpler way. But I can now publish arbitrary org-mode files to my webserver without much effort (and without having to switch context so some other program). And that’s something I’ve been missing for a long time, so I’m very happy to finally have it.

And it was less pain that I feared, though publishing this via my drupal-site, too, obviously shows that I’m still far from moving to static pages for everything. For work-in-progress, this is great, though - for example for my Basics for Guile Scheme.

Read your python module documentation from emacs

I just found the excellent pydoc-info mode for emacs from Jon Waltman. It allows me to hit C-h S in a python file and enter a module name to see the documentation right away. If the point is on a symbol (=module or class or function), I can just hit enter to see its docs.

pydoc in action

In its default configuration (see the Readme) it “only” reads the python documentation. This alone is really cool when writing new python code, but it s not enough, since I often use third party modules.

And now comes the treat: If those modules use sphinx for documentation (≥1.1), I can integrate them just like the standard python documentation!

It took me some time to get it right, but now I have all the documentation for the inverse modelling framework I contribute to directly at my fingertips: Just hit C-h S ENTER when I’m on some symbol and a window shows me the docs:

custom pydoc in action
The text in this image is from Wouter Peters. Used here as short citation which should be legal almost everywhere under citation rules.

I want to save you the work of figuring out how to do that yourself, so here’s a short guide for integrating the documentation for your python program into emacs.

Integrating your own documentation into emacs

The prerequisite for integrating your own documentation is to use sphinx for documenting your code. See their tutorial for info how to set it up. As soon as sphinx works for you, follow this guide to integrate your docs in your emacs.

Install pydoc-info

First get pydoc-info and the python infofile (adapt this to your local setup):

# get the mode
cd ~/.emacs.d/libs
hg clone https://bitbucket.org/jonwaltman/pydoc-info
# and the pregenerated info-file for python
wget https://bitbucket.org/jonwaltman/pydoc-info/downloads/python.info.gz
gunzip python.info
sudo cp python.info /usr/share/info
sudo install-info --info-dir=/usr/share/info python.info

(I also added pydoc-info as subrepo to my .emacs.d repo to make it easy to transfer my adaption between my different computers)

To build the info file for python yourself, have a look at the Readme.

Turn your documentation into info

Now turn your own documentation into an info document and install it.

Sphinx uses a core configuration file named conf.py. Add the following to that file, replacing all values but index and False by the appropriate names for you project:

# One entry per manual page. 
# list of tuples (startdocname, 
# targetname, title, author, dir_entry, 
# description, category, toctree_only).
texinfo_documents = [
  ('index', # startdocname, keep this!
   'TARGETNAME', # targetname
   u'Long Title', # title
   u'Author Name', # author
   'Name in the Directory Index of Info', # dir_entry
   u'Long Description', # description
   'Software Development', # cathegory
   False), # better keep this, too, i think.
]

Then call sphinx and install the info files like this (maybe adapted to your local setup):

sphinx-build -b texinfo source/ texinfo/ 
cd texinfo
sudo install-info --info-dir=/usr/share/info TARGETNAME.info
sudo cp TARGETNAME.info /usr/share/info/

Activate pydoc-info, including your documentation

Finally add the following to your .emacs (or wherever you store your personal adaptions):

; Show python-documentation as info-pages via C-h S
(setq load-path (cons "~/.emacs.d/libs/pydoc-info" load-path))
(require 'pydoc-info)
(info-lookup-add-help
   :mode 'python-mode
   :parse-rule 'pydoc-info-python-symbol-at-point
   :doc-spec
   '(("(python)Index" pydoc-info-lookup-transform-entry)
     ("(TARGETNAME)Index" pydoc-info-lookup-transform-entry)))
AnhangGröße
emacs-pydoc.png52 KB
emacs-pydoc-standardlibrary.png34.22 KB

Recipes for presentations with beamer latex using emacs org-mode

I wrote some recipes for creating the kinds of slides I need with emacs org-mode export to beamer latex.

Update: Read ox-beamer to see how to adapt this to work with the new export engine in org-mode 0.8.

PDF recipes The recipes as PDF (21 slides, 247 KiB)

org-mode file The org-mode sources (12.2 KiB)

Below is an html export of the org-mode file. Naturally it does not look as impressive as the real slides, but it captures all the sources, so I think it has some value.

Note: To be able to use the simple block-creation commands, you need to add #+startup: beamer to the header of your file or explicitely activate org-beamer with M-x org-beamer-mode.

«I love your presentation»:

PS: I hereby allow use of these slides under any of the licenses used by worg and/or the emacs wiki.

1 Introduction

1.1 Usage

1.1.1 (configure your emacs, see Basic Configuration at the end)

1.1.2 C-f <file which ends in .org>

1.1.3 Insert heading:

Hello World

#+LaTeX_CLASS: beamer
#+BEAMER_FRAME_LEVEL: 2

* Hello
** Hello GNU
Nice to see you!

1.1.4 M-x org-export-as-pdf

done: Your first org-beamer presentation.

1.2 org-mode + beamer = love

1.2.1 Code    BMCOL

Recipes
#+LaTeX_CLASS: beamer
#+BEAMER_FRAME_LEVEL: 2
* Introduction
** org-mode + beamer =  love
*** Code :BMCOL:
    :PROPERTIES:
    :BEAMER_col: 0.7
    :END:
<example block>
*** Simple block  :BMCOL:B_block:
    :PROPERTIES:
    :BEAMER_col: 0.3
    :BEAMER_env: block
    :END:
it's that easy!

1.2.2 Simple block    BMCOL B_block

it's that easy!

1.3 Two columns - in commands

1.3.1 Commands    BMCOL B_block

** Two columns - in commands
*** Commands
C-c C-b | 0.7
C-c C-b b
C-n
<eTAB (write example) C-n C-n
*** Result
C-c C-b | 0.3
C-c C-b b
even easier - and faster!

1.3.2 Result    BMCOL B_block

even easier - and faster!

2 Recipes

2.1 Four blocks - code

*** Column 1 :B_ignoreheading:BMCOL:
    :PROPERTIES:
    :BEAMER_env: ignoreheading
    :BEAMER_col: 0.5
    :END:

*** One
*** Three                                                           

*** Column 2 :BMCOL:B_ignoreheading:
    :PROPERTIES:
    :BEAMER_col: 0.5
    :BEAMER_env: ignoreheading
    :END:

*** Two
*** Four

2.2 Four blocks - result

2.2.1 Column 1    B_ignoreheading BMCOL

2.2.2 One

2.2.3 Three

2.2.4 Column 2    BMCOL B_ignoreheading

2.2.5 Two

2.2.6 Four

2.3 Four nice blocks - commands

*** 
C-c C-b | 0.5 # column
C-c C-b i # ignore heading
*** One 
C-c C-b b # block
*** Three 
C-c C-b b
*** 
C-c C-b | 0.5
C-c C-b i
*** Two 
C-c C-b b
*** Four 
C-c C-b b

2.4 Four nice blocks - result

2.4.1    BMCOL B_ignoreheading

2.4.2 One    B_block

2.4.3 Three    B_block

2.4.4    BMCOL B_ignoreheading

2.4.5 Two    B_block

2.4.6 Four    B_block

2.5 Top-aligned blocks

2.5.1 Code    B_block BMCOL

*** Code                                                      :B_block:BMCOL:
    :PROPERTIES:
    :BEAMER_env: block
    :BEAMER_col: 0.5
    :BEAMER_envargs: C[t]
    :END:

*** Result                                                    :B_block:BMCOL:
    :PROPERTIES:
    :BEAMER_env: block
    :BEAMER_col: 0.5
    :END:
pretty nice!

2.5.2 Result    B_block BMCOL

pretty nice!

2.6 Two columns with text underneath - code

2.6.1    B_columns

  • Code    BMCOL

    \tiny

    ***  :B_columns:
        :PROPERTIES:
        :BEAMER_env: columns
        :END:
    
    **** Code :BMCOL:
        :PROPERTIES:
        :BEAMER_col: 0.6
        :END:
    
    **** Result :BMCOL:
        :PROPERTIES:
        :BEAMER_col: 0.4
        :END:
    
    *** Underneath :B_ignoreheading:
        :PROPERTIES:
        :BEAMER_env: ignoreheading
        :END:
    Much text underneath! Very Much.
    Maybe too much. The whole width!
    

    \normalsize


  • Result    BMCOL

2.6.2 Underneath    B_ignoreheading

Much text underneath! Very Much. Maybe too much. The whole width!

2.7 Nice quotes

2.7.1 Code    B_block BMCOL

#+begin_quote
Emacs org-mode is a 
great presentation tool - 
Fast to beautiful slides.
- Arne Babenhauserheide
#+end_quote

2.7.2 Result    B_block BMCOL

Emacs org-mode is a great presentation tool - Fast to beautiful slides.

  • Arne Babenhauserheide

2.8 Math snippet

2.8.1 Code    BMCOL B_block

2.8.2 Inline    B_block

\( 1 + 2 = 3 \) is clear

2.8.3 As equation    B_block

\[ 1 + 2 \cdot 3 = 7 \]

2.8.4 Result    BMCOL B_block

2.8.5 Inline    B_block

\( 1 + 2 = 3 \) is clear

2.8.6 As equation    B_block

\[ 1 + 2 \cdot 3 = 7 \]

2.9 \( \LaTeX \)

2.9.1 Code    BMCOL B_block

\( \LaTeX \) gives a space 
after math mode.

\LaTeX{} does it, too.

\LaTeX does not.

At the end of a sentence 
both work.
Try \LaTeX. Or try \LaTeX{}.

Only \( \LaTeX \) and \( \LaTeX{} \) 
also work with HTML export.

2.9.2 Result    BMCOL B_block

\( \LaTeX \) gives a space after math mode.

\LaTeX{} does it, too.

\LaTeX does not.

At the end of a sentence both work. Try \LaTeX. Or try \LaTeX{}.

Only \( \LaTeX \) and \( \LaTeX{} \) also work with HTML export.

2.10 Images with caption and label

2.10.1    B_columns

  • Code    B_block BMCOL
    #+caption: GNU Emacs icon
    #+label: fig:emacs-icon
    [[/usr/share/icons/hicolor/128x128/apps/emacs.png]]
    
    This is image (\ref{fig:emacs-icon})
    

  • Result    B_block BMCOL

    file:///usr/share/icons/hicolor/128x128/apps/emacs.png

    GNU Emacs icon

    This is image (emacs-icon)


2.10.2    B_ignoreheading

Autoscaled to the block width!

2.11 Examples

2.11.1 Code    BMCOL B_block

: #+bla: foo
: * Example Header

Gives an example, which does not interfere with regular org-mode parsing.

#+begin_example
content
#+end_example

Gives a simpler multiline example which can interfere.

2.11.2 Result    BMCOL B_block

#+bla: foo
* Example Header

Gives an example, which does not interfere with regular org-mode parsing.

content

Gives a simpler multiline example which can interfere.

3 Basic Configuration

3.1 Header

<Title>

#+startup: beamer
#+LaTeX_CLASS: beamer
#+LaTeX_CLASS_OPTIONS: [bigger]
#+AUTHOR: <empty for none, if missing: inferred>
#+DATE: <empty for none, if missing: today>
#+BEAMER_FRAME_LEVEL: 2
#+TITLE: <causes <Title> to be regular content!>

3.2 .emacs config

Put these lines into your .emacs or in a file your .emacs pulls in - i.e. via (require 'mysettings) if the other file is named mysettings.el and ends in (provide 'mysettings).

(org-babel-do-load-languages ; babel, for executing 
 'org-babel-load-languages   ; code in org-mode.
 '((sh . t)
   (emacs-lisp . t)))

(require 'org-latex) ; latex export 
(add-to-list         ; with highlighting
  'org-export-latex-packages-alist '("" "minted"))
(add-to-list 
  'org-export-latex-packages-alist '("" "color"))
(setq org-export-latex-listings 'minted)

3.3 .emacs variables

You can easily set these via M-x customize-variable.

(custom-set-variables ; in ~/.emacs, only one instance 
 '(org-export-latex-classes (quote ; in the init file!
    (("beamer" "\\documentclass{beamer}" 
        org-beamer-sectioning))))
 '(org-latex-to-pdf-process (quote 
    ((concat "pdflatex -interaction nonstopmode" 
             "-shell-escape -output-directory %o %f") 
     "bibtex $(basename %b)" 
     (concat "pdflatex -interaction nonstopmode" 
             "-shell-escape -output-directory %o %f")
     (concat "pdflatex -interaction nonstopmode" 
             "-shell-escape -output-directory %o %f")))))

(concat "…" "…") is used here to get nice, short lines. Use the concatenated string instead ("pdflatex…%f").

3.4 Required programs

3.4.1 Emacs - (gnu.org/software/emacs)

To get org-mode and edit .org files effortlessly.

emerge emacs

3.4.2 Beamer \( \LaTeX \) - (bitbucket.org/rivanvx/beamer)

To create the presentation.

emerge dev-tex/latex-beamer app-text/texlive

3.4.3 Pygments - (pygments.org)

To color the source code (with minted).

emerge dev-python/pygments

4 Thanks and license

4.1 Thanks

Thanks go to the writers of emacs and org-mode, and for this guide in particular to the authors of the org-beamer tutorial on worg.

Thank you for your great work!

This presentation is licensed under the GPL (v3 or later) with the additional permission to distribute it without the sources and the copy of the GPL if you give a link to those.1

Footnotes:

1 : \tiny As additional permission under GNU GPL version 3 section 7, you may distribute these works without the copy of the GNU GPL normally required by section 4, provided you include a license notice and a URL through which recipients can access the Corresponding Source and the copy of the GNU GPL.\normalsize

AnhangGröße
emacs-org-beamer-recipes-thumnail.png8.92 KB
emacs-org-beamer-recipes-thumnail-org.png20.61 KB
2012-08-08-Mi-recipes-for-beamer-latex-presentation-using-emacs-org-mode.pdf247.11 KB
2012-08-08-Mi-recipes-for-beamer-latex-presentation-using-emacs-org-mode.org12.18 KB

Sending email to many people with Emacs Wanderlust

I recently needed to send an email to many people1.

Putting all of them into the BCC field did not work (mail rejected by provider) and when I split it into 2 emails, many did not see my mail because it was flagged as potential spam (they were not in the To-Field)2.

I did not want to put them all into the To-Field, because that would have spread their email-addresses around, which many would not want3.

So I needed a different solution. Which I found in the extensibility of emacs and wanderlust4. It now carries the name wl-draft-send-to-multiple-receivers-from-buffer.

You simply write the email as usual via wl-draft, then put all email addresses you want write to into a buffer and call M-x wl-draft-send-to-multiple-receivers-from-buffer. It asks you about the buffer with email addresses, then shows you all addresses and asks for confirmation.

Then it sends one email after the other, with a randomized wait of 0-10 seconds between messages to avoid flagging as spam.

If you want to use it, just add the following to your .emacs:

(defun wl-draft-clean-mail-address (address)
  (replace-regexp-in-string "," "" address))
(defun wl-draft-send-to-multiple-receivers (addresses) (loop for address in addresses do (progn (wl-user-agent-insert-header "To" (wl-draft-clean-mail-address address)) (let ((wl-interactive-send nil)) (wl-draft-send)) (sleep-for (random 10)))))
(defun wl-draft-send-to-multiple-receivers-from-buffer (&optional addresses-buffer-name) "Send a mail to multiple recipients - one recipient at a time" (interactive "BBuffer with one address per line") (let ((addresses nil)) (with-current-buffer addresses-buffer-name (setq addresses (split-string (buffer-string) "\n"))) (if (y-or-n-p (concat "Send this mail to " (mapconcat 'identity addresses ", "))) (wl-draft-send-to-multiple-receivers addresses))))

Happy Hacking!


  1. The email was about the birth of my second child, and I wanted to inform all people I care about (of whom I have the email address), which amounted to 220 recipients. 

  2. Naturally this technique could be used for real spamming, but to be frank: People who send spam won’t need it. They will already have much more sophisticated methods. This little trick just reduces the inconvenience brought upon us by the measures which are necessary due to spam. Otherwise I could just send a mail with 1000 receivers in the BCC field - which is how it should be. 

  3. It only needs one careless friend, and your connections to others get tracked in facebook and the likes. For more information on Facebook, see Stallman about Facebook

  4. Sure, there are also template mails and all such, but learning to use these would consume just as much time as extending emacs - and would be much less flexible: Should I need other ways to transform my mails, I’ll be able to just reuse my code. 

Simple Emacs DarkRoom

I just realized that I let myself be distracted by all kinds of not-so-useful stuff instead of finally getting to type the text I already wanted to transcribe from stenografic at the beginning of … last week.

Screenshot!

Let’s take a break for a screenshot of the final version, because that’s what we want from any program :)

Emacs darkroom, screenshot

As you can see, the distractions are removed — the screenshot is completely full screen and only the text is left. If you switch to the minibuffer (i.e. via M-x), the status bar (modeline) is shown.

Background

To remove the distractions I looked again at WriteRoom and DarkRoom and similar which show just the text I want to write. More exactly: I thought about looking at them again, but at second thought I decided to see if I could not just customize emacs to do the same, backed with all the power you get from several decades of being THE editor for many great hackers.

It took some googling and reading emacs wiki, and then some Lisp-hacking, but finally it’s 4 o’clock in the morning and I’m writing this in my own darkroom mode1, toggled on and off by just hitting F11.

Implementation

I build on hide-mode-line (livejournal post or webonastick) as well as the full-screen info in the emacs wiki.

The whole code just takes 76 lines of code plus 26 lines comments and whitespace:

;;;; Activate distraction free editing with F11

; hide mode line, from http://dse.livejournal.com/66834.html / http://webonastick.com
(autoload 'hide-mode-line "hide-mode-line" nil t)
; word counting
(require 'wc)

(defun count-words-and-characters-buffer ()
  "Display the number of words and characters in the current buffer."
  (interactive)
  (message (concat "The current buffer contains "
           (number-to-string
            (wc-non-interactive (point-min) (point-max)))
           " words and "
           (number-to-string 
            (- (point-max) (point-min)))
           " letters.")))

; fullscreen, taken from http://www.emacswiki.org/emacs/FullScreen#toc26
; should work for X und OSX with emacs 23.x (TODO find minimum version).
; for windows it uses (w32-send-sys-command #xf030) (#xf030 == 61488)
(defvar babcore-fullscreen-p t "Check if fullscreen is on or off")
(setq babcore-stored-frame-width nil)
(setq babcore-stored-frame-height nil)

(defun babcore-non-fullscreen ()
  (interactive)
  (if (fboundp 'w32-send-sys-command)
      ;; WM_SYSCOMMAND restore #xf120
      (w32-send-sys-command 61728)
    (progn (set-frame-parameter nil 'width 
                                (if babcore-stored-frame-width
                                    babcore-stored-frame-width 82))
           (set-frame-parameter nil 'height
                                (if babcore-stored-frame-height 
                                    babcore-stored-frame-height 42))
           (set-frame-parameter nil 'fullscreen nil))))

(defun babcore-fullscreen ()
  (interactive)
  (setq babcore-stored-frame-width (frame-width))
  (setq babcore-stored-frame-height (frame-height))
  (if (fboundp 'w32-send-sys-command)
      ;; WM_SYSCOMMAND maximaze #xf030
      (w32-send-sys-command 61488)
    (set-frame-parameter nil 'fullscreen 'fullboth)))

(defun toggle-fullscreen ()
  (interactive)
  (setq babcore-fullscreen-p (not babcore-fullscreen-p))
  (if babcore-fullscreen-p
      (babcore-non-fullscreen)
    (babcore-fullscreen)))

(global-set-key [f11] 'toggle-fullscreen)

; simple darkroom with fullscreen, fringe, mode-line, menu-bar and scroll-bar hiding.
(defvar darkroom-enabled nil)
; TODO: Find out if menu bar is enabled when entering darkroom. If yes: reenable.
(defvar darkroom-menu-bar-enabled nil)

(defun toggle-darkroom ()
  (interactive)
  (if (not darkroom-enabled)
      (setq darkroom-enabled t)
    (setq darkroom-enabled nil))
  (hide-mode-line)
  (if darkroom-enabled
      (progn
        (toggle-fullscreen)
        ; if the menu bar was enabled, reenable it when disabling darkroom
        (if menu-bar-mode
            (setq darkroom-menu-bar-enabled t)
          (setq darkroom-menu-bar-enabled nil))
        ; save the frame configuration to be able to restore to the exact previous state.
        (if darkroom-menu-bar-enabled
            (menu-bar-mode -1))
        (scroll-bar-mode -1)
        (let ((fringe-width 
               (* (window-width (get-largest-window)) 
                  (/ (- 1 0.61803) (1+ (count-windows)))))
              (char-width-pixels 6))
        ; 8 pixels is the default, 6 is the average char width in pixels
        ; for some fonts:
        ; http://www.gnu.org/software/emacs/manual/html_node/emacs/Fonts.html
           (set-fringe-mode (truncate (* fringe-width char-width-pixels))))
    
        (add-hook 'after-save-hook 'count-words-and-characters-buffer))
    
    (progn 
      (if darkroom-menu-bar-enabled
          (menu-bar-mode))
      (scroll-bar-mode t)
      (set-fringe-mode nil)
      (remove-hook 'after-save-hook 'count-words-and-characters-buffer)
      (toggle-fullscreen))))

; Activate with M-F11 -> enhanced fullscreen :)
(global-set-key [M-f11] 'toggle-darkroom)

(provide 'activate-darkroom)

Also I now activated cua-mode to make it easier to interact with other programs: C-c and C-x now copy/cut when the mark is active. Otherwise they are the usual prefix keys. To force them to be the prefix keys, I can use control-shift-c/-x. I thought this would disturb me, but it does not.

To make it faster, I also told cua-mode to have a maximum delay of 50ms, so I don’t feel the delay. Essentially I just put this in my ~/.emacs:

(cua-mode t)
(setq cua-prefix-override-inhibit-delay 0.005)

Epilog

Well, did this get me to transcribe the text? Not really, since I spent the time building my own DarkRoom/WriteRoom, but I enjoyed the little hacking and it might help me get it done tomorrow - and get far more other stuff done.

And it is really fun to write in DarkRoom mode ;)

PS: If you like the simple darkroom, please leave a comment!

I hereby declare that anyone is allowed to use this post and the screenshot under the same licensing as if it had been written in emacswiki.


  1. Actually there already is a darkroom mode, but it only works for windows. If you use that platform, you might enjoy it anyway. So you might want to call this mode “simple darkroom”, or darkroom x11 :) 

AnhangGröße
2011-01-22-emacs-darkroom.png97.37 KB

Staying sane with Emacs (when facing drudge work)

I have to sift through 6 really boring config files. To stay sane, I call in Emacs for support.

My task looks like this:

img
(click for full size)

In the lower left window I check the identifier in the table I have to complete (left column), then I search for all instances of that identifier in the right window and insert the instrument type, the SIGMA (uncertainty due to representation error defined for the type of the instrument and the location of the site) and note whether the site is marked as assimilated in the config file.

Then I also check all the other config files and note whether the site is assimilated there.

Drudge work. There are people who can do this kind of work. My wife would likely be able to do it faster without tool support than I can do it with tool support. But I’m really bad at that: When the task gets too boring I tend to get distracted - for example by writing this article.

To get the task done anyway, I create tools which make it more enjoyable. And with Emacs that’s actually quite easy, because Emacs provides most required tools out of the box.

Firstoff: My workflow before adding tools was like this:

  • hit xo to switch from the lower left window to the config file at the right.
  • Use M-x occur then type the station identifier. This displays all occurances of the station identifier within the config file in the upper left window.
  • Hit xo twice to switch to the lower left window again.
  • Type the information into the file.
  • Switch to the next line and repeat the process.

I now want to simplify this to a single command per line. I’ll use F9 as key, because that isn’t yet used for other things in my Emacs setup and because it is easy to reach and my default keybinding as “useful shortcut for this file”. Other single-keystroke options would be F7 and F8. All other F-keys are already used :)

To make this easy, I define a macro:

  • Move to the line above the line I want to edit.
  • Start Macro-recording with C-x C-(.
  • Go to the beginning of the next line with C-n and C-a.
  • Activate the mark with C-SPACE and select the whole identifier with M-f.
  • Make the identifier lowercase with M-x downcase-region, copy it with M-w and undo the downcasing with C-x u (or use the undo key; I defined one in my xmodmap).
  • Switch to the config file with C-x o
  • Search the buffer with M-x occur, inserting the identifier with C-y.
  • Hit C-x o C-x o (yes, twice) to get back into the list of sites.
  • Move to the end of the instrument column with M-f and kill the word with C-BACKSPACE.
  • Save the macro with C-x C-).
  • Bind kmacro-call-macro to F9 with M-x local-set-key F9 kmacro-call-macro.

Done.

My workflow is now reduced to this:

  • Hit F9
  • Type the information.
  • Repeat.

I’m pretty sure that this will save me more time today than I spent writing this text ☺

Happy hacking!

AnhangGröße
2015-01-26-sane-with-emacs-task.png79.85 KB
2015-01-26-sane-with-emacs-task-200.png7.79 KB
2015-01-26-sane-with-emacs-task-300.png15.92 KB
2015-01-26-sane-with-emacs-task-400.png27.28 KB
2015-01-26-sane-with-emacs-task-450.png33.83 KB

Tutorial: Writing scientific papers for ACPD using emacs org-mode

PDF-version (for printing)

orgmode-version (for editing)

Emacs Org mode is an excellent tool for reproducible research,1 but research is only relevant if people learn about it.2 To reach people with scientific work, you need to publish your results in a Journal, so I show here how to publish in ACPD with Emacs Org mode.3

1 Requirements

To use this tutorial, you need

  • a fairly recent version of org-mode (8.0 or later - not yet shipped with emacs 24.3, so you will need to install it separately) and naturally
  • Emacs. Also you need to download the
  • copernicus latex package. And it can’t hurt to have a look at the latex-instructions from ACP. I used them to create my setup.
  • lineno.sty. This is required by copernicus, but not included in the package - and neither in the texlive version I use.

2 Basic Setup

2.1 Emacs

The first step in publishing to ACPD is to activate org-mode and latex export and to create a latex-class in Emacs. To do so, just add the following to your ~/.emacs (or ~/.emacs.d/init.el) and eval it (for example by moving to the closing parenthesis and typing C-x C-e):

  (require 'org)
  (require 'org-latex)
  (require 'ox-latex)
  (setq org-latex-packages-alist 
        (quote (("" "color" t) ("" "minted" t) ("" "parskip" t)))
        org-latex-pdf-process 
        (quote (
"pdflatex -interaction nonstopmode -shell-escape -output-directory %o %f" 
"bibtex $(basename %b)" 
"pdflatex -interaction nonstopmode -shell-escape -output-directory %o %f" 
"pdflatex -interaction nonstopmode -shell-escape -output-directory %o %f")))
  (add-to-list 'org-latex-classes
               `("copernicus_discussions"
                 "\\documentclass{copernicus_discussions}
               [NO-DEFAULT-PACKAGES]
               [PACKAGES]
               [EXTRA]"
                 ("\\section{%s}" . "\\section*{%s}")
                 ("\\subsection{%s}" "\\newpage" "\\subsection*{%s}" "\\newpage")
                 ("\\subsubsection{%s}" . "\\subsubsection*{%s}")
                 ("\\paragraph{%s}" . "\\paragraph*{%s}")
                 ("\\subparagraph{%s}" . "\\subparagraph*{%s}"))
               )

This allows you to use #+Latex_Class: copernicus_discussions in your org-mode file to set the PDF to export for ACPD.

Also you will likely want to use reftex for nice bibtex integration. To get it, add the following to your ~/.emacs or ~/.emacs.d/init.el:

(require 'reftex-cite)
(defun org-mode-reftex-setup ()
  (interactive)
  (and (buffer-file-name) (file-exists-p (buffer-file-name))
       (progn
        ; Reftex should use the org file as master file. See C-h v TeX-master for infos.
        (setq TeX-master t)
        (turn-on-reftex)
        ; enable auto-revert-mode to update reftex when bibtex file changes on disk
        (global-auto-revert-mode t) ; careful: this can kill the undo
                                    ; history when you change the file
                                    ; on-disk.
        (reftex-parse-all)
        ; add a custom reftex cite format to insert links
        ; This also changes any call to org-citation!
        (reftex-set-cite-format
         '((?c . "\\citet{%l}") ; natbib inline text
           (?i . "\\citep{%l}") ; natbib with parens
           ))))
  (define-key org-mode-map (kbd "C-c )") 'reftex-citation)
  (define-key org-mode-map (kbd "C-c (") 'org-mode-reftex-search))

(add-hook 'org-mode-hook 'org-mode-reftex-setup)

The first line adds reftex-citations with C-c [, the rest sets some reftex-defaults and adds a menu which allows you to chose using \textbackslash citep{} instead of \textbackslash cite{} (this is what ACPD requires).

For nice Sourcecode highlighting, you should also install Pygmentize and then add the following to your .emacs.d:

(add-to-list 'org-latex-packages-alist '("" "minted"))
(add-to-list 'org-latex-packages-alist '("" "color"))
(setq org-latex-listings 'minted)

; add emacs lisp support for minted
(setq org-latex-custom-lang-environments
      '((emacs-lisp "common-lispcode")))

2.2 The working folder

As next step, unzip the copernicus latex package in the folder you want to use for writing your article (do use a dedicated folder for that: org-mode leaves around some files). And remember to use a version-tracking system like Mercurial, so you can always take snapshots of your current state.

This will give you the following files:

  • authblk.sty
  • copernicus.bst
  • copernicus_discussions.cls
  • natbib.sty
  • pdfscreen.sty
  • pdfscreencop.sty

Ensure that all of them are in your folder, not in a subfolder. If necessary copy them there.

Also get lineno.sty and copy it into your folder.

If you want to use unicode-symbols in your text, add uniinput.sty, too.

3 The org-mode document

Using the ACPD style requires some deviations from the standard org-mode export process. Luckily org-mode is flexible to adapt to them. Setup your document as follows:

#+title: YOUR TITLE
#+Options: toc:nil ^:nil
#+BIND: org-latex-title-command ""
#+Latex_Class: copernicus_discussions
#+LaTeX_CLASS_OPTIONS: [acpd, hvmath, online]

# Nice code-blocks
#+BEGIN_SRC elisp :noweb no-export :exports results
  (setq org-latex-minted-options
    '(("bgcolor" "mintedbg") ("frame" "single") ("framesep" "6pt") 
      ("mathescape" "true") ("fontsize" "\\footnotesize")))
  nil
#+END_SRC

#+BEGIN_ABSTRACT
Abstract
#+END_ABSTRACT
#+TOC: headlines 2

#+Latex: \runningtitle{SHORT TITLE}
#+Latex: \runningauthor{SHORT AUTHOR}
#+Latex: \correspondence{AUTHOR NAME\\ EMAIL}
#+Latex: \affil{YOUR UNIVERSITY}
#+Latex: \author[2,*]{SECOND AUTHOR}
#+Latex: \author[1]{THIRD AUTHOR SAME INSTITUTE}
#+Latex: \affil[2]{SECOND UNIVERSITY}
#+Latex: \affil[*]{now at: THIRD UNIVERSITY}

#+Latex: \received{}
#+Latex: \pubdiscuss{}
#+Latex: \revised{}
#+Latex: \accepted{}
#+Latex: \published{}
#+Latex: %% These dates will be inserted by ACPD
#+Latex: \firstpage{1}

#+Latex: \maketitle

#+Latex: \introduction
# * Introduction

* Second section

* Discussion

#+Latex: \conclusions
# * Conclusions

#+Latex: \appendix

# use acknowledgements for multiple
#+BEGIN_acknowledgement
Foo Bar Baz.
#+END_acknowledgement

#+Latex: \bibliographystyle{copernicus}
#+Latex: \bibliography{ABSOLUTE_PATH_TO_YOUR_BIBTEX_FILE_WITHOUT_.bib_SUFFIX}{}

# Local Variables:
# org-confirm-babel-evaluate: nil
# org-export-allow-bind-keywords: t
# End:

Let’s look at this in more detail.

3.1 Use the LaTeX class

As first step, we set the LaTeX class. In the options we select the journal (acpd) and such - you can find the detailed options in the latex-instructions from ACP.

#+Latex_Class: copernicus_discussions
#+LaTeX_CLASS_OPTIONS: [acpd, hvmath, online]

3.2 Delayed table of contents

The table of contents is set to be shown after the Abstract by setting the toc:nil option and later explicitely calling #+TOC: headlines 2. In org-mode this is really straightforward.

3.3 Delayed maketitle

Delaying \textbackslash maketitle is a bit more convoluted than delaying the TOC. First we add the local variable org-export-allow-bind-keywords: t at the bottom to allow file-local custom bindings for functions in the file, then we inactivate the title-command with #+BIND: org-latex-title-command /""/ and finally we add \textbackslash maketitle where we need it.

3.4 Define minted style

This defines the variables minted uses for beautiful code-blocks. Without this, your code-blocks will just look like inline text.

#+BEGIN_SRC elisp :noweb no-export :exports results
  (setq org-latex-minted-options
    '(("bgcolor" "mintedbg") ("frame" "single") ("framesep" "6pt") 
      ("mathescape" "true") ("fontsize" "\\footnotesize")))
  nil
#+END_SRC

3.5 Intro and conclusions

The Introduction and the conclusions have their own commands in ACPD, because they use them to add bookmarks. You can also use he commands to specify another name.

We call the commands with #+LaTeX: (just like some others) which allows us to explicitely add arbitrary LaTeX-code.

3.6 Appendix

The appendix should be used sparingly. It changes the numbering of the pages.

#+Latex: \appendix

3.7 Bibliography

The bibliography allows referring to entries from your general bibtex-file. Ensure that you use the correct absolute path to that file. For more information, see the org-tutorial page for biblatex.

3.8 Babel evaluate without confirmation

This allows us to just run all code snippets which we embedded in the document when we export the file. If we do not set this local variable, we have to acknowledge each source block before it runs (the block with local variables also contains the variable which allows binding functions on a per-file basis, as explained above).

# Local Variables:
# org-confirm-babel-evaluate: nil
# org-export-allow-bind-keywords: t
# End:

4 Conclusion

With this setup, you can publish your paper with ACPD using org-mode for the actual writing, which has a much lower overhead than LaTeX and offers quite a few unique features for more efficient working - from easy referencing over inline math preview to planning and code-evaluation directly in your file.

Footnotes:

1

General methods for using Emacs org-mode in scientific publishing have been described by \citet{SchulteEmacs2012}.

2

Research, or rather science not only means to learn new things and to uncover secrets, but just as importantly to share what you learn. Fun fact: The German word for science is “Wissenschaft”, built from the words “wissen” (knowledge) and “schaft” (from schaffen: create), so it more exactly captures the essence of scientific work than the word “science”, that is based on the latin word “scientia” which just means knowledge. It isn’t enough to just learn. Creating knowledge requires telling it to others, so they can build upon it.

3

I chose ACPD as target for this article, because it is an Open Access journal, and because I want to publish in it (which makes it a rather natural choice for a tutorial).

Unicode char \u8:χ not set up for use with LaTeX: Solution (made easy with Emacs)

For years I regularly stumbled over LaTeX-Errors in the form of Unicode char \u8:χ not set up for use with LaTeX. I always took the chickens path and replaced the unicode characters with the tex-escapes in the file. That was easy, but it made my files needlessly unreadable. Today I decided to FIX the problem once and for all. And it worked. Easily.

Firstoff: The problem I’m facing is that my keyboard layout makes it effortless for me to input characters like ℂ Σ and χ. But LaTeX cannot cope with them out-of-the-box. Org-mode already catches most of these problems, so I can write things like x² instead of x^2, but occasionally it stumbles.

The solution to that is actually pretty simple: I only need to declare the escapes-sequences LaTeX should use when it sees one of the characters (to be used before \begin{document}!):

\DeclareUnicodeCharacter{03C7}{\chi}

Or in org-mode:

#+LaTeX_HEADER: \DeclareUnicodeCharacter{03C7}{\chi}

Thanks go to Wikibooks:LaTeX for this. Their solution suggests then to read several Unicode definition documents for tracking down the codepoint of the character. But we can make that easier with Emacs (almost everything is easier with Emacs ☺).

Instead of browsing huge documents manually, we simply rely on the unicode-definitions in Emacs: Move the cursor over the char and execute M-x describe-char.

When used with χ, this shows the following output:

             position: 672 of 35513 (2%), column: 0
            character: χ (displayed as χ) (codepoint 967, #o1707, #x3c7)
    preferred charset: unicode-bmp (Unicode Basic Multilingual Plane (U+0000..U+FFFF))
code point in charset: 0x03C7
… (and a bit more) …

What we need is code point in charset: Just leave out the 0x and you have the codepoint.

For the document I currently write, I now use the following definitions:

#+LaTeX_HEADER: \DeclareUnicodeCharacter{03C7}{\chi}
#+LaTeX_HEADER: \DeclareUnicodeCharacter{B2}{^{2}}

And that makes χ² work.

Happy Hacking - and have fun with Emacs Org-Mode!

Unicode-Characters for TODO-States in Emacs Orgmode

By default Emacs Orgmode uses uppercase words for todo keywords. But having tens of entries marked with TODO and DONE in my file looked horribly cluttered to me. So I searched for alternatives. After a few months of experimentation, I decided on the following scheme. It served me well ever since:

  • ❢ To do
  • ☯ In progress
    • ⚙ A program is running (optional detail)
    • ✍ I’m writing (optional detail)
  • ⧖ Waiting
  • ☺ To report
  • ✔ Done
  • ⌚ Maybe do this at some later time
  • ✘ Won’t do

To set this in org-mode, just add the following to the header (and reopen the document, for example with C-x C-v):

#+SEQ_TODO: ❢ ☯ ⧖ | ☺ ✔ ⌚ ✘

or for the complex case (with details on what I do)

#+SEQ_TODO: ❢ ☯ ⚙ ✍ ⧖ | ☺ ✔ ⌚ ✘

Then use C-c C-t or SHIFT-→ (shift + right arrow) to switch to the next state or SHIFT-← (shift + left arrow) to switch to the previous state.

Anything before the | in the SEQ_TODO is shown in red (not yet done), anything after the | is show in green (done). Things which get triggered when something is done (like storing the time of a scheduled entry) happen when the state crosses the |.

And with that, my orgmode documents are not only very useful but also look pretty lean. Just as good as having a GUI with images, but additionally I can access them over SSH and edit the todo state with any tool - because it’s just text.

Use the source, Luke! — Emacs org-mode beamer export with images in figure

I just needed to tweak my Emacs org-mode to beamer-latex export to embed images into a figure environment (not wrapfigure!). After lots of googling and documentation reading I decided to bite the bullet and just read the source. Which proved to be much easier than I had expected.

This tutorial requires at least org-mode 8.0 (before that you had to use hacks to get figure without a caption). It is only tested for org-mode 8.0.2: The code you see when you read the source might look different in other versions.

1 Task

I just needed to tweak my org-mode to beamer-latex export to embed images I produce by a codesnippet in a figure environment. Practially speaking: I had this

#+BEGIN_SRC sh :exports results :results output raw
echo '[[./image.png]]'
#+END_SRC

which produces this latex snippet

\includegraphics[width=.9\linewidth]{./image.png}

and I needed a snippet which instead produces this:

\begin{figure}[htb]
\centering
\includegraphics[width=.9\linewidth]{./image.png}
\end{figure}

2 Use the Source!

After lots of googling and documentation reading I decided to bite the bullet and just read the source. Which proved to be much easier than I had expected (warning: obscure list of commands follows. Will be explained afterwards):

C-h f org-latex-export-as-latex
C-x C-o
C-s .el C-b ENTER
C-s figure C-s C-s C-s ...

And less than a minute after starting, I saw this:

(float (let ((float (plist-get attr :float)))
     (cond ((string= float "wrap") 'wrap)
       ((string= float "multicolumn") 'multicolumn)
       ((or (string= float "figure")
            (org-element-property :caption parent))
        'figure))))

Translated: Just add this to the output of the source block:

#+attr_latex: :float figure

which makes the sh block look like this:

#+BEGIN_SRC sh :exports results :results output raw
echo '#+attr_latex: :float figure'
echo '[[./image.png]]'
#+END_SRC

And voila, the export works and the latex looks like this:

\begin{figure}[htb]
\centering
\includegraphics[width=.9\linewidth]{./image.png}
\end{figure}

Mission accomplished!

3 Commands Explained

For all those who are not fluid in emacs commands, Here’s a short breakdown here’s a breakdown of my source-reading process:

C-h f org-latex-export-as-latex

Get the help (Control-h) for the function (f) org-latex-export-as-latex. I knew that org-mode calls that. If you did not know it, you could have simply used C-h k C-e (get help on the export keyboard shortcut) which would have led you to the function org-export-dispatch and the source file ox.el. But since the org-mode guides tell you to use M-x org-latex-export-as-latex, the function to search for is actually pretty obvious. Alternatively just use M-x org-latex- and then type TAB 2 times. That will show you all the export functions.

C-x C-o

Switch to the other buffer.

C-s .el C-b ENTER

Focus on the source file and open it (the canonical suffix for emacs lisp files is .el).

C-s figure C-s C-s C-s ...

Search for figure. Repeat 9 times to find the correct place in the code (in emacs that’s really easy and fast to do).

Voilà, you found the snippet which tells you that you can use the float-keyword (:float) with the argument "figure".

4 Conclusion

Using the source was actually faster than googling in this case - and if you practise it, you learn bits and pieces about the foundation of the program you use, which will enable you to adapt it even better to your needs in the future.

And with that, I conclude this text.

Enjoy your Emacs and Happy Hacking!

AnhangGröße
2013-08-28-Mi-use-the-source-beamer-figure.org3.8 KB

Using Macros to avoid tedious tasks (screencast)

Because I am lazy,1 and that makes me fast.

Screencast

(download (ogg theora video))

Using Macros to avoid tedious tasks

Plan

  • [X] Show the task
  • [X] Record Macro
  • [X] Use Macro

Explanation

I record a macro to find ~, then activate the mark and find a space.

C-s ~, C-SPACE, C-s SPACE

Then kill the region and type ${}

C-w ${}

That’s it.

Why??

  • It is resilient: I check each change I do.
  • I avoid repeating unnerving stuff.

Thank you

recorded with recordmydesktop: recordmydesktop --delay 10 --width 800 --height 600 --on-the-fly-encoding


  1. I have lots of stuff to do, so I cannot afford not being lazy ☺ 

AnhangGröße
using-emacs-macros-to-reduce-tedious-work-screencast.ogv17.81 MB
using-emacs-macros-to-reduce-tedious-work-screencast.org397 Bytes

Wish: KDE with Emacs-style keyboard shortcuts

I would love to be able to use KDE with emacs-style keyboard shortcuts, because Emacs offers a huge set of already clearly defined shortcuts for many different situations. Since its users tend to do very much with the keyboard alone, even more obscure tasks are available via shortcuts.

I think that this would be useful, because Emacs is like a kind of nongraphical desktop environment itself (just look at emacspeak!). For all those who use Emacs in a KDE environment, it could be a nice timesaver to be able to just use their accustomed bindings.

It also has a mostly clean structure for the bindings:

  • "C-x anything" does changes which affect things outside the content of the current buffer.
  • "C-c anything" is kept for specific actions of programs. For example "C-c C-c" in an email sends the email, while "C-c C-c" in a version tracking commit message finishes the message and starts the actual commit.
  • "C-anything but x or c" acts on the content you're currently editing.
  • "M-x" opens a 'command-selection-dialog' (just like alt-f2). You can run commands by name.
  • "M-anything but x" is a different flavor of "C-anything but x or c". For example "C-f" moves the cursor one character forward, while "M-f" moves one word forward. "C-v" moves one page forward, while "M-v" moves one page backwards.

On the backend side, this would require being able to define multistep shortcuts. Everything else is just porting the emacs shortcuts to KDE actions.

The actual porting of shortcuts would then require mapping of the Emacs commands to KDE actions.

Some examples:

  • "C-s" searches in a file. Replaces C-f.
  • "C-r" searches backwards.
  • "C-x C-s" saves a file -> close. Replaces C-w.
  • "C-x C-f" opens a file -> Open. Replaces C-o.
  • "C-x C-c" closes the program -> quit. Replaces C-q.
  • "C-x C-b" switches between buffers/files/tabs -> switch the open file. Replaces alt-right_arrow and a few other (to my knowledge) inconsistent bindings.
  • "C-x C-2" splits a window (or part of a window) vertically. "C-x C-o" switches between the parts. "C-x C-1" undoes the split and keeps the currently selected part. "C-x C-0" undoes the split and hides the currently selected part.

Write multiple images on a single page in org-mode.

How to add show multiple images on one page in the latex-export of emacs org-mode. I had this problem. This is my current solution.


1 Prep

Use the package subfig:

#+latex_header: \usepackage{subfig}

And create an image:

import pylab as pl
import numpy as np
x = np.random.random(size=(2,1000))
pl.scatter(x[0,:], x[1,:], marker=".")
pl.savefig("test.png")
print "\label{fig:image}"
print "[[./test.png]]"

\label{fig:image} test.png

Image: \ref{fig:image}

2 Multiple images on one page in LaTeX

#+BEGIN_LaTeX
\begin{figure}\centering
\subfloat[A gull]{\label{fig:latex-gull} 
\includegraphics[width=0.3\textwidth]{test}
} 
\subfloat[A tiger]{\label{fig:latex-tiger} 
\includegraphics[width=0.3\textwidth]{test}
} 
\subfloat[A mouse]{\label{fig:latex-mouse} 
\includegraphics[width=0.3\textwidth]{test}
}
\caption{Multiple pictures}\label{fig:latex-animals}
\end{figure}
#+END_LaTeX

Latex-Animals \ref{fig:latex-animals}.

3 Multiple images on one page in org-mode

#+latex: \begin{figure}\centering
#+latex: \subfloat[A gull]{\label{fig:org-gull} 
#+attr_latex: :width 0.3\textwidth
[[./test.png]]
#+latex: }\subfloat[A tiger]{\label{fig:org-tiger} 
#+attr_latex: :width 0.3\textwidth
[[./test.png]]
#+latex: }\subfloat[A mouse]{\label{fig:org-mouse} 
#+attr_latex: :width 0.3\textwidth
[[./test.png]]
#+latex: }\caption{Multiple pictures}\label{fig:org-animals}
#+latex: \end{figure}

test.png

test.png

test.png

Org-Animals \ref{fig:org-animals}.

AnhangGröße
test.png98.4 KB
2014-01-14-Di-org-mode-multiple-images-per-page.pdf281.84 KB
2014-01-14-Di-org-mode-multiple-images-per-page.org2.48 KB

emacs wanderlust.el setup for reading kmail maildir

This is my wanderlust.el file to read kmail maildirs. You need to define every folder you want to read.

;; mode:-*-emacs-lisp-*-
;; wanderlust 
(setq 
  elmo-maildir-folder-path "~/.kde/share/apps/kmail/mail"
          ;; where i store my mail

  wl-stay-folder-window t                       ;; show the folder pane (left)
  wl-folder-window-width 25                     ;; toggle on/off with 'i'
  
  wl-smtp-posting-server "smtp.web.de"            ;; put the smtp server here
  wl-local-domain "draketo.de"          ;; put something here...
  wl-message-id-domain "web.de"     ;; ...

file continued:

  wl-from "Arne Babenhauserheide "                  ;; my From:

  ;; note: all below are dirs (Maildirs) under elmo-maildir-folder-path 
  ;; the '.'-prefix is for marking them as maildirs
  wl-fcc ".sent-mail"                       ;; sent msgs go to the "sent"-folder
  wl-fcc-force-as-read t               ;; mark sent messages as read 
  wl-default-folder ".inbox"           ;; my main inbox 
  wl-draft-folder ".drafts"            ;; store drafts in 'postponed'
  wl-trash-folder ".trash"             ;; put trash in 'trash'
  wl-spam-folder ".gruppiert/Spam"              ;; ...spam as well
  wl-queue-folder ".queue"             ;; we don't use this

  ;; check this folder periodically, and update modeline
  wl-biff-check-folder-list '(".todo") ;; check every 180 seconds
                                       ;; (default: wl-biff-check-interval)

  ;; hide many fields from message buffers
  wl-message-ignored-field-list '("^.*:")
  wl-message-visible-field-list
  '("^\\(To\\|Cc\\):"
    "^Subject:"
    "^\\(From\\|Reply-To\\):"
    "^Organization:"
    "^Message-Id:"
    "^\\(Posted\\|Date\\):"
    )
  wl-message-sort-field-list
  '("^From"
    "^Organization:"
    "^X-Attribution:"
     "^Subject"
     "^Date"
     "^To"
     "^Cc"))


; Encryption via GnuPG

(require 'mailcrypt)
 (load-library "mailcrypt") ; provides "mc-setversion"
(mc-setversion "gpg")    ; for PGP 2.6 (default); also "5.0" and "gpg"

(autoload 'mc-install-write-mode "mailcrypt" nil t)
(autoload 'mc-install-read-mode "mailcrypt" nil t)
(add-hook 'mail-mode-hook 'mc-install-write-mode)

(add-hook 'wl-summary-mode-hook 'mc-install-read-mode)
(add-hook 'wl-mail-setup-hook 'mc-install-write-mode)

;(setq mc-pgp-keydir "~/.gnupg")
;(setq mc-pgp-path "gpg")
(setq mc-encrypt-for-me t)
(setq mc-pgp-user-id "FE96C404")

(defun mc-wl-verify-signature ()
  (interactive)
  (save-window-excursion
    (wl-summary-jump-to-current-message)
    (mc-verify)))

(defun mc-wl-decrypt-message ()
  (interactive)
  (save-window-excursion
    (wl-summary-jump-to-current-message)
    (let ((inhibit-read-only t))
      (mc-decrypt))))

(eval-after-load "mailcrypt"
  '(setq mc-modes-alist
       (append
        (quote
         ((wl-draft-mode (encrypt . mc-encrypt-message)
            (sign . mc-sign-message))
          (wl-summary-mode (decrypt . mc-wl-decrypt-message)
            (verify . mc-wl-verify-signature))))
        mc-modes-alist)))


; flowed text

 ;; Reading f=f
 (autoload 'fill-flowed "flow-fill")
 (add-hook 'mime-display-text/plain-hook
          (lambda ()
            (when (string= "flowed"
                           (cdr (assoc "format"
                                       (mime-content-type-parameters
                                        (mime-entity-content-type entity)))))
              (fill-flowed))))
; writing f=f
;(mime-edit-insert-tag "text" "plain" "; format=flowed")


(provide 'private-wanderlust)

UPDATE (2012-05-07): ~/.folders

I now use a ~/.folders file, to manage my non-kmail maildir subscriptions, too. It looks like this:

.sent-mail
.~/.local/share/mail/mgl_spam   "mgl spam" 
.~/.local/share/mail/to.arne_bab    "to arne_bab"
.inbox  "inbox" 
.trash  "Trash"
..gruppiert.directory/.inbox.directory/Freunde  "Freunde"
.drafts "Drafts"
..gruppiert.directory/.alt.directory/Posteingang-2011-09-18 "2011-09-18"
.outbox

The mail in ~/.local/share/mail is fetched via fetchmail and procmail to have a really reliable mail fetching system which does not rely on a non-broken database or free space on the disk to keep working…

keep auto-complete from competing with org-mode structure-templates

For a long time it bothered me that auto-complete made it necessary for me to abort completion before being able to use org-mode templates.

I typed <s and auto-complete showed stuff like <string, forcing me to hit C-g before I could use TAB to complete the template with org-mode.

I fixed this for me by adding all the org-mode structure templates as stop-words:

;; avoid competing with org-mode templates.
(add-hook 'org-mode-hook
          (lambda ()
            (make-local-variable 'ac-stop-words)
            (loop for template in org-structure-template-alist do
                  (add-to-list 'ac-stop-words 
                               (concat "<" (car template))))))

Note, that with this snippet you will have to reopen a file if you add an org-mode template and want it recognized as stop-word in that file.

PS: I added this as bug-report to auto-complete, so with some luck you might not have to bother with this, if you’re willing to simply wait for the next release ☺

Free Software

„Free, Reliable, Ethical and Efficient“
„Frei, Robust, Ethisch und Innovativ”
„Libre, Inagotable, Bravo, Racional y Encantado“

Articles connected to Free Software (mostly as defined by the GNU Project). This is more technical than Politics and Free Licensing, though there is some overlap.

Also see my lists of articles about specific free software projects:

  • Emacs - THE Editor.
  • Freenet - Decentralized, Anonymous Communication.
  • Mercurial - Decentralized Version Control System.

There is also a German Version to this Page: Freie Software. Most articles are not translated, so the content on the german page and on the english page is very different.

For me, Gentoo is about *convenient* choice

It's often said, that Gentoo is all about choice, but that doesn't quite fit what it is for me.

After all, the highest ability to choose is Linux from scratch and I can have any amount of choice in every distribution by just going deep enough (and investing enough time).

What really distinguishes Gentoo for me is that it makes it convenient to choose.

Since we all have a limited time budget, many of us only have real freedom to choose, because we use Gentoo which makes it possible to choose with the distribution-tools. Therefore only calling it “choice” doesn't ring true in general - it misses the reason, why we can choose.

So what Gentoo gives me is not just choice, but convenient choice.

Some examples to illustrate the point:

KDE 4 without qt3

I recently rebuilt my system after deciding to switch my disk layout (away from reiserfs towards a simple ext3 with reiser4 for the portage tree). When doing so I decided to try to use a "pure" KDE 4 - that means, a KDE 4 without any remains from KDE3 or qt3.

To use kde without any qt3 applications, I just had to put "-qt3" and "-qt3support" into my useflags in /etc/make.conf and "emerge -uDN world" (and solve any arising conflicts).

Imagine doing the same with a (K)Ubuntu...

Emacs support

Similarly to enable emacs support on my GentooXO (for all programs which can have emacs support), I just had to add the "emacs" useflag and "emerge -uDN world".

Selecting which licenses to use

Just add

ACCEPT_LICENSE="-* @FSF-APPROVED @FSF-APPROVED-OTHER"

to your /etc/make.conf to make sure you only get software under licenses which are approved by the FSF.

For only free licenses (regardless of the approved state) you can use:

ACCEPT_LICENSE="-* @FREE"

All others get marked as masked by license. Default (no ACCEPT_LICENSE in /etc/make.conf) is “* -@EULA”: No unfree software. You can check your setting via emerge --info | grep ACCEPT_LICENSE. More information…

One program (suite) in testing, but the main system rock stable

Another part where choosing is made convenient in Gentoo are testing and unstable programs.

I remember my pain with a Kubuntu, where I wanted to use the most recent version of Amarok. I either had to add a dedicated Amarok-only testing repository (which I'd need for every single testing program), or I had to switch my whole system into testing. I did the latter and my graphical package manager ceased to work. Just imagine how quickly I ran back to Gentoo.

And then have a look at the ease of deciding to take one package into testing in Gentoo:

  • emerge --autounmask-write =cathegory/package-version
  • etc-update
  • emerge =cathegory/package-version

EDIT: Once I had a note here “It would be nice to be able to just add the missing dependencies with one call”. This is now possible with --autounmask-write.

And for some special parts (like KDE 4) I can easily say something like

  • ln -s /usr/portage/local/layman/kde-testing/Documentation/package.keywords/kde-4.3.keywords /etc/portage/package.keywords/kde-4.3.keywords

(I don't have the kde-testing overlay on my GentooXO, where I write this post, so the exact command might vary slightly)

Closing remarks

So to finish this post: For me, Gentoo is not only about choice. It is about convenient choice.

And that means: Gentoo gives everybody the power to choose.

I hope you enjoy it as I do!

Automatic updates in Gentoo GNU/Linux

To keep my Gentoo up to date, I use daily and weekly update scripts which also always run revdep-rebuild after the saturday night update :)

My daily update is via pkgcore to pull in all important security updates:

pmerge @glsa

That pulls in the Gentoo Linux Security Advisories - important updates with mostly short compile time. (You need pkgcore for that: "emerge pkgcore")

Also I use two cron scripts.

Note: It might be useful to add the lafilefixer to these scripts (source).

The following is my daily update (in /etc/cron.daily/update_glsa_programs.cron )

Daily Cron

\#! /bin/sh

\### Update the portage tree and the glsa packages via pkgcore

\# spew a status message
echo $(date) "start to update GLSA" >> /tmp/cron-update.log

\# Sync only portage
pmaint sync /usr/portage

\# security relevant programs
pmerge -uDN @glsa > /tmp/cron-update-pkgcore-last.log || cat \
    /tmp/cron-update-pkgcore-last.log >> /tmp/cron-update.log  

\# And keep everything working
revdep-rebuild

\# Finally update all configs which can be updated automatically
cfg-update -au

echo $(date) "finished updating GLSA" >> /tmp/cron-update.log

And here's my weekly cron - executed every saturday night (in /etc/cron.weekly/update_installed_programs.cron ):

Weekly Cron

\#!/bin/sh                                                     

\### Update my computer using pgkcore, 
\### since that also works if some dependencies couldn't be resolved.

\# Sync all overlays
eix-sync

\## First use pkgcore
\# security relevant programs (with build-time dependencies (-B))
pmerge -BuD @glsa

\# system, world and all the rest
pmerge -BuD @system
pmerge -BuD @world
pmerge -BuD @installed

\# Then use portage for packages pkgcore misses (inlcuding overlays) 
\# and for *EMERGE_DEFAULT_OPTS="--keep-going"* in make.conf 
emerge -uD @security
emerge -uD @system
emerge -uD @world
emerge -uD @installed

\# And keep everything working
emerge @preserved-rebuild
revdep-rebuild

\# Finally update all configs which can be updated automatically
cfg-update -au

pkgcore vs. eix → pix (find packages in Gentoo)

For a long time it bugged me, that eix uses a seperate database which I need to keep up to date. But no longer: With pkgcore as fast as it is today, I set up pquery to replace eix.

The result is pix:

alias pix='pquery --raw -nv --attr=keywords'

(put the above in your ~/.bashrc)

The output looks like this:

$ pix pkgcore
 * sys-apps/pkgcore
    versions: 0.5.11.6 0.5.11.7
    installed: 0.5.11.7
    repo: gentoo
    description: pkgcore package manager
    homepage: http://www.pkgcore.org
    keywords: ~alpha ~amd64 ~arm ~hppa ~ia64 ~ppc ~ppc64 ~s390 ~sh ~sparc ~x86

It’s still a bit slower than eix, but it operates directly on the portage tree and my overlays — and I no longer have to use eix-sync for syncing my overlays, just to make sure eix is updated.

Some other treats of pkgcore

Aside from pquery, pkgcore also offers pmerge to install packages (almost the same syntax as emerge) and pmaint for synchronizing and other maintenance stuff.

From my experience, pmerge is hellishly fast for simple installs like pmerge kde-misc/pyrad, but it sometimes breaks with world updates. In that case I just fall back on portage. Both are Python, so when you have one, adding the other is very cheap (spacewise).

Also pmerge has the nice pmerge @glsa feature: Get Gentoo Linux security updates. Due to it’s almost unreal speed (compared to portage) checking for security updates now doesn’t hurt anymore.

$ time pmerge -p @glsa
 * Resolving...
Nothing to merge.

real    0m1.863s
user    0m1.463s
sys     0m0.100s

It differs from portage in that you call world as set explicitely — either via a command like pmerge -aus world or via pmerge -au @world.

pmaint on the other hand is my new overlay and tree synchronizer. Just call pmaint sync to sync all, or pmaint sync /usr/portage to sync only the given overlay (in this case the portage tree).

Caveeats

Using pix as replacement of eix isn’t yet perfect. You might hit some of the following:

  • pix always shows all packages in the tree and the overlays. The keywords are only valid for the highest version, though. marienz from #pkgcore on irc.freenode.net is working on fixing that.

  • If you only want to see the packages which you can install right away, just use pquery -nv. pix is intended to mimik eix as closely as possible, so I don’t have to change my habits ;) If it doesn’t fit your needs, just change the alias.

  • To search only in your installed packages, you can use pquery --vdb -nv.

  • Sometimes pquery might miss something in very broken overlay setups (like my very grown one). In that case, please report the error in the bugtracker or at #pkgcore on irc.freenode.net:

    23:27 <marienz> if they're reported on irc they're probably either fixed pretty quickly or they're forgotten
    23:27 <marienz> if they're reported in the tracker they're harder to forget but it may take longer before they're noticed

I hope my text helps you in changing your Gentoo system further towards the system which fits you best!

No, it ain’t “forever” (GNU Hurd code_swarm from 1991 to 2010)

If the video doesn’t show, you can also download it as Ogg Theora & Vorbis “.ogv” or find it on youtube.

This video shows the activity of the Hurd coders and answers some common questions about the Hurd, including “How stagnated is Hurd compared to Duke Nukem Forever?”. It is created directly from commits to Hurd repositories, processed by community codeswarm.

Every shimmering dot is a change to a file. These dots align around the coder who did the change. The questions and answers are quotes from todays IRC discussions (2010-07-13) in #hurd at irc.freenode.net.

You can clearly see the influx of developers in 2003/2004 and then again a strenthening of the development in 2008 with less participants but higher activity than 2003 (though a part of that change likely comes from the switch to git with generally more but smaller commits).

I hope you enjoyed the high-level look on the activity of the Hurd project!

PS: The last part is only the information title with music to honor Sean Wright for allowing everyone to use and adapt his Album Enchanted.

Some technical advantages of the Hurd

→ An answer to just accept it, truth hurds, where Flameeyes told his reasons for not liking the Hurd and asked for technical advantages (and claimed, that the Hurd does not offer a concept which got incorporated into other free software, contributing to other projects). Note: These are the points I see. Very likely there are more technical advantages which I don’t see well enough to explain them.

Information for potential testers: The Hurd is already usable, but it is not yet in production state. It progressed a lot during the recent years, though. Have a look at the status report if you want to see if it’s already interesting for you. See running the Hurd for testing it yourself.

Thanks for explaining your reasons. As answer:

Influence on other systems: FUSE in Linux and limited translators in BSD

Firstoff: FUSE is essentially an implementation of parts of the translator system (which is the main building block of the Hurd) to Linux, and NetBSD recently got a port of the translators system of the Hurd. That’s the main contribution to other projects that I see.

translator-based filesystem

On the bare technical side, the translator-based filesystem stands out: The filesystem allows for making arbitrary programs responsible for displaying a given node (which can also be a directory tree) and to start these programs on demand. To make them persistent over reboots, you only need to add them to the filesystem node (for which you need the right to change that node). Also you can start translators on any node without having to change the node itself, but then they are not persistent and only affect your view of the filesystem without affecting other users. These translators are called active, and you don’t need write permissions on a node to add them.

network transparency on the filesystem level

The filesystem implements stuff like Gnome VFS (gvfs) and KDE network transparency on the filesystem level, so those are available for all programs. And you can add a new filesystem as simple user, just as if you’d write into a file “instead of this node, show the filesystem you get by interpreting file X with filesystem Y” (this is what you actually do when setting a translator but not yet starting it (passive translator)).

One practical advantage of this is that the following works:

settrans -a ftp\: /hurd/hostmux /hurd/ftpfs /
dpkg -i ftp://ftp.gnu.org/path/to/*.deb

This installs all deb-packages in the folder path/to on the FTP server. The shell sees normal directories (beginning with the directory “ftp:”), so shell expressions just work.

You could even define a Gentoo mirror translator (settrans mirror\: /hurd/gentoo-mirror), so every program could just access mirror://gentoo/portage-2.2.0_alpha31.tar.bz2 and get the data from a mirror automatically: wget mirror://gentoo/portage-2.2.0_alpha31.tar.bz2

unionmount as user

Or you could add a unionmount translator to root which makes writes happen at another place. Every user is able to make a readonly system readwrite by just specifying where the writes should go. But the writes only affect his view of the filesystem.

persistent translators, started when needed

Starting a network process is done by a translator, too: The first time something accesses the network card, the network translator starts up and actually provides the device. This replaces most initscripts in the Hurd: Just add a translator to a node, and the service will persist over restarts.

It’s a surprisingly simple concept, which reduces the complexity of many basic tasks needed for desktop systems.

And at its most basic level, Hurd is a set of protocols for messages which allow using the filesystem to coordinate and connect processes (along with helper libraries to make that easy).

add permissions at runtime (capabilities)

Also it adds POSIX compatibility to Mach while still providing access to the capabilities-based access rights underneath, if you need them: You can give a process permissions at runtime and take them away at will. For example you can start all programs without permission to use the network (or write to any file) and add the permissions when you need them.

Different from Linux, you do not need to start privileged and drop permissions you do not need (goverened by the program which is run), but you start as unprivileged process and add the permissions you need (governed by an external process):

groups # → root
addauth -p $(ps -L) -g mail
groups # → root mail 

lightweight virtualization

And then there are subhurds (essentially lightweight virtualization which allows cutting off processes from other processes without the overhead of creating a virtual machine for each process). But that’s an entire post of its own…

Easy to test lowlevel hacking

And the fact that a translator is just a simple standalone program means that these can be shared and tested much more easily, opening up completely new options for lowlevel hacking, because it massively lowers the barrier of entry.

For example the current Hurd can use the Linux network device drivers and run them in userspace (via DDE), so you can simply restart them and a crashing driver won’t bring down your system.

subdividing memory management

And then there is the possibility of subdividing memory management and using different microkernels (by porting the Hurd layer, as partly done in the NetBSD port), but that is purely academic right now (search for Viengoos to see what its about).

Summary

So in short:

The translator system in the Hurd is a simple concept which makes many tasks easy, which are complex with Linux (like init, network transparency, new filesystems, …). Additionally there are capabilities (give programs only the access they need - at runtime), subhurds and (academic) memory management.

Best wishes,
Arne

PS: I decided to read flameeyes’ post as “please give me technical reasons to dispell my emotional impression”.

PPS: If you liked this post, it would be cool if you’d flattr it: Flattr this

PPPS: Additional information can be found in Gaël Le Mignot’s talk notes, in niches for the Hurd and the GNU Hurd documentation pages.

P4S: This post is also available in the Hurd Staging Wiki.

P5S: As an update in 2015: A pretty interesting development I saw in the past few years is that the systemd developers have been bolting features onto Linux which the Hurd already provided 15 years ago. Examples: socket-activation provides on-demand startup like passive translators, but as crude hack piggybacked on dbus which can only be used by dbus-aware programs while passive translators can be used by any program which can access the filesystem, calling priviledged programs via systemd provides jailed priviledge escalation like adding capabilities at runtime, but as crude hack piggybacked on dbus and specialized services.

That means, there is a need for the features of the Hurd, but instead of just using the Hurd, where they are cleanly integrated, these features are bolted onto a system where they do not fit and suffer from bad performance due to requiring lots of unnecessary cruft to circumvent limitations of the base system. The clean solution would be to just set 2-3 full-time developers onto the task of resolving the last few blockers (mainly sound and USB) and then just using the Hurd.

(A)GPL as hack on a Python-powered copyright system

AGPL is a hack on copyright, so it has to use copyright, else it would not compile/run.

All the GPL licenses are a hack on copyright. They insert a piece of legal code into copyright law to force it to turn around on itself.

You run that on the copyright system, and it gives you code which can’t be made unfree.

To be able to do that, it has to be written in copyright language (else it could not be interpreted).

my_code = "<your code>"

def AGPL ( code ): 
    """
    >>> is_free ( AGPL ( code ) )
    True
    """
    return eval (
        transform_to_free ( code ) )

copyright ( AGPL ( my_code ) )

You pass “AGPL ( code )” to the copyright system, and it ensures the freedom of the code.

The transformation means that I am allowed to change your code, as long as I keep the transformation, because copyright law sees only the version transformed by AGPL, and that stays valid.

Naturally both AGPL definition and the code transformed to free © must be ©-compatible. And that means: All rights reserved. Else I could go in and say: I just redefine AGPL and make your code unfree without ever touching the code itself (which is initially owned by you by the laws of ©):

def AGPL ( code ): 
    """ 
    >>> is_free ( AGPL ( code ) )
    False
    """
    return eval (
        transform_to_mine ( code ) )

In this Python-powered copyright-system, I could just define this after your definition but before your call to copyright(), and all calls to APGL ( code ) would suddenly return code owned by me.

Or you would have to include another way of defining which exact AGPL you mean. Something like “AGPL, but only the versions with the sha1 hashes AAAA BBBB and AABA”. cc tries to use links for that, but what do you do if someone changes the DNS resolution to point creativecommons.org to allmine.com? Whose DNS server is right, then - legally speaking?

In short: AGPL is a hack on copyright, so it has to use copyright, else it would not compile/run.

Communicating your project: honest marketing for free software projects

Communicating your project is an essential step for getting users. Here I summarize my experience from working on several different projects including KDE (where I learned the basics of PR - yay, sebas!), the Hurd (where I could really make a difference by improving the frontpage and writing the Month of the Hurd), Mercurial (where I practiced the minimally invasive PR) and 1d6 (my own free RPG where I see how much harder it is to do PR, if the project to communicate is your own).

Since voicing the claim that marketing is important often leads to discussions with people who hate marketing of any kind, I added an appendix with an example which illustrates nicely what happens when you don’t do any PR - and what happens if you do PR of the wrong kind.

If you’re pressed for time and want the really short form, just jump to the questionnaire.

What is good marketing?

Before we jump directly to the guide, there is an important term to define: Good marketing. That is the kind of marketing, we want to do.

The definition I use here is this:

Good marketing ensures that the people to whom a project would be useful learn about the project.

and

Good marketing starts with the existing strengths of a project and finds people to whom these strengths are useful.

Thus good marketing does not try to interfere with the greater plan of the project, though it might identify some points where a little effort can make the project much more interesting to users. Instead it finds users to whom the project as it is can be useful - and ensures that these know about the project.

Be fair to competitors, be honest to users, put the project goals before generic marketing considerations.

As such, good marketing is an interface between the project and its (potential) users.

How to communicate your project?

This guide depends on one condition: Your project already has at least one area in which it excels over other projects. If that isn’t the case, please start by making your project useful to at least some people.

The basic way for communicating your project to its potential users always follows the same steps.

To make this text easier to follow, I’ll intersperse it with examples from the latest project where I did this analysis: GNU Guile: The GNU Ubiquitous Intelligent Language for Extensions. Guile provides a nice example, because its mission is clearly established in its name and it has lots of backing, but up until our discussion actually had a wikipedia-page which was unappealing to the point of being hostile against Guile itself.

To improve the communication of our project, we first identify our target groups.

Who are our Target Groups?

To do so, we begin by asking ourselves, who would profit from our project:

  • What can we do well and how do we compare to others?
  • To whom would we already be useful or interesting if people knew about our strengths?
  • To whom are we already the best option?

Try to find about 3 groups of people and give them names which identify them. Those are the people we must reach to grow on the short term.

In the next step, we ask ourselves, whom we want or need as users to fullfill our mission (our long-term goal):

  • Where do we want to get? What is our goal? (do we have a mission statement?)
  • Whom do we need to get there?
  • Whom do we want as users? Those shape us as they take part in the development - either as users or as fellow developers.

Again try to find about 3 groups of people and give them names which identify them. Those are the people we must reach to achieve our longterm goal. If while writing this down you find that one of the already identified groups which we could reach would actually detract us from our goal, mark them. If they aren’t direly needed, we would do best to avoid targeting them in our communication, because they will hinder us in our longterm progress: They could become a liability which we cannot get rid of again.

Now we have about 6 target groups: Those are the people who should know about our project, either because they would benefit from it for pursuing their goals, or because we need to reach them to achieve our own goals. We now need to find out, which kind of information they actually need or search.

Example: Target Groups for Guile

GNU Guile is called The GNU Ubiquitous Intelligent Language for Extensions. So its mission is clear: Guile wants to become the de-facto standard language for extending programs - at least within the GNU project.

For whom are we already useful or interesting? Name them as Target-Groups.
  1. Schemer: Wants to see what GNU Scheme can do.
  2. Extender: GNU enthusiast wants to extend an existing program with a scripting language.
  3. Learner: Free Software enthusiast thinks about using Guile to learn programming
  4. Project-Starter: Experienced Programmer wants to start a new project.
  5. 1337: Programmer wants the coolness-factor.
  6. Emacser: Emacs users want to see what the potential future of Emacs would hold.
Whom do we want as users on the long run? Name them as Target-Groups.

\7. GNU-folk: All GNU developers.

What could they ask?

This part just requires thinking ourselves into the role of each of the target groups. For each of the target groups, ask yourself:

What would you want to know, if you were to read about our project?

As result of this step, we have a set of answers. Judge them on their strengths: Would these answers make you want to invest time to test our project? If not, can we find a better answer?

Example: Questions for the Target-Groups of Guile

  1. Schemer: What can guile do better than other Schemes?
  2. Extender: What does Guile offer my program? Why Guile and not Python/Lua?
  3. Learner: How easy and how powerful is Guile Scheme? Why Guile and not Python?
  4. Starter: What’s the advantage of starting my advanced project with guile?
  5. 1337: Why is guile cool?
  6. Emacser: What does Guile offer for Emacs?
  7. GNU-folk: What does Guile offer my program? (Being a GNU package is a distinct advantage, so there is less competition by non-GNU languages)

Whose wishes can we fullfill?

If our answers for a given group are not yet strong enough, we cannot yet communicate our project convincingly to them. In that case it is best to postpone reaching out to that group, otherwise they could get a lasting weak image of our project which would make it harder to reach them when we have stronger answers at some point in the future.

Remove all groups whose wishes we cannot yet fullfill, or for whom we do not see ourselves as the best choice.

Example: Chosen Target-Groups

  1. Schemer: Guile is a solid implementation of Scheme. For a comparison, see An opinionated Guide to Scheme implementations.
  2. Extender: The guile manual offers a nicely detailed guide for extending a program with Guile. We’re a bit weak on the examples and existing extensions, though, especially on non-GNU-plattforms.
  3. Learner: There aren’t yet tutorials for learning to program in Guile, though there are tutorials for learning to write scheme - and even one for understanding Scheme from the view of a Python-user. But our project resources cannot yet support people who cannot program at all well enough, so we have to restrict ourselves to programmers who want to learn a new language.
  4. Starter: Guile has solid support for many unix-specific things, but it is not yet a complete project-publishing solution. So we have to restrict ourselves to targeting people who want to start a project which is mainly intended to be used in environments with proper package management (mostly GNU/Linux).
  5. 1337: Guile is explicitely named in the GNU Coding Standards. It doesn’t get much cooler than that - at least for a certain idea of cool. We can’t get the Java-1337s, but we can get the Free Software-1337s.
  6. Emacser: Guile provides foreign-function-call. If guile gets used as base for Emacs, Emacs users get direct access to all scheme functions, too - as well as real threading. And that’s pretty strong. Also Geiser provides solid Guile Scheme support in Emacs.
  7. GNU-folk: They are either extenders or project starters or learners, so we don’t need to treat them as their own group.

Provide those answers!

Now we have answers for the target groups. When we now talk or write about our project, we should keep those target groups in mind.

You can make that arbitrarily complex, for example by trying to find out which of our target groups use which medium. But lets keep it simple:

Ensure that our website (and potentially existing wikipedia page) includes the information which matters to our target groups. Just take all the answers for all the target groups we can already reach and check whether the basic information contained in them is given on the front page of our website.

And if not, find ways to add it.

As next steps, we can make sure that the questions we found for the target groups not only get answered, but directly lead the target groups to actions: For example to start using our project.

Example: The new Wikipedia-Page of Guile

For Guile, we used this analysis to fix the Wikipedia-Page. The old-version mainly talked about history and weaknesses (to the point of sounding hostile towards Guile), and aside from the latest release number, it was horribly outdated. And it did not provide the information our target groups required.

The current Wikipedia-Page of GNU Guile works much better - for the project as well as for the readers of the page. Just comare them directly and you’ll see quite a difference. But aside from sounding nicer, the new site also addresses the questions of our target groups. To check that, we now ask: Did we include information for all the potential user-groups?

  1. Schemers: Yepp (it’s scheme and there’s a section on Guile Scheme
  2. Extenders: Yepp (libguile)
  3. Learners: Not yet. We might need a syntax-section with some examples. But wikipedians do not like Howto-Like sections. Also the interpreter should get a notice.
  4. Project-Starters: Partly in the “core idea”-part in the section Guile Scheme. It might need one more paragraph showing advantages of Guile which make it especially suited for that.
  5. 1337s: It is the preferred extension system for the GNU Project. If you’re not that kind of 1337: The Macro-System is hygienic (no surprising side-effects).
  6. Emacs users: They got their own section.

So there you go: Not perfect, but most of the groups are covered. And this also ensures that the Wikipedia-page is more interesting to its readers: A clear win-win.

Further points

Additional points which we should keep in mind:

  • On the website, do all of our target groups quickly find their way to advanced information about their questions? This is essential to keep the ones interested who aren’t completely taken by the short answers.
  • What is a common negative misconception about our project? We need to ensure that we do not write anything which strengthens this misconception. Is there an existing strength, which we can show to counter the negative misconception?
  • Where do we want to go? Do we have a mission statement?

bab-com q: Arne Babenhauserheide’s Project Communication Questionaire

  • For whom are we already useful or interesting? Name them as Target-Groups.

    • (1)
    • (2)
    • (3)
  • Whom do we want as users on the long run? Name them as Target-Groups.

    • (4)
    • (5)
    • (6)
  • What could the Target-Groups ask? What are their needs? Formulate them as questions.
    • (1)
    • (2)
    • (3)
    • (4)
    • (5)
    • (6)
  • Answer their questions.
    • (1)
    • (2)
    • (3)
    • (4)
    • (5)
    • (6)
  • Whose needs can we already fulfill well? For whom do we see ourselves as the best choice?
    • (1)
    • (2)
    • (3)
    • (4)
  • Ensure that our communication includes the answers to these questions (i.e. website, wikipedia page, talks, …), at least for the groups who are likely to use the medium on which we communicate!

Use bab-com to avoid bad-com ☺

Note: The mission statement

The mission statement is a short paragraph in which a project defines its goal.

A good example is:

Our mission is to create a general-purpose kernel suitable for the GNU operating system, which is viable for everyday use, and gives users and programs as much control over their computing environment as possible.GNU Hurd mission explained

Another example again comes from Guile:

Guile was conceived by the GNU Project following the fantastic success of Emacs Lisp as an extension language within Emacs. Just as Emacs Lisp allowed complete and unanticipated applications to be written within the Emacs environment, the idea was that Guile should do the same for other GNU Project applications. This remains true today.Guile and the GNU project

Closely tied to the mission statement is the slogan: A catch-phrase which helps anchoring the gist of your project in your readers mind. Guile does not have that, yet, but judging from its strengths, the following could work quite well for Guile 2.0 - though it falls short of Guile in general:

GNU Guile scripting: Use Guile Scheme, reuse anything.

Summary

We saw why it is essential to communicate the project to the outside, and we discussed a simple structure to check whether our way of communication actually fits our projects strengths and goals.

Finding the communication strategy actually boils down to 3 steps:

  • Target those who would profit from our project or whom we need.
  • Check what they need to know.
  • Answer that.

Also a clear mission statement, slogan and project description help to make the project more tangible for readers. In this context, good marketing means to ensure that people learn about the real strengths of the project.

With that I’ll conclude this guide. Have fun and happy hacking!
— Arne Babenhauserheide


Appendix: Why communicating your project?

In free software we often think that quality is a guarantee for success. But in just the 10 years I’ve been using free software nowadays, I saw my share of technically great projects succumbing to inferior projects which simply reached more people and used that to build a dynamic which greatly outpaced the technically better product.

One example for that are pkgcore and paludis. When portage, the package manager of Gentoo, grew too slow because it did ever more extensive tests, two teams set out to build a replacement.

One of the teams decided that the fault of the low performance lay in Python, the language used by portage. That team built a package manager in C++ and had --wonderfully-long-command-options without shortcuts (have fun typing), and you actually had to run it twice: Once to see what would get installed and then again to actually install it (while portage had had an --ask option for ages, with -a as shortcut). And it forgot all the work it had done in the previous run, so you could wait twice as long for the result. They also had wonderful latin names, and they managed the feat of being even slower than portage, despite being written in C++. So their claim that C++ would be magically faster than python was simply wrong. They called their program paludis.

Note: Nowadays paludis has a completely new commandline interface which actually supports short command options. That interface is called cave and looks sane.

The other team did a performance analysis and realized that the low performance actually lay with the filesystem: The portage tree, which holds the required information, contains about 30,000 ebuilds and almost 200,000 files in total, and portage accessed far more of those files than actually needed for resolving the dependencies needed to install the package. They picked python as their language - just like portage. They used almost the same commandline options as portage, except for the places where functionality differed. And they actually got orders of magnitude faster than portage - so fast that their search command often finished after less than a second while. portage took over 10 seconds. They called their program pkgcore.

Both had more exact resolution of packages and could break cyclic dependencies and so on.

So, judging from my account of the quality, which project would you expect to succeed?

I sure expected pkgcore to replace portage within a few months. But this is not what happened. And as I see it in hindsight, the difference lay purely in PR.

The paludis team with their slow and hard-to-use program went all over the Gentoo forums claiming that Python is a horrible language and that a C program will kick portage any time. On their website they repeated their attacks against python and claimed superiority at every step. And they gathered quite a few zealots. While actually being slower than portage. Eventually they rebranded paludis as just better and more correct, not faster. And they created their own distribution (exherbo) as direct rival of Gentoo. With a new, portage-incompatible package format. As if they had read the book, how not to be a friendly competitor.

The pkgcore team on the other hand focussed on good technology. They created the snakeoil library for high-performance python code, but they were friendly about it and actually contributed back to portage where code could be shared. But their website was out of date, often not noting the newest release and you actually had to run pmerge --help to see the most current commandline options (though you could simply guess them if you knew portage). And they got attacked by paludis zealots so much, that this year the main developer finally sacked the project: He told me on IRC that he had taken so much vitriol over the years that it simply wasn’t worth the cost anymore.

So, what can we learn from this? Technical superiority does not gain you anything, if you fail to convince people to actually use your project.

If you don't communicate your project, you don't get users. If you don’t get users, your chances of losing motivation are orders of magnitude higher than if you get users who support you.

And aggressive marketing works, even if you cannot actually deliver on your promises. Today they have a better user-interface and even short option-names. But even to date, exherbo has much fewer packages in its repositories than Gentoo. If the number of files is any measure, the 10,000 files in their special repositories are just about 5% of the almost 200,000 files portage holds. But they managed quite well to fraction the Gentoo users - at least for some time. And their repeated pushes for new standards in the portage tree (EAPIs) created a constant pressure on pkgcore to adapt, which had the effect that nowadays pkgcore cannot install from the portage tree anymore (the search still works, though, and I still use it - I will curse mightily on the day they manage to also break that).

So aggressive marketing and doing everything in the book of unfriendly competition might have allowed the paludis devs to gather some users and destroy the momentum of pkgcore, but it did not allow them to actually become a replacement of portage within Gentoo. Their behaviour alienated far too many people for that. So aggressive and unfriendly marketing is better than no marketing, but it has severe drawbacks which you will likely want to avoid.

If you use overly aggressive, unfriendly or dishonest communication tactics, you get some users, but if your users know their stuff, you won’t win the mindshare you need to actually make a difference.

If on the other hand you want to see communication done right, just take a look at KDE and Gnome nowadays. They cooperate quite well, and they compete on features and by improving their project so users can take an informed choice about the project they choose.

And their number of contributors steadily keeps growing.

So what do they do? Besides being technically great, it boils down to good marketing.

Download one page from a website with all its prerequisites

Often I want to simply backup a single page from a website. Until now I always had half-working solutions, but today I found one solution using wget which works really well, and I decided to document it here. That way I won’t have to search it again, and you, dear readers, can benefit from it, too ☺

In short: This is the command:

wget --no-parent --timestamping --convert-links --page-requisites --no-directories --no-host-directories --span-hosts --adjust-extension --no-check-certificate -e robots=off -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' [URL]

Optionally add --directory-prefix=[target-folder-name]

(see the meaning of the options and getting wget for some explanation)

That’s it! Have fun copying single sites! (but before passing them on, ensure that you have the right to do it)

Does this really work?

As a test, how about running this:

wget -np -N -k -p -nd -nH -H -E --no-check-certificate -e robots=off -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4' --directory-prefix=download-web-site http://draketo.de/english/download-web-page-with-all-prerequisites

(this command uses the short forms of the options)

Then test the downloaded page with firefox:

firefox download-web-site/download-web-page-all-prerequisites.html

Getting wget

If you run GNU/Linux, you likely already have it - and if not, then your package manager has it. GNU wget is one of the standard tools available everywhere.

Some information in the (sadly) typically terse style can be found on the wget website from the GNU project: gnu.org/s/wget.

In case you run Windows, have a look at Wget for Windows from the gnuwin32 project or at GNU Wgetw for Windows from eternallybored.

Alternatively you can get a graphical interface via WinWGet from cybershade.

Or you can get serious about having good tools and install MSYS or Cygwin - the latter gets you some of the functionality of a unix working environment on windows, including wget.

If you run MacOSX, either get wget via fink, homebrew or MacPorts or follow the guide from osxdaily or the german guide from dirk (likely there are more guides - these two were just the first hits in google).

The meaning of the options (and why you need them):

  • --no-parent: Only get this file, not other articles higher up in the filesystem hierarchy.
  • --timestamping: Only get newer files (don’t redownload files).
  • --page-requisites: Get all files needed to display this page.
  • --convert-links: Change files to point to the local files you downloaded.
  • --no-directories: Do not create directories: Put all files into one folder.
  • --no-host-directories: Do not create separate directories per web host: Really put all files in one folder.
  • --span-hosts: Get files from any host, not just the one with which you reached the website.
  • --adjust-extension: Add a .html extension to the file.
  • --no-check-certificate: Do not check SSL certificates. This is necessary if you’re missing one of the host certificates one of the hosts uses. Just use this. If people with enough power to snoop on your browsing would want to serve you a changed website, they could simply use one of the fake certifications authorities they control.
  • -e robots=off: Ignore robots.txt files which tell you to not spider and save this website. You are no robot, but wget does not know that, so you have to tell it.
  • -U 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.6) Gecko/20070802 SeaMonkey/1.1.4': Fake being an old Firefox to avoid blocking based on being wget.
  • --directory-prefix=[target-folder-name]: Save the files into a subfolder to avoid having to create the folder first. Without that options, all files are created in the folder in which your shell is at the moment. Equivalent to mkdir [target-folder-name]; cd [target-folder-name]; [wget without --directory-prefix]

Conclusion

If you know the required options, mirroring single pages from websites with wget is fast and easy.

Note that if you want to get the whole website, you can just replace --no-parent with --mirror.

Happy Hacking!

Fix Quod Libet empty panes on Gentoo GNU/Linux (bug solving process)

PDF-version (for printing)

orgmode-version (for editing)

For a few days now my Quod Libet has been broken, showing only empty space instead of information panes.

2013-12-11-quod-libet-broken.png

I investigated halfheartedly, but did not find the cause with quick googling. Today I decided to change that. I document my path here, because I did not yet write about how I actually tackle problems like these - and I think I would have profited from having a writeup like this when I started, instead of having to learn it by trial-and-error.

Update: Quodlibet 2.6.3 is now in the Gentoo portage tree - using my ebuild. The update works seamlessly. So to get your Quodlibet 2.5 running again, just call emerge =media-sound/quodlibet-2.6.3 =media-plugins/quodlibet-plugins-2.6.3. Happy Hacking!

Update: I got a second reply in the bug tracker which solved the plugins problem: I had user-plugins which require Quod Libet 3. Solution: mv ~/.quodlibet/plugins ~/.quodlibet/plugins.for-ql3. Quod Libet works completely again.

Solution for the impatient: Update to Quod Libet 2.5.1. In Gentoo that’s easy.

1 Gathering Information

As starting point I installed the Quod Libet plugins (media-libs/quodlibet-plugins), thinking that the separation between plugins and mediaplayer might not be perfect. That did not fix the problem, but a look at the plugin listing gave me nice backtraces:

2013-12-11-quod-libet-broken-plugins.png

And these actually show the reason for the breakage: Cannot import GTK:

Traceback (most recent call last):
  File "/home/arne/.quodlibet/plugins/songsmenu/albumart.py", line 51, in <module>
    from gi.repository import Gtk, Pango, GLib, Gdk, GdkPixbuf
  File "/usr/lib64/python2.7/site-packages/gi/__init__.py", line 27, in <module>
    from ._gi import _API, Repository
ImportError: cannot import name _API

Let’s look which package this file belongs to:

equery belongs /usr/lib64/python2.7/site-packages/gi/__init__.py
 * Searching for /usr/lib64/python2.7/site-packages/gi/__init__.py ... 
dev-python/pygobject-3.8.3 (/usr/lib64/python2.7/site-packages/gi/__init__.py)

So I finally have an answer: pygobject changed the API. Can’t be hard to fix… (a realization process follows)

2 The solution-hunting process

  • let’s check the Gentoo forums for pygobject
  • pygobject now pulls systemd??? - and they wonder why I’m pissed off by systemd: hugely invasive changes just for some small packages… KDE gets rid of the monolithic approach, and now Gnome starts it, just much more invasive into the basic structure of all distros?
  • set the USE flag -systemd to avoid systemd (why didn’t I have that yet? I guess I did not expect that Gentoo would push that on me…)
  • check when I updated pygobject:
qlop -l pygobject
...
Thu Dec  5 00:26:27 2013 >>> dev-python/pygobject-3.8.3
  • a week ago - that fits the timeframe. Damn… pygobject-3.8.3, you have to go.
echo =dev-python/pygobject-3.8.3 >> /usr/portage/package.mask
emerge -u pygobject
  • hm, no, the backtrace was for the plugin, but when I start Quod Libet from the shell, I see this:
LANG=C quodlibet
/usr/lib64/python2.7/site-packages/quodlibet/qltk/songlist.py:44: GtkWarning: Unable to locate theme engine in module_path: "clearlooks",
  _label = gtk.Label().create_pango_layout("")
  • emerge x11-themes/clearlooks-phenix to get clearlooks again. Looks nicer now, but still not fixed.

2013-12-11-quod-libet-broken-clearlooks.png

  • back to the drawing board. Let’s tackle this pygobject thing: emerge -C =dev-python/pygobject-3.8.3/, emerge -1 =dev-python/pygobject-2.28.6-r55.
  • not fixed. OK… let’s report a bug: empty information panes (screenshots attached).

3 The core solution

In the bug report at Quod Libet I got a reply: Known issue with quodlibet 2.5 “which triggered a bug in a recent pygtk release, resulting in lists not showing”. The plugins seem to be unrelated. Solution to my immediate problem: Update to 2.5.1. That’s not yet in gentoo, but this is easy to fix:

cd /usr/portage/media-sound/
# create the category in my local portage overlay, defined as
# PORTAGE_OVERLAY=/usr/local/portage in /etc/make.conf
mkdir -p /usr/local/portage/media-sound
# copy over the quodlibet directory, keeping the permissiong with -p
cp -rp quodlibet /usr/local/portage/media-sound
# most times it is enough to simply rename the ebuild to the new version
cd /usr/local/portage/media-sound/quodlibet
mv quodlibet-2.5.ebuild quodlibet-2.5.1.ebuild
# now prepare all the metadata portage needs - this requires
# app-portage/gentoolkit
ebuild quodlibet-2.5.1.ebuild digest compile 
# now it's prepared for the package manager. Just update it as usual:
emerge -u quodlibet

I wrote the solution in the Gentoo bug report. I should also state, that the gentoo package for Quod Libet is generally out of date (releases 2.6.3 and 3.0.2 are not yet in the tree).

Quod Libet works again.

2013-12-11-quod-libet-fixed.png

As soon as the ebuild in the portage tree is renamed, Quod Libet should work again for all Gentoo users.

The plugins still need to be fixed, but I’ll worry about that later.

4 Conclusion

Solving the core problem took me some time, but it wasn’t really complicated. The part of the solution process which got me forward boils down to:

  • checking the project bug tracker,
  • checking the distribution bug tracker,
  • reporting a bug for the project with the information I could gather - including screenshots (or anything else which shows the problem directly - see How to Report Bugs Effectively for hints on that), and
  • checking the reported bug again a few hours or days later - and synchronizing the information between the project bug tracker and the distribution bug tracker to help fixing the bug for all users of the distribution and of other distributions.

And that’s it: To get something working again, check the bug trackers, report bugs and help synchronizing bug tracker info.

AnhangGröße
2013-12-11-quod-libet-broken.png49.59 KB
2013-12-11-quod-libet-broken-clearlooks.png50.44 KB
2013-12-11-quod-libet-broken-plugins.png27.47 KB
2013-12-11-quod-libet-fixed.png85.61 KB
2013-12-11-Mi-quodlibet-broken.org7.11 KB
2013-12-11-Mi-quodlibet-broken.pdf419.37 KB

GnuPG/PGP signature, short explanation

»What is the .asc file?« This explanation is intended to be copied as-is into emails when someone asks about your signature.

The .asc file is a signature which can be used to verify that the email was really sent by me and wasn’t tampered with.[1] It can be verified with standard email security tools like Enigmail[2], Gpg4win[3] or MacGPG[4] - and others tools supporting OpenPGP[5].

Best wishes,
Arne

[1]: For further information on signatures see
    https://www.gnupg.org/gph/en/manual/x135.html

[2]: Enigmail enables secure communication in Thunderbird:
    https://addons.mozilla.org/de/thunderbird/addon/enigmail/

[3]: GPG4win provides secure encryption for Windows:
    http://gpg4win.org/download.html

[4]: MacGPG provides encryption for MacOSX:
    https://gpgtools.org/

[5]: Encryption for other systems is available from the GnuPG website:
    https://www.gnupg.org/download/

Going from a simple Makefile to Autotools

Table of Contents

Links

Intro

I recently started looking into Autotools, to make it easier to run my code on multiple platforms.

Naturally you can use cmake or scons or waf or ninja or tup, all of which are interesting in there own respect. But none of them has seen the amount of testing which went into autotools, and none of them have the amount of tweaks needed to support about every system under the sun. And I recently found pyconfigure which allows using autotools with python and offers detection of library features.

I had already used Makefiles for easily storing the build information of anything from python projects (python setup.py build) to my PhD thesis with all the required graphs.

I also had used scons for those same tasks.

But I wanted to test, what autotools have to offer. And I found no simple guide which showed me how to migrate from a Makefile to autotools - and what I could gain through that.

So I decided to write one.

My Makefile

The starting point is the Makefile I use for building my PhD. That’s pretty generic and just uses the most basic features of make.

If you do not know it yet: A basic makefile has really simple syntax:

# comments start with #
thing : required source files # separated by spaces
    build command
    second build command
# ^ this is a TAB.

The code above is a rule. If you put a file with this content into some folder using the filename Makefile and then run make thing in that folder (in a shell), the program “make” will check whether the source files have been changed after it last created the thing and if they have been changed, it will execute the build commands.

You can use things from other rules as source file for your thing and make will figure out all the tasks needed to create your thing.

My Makefile below creates plots from data and then builds a PDF from an org-mode file.

all: doktorarbeit.pdf sink.pdf

sink.pdf : sink.tex images/comp-t3-s07-tem-boas.png images/comp-t3-s07-tem-bona.png images/bona-marble.png images/boas-marble.png
    pdflatex sink.tex
    rm -f  *_flymake* flymake* *.log *.out *.toc *.aux *.snm *.nav *.vrb # kill litter

comp-t3-s07-tem-boas.png comp-t3-s07-tem-bona.png : nee-comp.pyx nee-comp.txt
    pyxplot nee-comp.pyx

doktorarbeit.pdf : doktorarbeit.org
    emacs --batch --visit "doktorarbeit.org" --funcall org-export-as-pdf  

Feature Equality

The first step is simple: How can I replicate with autotools what I did with the plain Makefile?

For that I create the files configure.ac and Makefile.am. The basic Makefile.am is simply my Makefile without any changes.

The configure.ac sets the project name, inits automake and tells autoreconf to generate a Makefile.

dnl run `autoreconf -i` to generate a configure script. 
dnl Then run ./configure to generate a Makefile.
dnl Finally run make to generate the project.

AC_INIT([Doktorarbeit Inverse GHG], [0.1], [arne.babenhauserheide@kit.edu])
dnl we use the build type foreign here instead of gnu because I do not have a NEWS file and similar, yet.
AM_INIT_AUTOMAKE([foreign])
AC_CONFIG_FILES([Makefile])
AC_OUTPUT

Now, if I run `autoreconf -i` it generates a Makefile for me. Nothing fancy here: The Makefile just does what my old Makefile did.

But it is much bigger, offers real –help output and can generate a distribution - which does not work yet, because it misses the source files. But it clearly tells me that with `make distcheck`.

make dist: distributing the project

Since `make dist` does not work yet, let’s change that.

… easier said than done. It took me the better part of a day to figure out how to make it happy. Problems there:

  • I have to explicitely give automake the list of sources so it can copy them to the distributed package.
  • distcheck uses a separate build dir. Yes, this is the clean way, but it needs some hacking to get everything to work.
  • I use pyxplot for generating some plots. Pyxplot does not have a way (I know of) to search for datafiles in a different folder. I have to copy the files to the build dir and kill them after the build. But only if I use a separate build dir.
  • pdflatex can’t find included images. I have to adapt the TEXINPUT environment variable to give it the srcdir as additional search path.
  • Some of my commands litter the build directory with temporary or intermediate files. I have to clean them up.

So, after much haggling with autotools, I have a working make distcheck:

pdf_DATA = sink.pdf doktorarbeit.pdf

sink = sink.tex
pkgdata_DATA = images/comp-t3-s07-tem-boas.png images/comp-t3-s07-tem-bona.png
dist_pkgdata_DATA = images/bona-marble.png images/boas-marble.png

plotdir = .
dist_plot_DATA = nee-comp.pyx nee-comp.txt

doktorarbeit = doktorarbeit.org

EXTRA_DIST = ${sink} ${dist_pkgdata_DATA} ${doktorarbeit}

MOSTLYCLEANFILES = \#* *~ *.bak # kill editor backups
CLEANFILES = ${pdf_DATA}
DISTCLEANFILES = ${pkgdata_DATA}

sink.pdf : ${sink} ${pkgdata_DATA} ${dist_pkgdata_DATA}
    TEXINPUTS=${TEXINPUTS}:$(srcdir)/:$(srcdir)/images// pdflatex $<
    rm -f  *_flymake* flymake* *.log *.out *.toc *.aux *.snm *.nav *.vrb # kill litter

${pkgdata_DATA} : ${dist_plot_DATA}
    $(foreach i,$^,if test "$(i)" != "$(notdir $(i))"; then cp -u "$(i)" "$(notdir $(i))"; fi;)
    ${MKDIR_P} images
    pyxplot $<
    $(foreach i,$^,if test "$(i)" != "$(notdir $(i))"; then rm -f "$(notdir $(i))"; fi;)

doktorarbeit.pdf : ${doktorarbeit}
    if test "$<" != "$(notdir $<)"; then cp -u "$<" "$(notdir $<)"; fi
    emacs --batch --visit "$(notdir $<)" --funcall org-export-as-pdf
    if test "$<" != "$(notdir $<)"; then rm -f "$(notdir $<)"; rm -f $(basename $(notdir $<)).tex $(basename $(notdir $<)).tex~; else rm -f $(basename $<).tex $(basename $<).tex~; fi

You might recognize that this is not the simple Makefile anymore. It is now a setup which defines files for distribution and has custom rules for preparing script runs and for cleanup.

But I can now make a fully working distribution, so when I want to publish my PhD thesis, I can simply add the generated release tarball. I work in a Mercurial repo, so I would more likely just include the repo, but there might be reasons for leaving out the history - and be it only that the history might grow quite big.

An advantage is that in the process of preparing the dist, my automake file got cleanly separated into a section defining files and dependencies and one defining build rules.

But I now also understand where newer build tools like scons got their inspiration for the abstractions they use.

I should note, however, that if you were to build a software project in one of the languages supported by automake (C, C++, Python and quite a few others), I would not have needed to specify the build rules myself.

And being able to freely mix the dependency declaration in automake style with Makefile rules gives a lot of flexibility which I missed in scons.

Finding programs

Now I can build and distribute my project, but I cannot yet make sure that the programs I need for building actually exist.

And that’s finally something which can really help my build, because it gives clear error messages when something is missing, and it allows users to specify which of these programs to use via the configure script. For example I could now build 5 different versions of Emacs and try the build with each of them.

Also I added cross compilation support, though that is a bit over the top for simple PDF creation :)

Firstoff I edited my configure.ac to check for the tools:

dnl run `autoreconf -i` to generate a configure script. 
dnl Then run ./configure to generate a Makefile.
dnl Finally run make to generate the project.

AC_INIT([Doktorarbeit Inverse GHG], [0.1], [arne.babenhauserheide@kit.edu])
# Check for programs I need for my build
AC_CANONICAL_TARGET
AC_ARG_VAR([emacs], [How to call Emacs.])
AC_CHECK_TARGET_TOOL([emacs], [emacs], [no])
AC_ARG_VAR([pyxplot], [How to call the Pyxplot plotting tool.])
AC_CHECK_TARGET_TOOL([pyxplot], [pyxplot], [no])
AC_ARG_VAR([pdflatex], [How to call pdflatex.])
AC_CHECK_TARGET_TOOL([pdflatex], [pdflatex], [no])
AS_IF([test "x$pdflatex" = "xno"], [AC_MSG_ERROR([cannot find pdflatex.])])
AS_IF([test "x$emacs" = "xno"], [AC_MSG_ERROR([cannot find Emacs.])])
AS_IF([test "x$pyxplot" = "xno"], [AC_MSG_ERROR([cannot find pyxplot.])])
# Run automake
AM_INIT_AUTOMAKE([foreign])
AM_MAINTAINER_MODE([enable])
AC_CONFIG_FILES([Makefile])
AC_OUTPUT

And then I used the created variables in the Makefile.am: See the @-characters around the program names.

pdf_DATA = sink.pdf doktorarbeit.pdf

sink = sink.tex
pkgdata_DATA = images/comp-t3-s07-tem-boas.png images/comp-t3-s07-tem-bona.png
dist_pkgdata_DATA = images/bona-marble.png images/boas-marble.png

plotdir = .
dist_plot_DATA = nee-comp.pyx nee-comp.txt

doktorarbeit = doktorarbeit.org

EXTRA_DIST = ${sink} ${dist_pkgdata_DATA} ${doktorarbeit}

MOSTLYCLEANFILES = \#* *~ *.bak # kill editor backups
CLEANFILES = ${pdf_DATA}
DISTCLEANFILES = ${pkgdata_DATA}

sink.pdf : ${sink} ${pkgdata_DATA} ${dist_pkgdata_DATA}
    TEXINPUTS=${TEXINPUTS}:$(srcdir)/:$(srcdir)/images// @pdflatex@ $<
    rm -f  *_flymake* flymake* *.log *.out *.toc *.aux *.snm *.nav *.vrb # kill litter

${pkgdata_DATA} : ${dist_plot_DATA}
    $(foreach i,$^,if test "$(i)" != "$(notdir $(i))"; then cp -u "$(i)" "$(notdir $(i))"; fi;)
    ${MKDIR_P} images
    @pyxplot@ $<
    $(foreach i,$^,if test "$(i)" != "$(notdir $(i))"; then rm -f "$(notdir $(i))"; fi;)

doktorarbeit.pdf : ${doktorarbeit}
    if test "$<" != "$(notdir $<)"; then cp -u "$<" "$(notdir $<)"; fi
    @emacs@ --batch --visit "$(notdir $<)" --funcall org-export-as-pdf
    if test "$<" != "$(notdir $<)"; then rm -f "$(notdir $<)"; rm -f $(basename $(notdir $<)).tex $(basename $(notdir $<)).tex~; else rm -f $(basename $<).tex $(basename $<).tex~; fi  

Summary

With this I’m at the limit of the advantages of autotools for my simple project.

They allow me to create and check a distribution tarball with relative ease (if I know how to do it), and I can use them to check for tools - and to specify alternative tools via the commandline.

For a C or C++ project, autotools would have given me a lot of other things for free, but even the basic features shown here can be useful.

You have to judge for yourself if they outweight the cost of moving away from the dead simple Makefile syntax.

Comparing SCons

A little bonus I want to share.

I also wrote an scons script as alternative to my Makefile which I think might be interesting to you. It is almost equivalent to my Makefile since it can build my files, but scons does not match the features of the full autotools build and distribution system. Missing: Clean up temporary files and create a validated distribution tarball.

You might notice that the more declarative style with explicit dependency information looks quite a bit more similar to automake than to plain Makefiles.

The following is my SConstruct file:

#!/usr/bin/env python
## I need a couple of special builders for my projects
# the $SOURCE replacement only uses the first source file. $SOURCES gives all.
# specifying all source files makes it possible to rerun the build if a single source file changed.
orgexportpdf = 'emacs --batch --visit "$SOURCE" --funcall org-export-as-pdf'
pyxplot = 'pyxplot $SOURCE'
# pdflatex is quite dirty. I directly clean up after it with rm.
pdflatex = 'pdflatex $SOURCE -o $TARGET; rm -f  *_flymake* flymake* *.log *.out *.toc *.aux *.snm *.nav *.vrb'

# build the PhD thesis from emacs org-mode.
Command("doktorarbeit.pdf", "doktorarbeit.org",
        orgexportpdf)

# create plots
Command(["images/comp-t3-s07-tem-boas.png", 
         "images/comp-t3-s07-tem-bona.png"], 
        ["nee-comp.pyx", 
         "nee-comp.txt"],
        pyxplot)

# build my sink.pdf
Command("sink.pdf", 
        ["sink.tex", 
         "images/comp-t3-s07-tem-boas.png", 
         "images/comp-t3-s07-tem-bona.png", 
         "images/bona-marble.png", 
         "images/boas-marble.png"],
        pdflatex)

# My editors leave tempfiles around. I want them gone after a build clean. This is not yet supported!
tempfiles = Glob('*~') + Glob('#*#') + Glob('*.bak')
# using this here would run the cleaning on every run.
#Command("clean", [], Delete(tempfiles))

If you want to integrate building with scons into a Makefile, the following lines allow you to run scons with `make sconsrun`. You might have to also mark sconsrun as .PHONY.

sconsrun : scons
    python scons/bootstrap.py -Q

scons : 
    hg clone https://bitbucket.org/ArneBab/scons

Here you can see part of the beauty of autotools, because you can just add this to your Makefile.am instead of the Makefile and it will work inside the full autotools project (though without the dist-integration). So autotools is a real superset of simple Makefiles.

Notes

If org-mode export keeps pestering you about selecting a TeX-master everytime you build the PDF, add the following to your org-mode file:

#+BEGIN_LaTeX
%%% Local Variables:
%%% TeX-master: t
%%% End:
#+END_LaTeX
AnhangGröße
2013-03-05-Di-make-to-autotools.org12.9 KB

Huge datafiles in free culture projects under GPL

4 ways how large raw artwork files are treated in free culture projects to provide the editable source.1

In the discussion about license compatibility of the creativecommons sharealike license towards the GPL, Anthony asked how the source-requirement is solved for artwork which often has huge raw files. These are the 4 basic ways I described in my answer.

1. The Wesnoth Way

“The Source is what we have”

The project just asks artists for full resolution PNG image files (without all the layering information) - and only uses these to develop the art. This was spearheaded by the GPL-licensed strategy game Battle for Wesnoth.

This is a viable strategy and also allows developing art, though a bit less convenient than with the layered sources. For example the illustrator who created many of the images in the RPG I work on used our PNG instead of her photoshop file to extract a die from the cover she created for us. She took the chance to also touch up the colors a bit - she had learned some new tricks to improve her paintings.

This clearly complies with the GPL, because the GPL just requires providing the file used for editing published file. If the released file is what you actually use to change published files, then the published file is the source.

2. The External Storage

“Use the FTP, Luke”

Here, files which are too big to be versioned effectively or which most people don’t need when working with the project get version-numbers and are put into an external storage - like an FTP server.

I do that for gimp-files: I put these into our public release-listing via FTP. For example I used that for a multi-layer cover which gets baked into our PDF.

3. The Elegant Way

“Make it so!”

Here huge files are simply versioned alongside other files and the versions to be used are created directly from the multi-layered files. The usual way to do that is a Makefile in which scripts explicitly define how the derived file can be extracted.

This is most elegant, because it has no duplication of information, the source is always trivial to find, it’s always clear that the derived file really originated from the source and it is easy to avoid quality loss or even reduce it later.

The disadvantage is that it can be very cumbersome to force new developers to get all huge files and then create them before being able to really start developing.

The common way to do this is a Makefile - for example the one I use for building my PhD thesis.

4. Pragmatic Elegance

“Hybrids win”

All the ways above can be combined: Huge files are put in version control, but the derived files are included, too, to make it easier for new people to get in. Maybe the huge files are only included on request - for example they could be stubs with which the version control system can retrieve the full files when the user wants them. This can partially be done with the largefiles extension in Mercurial by just not getting the large files.

Also you can just keep separate raw files and derived files. This is also used in Battle for Wesnoth: Optimized files of the right size for the game are stored in one folder while the bigger full resolution files are stored separately.

If you want to include free art in a GPL-covered work, I hope this article gave you some inspiration!


  1. The die was created by Trudy Wenzel (2013) and is licensed under GPLv3 or later. 

Installing GNU Guix 0.6, easily

Org-Source (for editing)

PDF (for printing)

“Got a power-outage while updating? No problem: Everything still works”

GNU Guix is the new functional package manager from the GNU Project which complements the Nix-Store with a nice Guile Scheme based package definition format.

What sold it to me was “Got a power-outage while updating? No problem: Everything still works” from the Guix talk of Ludovico at the GNU Hacker Meeting 2013. My son once found the on-off-button of our power-connector while I was updating my Gentoo box. It took me 3 evenings to get it completely functional again. This would not have happened with Guix.

Update (2014-05-17): Thanks to zerwas from IRC @ freenode for the patch to guix 0.6 and nice cleanup!

Intro

Installation of GNU Guix is straightforward, except if you follow the docs, but it’s not as if we’re not used to that from other GNU utilities, which often terribly short-sell their quality with overly general documentation ☺

So I want to provide a short guide how to setup and run GNU Guix with ease. My system natively runs Gentoo, My system natively runs Gentoo, so some details might vary for you. If you use Gentoo, you can simply copy the commands here into the shell, but better copy them to a text-file first to ensure that I do not try to trick you into doing evil things with the root access you need.

In short: This guide provides the First Contact and Black Triangle for GNU Guix.

Getting GNU Guix

mkdir guix && cd guix
wget http://alpha.gnu.org/gnu/guix/guix-0.6.tar.gz
wget http://alpha.gnu.org/gnu/guix/guix-0.6.tar.gz.sig
gpg --verify guix-0.?.tar.gz.sig

Installing GNU Guix

tar xf guix-0.?.tar.gz
cd guix-0.?
./configure && make -j16
sudo make install

Setting up GNU Guix

Build users

Build-users allow for strong separation of build processes: They cannot affect each other, because they actually run as different users.

sudo screen
groupadd guix-builder
for i in `seq 1 10`;
  do
    useradd -g guix-builder -G guix-builder           \
            -d /var/empty -s `which nologin`          \
            -c "Guix build user $i" --system          \
            guix-builder$i;
  done
exit

(if you do not have GNU screen yet, you should get it. It makes working on remote servers enjoyable.

Add user work folder.

Also we want to run guix as regular user. We need to pre-create the user-specific build-directory. Note: This should really be done automatically.

sudo mkdir -p /usr/local/var/nix/profiles/per-user/$USER
sudo chown -R $USER:$USER /usr/local/var/nix/profiles/per-user/$USER

Fix store permissions

chgrp 1002 /nix/store; chmod 1775 /nix/store

Starting the guix daemon and making it launch at startup

this might be quite Gentoo-specific.

sudo screen
echo "#\!/bin/sh" >> /etc/local.d/guix-daemon.start
echo "guix-daemon --build-users-group=guix-builder &" >> /etc/local.d/guix-daemon.start
echo "#\!/bin/sh" >> /etc/local.d/guix-daemon.stop
echo "pkill guix-daemon" >> /etc/local.d/guix-daemon.stop
chmod +x /etc/local.d/guix-daemon.start
chmod +x /etc/local.d/guix-daemon.stop
exit

(the pkill is not the nice way of killing the daemon. Ideally the daemon should have a –kill option)

To avoid having to restart, we just launch the daemon once, now.

sudo /etc/local.d/guix-daemon.start

Adding the guix-installed programs to your PATH

Guix installs each state of the system in its own directory, which actually enables rollbacks. The current state is made available via ~/.guix-profile/, and so we need ~/.guix-profile/bin in our path:

echo "export PATH=$PATH:~/.guix-profile/bin" >> ~/.bashrc
. ~/.bashrc

Using guix

Guix comes with a quite complete commandline interface. The basics are

  • Update the package listing: guix pull
  • List available packages: guix package -A
  • Install a package: guix package -i PACKAGE
  • Update all packages: guix package -u

Experience

For a new distribution-tool, Guix is quite nice. Remember, though, that it builds on Nix: It is not a complete reinvention but rather “stands on the shoulders of giants”.

The download speeds are abysmal, though. http://hydra.gnu.org seems to have a horribly slow internet connection…

And what I direly missed is a short command explanation in the help output:

$ guix --help
Usage: guix COMMAND ARGS...
Run COMMAND with ARGS.

COMMAND must be one of the sub-commands listed below:

   build
   download
   gc
   hash
   import
   package
   pull
   refresh
   substitute-binary

Also I miss the categories I know from Gentoo: Having package-names like grue-hunter seems very unorganized compared to the games-text/grue-hunter which I know from Gentoo.

And it would be nice to have shorthands for the command names:

  • "guix pa -i" instead of "guix package -i" (though there is a namespace clash with guix pull :( )
  • "guix pu" for "guix pull"

and so on.

But anyway: A very interesting project which I plan to keep tracking. It might allow me to do less risky local package installs of stuff I need, like small utilities I wrote myself.

The big advantage of that would be, that I could actually take them with me when I have to use different distros (though I’ve been a happy Gentoo user for ~10 years and I don’t see it as likely that I’ll switch completely: Guix would have to include all the roughly 30k packages in Gentoo to actually be a full-fledged alternative - and provide USE flags and all the convenient configurability which makes Gentoo such a nice experience).

Using guix for such small stuff would allow me to decouple experiments from my production environment (which has to keep working).

But enough talk: Have fun with GNU Guix and Happy Hacking!

Author: Arne Babenhauserheide

Created: 2014-05-17 Sa 23:40

Emacs 24.3.1 (Org mode 8.2.5h)

Validate

AnhangGröße
2013-09-04-Mi-guix-install.org6.53 KB
2013-09-04-Mi-guix-install.pdf171.32 KB

Installing Scipy and PyNIO on a Bare Cluster with the Intel Compiler

2 years ago I had the task of running a python-program using scipy on our university cluster, using the Intel Compiler. I needed all those (as well as PyNIO and some other stuff) for running TM5 with the python shell on the HC3 of KIT.

This proved to be quite a bit more challenging than I had expected - but it was very interesting, too (and there I learned the basics of GNU autotools which still help me a lot).

But no one should have to go to the same effort with as little guidance as I had, so I decided to publish the script and the patches I created for installing everything we needed.1

The script worked 2 years ago, so you might have to fix some bits. I won’t promise that this contains everything you need to run the script - or that it won’t be broken when you install it. Actually I won’t promise anything at all, except that if the stuff here had been available 2 years ago, that could have saved me about 2 months of time (each of the patches here required quite some tracking of problems, experimenting and fixing, until it provided basic functionality - but actually I enjoyed doing that - I learned a lot - I just don’t want to be forced to do it again). Still, this stuff contains quite some hacks - even a few ugly ones. But it worked.

2 libraries and programs which get installed (=requirements)

This script requires and installs quite a few libraries. I retrieved most of the following tarballs from my Gentoo distfiles dir after installing the programs locally. I uploaded them to draketo.de/dateien/scipy-pynio-deps. These files are included there:

satexputils.so also needs interpolatelevels.F90 which I think that I am not allowed to share, so you’re on your own there. Guess why I do not like using non-free (or not-guaranteed-to-be-free) software.

3 Known Bugs

3.1 HDF autotools patch throws away some CFLAGS

The hdf autotools patch only retrieves the last CFLAG instead of all:

export CC='gcc-4.8.1 -Wall -Werror'                                                          
echo $CC | grep \ - | sed 's/.* -/-/'                                                                     
-Werror

If you have the regexp-foo to fix that, please improve the patch! But without perl (otherwise we’d have to install perl, too).

3.2 SciPy inline-C via weaver does not work

Udo Grabowski, the maintainer of our institutes sun-cluster somehow managed to get that working on OpenIndiana with the Sun-Compiler, but since I did not need it, I did not dig deeper to see whether I could adapt his solutions to the intel-compiler.

5 Implementation

This is the full install script I used to install all necessary dependencies.

#!/bin/bash

# Untar

for i in *.tar* *.tgz; do
  tar xvf $i || exit
done

# Install

PREFIX=/home/ws/babenhau/
PYPREFIX=/home/ws/babenhau/python/

# Blas

cd BLAS
cp ../blas-make.inc make.inc || exit
#make -j9 clean
F77=ifort make -j9 || exit
#make -j9 install --prefix=$PREFIX
# OR for Intel compiler:
ifort -fPIC -FI -w90 -w95 -cm -O3 -xHost -unroll -c *.f || exit
#Continue below irrespective of compiler:
ar r libfblas.a *.o || exit
ranlib libfblas.a || exit
cd ..
ln -s BLAS blas

## Lapack

cd lapack-3.3.1
ln -s ../blas
# this has a hardcoded absolute path to blas in it: replace is with the appropriate one for you.
cp ../lapack-make.inc make.inc || exit
make -j9 clean  || exit
make -j9
make -j9 || exit
cp lapack_LINUX.a libflapack.a || exit
#make -j9 install --prefix=$PREFIX
cd ..

# C interface

patch -p0 < lapacke-ifort.diff

cd lapacke
# patch for lapack 3.3.1 and blas
for i in gnu inc intel ; do 
    sed -i s/lapack-3\.2\.1\\/lapack\.a/lapack-3\.3\.1\\/lapack_LINUX.a/ make.$i; 
    sed -i s/lapack-3\.2\.1\\/blas\.a/blas\\/blas_LINUX.a/ make.$i; 
done

make -j9 clean || exit
#make -j9
LINKER=ifort LDFLAGS=-nofor-main make -j9 # || exit
#LINKER=ifort LDFLAGS=-nofor-main make -j9 install
cd ..

## ATLAS

cd ATLAS
cp ../Make.Linux_HC3 . || exit
echo "ATLAS needs manual intervention. Run make by hand first."
#echo "just say yes. It makes some stuff we need later."
#make
#mv bin/Linux_UNKNOWNSSE2_8 bin/Linux_HC3
#for i in bin/Linux_HC3/*; do sed -i s/UNKNOWNSSE2_8/HC3/ $i ; done
#rm bin/Linux_HC3/Make.inc
#cd bin/Linux_HC3/
#ln -s ../../Make.Linux_HC3 Make.inc
#cd -

make -j9 install arch=Linux_HC3 || exit
cd lib
for i in Linux_HC3/* ; do ln -s $i ; done
cd ../bin
for i in Linux_HC3/* ; do ln -s $i ; done
cd ../include
for i in Linux_HC3/* ; do ln -s $i ; done
cd ..
cd ..

# Numpy and SciPy with intel compilers

# Read this: http://marklodato.github.com/2009/08/30/numpy-scipy-and-intel.html

# patching

patch -p0 < SuiteSparse.diff  || exit
patch -p0 < SuiteSparse-umfpack.diff  || exit

rm numpy
ln -s numpy-*.*.*/ numpy
patch -p0 < numpy-icc.diff  || exit
patch -p0 < numpy-icpc.diff || exit
patch -p0 <<EOF
--- numpy/numpy/distutils/fcompiler/intel.py      2009-03-29 07:24:21.000000000 -0400
+++ numpy/numpy/distutils/fcompiler/intel.py  2009-08-06 23:08:59.000000000 -0400
@@ -47,6 +47,7 @@
     module_include_switch = '-I'

     def get_flags(self):
+        return ['-fPIC', '-cm']
         v = self.get_version()
         if v >= '10.0':
             # Use -fPIC instead of -KPIC.
@@ -63,6 +64,7 @@
         return ['-O3','-unroll']

     def get_flags_arch(self):
+        return ['-xHost']
         v = self.get_version()
         opt = []
         if cpu.has_fdiv_bug():
EOF
# include -fPIC in the fcompiler.
sed -i "s/w90/w90\", \"-fPIC/" numpy/numpy/distutils/fcompiler/intel.py
# and more of that
patch -p0 < numpy-ifort.diff

rm scipy
ln -s scipy-*.*.*/ scipy

patch -p0 < scipy-qhull-icc.diff || exit
patch -p0 < scipy-qhull-icc2.diff || exit

# # unnecessary!
# patch -p0 <<EOF
# --- scipy/scipy/special/cephes/const.c    2009-08-07 01:56:43.000000000 -0400
# +++ scipy/scipy/special/cephes/const.c        2009-08-07 01:57:08.000000000 -0400
# @@ -91,12 +91,12 @@
# double THPIO4 =  2.35619449019234492885;       /* 3*pi/4 */
# double TWOOPI =  6.36619772367581343075535E-1; /* 2/pi */
# #ifdef INFINITIES
# -double INFINITY = 1.0/0.0;  /* 99e999; */
# +double INFINITY = __builtin_inff();
# #else
# double INFINITY =  1.79769313486231570815E308;    /* 2**1024*(1-MACHEP) */
# #endif
# #ifdef NANS
# -double NAN = 1.0/0.0 - 1.0/0.0;
# +double NAN = __builtin_nanf("");
# #else
# double NAN = 0.0;
# #endif
# EOF


# building

# TODO: try again later

cd SuiteSparse

make -j9 -C AMD || exit
make -j9 -C UMFPACK || exit

cd ..

# TODO: build numpy again and make sure it has blas and lapack (and ATLAS?)

cd numpy
python setup.py -v build_src config --compiler=intel build_clib \
    --compiler=intel build_ext --compiler=intel || exit
python setup.py install --prefix=$PYPREFIX || exit
cd ..

# scons and numscons
cd scons-2.0.1
python setup.py -v install --prefix=/home/ws/babenhau/python/ || exit
cd ..

git clone git://github.com/cournape/numscons.git
cd numscons 
python setup.py -v install --prefix=/home/ws/babenhau/python/  || exit
cd ..

# adapt /home/ws/babenhau/python/lib/python2.7/site-packages/numpy/distutils/fcompiler/intel.py by hand to include fPIC for intelem

cd scipy

PYTHONPATH=/home/ws/babenhau/python//lib/scons-2.0.1/ ATLAS=../ATLAS/ \
    LAPACK=../lapack-3.3.1/libflapack.a LAPACK_SRC=../lapack-3.3.1 BLAS=../BLAS/libfblas.a \
    F77=ifort f77_opt=ifort python setup.py -v config --compiler=intel --fcompiler=intelem build_clib \
    --compiler=intel --fcompiler=intelem build_ext --compiler=intel --fcompiler=intelem \
    -I../SuiteSparse/UFconfig # no exit, because we do the linking by hand later on.

# one file is C++ :(
icpc -fPIC -I/home/ws/babenhau/python/include/python2.7 -I/home/ws/babenhau/python/lib/python2.7/site-packages/numpy/core/include -I/home/ws/babenhau/python/lib/python2.7/site-packages/numpy/core/include -c scipy/spatial/qhull/src/user.c -o build/temp.linux-x86_64-2.7/scipy/spatial/qhull/src/user.o || exit

# linking by hand

# for x in csr csc coo bsr dia; do
#    icpc -xHost -O3 -fPIC -shared \
#        build/temp.linux-x86_64-2.7/scipy/sparse/sparsetools/${x}_wrap.o \
#        -o build/lib.linux-x86_64-2.7/scipy/sparse/sparsetools/_${x}.so || exit
# done
#icpc -xHost -O3 -fPIC -openmp -shared \
#   build/temp.linux-x86_64-2.7/scipy/interpolate/src/_interpolate.o \
#   -o build/lib.linux-x86_64-2.7/scipy/interpolate/_interpolate.so || exit

# build again with the C++ file already compiled

PYTHONPATH=/home/ws/babenhau/python//lib/scons-2.0.1/ ATLAS=../ATLAS/ \
    LAPACK=../lapack-3.3.1/libflapack.a LAPACK_SRC=../lapack-3.3.1 BLAS=../BLAS/libfblas.a \
    F77=ifort f77_opt=ifort python setup.py config --compiler=intel --fcompiler=intelem build_clib \
    --compiler=intel --fcompiler=intelem build_ext --compiler=intel --fcompiler=intelem \
    -I../SuiteSparse/UFconfig || exit

# make sure we have cephes
cd scipy/special
PYTHONPATH=/home/ws/babenhau/python//lib/scons-2.0.1/ ATLAS=../../../ATLAS/ \
    LAPACK=../../../lapack-3.3.1/libflapack.a LAPACK_SRC=../lapack-3.3.1 BLAS=../../../BLAS/libfblas.a \
    F77=ifort f77_opt=ifort python setup.py -v config --compiler=intel --fcompiler=intelem build_clib \
    --compiler=intel --fcompiler=intelem build_ext --compiler=intel --fcompiler=intelem \
    -I../../../SuiteSparse/UFconfig
cd ../..

# install
PYTHONPATH=/home/ws/babenhau/python//lib/scons-2.0.1/ ATLAS=../ATLAS/ \
    LAPACK=../lapack-3.3.1/libflapack.a LAPACK_SRC=../lapack-3.3.1 BLAS=../BLAS/libfblas.a \
    F77=ifort f77_opt=ifort python setup.py config --compiler=intel --fcompiler=intelem build_clib \
    --compiler=intel --fcompiler=intelem install --prefix=$PYPREFIX || exit

cd ..

# PyNIO

# netcdf-4

patch -p0 < netcdf-patch1.diff || exit
patch -p0 < netcdf-patch2.diff || exit

cd netcdf-4.1.3

CPPFLAGS="-I/home/ws/babenhau/libbutz/hdf5-1.8.7/include -I/home/ws/babenhau/include" LDFLAGS="-L/home/ws/babenhau/libbutz/hdf5-1.8.7/lib/ -L/home/ws/babenhau/lib -lsz -L/home/ws/babenhau/libbutz/szip-2.1/lib -L/opt/intel/Compiler/11.1/080/lib/intel64/libifcore.a -lifcore" ./configure --prefix=/home/ws/babenhau/ --enable-netcdf-4 --enable-shared || exit

make -j9; make check install -j9 || exit

cd ..

# NetCDF4
cd netCDF4-0.9.7
HAS_SZIP=1 SZIP_PREFIX=/home/ws/babenhau/libbutz/szip-2.1/ HAS_HDF5=1 HDF5_DIR=/home/ws/babenhau/libbutz/hdf5-1.8.7 HDF5_PREFIX=/home/ws/babenhau/libbutz/hdf5-1.8.7 HDF5_includedir=/home/ws/babenhau/libbutz/hdf5-1.8.7/include HDF5_libdir=/home/ws/babenhau/libbutz/hdf5-1.8.7/lib HAS_NETCDF4=1 NETCDF4_PREFIX=/home/ws/babenhau/ python setup.py build_ext --compiler="intel" --fcompiler="intel -fPIC" install --prefix $PYPREFIX
cd ..

# parallel netcdf and hdf5: ~/libbutz/

patch -p0 < pynio-fix-no-grib.diff || exit

cd PyNIO-1.4.1
HAS_SZIP=1 SZIP_PREFIX=/home/ws/babenhau/libbutz/szip-2.1/ HAS_HDF5=1 HDF5_DIR=/home/ws/babenhau/libbutz/hdf5-1.8.7 HDF5_PREFIX=/home/ws/babenhau/libbutz/hdf5-1.8.7 HDF5_includedir=/home/ws/babenhau/libbutz/hdf5-1.8.7/include HDF5_libdir=/home/ws/babenhau/libbutz/hdf5-1.8.7/lib HAS_NETCDF4=1 NETCDF4_PREFIX=/home/ws/babenhau/ python setup.py install --prefix=$PYPREFIX || exit
# TODO: Make sure that the install goes to /home/ws/.., not home/ws/...
cd ..

# satexp_utils.so

f2py -c -m satexp_utils --f77exec=ifort --f90exec=ifort interpolate_levels.F90 || exit

## pyhdf

# recompile hdf with fPIC - grr!
cd hdf-4*/
# Fix configure for compilers with - in the name.
patch -p0 < ../hdf-fix-configure.ac.diff
autoconf
FFLAGS="-ip -O3 -xHost -fPIC -r8" CFLAGS="-ip -O3 -xHost -fPIC" CXXFLAGS="$CFLAGS -I/usr/include/rpc  -DBIG_LONGS -DSWAP" F77=ifort ./configure --prefix=/home/ws/babenhau/ --disable-netcdf --with-szlib=/home/ws/babenhau/libbutz/szip-2.1 # --with-zlib=/home/ws/babenhau/libbutz/zlib-1.2.5 --with-jpeg=/home/ws/babenhau/libbutz/jpeg-8c
# finds zlib and jpeg due to LD_LIBRARY_PATH (hack but works…)
make
make install
cd ..

# build pyhdf
cd pyhdf-0.8.3/
INCLUDE_DIRS="/home/ws/babenhau/include:/home/ws/babenhau/libbutz/szip-2.1/include" LIBRARY_DIRS="/home/ws/babenhau/lib:/home/ws/babenhau/libbutz/szip-2.1/lib" python setup.py build -c intel --fcompiler ifort install --prefix=/home/ws/babenhau/python 
cd ..

## matplotlib

cd matplotlib-1.1.0
patch -p0 < ../matplotlib-add-icc-support.diff
python setup.py build -c intel install --prefix=/home/ws/babenhau/python
cd ..

# GEOS → http://download.osgeo.org/geos/geos-3.3.2.tar.bz2

cd geos*/ 
./configure --prefix=/home/ws/babenhau/
make
make check
make install 
cd ..

# basemap

easy_install --prefix /home/ws/babenhau/python basemap
# fails but should now have all dependencies.

cd basemap-*/

python setup.py build -c intel install --prefix=/home/ws/babenhau/python

cd ..

6 Appendix

6.1 All patches inline

To ease usage and upstreaming of my fixes, I include all the patches below, so you can find them directly in this text instead of having to browse external textfiles.

6.1.1 SuiteSparse-umfpack.diff

--- SuiteSparse/UMFPACK/Lib/GNUmakefile 2009-11-11 21:09:54.000000000 +0100
+++ SuiteSparse/UMFPACK/Lib/GNUmakefile 2011-09-09 14:18:57.000000000 +0200
@@ -9,7 +9,7 @@
 C = $(CC) $(CFLAGS) $(UMFPACK_CONFIG) \
     -I../Include -I../Source -I../../AMD/Include -I../../UFconfig \
     -I../../CCOLAMD/Include -I../../CAMD/Include -I../../CHOLMOD/Include \
-    -I../../metis-4.0/Lib -I../../COLAMD/Include
+    -I../../COLAMD/Include

 #-------------------------------------------------------------------------------
 # source files

6.1.2 SuiteSparse.diff

--- SuiteSparse/UFconfig/UFconfig.mk    2011-09-09 13:14:03.000000000 +0200
+++ SuiteSparse/UFconfig/UFconfig.mk    2011-09-09 13:15:03.000000000 +0200
@@ -33,11 +33,11 @@
 # C compiler and compiler flags:  These will normally not give you optimal
 # performance.  You should select the optimization parameters that are best
 # for your system.  On Linux, use "CFLAGS = -O3 -fexceptions" for example.
-CC = cc
-CFLAGS = -O3 -fexceptions
+CC = icc
+CFLAGS = -O3 -xHost -fPIC -openmp -vec_report=0

 # C++ compiler (also uses CFLAGS)
-CPLUSPLUS = g++
+CPLUSPLUS = icpc

 # ranlib, and ar, for generating libraries
 RANLIB = ranlib
@@ -49,8 +49,8 @@
 MV = mv -f

 # Fortran compiler (not normally required)
-F77 = f77
-F77FLAGS = -O
+F77 = ifort
+F77FLAGS = -O3 -xHost
 F77LIB =

 # C and Fortran libraries
@@ -132,13 +132,13 @@
 # The path is relative to where it is used, in CHOLMOD/Lib, CHOLMOD/MATLAB, etc.
 # You may wish to use an absolute path.  METIS is optional.  Compile
 # CHOLMOD with -DNPARTITION if you do not wish to use METIS.
-METIS_PATH = ../../metis-4.0
-METIS = ../../metis-4.0/libmetis.a
+# METIS_PATH = ../../metis-4.0
+# METIS = ../../metis-4.0/libmetis.a

 # If you use CHOLMOD_CONFIG = -DNPARTITION then you must use the following
 # options:
-# METIS_PATH =
-# METIS =
+METIS_PATH =
+METIS =

 #------------------------------------------------------------------------------
 # UMFPACK configuration:
@@ -194,7 +194,7 @@
 # -DNSUNPERF       for Solaris only.  If defined, do not use the Sun
 #          Performance Library

-CHOLMOD_CONFIG =
+CHOLMOD_CONFIG = -DNPARTITION

 #------------------------------------------------------------------------------
 # SuiteSparseQR configuration:

6.1.3 hdf-fix-configure.ac.diff (fixes a bug but still contains another known bug - see Known Bugs!)

--- configure.ac    2012-03-01 15:00:28.000000000 +0100
+++ configure.ac    2012-03-01 15:00:40.000000000 +0100
@@ -815,7 +815,7 @@
 dnl Report anything stripped as a flag in CFLAGS and 
 dnl only the compiler in CC_VERSION.
 CC_NOFLAGS=`echo $CC | sed 's/ -.*//'`
-CFLAGS_TO_ADD=`echo $CC | grep - | sed 's/.* -/-/'`
+CFLAGS_TO_ADD=`echo $CC | grep \ - | sed 's/.* -/-/'`
 if test -n $CFLAGS_TO_ADD; then
   CFLAGS="$CFLAGS_TO_ADD$CFLAGS"
 fi

6.1.4 lapacke-ifort.diff

--- lapacke/make.intel.old  2011-10-05 13:24:14.000000000 +0200
+++ lapacke/make.intel  2011-10-05 16:17:00.000000000 +0200
@@ -56,7 +56,7 @@
 # Ensure that the libraries have the same data model (LP64/ILP64).
 #
 LAPACKE = lapacke.a
-LIBS = ../../../lapack-3.3.1/lapack_LINUX.a ../../../blas/blas_LINUX.a -lm
+LIBS = /opt/intel/Compiler/11.1/080/lib/intel64/libifcore.a ../../../lapack-3.2.1/lapack.a ../../../lapack-3.2.1/blas.a -lm -ifcore
 #
 #  The archiver and the flag(s) to use when building archive (library)
 #  If your system has no ranlib, set RANLIB = echo.

6.1.5 matplotlib-add-icc-support.diff

diff -r 38c2a32c56ae matplotlib-1.1.0/setup.py
--- a/matplotlib-1.1.0/setup.py Fri Mar 02 12:29:47 2012 +0100
+++ b/matplotlib-1.1.0/setup.py Fri Mar 02 12:30:39 2012 +0100
@@ -31,6 +31,13 @@
 if major==2 and minor1<4 or major<2:
     raise SystemExit("""matplotlib requires Python 2.4 or later.""")

+if "intel" in sys.argv or "icc" in sys.argv:
+    try: # make it compile with the intel compiler
+        from numpy.distutils import intelccompiler
+    except ImportError:
+        print "Compiling with the intel compiler requires numpy."
+        raise
+
 import glob
 from distutils.core import setup
 from setupext import build_agg, build_gtkagg, build_tkagg,\

6.1.6 netcdf-patch1.diff

--- netcdf-4.1.3/fortran/ncfortran.h    2011-07-01 01:22:22.000000000 +0200
+++ netcdf-4.1.3/fortran/ncfortran.h    2011-09-14 14:56:03.000000000 +0200
@@ -658,7 +658,7 @@
  * The following is for f2c-support only.
  */

-#if defined(f2cFortran) && !defined(pgiFortran) && !defined(gFortran)
+#if defined(f2cFortran) && !defined(pgiFortran) && !defined(gFortran) &&!defined(__INTEL_COMPILER)

 /*
  * The f2c(1) utility on BSD/OS and Linux systems adds an additional

6.1.7 netcdf-patch2.diff

--- netcdf-4.1.3/nf_test/fortlib.c  2011-09-14 14:58:47.000000000 +0200
+++ netcdf-4.1.3/nf_test/fortlib.c  2011-09-14 14:58:38.000000000 +0200
@@ -14,7 +14,7 @@
 #include "../fortran/ncfortran.h"


-#if defined(f2cFortran) && !defined(pgiFortran) && !defined(gFortran)
+#if defined(f2cFortran) && !defined(pgiFortran) && !defined(gFortran) &&!defined(__INTEL_COMPILER)
 /*
  * The f2c(1) utility on BSD/OS and Linux systems adds an additional
  * underscore suffix (besides the usual one) to global names that have

6.1.8 numpy-icc.diff

--- numpy/numpy/distutils/intelccompiler.py 2011-09-08 14:14:03.000000000 +0200
+++ numpy/numpy/distutils/intelccompiler.py 2011-09-08 14:20:37.000000000 +0200
@@ -30,11 +30,11 @@
     """ A modified Intel x86_64 compiler compatible with a 64bit gcc built Python.
     """
     compiler_type = 'intelem'
-    cc_exe = 'icc -m64 -fPIC'
+    cc_exe = 'icc -m64 -fPIC -xHost -O3'
     cc_args = "-fPIC"
     def __init__ (self, verbose=0, dry_run=0, force=0):
         UnixCCompiler.__init__ (self, verbose,dry_run, force)
-        self.cc_exe = 'icc -m64 -fPIC'
+        self.cc_exe = 'icc -m64 -fPIC -xHost -O3'
         compiler = self.cc_exe
         self.set_executables(compiler=compiler,
                              compiler_so=compiler,

6.1.9 numpy-icpc.diff

--- numpy-1.6.1/numpy/distutils/intelccompiler.py   2011-10-06 16:55:12.000000000 +0200
+++ numpy-1.6.1/numpy/distutils/intelccompiler.py   2011-10-10 10:26:14.000000000 +0200
@@ -10,11 +10,13 @@
     def __init__ (self, verbose=0, dry_run=0, force=0):
         UnixCCompiler.__init__ (self, verbose,dry_run, force)
         self.cc_exe = 'icc -fPIC'
+   self.cxx_exe = 'icpc -fPIC'
         compiler = self.cc_exe
+   compiler_cxx = self.cxx_exe
         self.set_executables(compiler=compiler,
                              compiler_so=compiler,
-                             compiler_cxx=compiler,
-                             linker_exe=compiler,
+                             compiler_cxx=compiler_cxx,
+                             linker_exe=compiler_cxx,
                              linker_so=compiler + ' -shared')

 class IntelItaniumCCompiler(IntelCCompiler):

6.1.10 numpy-ifort.diff

--- numpy-1.6.1/numpy/distutils/fcompiler/intel.py.old  2011-10-10 17:52:34.000000000 +0200
+++ numpy-1.6.1/numpy/distutils/fcompiler/intel.py  2011-10-10 17:53:51.000000000 +0200
@@ -32,7 +32,7 @@
     executables = {
         'version_cmd'  : None,          # set by update_executables
         'compiler_f77' : [None, "-72", "-w90", "-fPIC", "-w95"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'compiler_fix' : [None, "-FI"],
         'linker_so'    : ["<F90>", "-shared"],
         'archiver'     : ["ar", "-cr"],
@@ -129,7 +129,7 @@
         'version_cmd'  : None,
         'compiler_f77' : [None, "-FI", "-w90", "-fPIC", "-w95"],
         'compiler_fix' : [None, "-FI"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'linker_so'    : ['<F90>', "-shared"],
         'archiver'     : ["ar", "-cr"],
         'ranlib'       : ["ranlib"]
@@ -148,7 +148,7 @@
         'version_cmd'  : None,
         'compiler_f77' : [None, "-FI", "-w90", "-fPIC", "-w95"],
         'compiler_fix' : [None, "-FI"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'linker_so'    : ['<F90>', "-shared"],
         'archiver'     : ["ar", "-cr"],
         'ranlib'       : ["ranlib"]
@@ -180,7 +180,7 @@
         'version_cmd'  : None,
         'compiler_f77' : [None,"-FI","-w90", "-fPIC","-w95"],
         'compiler_fix' : [None,"-FI","-4L72","-w"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'linker_so'    : ['<F90>', "-shared"],
         'archiver'     : [ar_exe, "/verbose", "/OUT:"],
         'ranlib'       : None
@@ -232,7 +232,7 @@
         'version_cmd'  : None,
         'compiler_f77' : [None,"-FI","-w90", "-fPIC","-w95"],
         'compiler_fix' : [None,"-FI","-4L72","-w"],
-        'compiler_f90' : [None],
+        'compiler_f90' : [None, "-fPIC"],
         'linker_so'    : ['<F90>',"-shared"],
         'archiver'     : [ar_exe, "/verbose", "/OUT:"],
         'ranlib'       : None

6.1.11 pynio-fix-no-grib.diff

--- PyNIO-1.4.1/Nio.py  2011-09-14 16:00:13.000000000 +0200
+++ PyNIO-1.4.1/Nio.py  2011-09-14 16:00:18.000000000 +0200
@@ -98,7 +98,7 @@
         if ncarg_dir == None or not os.path.exists(ncarg_dir) \
           or not os.path.exists(os.path.join(ncarg_dir,"lib","ncarg")):
             if not __formats__['grib2']:
-                return None
+                return "" # "", because an env variable has to be a string.
             else:
                 print "No path found to PyNIO/ncarg data directory and no usable NCARG installation found"
                 sys.exit()

6.1.12 scipy-qhull-icc.diff

--- scipy/scipy/spatial/qhull/src/qhull_a.h 2011-02-27 11:57:03.000000000 +0100
+++ scipy/scipy/spatial/qhull/src/qhull_a.h 2011-09-09 15:42:12.000000000 +0200
@@ -102,13 +102,13 @@
 #elif defined(__MWERKS__) && defined(__INTEL__)
 #   define QHULL_OS_WIN
 #endif
-#if defined(__INTEL_COMPILER) && !defined(QHULL_OS_WIN)
-template <typename T>
-inline void qhullUnused(T &x) { (void)x; }
-#  define QHULL_UNUSED(x) qhullUnused(x);
-#else
+/*#if defined(__INTEL_COMPILER) && !defined(QHULL_OS_WIN)*/
+/*template <typename T>*/
+/*inline void qhullUnused(T &x) { (void)x; }*/
+/*#  define QHULL_UNUSED(x) qhullUnused(x);*/
+/*#else*/
 #  define QHULL_UNUSED(x) (void)x;
-#endif
+*/#endif*/

 /***** -libqhull.c prototypes (alphabetical after qhull) ********************/

6.1.13 scipy-qhull-icc2.diff

--- scipy/scipy/spatial/qhull/src/qhull_a.h 2011-09-09 15:43:54.000000000 +0200
+++ scipy/scipy/spatial/qhull/src/qhull_a.h 2011-09-09 15:45:17.000000000 +0200
@@ -102,13 +102,7 @@
 #elif defined(__MWERKS__) && defined(__INTEL__)
 #   define QHULL_OS_WIN
 #endif
-/*#if defined(__INTEL_COMPILER) && !defined(QHULL_OS_WIN)*/
-/*template <typename T>*/
-/*inline void qhullUnused(T &x) { (void)x; }*/
-/*#  define QHULL_UNUSED(x) qhullUnused(x);*/
-/*#else*/
 #  define QHULL_UNUSED(x) (void)x;
-*/#endif*/

 /***** -libqhull.c prototypes (alphabetical after qhull) ********************/

6.1.14 scipy-spatial-lifcore.diff

--- scipy-0.9.0/scipy/spatial/setup.py  2011-10-10 17:11:23.000000000 +0200
+++ scipy-0.9.0/scipy/spatial/setup.py  2011-10-10 17:11:09.000000000 +0200
@@ -22,6 +22,8 @@
                                      get_numpy_include_dirs()],
                        # XXX: GCC dependency!
                        #extra_compiler_args=['-fno-strict-aliasing'],
+                       # XXX intel compiler dependency
+                       extra_compiler_args=['-lifcore'],
                        )

     lapack = dict(get_info('lapack_opt'))

7 Summary

I hope this helps someone out there saving some time - or even better: improving the upstream projects. At least it should be a nice reference for all who need to get scipy working on not-quite-supported architectures.

Happy Hacking!

Footnotes:

1

: Actually I already wanted to publish that script more than a year ago, but time flies and there’s always stuff to do. But at least I now managed to get it done.

Author: Arne Babenhauserheide

Created: 2013-09-26 Do

Emacs 24.3.1 (Org mode 8.0.2)

Validate XHTML 1.0

AnhangGröße
2013-09-26-Do-installing-scipy-and-matplotlib-on-a-bare-cluster-with-the-intel-compiler.org29.2 KB

Memory requirement of Python datastructures: numpy array, list of floats and inner array

Easily answering the question: “How much space does this need?”

Intro

We just had the problem to find out whether a given dataset will be shareable without complex trickery. So we took the easiest road and checked the memory requirements of the datastructure.

If you have such a need, there’s always a first stop: Fire up the interpreter and try it out.

The test

We just created a three dimensional numpy array of floats and then looked at the memory requirement in the system monitor - conveniently bound to CTRL-ESC in KDE. By making the array big enough we can ignore all constant costs and directly get the cost per stored value by dividing the total memory of the process by the number of values.

All our tests are done in Python3.

Numpy

For numpy we just create an array of random values cast to floats:

import numpy as np
a = np.array(np.random.random((100, 100, 10000)), dtype="float")

Also we tested what happens when we use "f4" and "f2" instead of "float" as dtype in numpy.

Native lists

For the native lists, we use the same array, but convert it to a list of lists of lists:

import numpy as np
a = [[[float(i) for i in j] for j in k] 
     for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]

Array module

Instead of using the full-blown numpy, we can also turn the inner list into an array.

import numpy as np
a = [[array.array("d", [float(i) for i in j]) for j in k] 
     for k in list(np.array(np.random.random((100, 100, 10000)), dtype="float"))]

The results

With a numpy array we need roughly 8 Byte per float. A linked list however requires roughly 32 Bytes per float. So switching from native Python to numpy reduces the required memory per floating point value by factor 4.

Using an inner array (via array module) instead of the innermost list provides roughly the same gains.

I would have expected factor 3: The value plus a pointer to the next and to the previous entry.

The details are in the following table.

Table 1: Memory requirement of different ways to store values in Python
  total memory per value
list of floats 3216.6 MiB 32.166 Bytes
numpy array of floats 776.7 MiB 7.767 Bytes
np f4 395.2 MiB 3.95 Bytes
np f2 283.4 MiB 2.834 Bytes
inner array 779.1 MiB 7.791 Bytes

This test was conducted on a 64 bit system, so floats are equivalent to doubles.

The scipy documentation provides a list of all the possible dtype definitions cast to C-types.

Summary

In Python large numpy arrays require 4 times less memory than a linked list structure with the same data. Using an inner array from the array module instead of the innermost list provides roughly the same gains.

Ogg Theora and h.264 - which video codec as standard for internet-video?

Links:
- Video encoder comparison - a much more thorough comparision than mine

We had a kinda long discussion on identi.ca about Ogg Theora and h.264, and since we lacked a simple comparision method, I hacked up a quick script to test them.

It uses frames from Big Buck Bunny and outputs the files bbb.ogg and bbb.264 (license: cc by).

The ogg file looks like this:

The h.264 file looks like this: download

Results

What you can see by comparing both is that h.264 wins in terms of raw image quality at the same bitrate (single pass).

So why am I still strongly in favor of Ogg Theora?

The reason is simple:

Due to licensing costs of h.264 (a few millions per year, due from 2015 onwards) making h.264 the standard for internet video would have the effect that only big companies would be able to make a video enabled browser - or we would get a kind of video tax for free software: if you want to view internet video with free software, you have to pay for the right to use the x264 library (else the developers couldn't cough up the money to pay for the parent license). And noone but the main developers and huge corporations could distribute the x264 library, because they’d have to pay license fees for that.

And noone could hack on the browser or library and distribute the changed version, so the whole idea of free software would be led ad absurdum. It wouldn't matter that all code would be free licensed, since only those with a h.264 patent license could change it.

So this post boils down to a simple message:

“Support !theora against h.264 and #flash [as video codec for the web]. Otherwise only big companies will be able to write video browsers - or we get a h.264 tax on !fs”

Theoras raw quality may still be worse, but the license costs and their implications provide very clear reasons for supporting Theora - which in my view are far more important than raw technical stuff.

The test-script

for k in {0..1}
     do for i in {0..9}
         do for j in {0..9}
             do
wget http://media.xiph.org/BBB/BBB-360-png/big_buck_bunny_00$k$i$j.png
         done
     done
done

mplayer -vo yuv4mpeg -ao null -nosound mf://*png -mf fps=50

theora_encoder_example -z 0 --soft-target -V 400 -o bbb.ogg stream.yuv

mencoder stream.yuv -ovc x264 -of rawvideo -o bbb.264 -x264encopts bitrate=400 -aspect 16:9 -nosound -vf scale=640:360,harddup

AnhangGröße
bbb-400bps.ogg212.88 KB
bbb-400bps.264214.39 KB
encode.sh428 Bytes

Phoronix conclusions distort their results, shown with the example of GCC vs. LLVM/Clang On AMD's FX-8350 Vishera

Phoronix recently did a benchmark of GCC vs. LLVM on AMD hardware. Sadly their conclusion did not fit the data they showed. Actually it misrepresented the data so strongly, that I decided to speak up here instead of having my comments disappear in their forums. This post was started on 2013-05-14 and got updates when things changed - first for the better, then for the worse.

Update 3 (the last straw, 2013-11-09): In the recent most blatant attack by Phoronix on copyleft programs - this time openly targeted at GNU - Michael Larabel directly misrepresented a post from Josh Klint to badmouth GDB (Josh confirmed this1). Josh gave a report of his initial experience with GDB in a Kickstarter Update in which he reported some shortcomings he saw in GDB (of which the major gripe is easily resolved with better documentation2) and concluded with “the limitations of GDB are annoying, but I can deal with it. It's very nice to be able to run and debug our editor on Linux”. Michael Larabel only quoted the conclusion up to “annoying” and abused that to support the claim that game developers (in general) call GDB “crap” and for further badmouthing of GDB. With this he provided the straw which I needed to stop reading Phoronix: Michael Larabel is hostile to copyleft and in particular to GNU and he goes as far as rigging test results3 and misrepresenting words of others to further his agenda. I even donated to Phoronix a few times in the past. I guess I won’t do that again, either. I should have learned from the error of the german pirates and should have avoided reading media which is controlled by people who want to destroy what I fight for (sustainable free software).
Update 2 (2013-07-06): But the next went down the drain again… “Of course, LLVM/Clang 3.3 still lacks OpenMP support, so those tests are obviously in favor of GCC.” — I couldn’t find a better way to say that those tests are completely useless while at the same time devaluing OpenMP support as “ignore this result along with all others where GCC wins”…
Update (2013-06-21): The recent report of GCC 4.8 vs. LLVM 3.3 looks much better. Not perfect, but much better.

Taking out the OpenMP benchmarks (where GCC naturally won, because LLVM only processes those tests single-threaded) and the build times (which are irrelevant to the speed of the produced binaries), their benchmark had the following result:

LLVM is slower than GCC by:

  • 10.2% (HMMer)
  • 12.7% (MAFFT)
  • 6.8% (BLAKE2)
  • 9.1% (HIMENO)
  • 42.2% (C-Ray)

With these results (which were clearly visible on their result summary on OpenBenchmarking, Michael Larabel from Phoronix concluded:

» The performance of LLVM/Clang 3.3 for most tests is at least comparable to GCC «

Nobu from their Forums supplied a conclusion which represents the data much better:

» GCC is much faster in anything which uses OpenMP, and moderately faster or equal in anything (except compile times) which doesn't [use OpenMP] «

But Michael from Phoronix did not stop at just ignoring the performance difference between GCC and LLVM. He went on claiming, that

In a few benchmarks LLVM/Clang is faster, particularly when it comes to build times.

And this is blatant reality-distortion which I am very tempted to ascribe to favoritism. LLVM is not “particularly” faster when it comes to build times.

LLVM on AMD FX-8350 Vishera is faster ONLY when it comes to build times!

This was not the first time that I read data-distorting conclusions on Phoronix - and my complaints about that in their forum did not change their actions. So I hope that my post here can help making them aware that deliberately distorting test results is unacceptable.

For my work, compiler performance is actually quite important, because I use programs which run for days or weeks, so 10% runtime reduction can mean saving several days - not counting the cost of using up cluster time.

To fix their blunders, what they would have to do is:

  • Avoiding Benchmarks which only one compiler supports properly (OpenMP).
  • Marking the compile time tests explicitely, so they strongly stand out from the rest, because they measure a completely different parameter than the other tests: Compiler Runtime vs. Performance of the Compiled Binaries.
  • Writing conclusions which actually fit their results.

Their current approach gives a distinct disadvantage to GCC (even for the OpenMP tests, because they convey the notion that if LLVM only had OpenMP, it would be better in everything - which as this test shows is simply false), so the compiler-tests from Phoronix work as covert propaganda against GCC, even in tests where GCC flat-out wins. And I already don’t like open propaganda, but when the propaganda gets masked as objective testing, I actually get angry.

I hope my post here can help move them towards doing proper testing again.

PS: I write so strongly here, because I actually like the tests from Phoronix a lot. I think we need rather more than less testing and their testsuite actually seems to do a good job - when given the right parameters - so seeing Phoronix distorting the tests to a point where they become almost useless (except as political tool against GCC) is a huge disappointment to me.


  1. Josh Klint from Leadwerks confirmed that Phoronix misrepresented his post and wrote a followup-post: » @ArneBab That really wasn't meant to be controversial. I was hoping to provide constructive feedback from the view of an Xcode / VS user.« » Slightly surprised my complaints about GDB are a hot topic. I can make just as many criticisms of other compilers and IDEs.« » The first 24 hours are the best for usability feedback. I figure if they notice a pattern some of those things will be improved.« » GDB Follwup «@Leadwerks, 2:04 AM - 11 Nov 13, 2:10 AM - 11 Nov 13 and @JoshKlint, 2:07 AM - 11 Nov 13, 8:48 PM - 11 Nov 13

  2. The first-impression criticism from Josh Klint was addressed by a Phoronix reader by pointing to the frame command. I do not blame Josh for not knowing all tricks: He wrote a fair account of his initial experience with GDB (and he said later that he wrote the post after less than 24 hours of using GDB, because he considers that the best time to provide feedback) and his experience can serve as constructive criticism to improve tutorials, documentation and the UI of GDB. Sadly his visibility and the possible impact of his work on free software made it possible for Phoronix to abuse a personal report as support for a general badmouthing of the tool. In contrast the full message of Josh Klint ended really positive: Although some annoyances and limitations have been discovered, overall I have found Linux to be a completely viable platform for application development. — Josh Klint, Leadwerks 

  3. I know that rigging of tests is a strong claim. The actions of Michael Larabel deserve being called rigging for three main reasons: (1) Including compile-time data along with runtime performance without clear distinction between both, even though compile-time of the full code is mostly irrelevant when you use a proper build system and compile time and runtime are completely different classes of results, (2) including pointless tests between incomparable setups whose only use is to relativate any weakness of his favorite system and (3) blatantly lying in the summaries (as I show in this article). 

Python for beginning programmers

(written on ohloh for Python)

Since we already have two good reviews from experienced programmers, I'll focus on the area I know about: Python as first language.

My experience:

  • I began to get into coding only a short time ago. I already knew about processes in programs, but not how to get them into code.
  • I wanted to learn C/C++ and failed at general structure. After a while I could do it, but it didn't feel right.
  • I tried my luck with Java and didn't quite get going.
  • Then I tried Python, and got in at once.

Advantages of Python:

  • The structure of programs can be understood easily.
  • The Python interpreter lets you experiment very quickly.
  • You can realize complex programs, but Python also allows for quick and simple scripting.
  • Code written by others is extremely readable.
  • And coding just flows - almost like natural speaking/thinking.

How it looks:

def hello(user):
    print("Hello " + user + "!")
hello("Fan")
# prints Hello Fan! on screen

As a bonus, there is the great open book How to Think Like a Computer Scientist which teaches Python and is being used for teaching Python and Programming at universities.

So I can wholeheartedly recommend Python to beginners in programming, and as the other reviews on Ohloh show, it is also a great language for experienced programmers and seems to be a good language to accompany you in your whole coding life.

PS: Yes, I know about the double meaning of "first language" :)

Recursion wins!

I recently read the little schemer and that got me thinking about recursion and loops.

After starting my programming life with Python, I normally use for-loops to solve problems. But actually they are an inferior mechanism when compared to recursion, if the language provides proper syntactic support for that. Since that claim pretty much damns Python on a theoretical level (even though it is still a very good tool in practice and I still love it!), I want to share a simplified version of the code which made me realize this.

Let’s begin with how I would write that code in Python.

res = ""
instring = False
for letter in text:
    if letter = "\"":
        # special conditions for string handling go here
        # lots of special conditions
        # and more special conditions
        # which cannot easily be moved out, 
        # because we cannot skip multiple letters
        # in one step
        instring = not instring
    if instring:
        res += letter
        continue
    # other cases

Did you spot the comment “special conditions go here”? That’s the point which damns for-loops: You cannot easily factor out these special conditions.1 In this example all the complexity is in the variable instring. But depending on the usecase, this could require lots of different states being tracked within the loop and cluttering up the namespace as well as entangling complexity from different parts of the loop.

This is how the same could be done with proper let-recursion:

; first get SRFI-71: multi-value let for syntactic support for what I
; want to do
use-modules : srfi srfi-71

let process-text
    : res ""
      letter : string-take text 1
      unprocessed : string-drop text 1
    when : equal? letter "\""
           let-values 
               ; all the complexity of string-handling is neatly
               ; confined in the helper-function consume-string
               : (to-res next-letter still-unprocessed) : consume-string unprocessed
               process-text
                   string-append res to-res
                   . next-letter
                   . still-unprocessed
    ; other cases

The basic code for recursion is a bit longer, because the new values in the next step of the processing are given explicitly. But it is almost trivial to shell out parts of the loop to another function. It just needs to return the next state of the recursion.

And that’s what consume-string does:

define : consume-string text
    let
        : res ""
          next-letter : string-take text 1
          unprocessed : string-drop text 1
        ; lots of special handling here
        values res next-letter unprocessed

To recite from the Zen of Python:

Explicit is better than implicit.

It’s funny to see how Guile Scheme allows me to follow that principle more thoroughly than Python.

(I love Python, but this is a case where Scheme simply wins - and I’m not afraid to admit that)

PS: Actually I found this technique when thinking about use-cases for multiple return-values of functions.

PPS: This example uses wisp-syntax for the scheme-examples to avoid killing Pythonistas with parens.


  1. While you cannot factor out parts of for loops easily, functions which pass around iterators get pretty close to the expressivity of tail recursion. They might even go a bit further and I already missed them for some scheme code where I needed to generate expressions step by step from a function which always returned an unspecified number of expressions per call. If Python continues to make it easier to use iterators, they could reduce the impact of the points I make in this article. 

AnhangGröße
2014-03-05-Mi-recursion-wins.org3.36 KB

Reducing the Python startup time

The python startup time always nagged me (17-30ms) and I just searched again for a way to reduce it, when I found this:

The Python-Launcher caches GTK imports and forks new processes to reduce the startup time of python GUI programs.

Python-launcher does not solve my problem directly, but it points into an interesting direction: If you create a small daemon which you can contact via the shell to fork a new instance, you might be able to get rid of your startup time.

To get an example of the possibilities, download the python-launcher and socat and do the following:

PYTHONPATH="../lib.linux-x86_64-2.7/" python python-launcher-daemon &
echo pass > 1
for i in {1..100}; do 
    echo 1 | socat STDIN UNIX-CONNECT:/tmp/python-launcher-daemon.socket & 
done

Todo: Adapt it to a given program and remove the GTK stuff. Note the & at the end: Closing the socket connection seems to be slow, so I just don’t wait for socat to finish. Breaks at somewhere over 200 simultaneous connections. Option: Use a datagram socket instead.

The essential trick is to just create a server which opens a socket. Then it reads all the data from the socket. Once it has the data, it forks like the following:

        pid = os.fork()
        if pid:
            return

        signal.signal(signal.SIGPIPE, signal.SIG_DFL)
        signal.signal(signal.SIGCHLD, signal.SIG_DFL)

        glob = dict(__name__="__main__")
        print 'launching', program
        execfile(program, glob, glob)

        raise SystemExit

Running a program that way 100-times took just 0.23 seconds for me so the Python startup time of 17ms got reduced to 2.3ms.

You might have to switch from forking to just executing the code instead of forking if you want to be even faster and the code snippets are small. For example when running the same test without the fork and the signals, 100 executions of the same code took just 0.09s, cutting down the startup time to an impressing 0.9ms - with the cost of no longer running in parallel.

(That’s what I also do with emacsclient… My emacs takes ~30s to start (due to excessive use of additional libraries I added), but emacsclient -c shows up almost instantly.)

I tested the speed by just sending a file with the following snippet to the server:

import time
with open("2", "a") as f:
    f.write(str(time.time()) + "\n")

Note: If your script only needs the included python libraries (batteries) and no custom-installed libs, you can also reduce the startuptime by avoiding site initialization:

python -S [script]

Without -S python -c '' takes 0.018s for me. With -S I am down to

time python -S -c '' → 0.004s. 

Note that you might miss some installed packages that way. This is slower than the daemon method by up to factor 4 (4ms instead of 0.9), but still faster than the default way. Note that cold disk buffers can make the difference much bigger on the first run which is not relevant in this case but very much relevant in general for the impression of startup speed.

PS: I attached the python-launcher 0.1.0 in case its website goes down. License: GPL and MIT; included. This message was originally written at stackoverflow.

AnhangGröße
python-launcher-0.1.0.tar.gz11.11 KB

Screencast: Tabbing of everything in KDE

I just discovered tabbing of everything in KDE:

(download)

Created with recordmydesktop, cut with kdenlive, encoded to ogg theora with ffmpeg2theora (encoding command).

Music: Beat into Submission on Public Domain by Tryad.

To embed the video on your own site you can simply use:

<video 
src="http://draketo.de/files/screencast-tabbing-everywhere-kde.ogv"
controls=controls>
</video>

If you do so, please provide a backlink here.

License: cc by-sa, because that’s the license of the song. If you omit the audio, you can also use one of my usual free licenses (or all of them, including the GPL). Here’s the raw recording (=video source).

¹: Feel free to upload the video to youtube or similar. I license my stuff under free licenses to make it easy for everyone to use, change and spread them.

²: Others have shown this before, but I don’t mind that. I just love the feature, so I want to show it :)

³: The command wheel I use for calling programs is the pyRad.

AnhangGröße
screencast-tabbing-everywhere-kde.ogv10.75 MB

Simple daemon with start-stop-daemon and runit

PDF

PDF (to print)

Org (source)

Creating a daemon with almost zero effort.

start-stop-daemon

The example with the start-stop-daemon uses Gentoo OpenRC as root.

The simplest daemon we can create is a while loop:

echo '#!/bin/sh' > whiledaemon.sh
echo 'while true; do true; done' >> whiledaemon.sh
chmod +x whiledaemon.sh

Now we start it as daemon

start-stop-daemon --pidfile whiledaemon.pid \
--make-pidfile --background ./whiledaemon.sh

Top shows that it is running:

top | grep whiledaemon.sh

We stop it using the pidfile:

start-stop-daemon --pidfile whiledaemon.pid \
--stop ./whiledaemon.sh

That’s it.

Hint: To add cgroups support on a Gentoo install, open /etc/rc.conf and uncomment

rc_controller_cgroups="YES"

Then in the initscript you can set the other variables described below that line. Thanks for this hint goes to Luca Barbato!

If you want to ensure that the daemon keeps running without checking a PID file (which might in some corner cases fail because a new process claims the same PID), we can use runsvdir from runit.

daemon with runit

Minimal examples for runit daemons - first as unpriviledged user, then as root.

runit as simple user

Create a script which dies

echo '#!/usr/bin/env python\nfor i in range(100): a = i*i' >/tmp/foo.py
chmod +x /tmp/foo.py

Create the daemon folder

mkdir -p ~/.local/run/runit_services/python
ln -sf /tmp/foo.py ~/.local/run/runit_services/python/run

Run the daemon via runsvdir

runsvdir ~/.local/run/runit_services

Manage it with sv (part of runit)

# stop the running daemon
SVDIR=~/.local/run/runit_services/ sv stop python
# start the service (it shows as `run` in top)
SVDIR=~/.local/run/runit_services/ sv start python

runit as root

Minimal working example for setting up runit as root - like a sysadmin might do it.

echo '#!/usr/bin/env python\nfor i in range(100): a = i*i' >/tmp/foo.py &&
    chmod +x /tmp/foo.py &&
    mkdir -p /run/arne_service/python &&
    printf '#!/bin/sh\nexec /tmp/foo.py' >/run/arne_service/python/run &&
    chmod +x /run/arne_service/python/run &&
    chown -R arne /run/arne_service &&
    su - arne -c 'runsvdir /run/arne_service'

Or without bash indirection (giving up some flexibility we don’t need here)

echo '#!/usr/bin/env python\nfor i in range(100): a = i*i' >/tmp/foo.py && 
    chmod +x /tmp/foo.py &&
    mkdir -p /run/arne_service/python &&
    ln -s /tmp/foo.py /run/arne_service/python/run &&
    chown -R arne /run/arne_service &&
    su - arne -c 'runsvdir /run/arne_service'
AnhangGröße
2015-04-15-Mi-simple-daemon-openrc.org2.92 KB
2015-04-15-Mi-simple-daemon-openrc.pdf152.99 KB

Simple positive trust scheme with threshholds

I don’t see a reason for negative reputation schemes — voting down is in my view a flawed concept.

The rest of this article is written for freetalk inside freenet, and also posted there with my nonanonymous ID.

That just allows for community censorship, which I see as incompatible with the goals of freenet.

Would it be possible to change that to use only positive votes and a threshhold?

  • If I like what some people write, I give them positive votes.
  • If I get too much spam, I increase the threshhold for all people.
  • Effective positive votes get added. It suffices that some people I trust also trust someone else and I’ll see the messages.
  • Effective trust is my trust (0..1) · the trust of the next in the chain (0..1) · …

Usecase:

  • Zwister trusts Alice and Bob.
  • Alice trusts Lilith.
  • Bob hates Lilith.

In the current scheme (as I understand it), zwister wouldn’t see posts from Lilith.

In a pure positive scheme, zwister would see the posts. If zwister wants to avoid seeing the posts from Lilith, he has to untrust Alice or ask Alice to untrust Lilith. Add to that a personal (and not propagating) blocking option which allows me to “never see anything from Lilith again”.

Bob should not be able to interfere with me seeing the messages from Lilith, when Alice trusts Lilith.

If zwisters trust for Alice (0..1) multiplied with Alices trust for Lilith (0..1) is lower than zwisters threshhold, zwister doesn’t see the messages.

PS: somehow adapted from Credence, which would have brought community spam control to Gnutella, if Limewire had adopted it.

PPS: And adaption for news voting: You give positive votes on news which show up. Negative votes assign a private threshhold to the author of the news, so you then only see news from that author which enough people vote for.

Simple steps to attach the GNU Public License (GPL) to your project

Here's the simple steps to attach a GPL license to your source files (written after requests by DiggClone and Bandnet):

For your own project, just add the following text-notice to the header/first section of each of your source-files, commented out in whatever way your language uses:

----------------following is the notice-----------------
/*
* Your Project Name - -you slogan-
* Copyright (C) 2007 - 2007 Your Name
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*/
----------------------------------------------
the "2007 - 2007" needs to be adjusted to "year when you gave it the license in the first place" - "current year".

Then put the file gpl.txt into the source-folder or a docs folder: http://www.gnu.org/licenses/gpl.txt

If you are developing together with other people, you need their permission to put the project under the GPL.

------

Just for additional Info, I found this license comparision paper by sun: http://mediacast.sun.com/share/webmink/SunLicensingWhitePaper042006.pdf

And comments to it: http://blogs.sun.com/webmink/entry/open_source_licensing_paper#comments

It does look nice, but it misses one point:

GPL is trust: Contributors can trust, that their contributions will keep helping the community, and that the software they contribute to will keep being accessible for the community.

(That's why I decided some years ago to only support GPL projects. My contributions to one semi-closed project got lost, because the project wasn't free and the developer just decided not to offer them anymore, and I could only watch hundreds of hours of work disappear, and that hurt.)

Best wishes,
Arne
PS: If anything's missing, please write a comment!

Some Python Programs of mine

heavily outdated page. See bitbucket.org/ArneBab for many more projects…

Hi,

I created some projects with pyglet and some tools to facilitate 2D
game development (for me), and I though you might be interested.

  • babglet: basic usage of pyglet for 2D games with optional collision
    detection and avoidance.
  • blob_swarm: a swarm of blobs with emerging swarm behaviour through only pair relations.
  • blob_battle: a duel-style battle between two blobs (basic graphics,
    control and movement done)
  • fuzzy_collisions: 2 groups of blobs. One can be controlled. When two
    blobs collide, they move away a (random) bit to avoid the collision.

They are avaible from the rpg-1d6 project on sourceforge:
-> https://sf.net/projects/rpg-1d6/

The download can be found at the sf.net download page:
-> https://sourceforge.net/project/showfiles.php?group_id=199744

Surprising behaviour of Fortran (90/95)

1 Introduction

I recently started really learning Fortran (as opposed to just dabbling with existing code until it did what I wanted it to).

Here I document the surprises I found along the way.

As reference: I come from Python, C++ and Lisp, and I actually started to like Fortran while learning it. So the horror-stories I heard while studying were mostly proven wrong. I uploaded the complete code as base60.f90.

2 Testing Skelleton

This is a code sample for calculating a base60 value from an integer.

The surprises are taken out of the program and marked with double angle brackets («surprise»). They are documented in the chapter Surprises.

program base60
  ! first step: Base60 encode. 
  ! reference: http://faruk.akgul.org/blog/tantek-celiks-newbase60-in-python-and-java/
  ! 5000 should be 1PL
  implicit none
  <<declare-function-type-program>>
  <<function-test-calls>>
end program base60
<<declare-function-type-function>>
  implicit none
  !!! preparation
  <<unchanged-argument>>
  <<parameter>>
  ! work variables
  integer :: n = 0
  integer :: remainder = 0
  ! result
  <<variable-declare-init>>
  ! actual algorithm
  if (number == 0) then
     <<return>>
  end if
  ! calculate the base60 string
  <<variable-reset>>
  n = number ! the input argument: that should be safe to use.
  ! catch number = 0
  do while(n > 0)
     remainder = mod(n, 60)
     n = n/60
     <<indizes-start-at-1>>
     ! write(*,*) number, remainder, n
  end do
<<return-end>>

2.1 Helpers

write(*,*) 0, trim(numtosxg(0))
write(*,*) 100000, trim(numtosxg(100000))
write(*,*) 1, trim(numtosxg(1))
write(*,*) 2, trim(numtosxg(2))
write(*,*) 60, trim(numtosxg(60))
write(*,*) 59, trim(numtosxg(59))

3 Surprises

3.1 I have to declare the return type of a function in the main program and in the function

! I have to declare the return type of the function in the main program, too.
character(len=1000) :: numtosxg
character(len=1000) function numtosxg( number )

Alternatively to declaring the function in its header, I can also declare its return type in the declaration block inside the function body:

function numtosxg (number)
  character(len=1000) :: numtosxg
end function numtosxg

3.2 Variables in Functions accumulate over several function calls

This even happens, when I initialize the variable when I declare it:

character(len=1000) :: res = ""

Due to that I have to begin the algorithm with resetting the required variable.

res = "" ! I have to explicitely set res to "", otherwise it
         ! accumulates the prior results!

This provides a hint that initialization in a declaration inside a function is purely compile-time.

program accumulate
  implicit none
  integer :: acc
  write(*,*) acc(), acc(), acc() ! prints 1 2 3
end program accumulate

integer function acc()
  implicit none
  integer :: ac = 0
  ac = ac + 1
  acc = ac
end function acc
program accumulate
  implicit none
  integer :: acc
  write(*,*) acc(), acc(), acc() ! prints 1 1 1
end program accumulate

integer function acc()
  implicit none
  integer :: ac
  ac = 0
  ac = ac + 1
  acc = ac
end function acc

3.3 parameter vs. intent(in)

Defining a variable as parameter gives a constant, not an unchanged function argument:

! constants: marked as parameter: not function parameters, but
! algorithm parameters!
character(len=61), parameter :: base60chars = "0123456789"&
     //"ABCDEFGHJKLMNPQRSTUVWXYZ_abcdefghijkmnopqrstuvwxyz"

An argument the function is not allowed to change is defined via intent(in):

! input: ensure that this is purely used as input.
! intent is only useful for function arguments.
integer, intent(in) :: number

3.4 To return values from functions, assign the value to the function itself

This feels surprisingly obvious, but it was surprising to me nontheless.

numtosxg = "0"
return

The return statement is only needed when returning within a function. At the end of the function it is implied.

  numtosxg = res
end function numtosxg

3.5 Fortran array indizes start at 1 - and are inclusive

For an algorithm like the example base60, where 0 is identified by the first character of a string, this requires adding 1 to the index.

! note that fortran indizes start at 1, not at 0.
res = base60chars(remainder+1:remainder+1)//trim(res)

Also note that the indizes are inclusive. The following actually gets the single letter at index n+1:

base60chars(n+1:n+1)

In python on the other hand, the second argument of the array is exclusive, so to get the same result you would use [n:n+1]:

pythonarray[n:n+1]

3.6 I have to trim strings when concatenating

It is necessary to get rid of trailing blanks (whitespace) from the last char to the end of the declared memory space, otherwise there will be huge gaps in combined strings - or you will get missing characters.

program test
  character(len=5) :: res
  write(*,*) res ! undefined. In the last run it gave me null-bytes, but
                 ! that is not guaranteed.
  res = "0"
  write(*,*) res ! 0
  res = trim(res)//"a"
  write(*,*) res ! 0a
  res = res//"a"
  write(*,*) res ! 0a: trailing characters are silently removed.
  ! who else expected to see 0aa?
  write(res, '(a, "a")') trim(res) ! without trim, this gives an error!
                                   ! *happy*
  write(*,*) res
end program test

Hint from Alexey: use trim(adjustl(…)) to get rid of whitespace on the left and the right side of the string. Trim only removes trailing blanks.

Author: Arne Babenhauserheide

Emacs 24.3.1 (Org mode 8.0.2)

AnhangGröße
surprises.org8.42 KB
accumulate.f90226 Bytes
accumulate-not.f90231 Bytes
base60-surprises.f901.6 KB
trim.f90501 Bytes
surprises.pdf206.83 KB
surprises.html22.47 KB
base60.f902.79 KB

Tail Call Optimization (TCO), dependency, broken debug builds in C and C++ — and gcc 4.8

TCO: Reducing the algorithmic complexity of recursion.
Debug build: Add overhead to a program to trace errors.
Debug without TCO: Obliterate any possibility of fixing recursion bugs.

“Never develop with optimizations which the debug mode of the compiler of the future maintainer of your code does not use.”°

UPDATE: GCC 4.8 gives us -Og -foptimize-sibling-calls which generates nice-backtraces, and I had a few quite embarrassing errors in my C - thanks to AKF for the catch!

1 Intro

Tail Call Optimization (TCO) makes this

def foo(n):
    print(n)
    return foo(n+1)
foo(1)

behave like this

def foo(n):
    print(n)
    return n+1
n = 1 while True: n = foo(n)

Table of Contents

I recently told a colleague how neat tail call optimization in scheme is (along with macros, but that is a topic for another day…).

Then I decided to actually test it (being mainly not a schemer but a pythonista - though very impressed by the possibilities of scheme).

So I implemented a very simple recursive function which I could watch to check the Tail Call behaviour. I tested scheme (via guile), python (obviously) and C++ (which proved to provide a surprise).

2 The tests

2.1 Scheme

(define (foo n)
  (display n)
  (newline)
  (foo (1+ n)))

(foo 1)

2.2 Python

def foo(n):
    print n
    return foo(n+1)

foo(1)

2.3 C++

The C++ code needed a bit more work (thanks to AKF for making it less ugly/horrible!):

#include <stdio.h>

int recurse(int n)
{
  printf("%i\n", n);
  return recurse(n+1);
}

int main()
{
  return recurse(1);
}

Additionally to the code I added 4 different ways to build the code: Standard optimization (-O2), Debug (-g), Optimized Debug (-g -O2), and only slightly optimized (-O1).

all : C2 Cg Cg2 C1

# optimized
C2 : tailcallc.c
    g++ -O2 tailcallc.c -o C2

# debug build
Cg : tailcallc.c
    g++ -g tailcallc.c -o Cg

# optimized debug build
Cg2 : tailcallc.c
    g++ -g -O2 tailcallc.c -o Cg2

# only slightly optimized
C1 : tailcallc.c
    g++ -O1 tailcallc.c -o C1

3 The results

So now, let’s actually check the results. Since I’m interested in tail call optimization, I check the memory consumption of each run. If we have proper tail call optimization, the required memory will stay the same over time, if not, the function stack will get bigger and bigger till the program crashes.

3.1 Scheme

Scheme gives the obvious result. It starts counting numbers and keeps doing so. After 10 seconds it’s at 1.6 million, consuming 1.7 MiB of memory - and never changing the memory consumption.

3.2 Python

Python is no surprise either: it counts to 999 and then dies with the following traceback:

Traceback (most recent call last):
 File "tailcallpython.py", line 6, in <module>
   foo(1)
 File "tailcallpython.py", line 4, in foo
   return foo(n+1)
… repeat about 997 times …
RuntimeError: maximum recursion depth exceeded

Python has an arbitrary limit on recursion which keeps people from using tail calls in algorithms.

3.3 C/C++

C/C++ is a bit trickier.

First let’s see the results for the optimized run:

3.3.1 Optimized

g++ -O2 C.c -o C2
./C2

Interestingly that runs just like the scheme one: After 10s it’s at 800,000 and consumes just 144KiB of memory. And that memory consumption stays stable.

3.3.2 Debug

So, cool! C/C++ has tail call optimization. Let’s write much recursive tail call using code!

Or so I thought. Then I did the debug run.

g++ -g C.c -o Cg
./Cg 

It starts counting just like the optimized version. Then, after about 5 seconds and counting to about 260,000, it dies with a segmentation fault.

And here’s a capture of its memory consumption while it was still running (thanks to KDEs process monitor):

Private

7228 KB   [stack]
56 KB [heap]
40 KB /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libstdc++.so.6.0.17
24 KB /lib64/libc-2.15.so
12 KB /home/arne/.emacs.d/private/journal/Cg

Shared

352 KB    /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libstdc++.so.6.0.17
252 KB    /lib64/libc-2.15.so
108 KB    /lib64/ld-2.15.so
60 KB /lib64/libm-2.15.so
16 KB /usr/lib64/gcc/x86_64-pc-linux-gnu/4.7.2/libgcc_s.so.1

That’s 7 MiB after less than 5 seconds runtime - all of it in the stack, since that has to remember all the recursive function calls when there is no tail call optimization.

So we now have a program which runs just fine when optimized but dies almost instantly when run in debug mode.

But at least we have nice gdb traces for the start:
recurse (n=43) at C.c:5
5         printf("%i\n", n);
43
6         return recurse(n+1);

3.4 Optimized debug build

So, is all lost? Luckily not: We can actually specify optimization with debugging information.

g++ -g -O2 C.c -o Cg2
./Cg2

When doing so, the optimized debug build chugs along just like the optimized build without debugging information. At least that’s true for GCC.

But our debug trace now looks like this:
5         printf("%i\n", n);
printf (__fmt=0x40069c "%i\n") at /usr/include/bits/stdio2.h:105
105       return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
5
6         return recurse(n+1);
That’s not so nice, but at least we can debug with tail call optimization. We can also improve on this (thanks to AKF for that hint!): We just need to enable tail call optimization separately:
g++ -g -O1 -foptimize-sibling-calls C.c -o Cgtco
./Cg 
But this still gives ugly backtraces (if I leave out -O1, it does not do TCO). So let’s turn to GCC 4.8 and use -Og.
g++ -g -Og -foptimize-sibling-calls C.c -o Cgtco
./Cgtco 
And we have nice backtraces!
recurse (n=n@entry=1) at C.c:4
4       {
5         printf("%i\n", n);
1
6         return recurse(n+1);
5         printf("%i\n", n);
2
6         return recurse(n+1);

3.5 Optimized for size

Can we invert the question? Is all well, now?

Actually not…

If we activate minor optimization, we get the same unoptimized behaviour again.

g++ -O1 C.c -o C1
./C1

It counts to about 260,000 and then dies from a stack overflow. And that is pretty bad™, because it means that a programmer cannot trust his code to work when he does not know all the optimization strategies which will be used with his code.

And he has no way to define in his code, that it requires TCO to work.

4 Summary

Tail Call Optimization (TCO) turns an operation with a memory requirement of O(N)1 into one with a memory requirement of O(1).

It is a nice tool to reduce the complexity of code, but it is only safe in languages which explicitely require tail call optimization - like Scheme.

And from this we can find a conclusion for compilers:

C/C++ compilers should always use tail call optimization, including debug builds, because otherwise C/C++ programmers should never use that feature, because it can make it impossible to use certain optimization settings in any code which includes their code.

And as a finishing note, I’d like to quote (very loosely) what my colleague told me from some of his real-life debugging experience:

“We run our project on an AIX ibm-supercomputer. We had spotted a problem in optimized runs, so we activated the debugger to trace the bug. But when we activated debug flags, a host of new problems appeared which were not present in optimized runs. We tried to isolate the problems, but they only appeared if we ran the full project. When we told the IBM coders about that, they asked us to provide a simple testcase… The problems likely happened due to some crazy optimizations - in our code or in the compiler.”

So the problem of undebuggable code due to a dependency of the program on optimization changes is not limited to tail call optimization. But TCO is a really nice way to show it :)

Let’s use that to make the statement above more general:

C/C++ compilers should always do those kinds of optimizations which lead to changes in the algorithmic cost of programs.

Or from a pessimistic side:

You should only rely on language features, which are also available in debug mode - and you should never develop your program with optimization turned on.

And by that measure, C/C++ does not have Tail Call Optimization - at least until all mainstream compilers include TCO in their default options. Which is a pretty bleak result after the excitement I felt when I realized that optimizations can actually give C/C++ code the behavior of Tail Call Optimization.

Never develop with optimizations which the debug mode of the compiler of the future maintainer of your code does not use.Never develop with optimizations which are not required by the language standard.

Note, though, that GCC 4.8 added the -Og option, which improves the debugging a lot (Phoronix wrote about plans for that last september). It still does not include -foptimize-sibling-calls in -Og, but that might be only a matter of time… I hope it is.

Footnotes:

1 : O(1) and O(N) describe the algorithmic cost of an algorithm. If it is O(N), then the cost rises linearly with the size of the problem (N is the size, for example printing 20,000 consecutive numbers). If it is O(1), the cost is stable regardless of the size of the problem.

Top 5 systemd troubles - a strategic view for distros

systemd is a new way to start a Linux-system with the expressed goal of rethinking all of init. These are my top 5 gripes with it. (»skip the updates«)

Update (2014-12-11): One more deconstruction of the strategies around systemd: systemd: Assumptions, Bullying, Consent. It shows that the attitude which forms the root of the dangers of systemd is even visible in its very source code.

Update (2014-11-19): The Debian General Resolution resulted in “We do not need a general resolution to decide systemd”. The vote page provides detailed results and statistics. Ian Jackson resigned from the Technical Committee: “And, speaking personally, I am exhausted.”

Update (2014-10-16): There is now a vote on a General Resolution in Debian for preserving the ability to switch init systems. It is linked under “Are there better solutions […]?” on the site Shall we fork Debian™? :^|.

Update (2014-10-07): Lennart hetzt (german) describes the rhetoric tricks used by Lennart Poettering to make people forget that he is a major part of the communication problems we’re facing at times - and to hide valid technical, practical, pragmatical, political und strategical criticism of Systemd.

Update (2014-09-24): boycott systemd calls for action with 12 reasons against systemd: “We do recognize the need for a new init system in the 21st century, but systemd is not it.”

Update (2014-04-03): And now we have Julian Assange warning about NSA control over Debian, Theodore Ts’o, maintainer of ext4, complaining about incomprehensible systemd, and Linus Torvalds (you know him, right?) rant against disrupting behavior from systemd developers, going as far as refusing to merge anything from the developers in question into Linux. Should I say “I said so”? Maybe not. After all, I came pretty late. Others saw this trend 2 years before I even knew about systemd. Can we really assume that there won’t be intentional disruption? Maybe I should look for solutions. It could be a good idea to start having community-paid developers.

Update (2014-02-18): An email to the mailing list of the technical committee of debian summarized the strategic implications of systemd-adoption for Debian and RedHat. It was called conspiracy theory right away, but the gains for RedHat are obvious: RedHat would be dumb not to try this. And only a fool trusts a company. Even the best company has to put money before ethics.

Update (2013-11-20): Further reading shows that people have been giving arguments from my list since 2011, and they got answers in the range of “anything short of systemd is dumb”, “this cannot work” (while OpenRC clearly shows that it works well), requests for implementation details without justification and insults and further insults; but the arguments stayed valid for the last 2 years. That does not look like systemd has a friendly community - or is healthy for distributions adopting it. Also an OpenRC developer wrote the best rebuttal of systemd propaganda I read so far: “Alternativlos”: Systemd propaganda (note, though, that I am biased against systemd due to problems I had in the past with udev kernel-dependencies)

  1. Losing Control: systemd does so many crucial things itself that the developers of distributions lose their control over the init process: If systemd developers decide to change something, the distributions might actually have to fork systemd and keep the fork up-to-date, and this requires rare skills and lots of resources (due to the pace of systemd). See the Gentoo eudev-Project for a case where this had to happen so the distribution could keep providing features its users rely on. Systemd nowadays incorporates udev. Go reason how systemd devs will act.1 Why losing control is a bad idea: Strategy Letter V: Commodities
  2. No scripts (as if you can know beforehand all the things the init system will need to do in each distribution). Nowadays any system should be user-extendable to avoid bottlenecks for development. This essentially boils down to providing a scripting language. Using the language which almost every system administrator knows is a very sane choice for that - and means making it possible to use Shell-Scripts to extend the init-system. Scripts mean that the distribution will never be in a position where it is blocked because it absolutely can’t provide a given fringe feature. And as the experiment with paludis in Gentoo shows, an implementation in C isn’t magically faster than one in a scripting language and can actually be much slower (just compare paludis to pkgcore), because the execution time of the language only very rarely is the real bottleneck - and you can easily shell out that part to a faster language with negligible time loss,2 especially in shell-scripts (pun partially intended). While systemd can be told to run a shell script, this requires a mental context switch and the script cannot tie into all the machinery inside systemd. If there’s a bug in systemd, you need to fix systemd, if you need more than systemd provides out of the box, you need either a script or you have to patch systemd, and otherwise you write in a completely different language (so most people won’t have the skills to go beyond the fences of the ground defined by the systemd developers as proper for users). Why killing scripts is a bad idea: Bloatware and the 80/20 Myth
  3. Linux-specific3 (are you serious??). This makes the distribution an add-on to the kernel instead of the distribution being a focus point of many different development efforts. This is a second point where distributions become commodities, and as for systemd itself, this is against the interest of the distributions. On the other hand, enabling the use of many different kernels strengthens the Distribution - even if currently only few people are using them. Why being Linux-only is a bad idea for distributions: Strategy Letter V: Commodities
  4. Requiring an up-to-date kernel. This problem already gives me lots of headaches for my OLPC due to udev (from the same people as systemd… which is one of the reasons why I hope that Gentoo-devs will succeed with eudev), since it is not always easy to go to a newer kernel when you’re on a fringe platform (I’m currently fighting with that). An init system should not require some special kernel version just to boot… Why those hard dependencies are a bad idea: Bloatware and the 80/20 Myth AND Strategy Letter V: Commodities
  5. Requiring D-Bus. D-Bus was already broken a few times for me, and losing not just some KDE functionality but instead making my system unbootable is unacceptable. It’s bad enough that so much stuff relies on udev.4

In my understanding, we need more services which can survive without the others, so the system gets resilient against failures in a given part. As the system gets more and more complex, this constantly gets more important: Less interdependencies, and the services which are crucial to get my system in a debuggable state should be small and simple - and should not require many changes to implement new features.

Having multiple tools to solve the same problem looks like wasted resources, but actually this extends the range of problems which can be solved with our systems and avoids bottlenecks and single points of failure (either tools or communities), so it makes us resilient. Also it encourages standard-formats to minimize the cost of maintaining several systems side-by-side.

You can see how systemd manages to violate all these principles…

This does not mean, that the features provided by systemd are useless. It says that the way they are embedded in systemd with its heavy dependencies is detrimental to a healthy distribution.

Note: I am neither a developer of systemd, nor of upstart, sysvinit or OpenRC. I am just a humble user of distributions, but I can recognize impending horrible fallout when I see it.

References:

I’ll finish this with a quote from 30 myths about systemd:

We try to get rid of many of the more pointless differences of the various distributions in various areas of the core OS. As part of that we sometimes adopt schemes that were previously used by only one of the distributions and push it to a level where it's the default of systemd, trying to gently push everybody towards the same set of basic configuration.
— Lennart Poettering, main developer of systemd

I could not show much clearer why distributions should be very wary about systemd than Lennart Poettering does here in the post where he tries to refute myths about systemd.

PS: I’m definitely biased against systemd, after having some horrifying experiences with kernel-dependencies in udev. Resilience looks different. And I already modified some init scripts to adjust my systems behavior so it better fits my usecase. Now go and call me part of a fringe group which wants to add “pointless differences” to the system. If you force Gentoo devs to issue a warning in the style of “you MUST activate feature X in your kernel, else your system will become unbootable”, this should be a big red flag to you that you’re doing something wrong. If you do that twice, this is a big red flag to users not to trust your software. And regaining that trust requires reestablishing a long record of solid work. Which I do not see at the moment. Also do read Bloatware and the 80/20 Myth (if you didn’t do that by now): It might be true that 80% of the users only use 20% of the features, but they do not use the same 20%.


  1. Update 2014: Actually there is no need to guess how the systemd developers will act: They showed (again) that they will keep breaking systems of their users: “udev now silently fails to do anything useful if devtmpfs is missing, almost as if resilience was a disease” — bonsaikitten, Gentoo developer, 2014-01, long after udev was subsumed into systemd. 

  2. Running a program in a subshell increases the runtime by just six milliseconds. I measured that when testing ways to run GNU Guile modules as scripts. So you have to start almost 100 subshells during bootup to lose half a second of runtime. Note that OpenRC can boot a system and power down again in under 0.7 seconds and the minimal boot-to-login just takes 250 ms. There is no need for systemd to get a faster boot. 

  3. The systemd proponents in the debian initsystem discussion explicitly stated that they don’t want to port systemd to other kernels. 

  4. And D-Bus is slow, slow, slow when your system is under heavy memory and IO-pressure, as my systems tend to be (I’m a Gentoo user. I often compile a new version of all KDE-components or of Firefox while I do regular work on the computer). From dbus I’m used to reaction times up to several seconds… 

Weltenwald-theme under AGPL (Drupal)

After the last round of polishing, I decided to publish my theme under AGPLv3. Reason: If you use AGPL code and people access it over a network, you have to offer them the code. Which I hereby do ;)
That’s the only way to make sure that website code stays free.

It’s still for Drupal 5, because I didn’t get around to port it, and it has some ugly hacks, but it should be fully functional.

Just untar it in any Drupal 5 install.

tar xjf weltenwald-theme-2010-08-05_r1.tar.bz2

Maybe I’ll get around to properly package it in the future…

Until then, feel free to do so yourself :)

And should I change the theme without posting a new layout here, just drop me a line and I’ll upload a new version — as required by AGPL. And should you have some problem, or if something should be missing, please drop me a line, too.

No screenshot, because a live version kicks a screenshot any day ;)
(in case it isn’t clear: Weltenwald is the theme I use on this site)

AnhangGröße
weltenwald-theme-2010-08-05_r1.tar.bz2877.74 KB

Why Gnutella scales quite well

You might have read in some (almost ancient) papers, that a network like Gnutella can't scale. So I want to show you, why the current Version of Gnutella does scale, and does it well.

In earlier versions, up to v0.4, Gnutella was a a pure broadcast network. That means, that every search request did reach every participant, so the number of search requests hitting each node was for an optimal network exactly equal to the number of requests, made by nodes who were in the network. And you can see easily why that can't scale.
But that was only true for Gnutella 0.4.

In the current incarnation of Gnutella (Gnutella 0.6), Gnutella is no longer a pure Broadcast network. Instead, only the smallest percentage of the traffic is done via broadcast.

If you want to read about the methods used to realize this, please have a look at the GnuFU guide (english, german).

Here I want to limit it to the statement, that the first two hops of a search request are governed via Dynamic Querying, which stops the request as soon as it has enough sources (this stops a search as soon as it gets about 250 results), and that the last two hops are governed via the Query Routing Protocol, which ensures, that a search request reaches only those hosts, which can actually have the file (which is only about 5% of the nodes).

So in todays reality, Gnutella is a quite structured and very flexible network.

To scale it, Ultrapeers can increase their number of connections from their current 32 upwards, which makes Dynamic Querying (DQ) and the Query Routing Protocol (QRP) even more effective.

In the case of DQ most queries for popular files will still provide enough results after the same number of clients have been contacted, so increasing the number of connections won't change the network traffic at all which is caused by the first two steps.

In the case of QRP, queries wil still only reach the hosts, which can have the file, and if Ultrapeers are connected to more nodes at the same time (by increasing the number of connections), it will provide more results for each connection, so DQ will stop even earlier than with fewer connections per Ultrapeer.

So Gnutella is now far from a broadcast model, and the act of increasing the size of the Gnutella Network can even increase its efficiency for popular files.

For rare files, QRP kicks in with full force, and even though DQ will likely check all other nodes for content, QRP will make sure that only those nodes are reached, which can have the content, which might be only 0.1% of the net or even far less.

Here, increasing the number of nodes per Ultrapeer means that nodes with rare files are in effect closer to you than before, so Gnutella also gets more efficient when you increase the network size, when rare file searches are your major concern.

So you can see, that Gnutella has become a network, which scales extremly well for keyword searches, and due to that it can also very efficiently be used to search for metadata and similar concepts.

The only thing which Gnutella can't do well are searches for strings which aren't seperate words (for example file-hashes), because that kills QRP, so they will likely not reach (m)any hosts. For these types of searches, the Gnutella developers work on a DHT (Distributed Hash Table), which will only be used, if the string can't be split into seperate words, and that DHT will most likely be Kademlia, which is also proven to work quite well.

And with that, the only problem which remains in need of fixing is spam, because that inhibits DQ when you do a rare search, but I am sure that the devs will also find a way to stop spamming, and even with spam, Gnutella is quite effective and consumes very little bandwidth, when you are acting as a leaf, and only moderate bandwidth when you are acting as ultrapeer.

Some figures as finishing touch:

  • Leaf network traffic: About 1kB/s if you add outgoing and incoming traffic, which is about the seventh part of the speed of a 56k modem.
  • Ultrapeer traffic: About 7kB/s, outgoing and incoming added together, which is about one full ISDN line of less than 1/8th of a DSLs outgoing speed.

Have fun with Gnutella!
- ArneBab 08:14, 15. Nov 2006 (CET)

PS: This guide ignores, that requests must travel through intermediate nodes. But since those nodes make up only about 3% of the network and only 3% of those nodes will be reached by a (QRP-routed) rare file request, it seems safe to ignore these 0.1% of the network in the calculations for the sake of making it easier to follow them mentally (QRP takes care of that).

Write programs you can still hack when you feel dumb

"Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it." --Brian Kernighan

I just read the post Hyperfocus and balance of Arc Riley from PySoy who talks about trying to get to the Hyperfocus state without endangering his health. Since I have similar needs, I am developing some strategies for that myself (though not for my health, but because my wife and children can’t be expected to let me work 8h without any interruptions in my free time).

Different from Arc, I try to change my programming habits instead of changing myself to fit to the requirements of my habits.1

Easy times

Let’s begin with Programming while you feel great.

The guideline I learned from writing PnP roleplaying games is to keep the number of things to know below 7 at each point (well, the actual limitation for average humans is 4 objects!). For a function of code I would convert that as follows:

  1. You need to keep in mind the function you work in (location), and
  2. the task it should perform (purpose and effect), and
  3. the resources it uses (arguments or global values/class attributes).

Only 4 things left for the code of your function. (three if you use both class attributes/global values and function arguments. Two, if you have complex custom data-structures with peculiar names or access-methods which you have to understand for doing anything. One if you also have to remember the commands of an unfamiliar editor or VCS tool. See how fast this approaches zero even starting with 7 things?)

Add an if-switch, for-loop or similar and you have only 3 things left.

You need those for what the function should actually do, so better put further complexities into subfunctions.

Also ensure that each of the things you work with is easy enough. If you get the things you use down to 7 by writing functions with 20 arguments, you don’t win anything. Just the resources you could use in the function will blow your mind when you try to change the function a few months later. This goes for every part of your program: The number of functions, the number of function arguments, the number of variables, the lines of code per function and even the number of hierarchy levels you use to reduce the other things you need to keep in mind at any given time.

Hard times

But if you want to be able to hack that code while you feel dumb (compared to those streaks of genius when you can actually hold the whole structure of your program in your head and forsee every effect of a given change before actually doing it), you need to make sure that you don’t have to take all 7 things into account.

Tune it down for the times when you feel dumb by starting with 5 things.2 After substracting one for the location, for the task and for the resources, you are left with only two things:

Two things for your function. Some Logic and calling stuff are 2 things.

If it is an if-switch, let it be just an if-switch calling other functions. Yes, it may feel much easier to do it directly here, when you are fully embedded in your code and feel great, but it will bite you when you are down. Which is exactly when you won’t want to be bitten by your own code.

Loose coupling and tight cohesion

Programming is a constant battle against complexity. Stumble from the sweet spot of your program into any direction, and complexity raises its ugly head. But finding the sweet spot requires constant vigilance, as it shifts with the size and structure of your program and your development group.

To find a practical way of achieving this, Django’s concept of loose coupling and tight cohesion (more detailed) helped me most, because it reduces the interdependencies.

The effects of any given change should be contained in the part of the code you work in - and in one type of code.

As web framework, Django seperates the templates, the URI definitions, the program code and the database access from each other. (see how these are already 4 categories, hitting the limit of our mind again?)

For a game on the other hand, you might want to seperate story, game logic, presentation (what you see on the screen) and input/user actions. Also people who write a scenario or level should only have to work in one type of code, neatly confined in one file or a small set of files which reside in the same place.

And for a scientific program, data input, task definition, processing and data output might be seperated.

Remember that this seperation does not only mean that you put those parts of the code into different files, but that they are loosely coupled:

They only use lean and clearly defined interfaces and don’t need to know much about each other.

Conclusions

This strategy does not only make your program easier to adapt (because the parts you need to change for implementing a given feature are smaller). If you apply it not only to the bigger structure, but to every part of the program, it’s main advantage is that any part of the code can be understood without having to understand other parts.

And you can still understand and hack your code, when your child is sick, your wife is overworked, you slept 3 hours the night before - and can only work for half an hour straight, because it’s evening and you don’t want to be a creep (but this change has to be finished nontheless).

Note that finding a design which accomplishes this is far more complex than it sounds. If people can read your code and say “oh, that’s easy. I can hack that” (and manage to do so), then you did it right.

Designing a simple structure to solve a complex task is far harder than designing a complex structure to solve that task.

And being able to hack your program while you feel dumb (and maybe even hold it in your head) is worth investing some of your genius-time into your design (and repeating that whenever your code grows too hairy).


  1. Where I got bitten badly by my high-performance coding habits is the keyboard layout evolution program. I did not catch my error when the structure grew too complex (while adding stuff), and now that I do not have as much uninterrupted time as before, I cannot actually work on it efficiently anymore. I’m glad that this happened with a mostly finished project on whoose evolution no ones future depended. Still it is sad that this will keep me from turning it into a realtime visual layout optimizer. I can still work on its existing functionality (I kept improving it for the most importang task: the cost calculation), but adding new functionality is a huge pain. 

  2. See how I actually don’t get below 5 here? A good TODO list which shows you the task so you can forget it while coding might get you down to 4. But don’t bet on it. Not knowing where you are or where you want to go are recipes for desaster… And if you make your functions too small, the collection of functions gets more complex, or the object hierarchy too deep, adding complexity at other places. Well, no one said creating well-structured programs would be easy. You need to find the right compromise for you. 

Your browser history can be sniffed with just 64 lines of Python (tested with Firefox 3.5.3)

After the example of making-the-web, I was quite intrigued by the ease of sniffing the history via simple CSS tricks.

- Firefox Bug report - still open!
- Start Panic! - a site dedicated to spreading the news about the vulnerability.
- What the internet knows about you - easily sniff yourself.
- Cute kitten - look at cute kittens. Does this look suspicious? :)

So I decided to test, how small I get a Python program which can sniff the history via CSS - without requiring any scripting ability on the browser-side.

I first produced fully commented code (see server.py) and then stripped it down to just 64 lines (server-stripped.py), to make it really crystal clear, that making your browser vulnerable to this exploit is a damn bad idea. I hope this will help get Firefox fixed quickly.

If you see http://blubber.blau as found, you're safe. If you don't see any links as found, you're likely to be safe. In any other case, everyone in the web can grab your history - if given enough time (a few minutes) or enough iframes (which check your history in parallel). This doesn't use Javascript.

It currently only checks for the 1000 or so most visited websites and doesn't keep any logs in files (all info is in memory and wiped on every restart), since I don't really want to create a full fledged history ripper but rather show how easy it would be to create one.

Besides: It does not need to be run in an iframe. Any Python-powered site could just run this test as regular part of the site while you browse it (and wonder why your browser has so much to do for a simple site, but since we’re already used to high load due to Javascript, who is going to care?). So don’t feel safe, just because there are no iframes. To feel and be safe, use one of the solutions from What the Internet knows about you.

Konqueror seems to be immune: It also (pre-)loads the "visited"-images from not visited links, so every page is seen as visited - which is the only way to avoid spreading my history around on the web and still providing “visited” image-hints in the browser!

Firefox 4.0.1 seems to be immune, too: It does not show any :visited-images, so the server does not get any requests.

So please don't let your browser load anything depending on the :visited state of a link tag! It shouldn't load anything based on internal information, because that always publicizes private information - and you don't know who will read it!

In short: Don't keep repeating Ennesbys Mistake:

  • Mistake: http://www.schlockmercenary.com/d/20071201.html

  • Effects: http://www.schlockmercenary.com/d/20071206.html

(comic strips not hosted here and not free licensed → copyright: Howard V. Tayler)

And to the Firefox developers: Please remove the optimization of only loading required css data based on the visited info! I already said so in a bug report, and since the bug isn't fixed, this is my way to put a bit of weight behind it. Please stop putting your users privacy at risk.

Usage:

  • python server.py
    start the server at port 8000. You can now point your browser to http://127.0.0.1:8000 to get sniffed :)

To get more info, just use ./server.py --help.

complex number compiler and libc bugs (cexp+conj) on OSX and with the intel compiler (icc)

Today a bug in complex number handling surfaced in guile which only appeared on OSX.

This is a short note just to make sure that the bug is reported somewhere.

Test-code (written mostly by Mark Weaver who also analyzed the bug - I only ran the code on a few platforms I happened to have access to):

// test.c
// compile with gcc -O0 -o test test.c -lm
// or with icc -O0 -o test test.c -lm
#include <complex.h>
#include <stdio.h>

int
main (int argc, char **argv)
{
  double complex z = conj (1.0);
  double complex result;

  if (argc == 1)
    z = conj (0.0);

  result = cexp (z);

  printf ("cexp (%f + %f i) => %f + %f i\n",
          creal (z), cimag (z), creal (result), cimag (result));
  result = conj(result);
  printf ("conj(cexp (%f + %f i)) => %f + %f i\n",
          creal (z), cimag (z), creal (result), cimag (result));

  return 0;
}

As by the C-11 standard (pages 561 and 216) this should return:

cexp (0.000000 + -0.000000 i) => 1.000000 + -0.000000 i

conj(cexp (0.000000 + -0.000000 i)) => 1.000000 + 0.000000 i

Page 561:

— cexp(conj(z)) = conj(cexp(z)).

Page 216:

The conj functions compute the complex conjugate of z, by reversing the sign of its imaginary part.

On OSX it returns (compiled with GCC):

TODO: Check the second line!

cexp (0.000000 + -0.000000 i) => 1.000000 + 0.000000 i

With the intel compiler it returns:

cexp (0.000000 + 0.000000 i) => 1.000000 + 0.000000 i

conj(cexp (0.000000 + 0.000000 i)) => 1.000000 + 0.000000 i

In short: On OSX cexp seems broken. With the intel compiler conj seems broken.

icc --version
# => icc (ICC) 13.1.3 20130607
# => Copyright (C) 1985-2013 Intel Corporation.  All rights reserved.

The OSX compiler is GCC 4.8.2 from MacPorts.


[taylanub] ArneBab: You might want to add that compiler optimizations can result in cexp() calls where there are none (which is how this bug surfaced in our case).

[mark_weaver] cexp(z) = e^z = e^(a+bi) = e^a * e^(bi) = e^a * (cos(b) + i*sin(b))

[mark_weaver] for real 'b', e^(bi) is a point on the unit circle on the complex plane.

[mark_weaver] so cexp(bi) can be used to compute cos(b) and sin(b) simultaneously, and probably faster than calling 'sin' and 'cos' separately.

pyRad - a wheel type command interface for KDE

Arrrrrr! Ye be replacin' th' walk th' plank alt-tab wi' th' keelhaulin' pirate wheel, matey! — Lacrocivious

pyRad is a wheel type command interface for KDE1, designed to appear below your mouse pointer at a gesture.

install | setup | usage and screenshots | download and sources

pyRad command wheel

Install

in any distro

  • Get Python.
  • call easy_install pyRadKDE in any shell.
  • Test it by calling pyrad.py.
  • This should automatically pull in pyKDE4. If it doesn’t, you need to install that seperately.
  • Visual icon selection requires the kdialog program (a standard part of KDE).

  • For a "live" version, just clone the pyrad Mercurial repo and let KDE run "path/to/repo/pyrad.py" at startup. You can stop a running pyrad via pyrad.py --quit. pyrad.py --help gives usage instructions.

In Gentoo

  • emerge -a kde-misc/pyrad

In unfree systems (like MacOSX and Windows)

  • I have no clue since I don’t use them. You’ll need to find out yourself or install a free system. Examples are Kubuntu for beginners and Gentoo for convenient tinkering. Both run GNU/Linux.

Setup

  • Run /usr/bin/pyrad.py. Then add it as script to your autostart (systemsettings→advanced→autostart). You can now use Alt-F6 and Meta-F6 to call it.

Mouse gesture (optional)

  • Add the mouse gesture in systemsettings (systemsettings→shortcuts) to call D-Bus: Program: org.kde.pyRad ; Object: /MainApplication ; Function: newInstance (you might have to enable gestures in the settings, too - in the shortcuts-window you should find a settings button).

  • Alternately set the gesture to call the command dbus-send --type=method_call --dest=org.kde.pyRad /MainApplication org.kde.KUniqueApplication.newInstance.

Customize the wheel

Customize the menu by editing the file "$HOME/.pyradrc" or middle-clicking (add) and right-clicking (edit) items.

Usage and screenshots

To call pyRad and see the command wheel, you simply use the gesture or key you assigned.

pyRad command wheel

Then you can activate an action with a single left click. Actions can be grouped into folders. To open a folder, you also simply left-click it.

Also you can click the keyboard key shown at the beginning of the tooltip to activate an action (hover the mouse over an icon to see the tooltip).

To make the wheel disappear or leave a folder, click the center or hit the key 0. To just make it disappear, hit escape.

For editing an action, just right click it, and you’ll see the edit dialog.

pyRad edit dialog

Each item has an icon (either an icon name from KDE or the path to an icon) and an action. The action is simply the command you would call in the shell (only simple commands, though, no real shell scripting or glob).

To add a new action, simply middle-click the action before it. The wheel goes clockwise, with the first item being at the bottom. To add a new first item, middle-click the center.

To add a new folder (or turn an item into a folder), simply click on the folder button, say OK and then click it to add actions in there.

See it in action:

pyRad in action (screenshot)

download and sources

pyRad is available from

PS: The name is a play on ‘python’, ‘Rad’ (german for wheel) and pirate :-)

PPS: KDE, K Desktop Environment and the KDE Logo are trademarks of KDE e.V.

PPPS: License is GPL+ as with almost everything on this site.Arrrrrr! Ye be replacin' th' walk th' plank alt-tab wi' th' keelhaulin' pirate wheel, matey! Arrrrr! → http://draketo.de/light/english/pyrad


  1. powered by KDE 

AnhangGröße
pyrad-0.4.3-screenshot.png26.67 KB
pyrad-0.4.3-screenshot-edit-action.png36.28 KB
pyrad-0.4.3-screenshot-edit-folder.png39.18 KB
pyrad-0.4.3-screenshot2.png29.03 KB
pyrad-0.4.3-screenshot3.png27.59 KB
powered_by_kde_horizontal_190.png11.96 KB
pyrad-0.4.3-fullscreen.png913.3 KB
pyrad-0.4.3-fullscreen-400x320.png143.69 KB
pyrad-0.4.4-screenshot-edit-action.png40.94 KB

pyRad is now in Gentoo portage! *happy*

My wheel type command interface pyRad just got included in the official Gentoo portage-tree!

So now you can install it in Gentoo with a simple emerge kde-misc/pyrad.

pyRad command wheel

Many thanks go to the maintainer Andreas K. Hüttel (dilfridge) and to jokey and Tommy[D] from the Gentoo sunrise project (wiki) for providing their user-overlay and helping users with creating ebuilds as well as Arfrever, neurogeek, floppym from the Gentoo Python-Herd for helping me to clean up the ebuild and convert it to EAPI 3!

shell basics (bash)

These are the notes to a short tutorial I gave to my working group as part of our groundwork group meetings. Some parts here require GNU Bash.

1 Outline

1.1 Outline

  • user-output: echo
  • pipes: |, xargs, - (often stdin)
  • text-processing: cat/tac, sed, grep, cut, head/tail
  • variables (foo=1; echo ${foo})
  • subshell: $(command)
  • loops (for; do; done) (while; do; done)
  • conditionals (if; then; fi)
  • scripts: shebang
  • return values: $?
  • script-arguments: $1, $#, $@ and getopt
  • command chaining: ;, &, && and ||
  • functions and function-arguments
  • math: $((1+2))
  • help: man and info

2 Notes

2.1 user-output

echo "foobar"
echo foobar
echo echo # second echo not executed but printed!

2.2 Pipes

  • basic way of passing info between programs
echo foobar | xargs echo
# same output as
echo foobar
echo foo > test.txt # pipe into file, replacing the content
echo bar >> test.txt # append to file
# warning: 
cat test.txt > test.txt # defined as generating an empty file!

2.3 text-processing

echo foobar | sed s/foo.*/foo/ | xargs echo
# same output as 
echo foo
echo foo | grep bar # empty
echo foobar | grep oba # foobar, oba higlighted

2.4 Variables

foo=1 # no spaces around the equal sign!
echo ${foo} # "$foo" == "1", "$foobar" == "", "${foo}bar" == "1bar"

2.5 Subshells

echo $(echo foobar)
# equivalent to
echo foobar | xargs echo

2.6 loops

for i in a b c; do 
    echo $i
done
# ; can replace a linebreak
for i in a b c; do echo $i; done
for i in {1..5}; do # 1 2 3 4 5
    echo $i
done
while true; do 
    break; 
done
# break: stop
# continue: start the loop again

2.7 Quoting

foo=1
echo "${foo}" # 1
echo '${foo}' # ${foo} <- literal string
for i in "a b c"; do # quoted: one argument
    echo ${i}; 
done 
# => a b c
for i in a b c; do # unquoted: whitespace is separator!
    echo ${i}; 
done 
# a
# b
# c

2.8 conditionals

# string equality
a="foo"
b="bar"
if [[ x"${a}" == x"${b}" ]] ; then
    echo a
else
    echo b
fi
# other tests
if test -z ""; then 
    echo empty
fi
if [ -z "" ]; then
    echo same check
fi
if [ ! -z "not empty" ]; then
    echo inverse check
fi
if test ! -z "not empty"; then
    echo inverse check with test
fi
if test 5 -ge 2; then
    echo 5 is greater or equal 2
fi

also check test 1 -eq 1, and info test.

2.9 scripts: shebang/hashbang

#!/usr/bin/env bash
echo "Hello World"
chmod +x hello.sh
./hello.sh

2.10 Scripts: return value

echo 1
echo $? # 0: success
grep 1 /dev/null # fails
echo $? # 1: failure
exit 0 # exit a script with success value (no further processing of the script)
exit 1 # exit with failure (anything but 0 is a failure)

2.11 define shell arguments with getopt

# info about this script
version="shell option parsing example 0.1"
# check for the kind of getopt
getopt -T > /dev/null
if [ $? -eq 4 ]; then
    # GNU enhanced getopt is available
    eval set -- `getopt --name $(basename $0) --long help,verbose,version,output: --options hvo: -- "$@"`
else
    # Original getopt is available
    eval set -- `getopt hvo: "$@"`
fi

# # actually parse the options
# PROGNAME=`basename $0`
# ARGS=`getopt --name "$PROGNAME" --long help,verbose,version,output: --options hvo: -- "$@"`
# if [ $? -ne 0 ]; then
#   exit 1
# fi
# eval set -- $ARGS

# default options
HELP=no
verbose=no
VERSION=no
OUTPUT=no

# check, if the default wisp exists and can be executed. If not, fall
# back to wisp.py (which might be in PATH).
if [ ! -x $WISP ]; then
    WISP="wisp.py"
fi

while [ $# -gt 0 ]; do
    case "$1" in
        -h | --help)        HELP=yes;;
        -o | --output)      OUTPUT="$2"; shift;;
        -v | --verbose)     VERBOSE=yes;;
        --version)          VERSION=yes;;
        --)              shift; break;;
    esac
    shift
done
# all other arguments stay in $@
<<using-options>>

2.12 act on options

# Provide help output

if [[ $HELP == "yes" ]]; then
    echo "$0 [-h] [-v] [-o FILE] [- | filename]
        Show commandline option parsing.

        -h | --help)        This help output.
        -o | --output)      Save the executed wisp code to this file.
        -v | --verbose)     Provide verbose output.
        --version)          Print the version string of this script.
"
    exit 0
fi

if [[ x"$VERSION" == x"yes" ]]; then
    echo "$version"
    exit 0 # script ends here
fi

if [[ ! x"$OUTPUT" == x"no" ]]; then
    echo writing to $OUTPUT
fi

# just output all other arguments
if [ $# -gt 0 ]; then
    echo $@
fi

2.13 default help output formatting

prog [OPTIONAL_FLAG] [OPTIONAL_ARGUMENT VALUE] REQUIRED_ARGUMENT...
# ... means that you can specify something multiple times
# short and long options
prog [-h | --help] [-v | --verbose] [--version] [-f FILE | --file FILE] 
# concatenated short options
hg help [-ec] [THEMA] # hg help -e -c == -ec

2.14 Common parameters for commands

prog --help # provide help output. Often also -h
prog --version # version of the program. Often also -v
prog --verbose # often to give more detailed information. Also --debug

By convention and the minimal GNU standards

2.15 Command chaining

echo 1 ; echo 2 ; echo 3 # sequential
echo 1 & echo 2 & echo 3 # backgrounding: possibly parallel

grep foo test.txt && echo foo is in test.txt # conditional: Only if grep is successful
grep foo test.txt || echo foo is not in test.txt # conditional: on failure

2.16 Math (bash-builtin)

echo $((1+2)) # 3
a=2
b=3
echo $((a*b)) # 6
echo $((a**$(echo 3))) # 8

2.17 help

man [command]
info [topic]
info [topic subtopic]
# emacs: C-h i

more convenient info:

function i()
{
    if [[ "$1" == "info" ]]; then
        info --usage -f info-stnd
    else
        # check for usage from fast info, if that fails check man and if that also fails, just get the regular info page.
        info --usage -f "$@" 2>/dev/null || man "$@" || info "$@"
    fi
}

turn files with wikipedia syntax to html (simple python script using mediawiki api)

I needed to convert a huge batch of mediawiki-files to html (had a 2010-03 copy of the now dead limewire wiki lying around). With a tip from RoanKattouw in #mediawiki@freenode.net I created a simple python script to convert arbitrary files from mediawiki syntax to html.

Usage:

  • Download the script and install the dependencies (yaml and python 3).
  • ./parse_wikipedia_files_to_html.py <files>

This script is neither written for speed or anything (do you know how slow a webrequest is, compared to even horribly inefficient code? …): The only optimization is for programming convenience — the advantage of that is that it’s just 47 lines of code :)

It also isn’t perfect: it breaks at some pages (and informs you about that).

It requires yaml and Python 3.x.

#!/usr/bin/env python3

"""Simply turn all input files to html. 
No errorchecking, so keep backups. 
It uses the mediawiki webapi, 
so you need to be online.

Copyright: 2010 © Arne Babenhauserheide
License: You can use this under the GPLv3 or later, 
         if you add the appropriate license files
         → http://gnu.org/licenses/gpl.html
"""

from urllib.request import urlopen
from urllib.parse import quote
from urllib.error import HTTPError, URLError
from time import sleep
from random import random
from yaml import load
from sys import argv

mediawiki_files = argv[1:]

def wikitext_to_html(text):
    """parse text in mediawiki markup to html."""
    url = "http://en.wikipedia.org/w/api.php?action=parse&format=yaml&text=" + quote(text, safe="") + " "
    f = urlopen(url)
    y = f.read()
    f.close()
    text = load(y)["parse"]["text"]["*"]
    return text

for mf in mediawiki_files:
    with open(mf) as f:
        text = f.read()
    HTML_HEADER = "<html><head><title>" + mf + "</title></head><body>"
    HTML_FOOTER = "</body></html>"
    try: 
        text = wikitext_to_html(text)
        with open(mf, "w") as f:
            f.write(HTML_HEADER)
            f.write(text)
            f.write(HTML_FOOTER)
    except HTTPError:
        print("Error converting file", mf)
    except URLError:
        print("Server doesn’t like us :(", mf)
        sleep(10*random())
    # add a random wait, so the api server doesn’t kick us
    sleep(3*random())
AnhangGröße
parse_wikipedia_files_to_html.py.txt1.47 KB

wisp: Whitespace to Lisp

» I love the syntax of Python, but crave the simplicity and power of Lisp.«

display "Hello World!" ↦  (display "Hello World!")
define : factorial n      (define (factorial n)            
    if : zero? n       ↦      (if (zero? n)                
       . n                       n                      
      * n : factorial {n - 1}    (* n (factorial {n - 1}))))

Wisp basics

Update (2015-04-10): wisp v0.8.3 released with line information in backtraces. For more info, see the NEWS file.To test it, install Guile 2.0.x or 2.2.x and Python 3 and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.3.tar.gz;
tar xf wisp-0.8.3.tar.gz ; cd wisp-0.8.3/;
./configure; make check;
guile -L . --language=wisp tests/factorial.w; echo
If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2015-03-18): wisp v0.8.2 released with reader bugfixes, new examples and an updated draft for SRFI 119 (wisp). For more info, see the NEWS file.To test it, install Guile 2.0.x or 2.2.x and Python 3 and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.2.tar.gz;
tar xf wisp-0.8.2.tar.gz ; cd wisp-0.8.2/;
./configure; make check;
guile -L . --language=wisp tests/factorial.w; echo
If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2015-02-03): The wisp SRFI just got into draft state: SRFI-119 — on its way to an official Scheme Request For Implementation!
Update (2014-11-19): wisp v0.8.1 released with reader bugfixes. To test it, install Guile 2.0.x and Python 3 and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.1.tar.gz;
tar xf wisp-0.8.1.tar.gz ; cd wisp-0.8.1/;
./configure; make check;
guile -L . --language=wisp tests/factorial.w; echo
If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational.
That’s it - have fun with wisp syntax!
Update (2014-11-06): wisp v0.8.0 released! The new parser now passes the testsuite and wisp files can be executed directly. For more details, see the NEWS file. To test it, install Guile 2.0.x and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.8.0.tar.gz;
tar xf wisp-0.8.0.tar.gz ; cd wisp-0.8.0/;
./configure; make check;
guile -L . --language=wisp tests/factorial.w;
echo
If it prints 120120 (two times 120, the factorial of 5), your wisp is fully operational.
That’s it - have fun with wisp syntax!
On a personal note: It’s mindboggling that I could get this far! This is actually a fully bootstrapped indentation sensitive programming language with all the power of Scheme underneath, and it’s a one-person when-my-wife-and-children-sleep sideproject. The extensibility of Guile is awesome!
Update (2014-10-17): wisp v0.6.6 has a new implementation of the parser which now uses the scheme read function. `wisp-scheme.w` parses directly to a scheme syntax-tree instead of a scheme file to be more suitable to an SRFI. For more details, see the NEWS file. To test it, install Guile 2.0.x and bootstrap wisp:
wget https://bitbucket.org/ArneBab/wisp/downloads/wisp-0.6.6.tar.gz;
tar xf wisp-0.6.6.tar.gz; cd wisp-0.6.6;
./configure; make;
guile -L . --language=wisp
That’s it - have fun with wisp syntax at the REPL!
Caveat: It does not support the ' prefix yet (syntax point 4).
Update (2014-01-04): Resolved the name-clash together with Steve Purcell und Kris Jenkins: the javascript wisp-mode was renamed to wispjs-mode and wisp.el is called wisp-mode 0.1.5 again. It provides syntax highlighting for Emacs and minimal indentation support via tab. You can install it with `M-x package-install wisp-mode`
Update (2014-01-03): wisp-mode.el was renamed to wisp 0.1.4 to avoid a name clash with wisp-mode for the javascript-based wisp.
Update (2013-09-13): Wisp now has a REPL! Thanks go to GNU Guile and especially Mark Weaver, who guided me through the process (along with nalaginrut who answered my first clueless questions…).
To test the REPL, get the current code snapshot, unpack it, run ./bootstrap.sh, start guile with $ guile -L . (requires guile 2.x) and enter ,language wisp.
Example usage:
display "Hello World!\n"
then hit enter thrice.
Voilà, you have wisp at the REPL!
Caveeat: the wisp-parser is still experimental and contains known bugs. Use it for testing, but please do not rely on it for important stuff, yet.
Update (2013-09-10): wisp-guile.w can now parse itself! Bootstrapping: The magical feeling of seeing a language (dialect) grow up to live by itself: python3 wisp.py wisp-guile.w > 1 && guile 1 wisp-guile.w > 2 && guile 2 wisp-guile.w > 3 && diff 2 3. Starting today, wisp is implemented in wisp.
Update (2013-08-08): Wisp 0.3.1 released (Changelog).

Table of Contents

2 What is wisp?

Wisp is a simple preprocessor which turns indentation sensitive syntax into Lisp syntax.

The basic goal is to create the simplest possible indentation based syntax which is able to express all possibilities of Lisp.

Basically it works by inferring the brackets of lisp by reading the indentation of lines.

It is related to SRFI-49 and the readable Lisp S-expressions Project (and actually inspired by the latter), but it tries to Keep it Simple and Stupid. Instead of a full alternate reader like readable, it is a simple preprocessor which can be called by any lisp implementation to add support for indentation sensitive syntax.

Just call ./wisp.py –help to see what you can do with it (`./wisp.py -` takes its input from stdin, so it can be used with pipes):

./wisp.py --help
Usage: [-o outfile] [file | -]

Options:
  -h, --help            show this help message and exit
  -o OUTPUT, --output=OUTPUT

Currently wisp is implemented in Python, because that’s the language which I know best and which inspired my wish to use indentation-sensitive syntax in Lisp. To repeat the initial quote:

I love the syntax of Python, but crave the simplicity and power of Lisp.

With wisp I hope to make it possible to create lisp code which is easily readable for non-programmers (and me!) and at the same time keeps the simplicity and power of Lisp.

Its main technical improvements over SRFI-49 and Project Readable are using lines prefixed by a dot (". ") to mark the continuations of the parameters of a function after intermediate function calls and working as a simple preprocessor which can be used with any flavor of Lisp.

The dot-syntax means, instead of marking every function call, it marks every line which does not begin with a function call - which is the much less common case in lisp-code.

3 Wisp syntax rules

  1. A line without indentation is a function call, just as if it would start with a bracket.
    display "Hello World!"      ↦      (display "Hello World!")
    

     
  2. A line which is more indented than the previous line is a sibling to that line: It opens a new bracket.
    display                              ↦    (display
      string-append "Hello " "World!"    ↦      (string-append "Hello " "World!"))
    

     
  3. A line which is not more indented than previous line(s) closes the brackets of all previous lines which have higher or equal indentation. You should only reduce the indentation to indentation levels which were already used by parent lines, else the behaviour is undefined.
    display                              ↦    (display
      string-append "Hello " "World!"    ↦      (string-append "Hello " "World!"))
    display "Hello Again!"               ↦    (display "Hello Again!")
    

     
  4. To add any of ' , or ` to a bracket, just prefix the line with any combination of "' ", ", " or "` " (symbol followed by one space).
    ' "Hello World!"      ↦      '("Hello World!")
    

     
  5. A line whose first non-whitespace characters are a dot followed by a space (". ") does not open a new bracket: it is treated as simple continuation of the first less indented previous line. In the first line this means that this line does not start with a bracket and does not end with a bracket, just as if you had directly written it in lisp without the leading ". ".
    string-append "Hello"        ↦    (string-append "Hello"
      string-append " " "World"  ↦      (string-append " " "World")
      . "!""!")
    

     
  6. A line which contains only whitespace and a colon (":") defines an indentation level at the indentation of the colon. It opens a bracket which gets closed by the next less-indented line. If you need to use a colon by itself. you can escape it as "\:".
    let                       ↦    (let
      :                       ↦      ((msg "Hello World!"))
        msg "Hello World!"    ↦      (display msg))
      display msg             ↦      
    

     
  7. A colon sourrounded by whitespace (" : ") starts a bracket which gets closed at the end of the line.
    define : hello who                    ↦    (define (hello who)
      display                             ↦      (display 
        string-append "Hello " who "!"    ↦        (string-append "Hello " who "!")))
    

     
  8. You can replace any number of consecutive initial spaces by underscores, as long as at least one whitespace is left between the underscores and any following character. You can escape initial underscores by prefixing the first one with \ ("\___ a" → "(___ a)"), if you have to use them as function names.
    define : hello who                    ↦    (define (hello who)
    _ display                             ↦      (display 
    ___ string-append "Hello " who "!"    ↦        (string-append "Hello " who "!")))
    

     

To make that easier to understand, let’s just look at the examples in more detail:

3.1 A simple top-level function call

display "Hello World!"      ↦      (display "Hello World!")

This one is easy: Just add a bracket before and after the content.

3.2 Multiple function calls

display "Hello World!"      ↦      (display "Hello World!")
display "Hello Again!"      ↦      (display "Hello Again!")

Multiple lines with the same indentation are separate function calls (except if one of them starts with ". ", see Continue arguments, shown in a few lines).

3.3 Nested function calls

display                              ↦    (display
  string-append "Hello " "World!"    ↦      (string-append "Hello " "World!"))

If a line is more indented than a previous line, it is a sibling to the previous function: The brackets of the previous function gets closed after the (last) sibling line.

3.4 Continue function arguments

By using a . followed by a space as the first non-whitespace character on a line, you can mark it as continuation of the previous less-indented line. Then it is no function call but continues the list of parameters of the funtcion.

I use a very synthetic example here to avoid introducing additional unrelated concepts.

string-append "Hello"        ↦    (string-append "Hello"
  string-append " " "World"  ↦      (string-append " " "World")
  . "!""!")

As you can see, the final "!" is not treated as a function call but as parameter to the first string-append.

This syntax extends the notion of the dot as identity function. In many lisp implementations1 we already have `(= a (. a))`.

= a        ↦    (= a
  . a      ↦      (. a))

With wisp, we extend that equality to `(= '(a b c) '((. a b c)))`.

. a b c    ↦    a b c

3.5 Double brackets (let-notation)

If you use `let`, you often need double brackets. Since using pure indentation in empty lines would be really error-prone, we need a way to mark a line as indentation level.

To add multiple brackets, we use a colon to mark an intermediate line as additional indentation level.

let                       ↦    (let
  :                       ↦      ((msg "Hello World!"))
    msg "Hello World!"    ↦      (display msg))
  display msg             ↦      

3.6 One-line function calls inline

Since we already use the colon as syntax element, we can make it possible to use it everywhere to open a bracket - even within a line containing other code. Since wide unicode characters would make it hard to find the indentation of that colon, such an inline-function call always ends at the end of the line. Practically that means, the opened bracket of an inline colon always gets closed at the end of the line.

define : hello who                            ↦    (define (hello who)
  display : string-append "Hello " who "!"    ↦      (display (string-append "Hello " who "!")))

This also allows using inline-let:

let                       ↦    (let
  : msg "Hello World!"    ↦      ((msg "Hello World!"))
  display msg             ↦      (display msg))

and can be stacked for more compact code:

let : : msg "Hello World!"     ↦    (let ((msg "Hello World!"))
  display msg                  ↦      (display msg))

3.7 Visible indentation

To make the indentation visible in non-whitespace-preserving environments like badly written html, you can replace any number of consecutive initial spaces by underscores, as long as at least one whitespace is left between the underscores and any following character. You can escape initial underscores by prefixing the first one with \ ("\___ a" → "(___ a)"), if you have to use them as function names.

define : hello who                    ↦    (define (hello who)
_ display                             ↦      (display 
___ string-append "Hello " who "!"    ↦        (string-append "Hello " who "!")))

4 Syntax justification

I do not like adding any unnecessary syntax element to lisp. So I want to show explicitely why the syntax elements are required to meet the goal of wisp: indentation-based lisp with a simple preprocessor.

4.1 . (the dot)

We have to be able to continue the arguments of a function after a call to a function, and we must be able to split the arguments over multiple lines. That’s what the leading dot allows. Also the dot at the beginning of the line as marker of the continuation of a variable list is a generalization of using the dot as identity function - which is an implementation detail in many lisps.

`(. a)` is just `a`.

So for the single variable case, this would not even need additional parsing: wisp could just parse ". a" to "(. a)" and produce the correct result in most lisps. But forcing programmers to always use separate lines for each parameter would be very inconvenient, so the definition of the dot at the beginning of the line is extended to mean “take every element in this line as parameter to the parent function”.

Essentially this dot-rule means that we mark variables at the beginning of lines instead of marking function calls, since in Lisp variables at the beginning of a line are much rarer than in other programming languages. In lisp assigning a value to a variable is a function call while it is a syntax element in many other languages, so what would be a variable at the beginning of a line in other languages is a function call in lisp.

(Optimize for the common case, not for the rare case)

4.2 : (the colon)

For double brackets and for some other cases we must have a way to mark indentation levels without any code. I chose the colon, because it is the most common non-alpha-numeric character in normal prose which is not already reserved as syntax by lisp when it is surrounded by whitespace, and because it already gets used for marking keyword arguments to functions in Emacs Lisp, so it does not add completely alien characters.

The function call via inline " : " is a limited generalization of using the colon to mark an indentation level: If we add a syntax-element, we should use it as widely as possible to justify the added syntax overhead.

But if you need to use : as variable or function name, you can still do that by escaping it with a backslash (example: "\:"), so this does not forbid using the character.

4.3 _ (the underscore)

In Python the whitespace hostile html already presents problems with sharing code - for example in email list archives and forums. But in Python the indentation can mostly be inferred by looking at the previous line: If that ends with a colon, the next line must be more indented (there is nothing to clearly mark reduced indentation, though). In wisp we do not have this help, so we need a way to survive in that hostile environment.

The underscore is commonly used to denote a space in URLs, where spaces are inconvenient, but it is rarely used in lisp (where the dash ("-") is mostly used instead), so it seems like a a natural choice.

You can still use underscores anywhere but at the beginning of the line. If you want to use it at the beginning of the line you can simply escape it by prefixing the first underscore with a backslash (example: "\___").

5 Background

A few months ago I found the readable Lisp project which aims at producing indentation based lisp, and I was thrilled. I had already done a small experiment with an indentation to lisp parser, but I was more than willing to throw out my crappy code for the well-integrated parser they had.

Fast forward half a year. It’s February 2013 and I started reading the readable list again after being out of touch for a few months because the birth of my daughter left little time for side-projects. And I was shocked to see that the readable folks had piled lots of additional syntax elements on their beautiful core model, which for me destroyed the simplicity and beauty of lisp. When language programmers add syntax using \\, $ and <>, you can be sure that it is no simple lisp anymore. To me readability does not just mean beautiful code, but rather easy to understand code with simple concepts which are used consistently. I prefer having some ugly corner cases to adding more syntax which makes the whole language more complex.

I told them about that and proposed a simpler structure which achieved almost the same as their complex structure. To my horror they proposed adding my proposal to readable, making it even more bloated (in my opinion). We discussed a long time - the current syntax for inline-colons is a direct result of that discussion in the readable list - then Alan wrote me a nice mail, explaining that readable will keep its direction. He finished with «We hope you continue to work with or on indentation-based syntaxes for Lisp, whether sweet-expressions, your current proposal, or some other future notation you can develop.»

It took me about a month to answer him, but the thought never left my mind (@Alan: See what you did? You anchored the thought of indentation based lisp even deeper in my mind. As if I did not already have too many side-projects… :)).

Then I had finished the first version of a simple whitespace-to-lisp preprocessor.

And today I added support for reading indentation based lisp from standard input which allows actually using it as in-process preprocessor without needing temporary files, so I think it is time for a real release outside my Mercurial repository.

So: Have fun with wisp v0.2 (tarball)!

PS: If you want to run wisp code pseudo-directly, you can use the following script:

#!/bin/sh
~/path/to/wisp.py -o /tmp/wisptmp.scm $@ && guile -l ~/.guile -s /tmp/wisptmp.scm

PPS: Wisp is linked in the comparisions of SRFI-110.

Freenet

“When free speech dies, we need a place to organize”

Freenet is a censorship resistant, distributed p2p-publishing platform.

It lets you anonymously share files, browse and publish “freesites”, chat on forums and even do microblogging, using a generic Web of Trust, shared by different plugins, to avoid spam. For really careful people it offers a “darknet” mode, where users only connect to their friends, with which it is very hard to detect that they are running freenet.

The overarching design goal of freenet is to make censorship as hard as technically possible. That’s the reason for providing anonymity (else you could be threatened with repercussions - as seen in the case of the wikileaks informer from the army in the USA), building it as a decentral network (else you could just shut down the central website, as people tried with wikileaks), providing safe pseudonyms and caching of the content on all participating nodes (else people could censor by spamming or overloading nodes) and even the darknet mode and enhancements in usability (else freenet could be stopped by just prosecuting everyone who uses it, or it would reach too few people to be able to counter censorship in the open web).

I don’t know anymore what triggered my use of freenet initially, but I know all too well what keeps me running it instead of other anonymizers:

I see my country (Germany) turning more and more into a police-state, starting with attacks on p2p, continuing with censorship of websites (“we all know child-porn is bad, so it can’t be bad to censor it, right? Sure we could just make the providers delete it, so noone can access it, but… no, we have to censor it, so only people who can use google can find it – which luckily excludes us, because we are not pedocriminals.”) and leading into directions I really don’t like.

And in case the right for freedom of speech dies, we need a place where we can organize to get it back and fight for the rights laid out in our constitution (the Grundgesetz).

And that’s what Freenet is to me.

A technical way to make sure we can always organize acting by section 20 of our constitution (german link — google translated version): the right to oppose everyone who wants to abolish our constitutional order.

PS: New entries on my site are also available in freenet (via freereader: downloads RSS feeds and republishes them in freenet).

PPS: If you like this text, please redent/retweet the associated identi.ca/twitter notices so it spreads:

  • https://identi.ca/notice/46221737
  • https://twitter.com/ArneBab/status/21217822748

50€ for the Freenet Project - and against censorship

As I pledged1, I just donated to freenet 50€ of the money I got back because I cannot go to FilkCONtinental. Thanks go to Nemesis, a proud member of the “FiB: Filkers in Black” who will take my place at the Freusburg and fill these old walls with songs of stars and dreams - and happy laughter.

It’s a hard battle against censorship, and as I now had some money at hand, I decided to do my part (freenetproject.org/donate.html).


  1. The pledge can be seen in identi.ca and in a Sone post in freenet (including a comment thread; needs a running freenet node (install freenet in a few clicks) and the Sone plugin). 

A bitcoin-marketplace using Freenet?

A few days ago, xor, the developer of the Web of Trust in Freenet got in contact with the brain behind the planned Web of Trust for Openbazaar, and toad, the former maintainer of Freenet questioned whether we would actually want a marketplace using Freenet.

I took a a few days to ponder the question, and I think a marketplace using Freenet would be a good idea - for Freenet as well as for society.

Freenet is likely the most secure way for implementing a digital market, which means it can work safely for small sums, but not for large ones - except if you can launder huge amounts of digital money. As such it is liberating for small people, but not for syndicates. For example a drug cartel needs to be able to turn lots of money into clean cash to pay henchmen abroads. Since you can watch bitcoin more easily than cash and an anonymous network makes it much harder to use scare-tactics against competing sellers, moving the marketplace from the street to the internet weakens syndicates and other organized crime by removing part of their options for creating a monopoly by force.

If a bitcoin marketplace with some privacy for small-scale users should become a bigger problem than the benefit it brings by weakening organized crime, any state or other big player can easily force the majority of users to reveal their identities by using the inherent tracability of bitcoin transactions.

Also the best technologies in freenet were developed (or rather: got to widespread use), because it had to actually withstand attacks.

Freenet as marketplace with privacy for small people equivalent to cash-payments would also help improve its suitability for whistleblowers - see hiding in the forest: A better alternative.

For free speech this would also help, because different from other solutions, freenet has the required properties for that: a store with lifetime depending on the popularity of content, not the power of the publisher, which provides DoS-resistant hosting without the need to have a 24/7 server, stable and untraceable pseudonyms (ignoring fixable attack-vectors) and an optional friend-to-friend darknet.

In short: A decentralized ebay-killer would be cool and likely beneficial to Freenet and Free Speech without bringing actual benefit for organized crime.

Also this might be what is needed to bring widespread darknet adoption.

And last but not least, we would not be able to stop people from implementing a marketplace over freenet: Censorship resistance also means resistance against censorship by us.

Final note: Openbazaar is written in Python and Freenet has decent Python Bindings (though they are not beautiful everywhere), so it should not be too hard to use it for Openbazaar. A good start could be the WoT-code written for Infocalypse in last years GSoC: Web of Trust integration as well as private messaging.

AnhangGröße
freenet_logo.png16.72 KB
freenet-banner.png3.39 KB

A vision for a social Freenet with WoT, FreeTalk and Sone

I let my thought wander a bit around the question how a social Freenet (2.0 ;) ) could look from the view of a newcomer.

I imagine myself installing freenet. The first thing to come up after starting it is the node page. (italic Text in brackets is a comment. The links need a Freenet running on 127.0.0.1 to work)


“Welcome to Freenet, where no one can tell you’re reading”

“Freenet tries hard to project your privacy. Therefore we created a pseudonymous ID for you. Its name is Gandi Schmidt. Visit the [your IDs site] to see a legend we prepared for you. You can use this legend as fictional background for your ID, if you are really serious about staying anonymous.”

(The name should be generated randomly for each ID. A starting point for that could be a list of scientists from around the world compiled from the wikipedia (link needs freenet). The same should be true for the legend, though it is harder to generate. The basic information should be a quote (people remember that), a job and sex, the country the ID comes from (maybe correlated with the name) and a hobby.)

“During the next few restarts, Freenet will ask you to solve various captchas to prove that you are indeed human. Once enough other nodes successfully confirmed that you are human, you will gain write access to the forums and microblogging. This might take a few hours to a few days.”

(as soon as the ID has sufficient trust, automatically activate posting to FreeTalk, Sone and others. Access is delayed to ensure that when people talk they can get answers)

“Note that other nodes don’t know who you are. They don’t know your IP, nor your real identity. The only thing they know is that you exist, that you can solve captchas and how to send you a message.”

“You can create additional IDs at any time and give them any name and legend you choose by adding it on the WebOfTrust-page. Each new ID has to verify for itself that it’s human, though. If you carefully keep them seperate, others can only find out with a lot of effort that your IDs are related. Mind your writing style. In doubt, keep your sentences short. To make it easier for you to stay anonymous, you can autogenerate Name and Legend at random. Don’t use the nicest from many random trials, else you can be traced by the kind of random IDs you select.”

“While your humanity is being confirmed, you can find a wealth of content on the following indexes, some published anonymously, some not. If you want to publish your own anonymous site, see Upload a Freesite. The list of indexes uses dynamic bookmarks. You get notified whenever a bookmarked site (like the indexes below) gets updated.”

“Note: If you download content from freenet, it is being cached by other nodes. Therefore popular content is faster than rare content and you cannot overload nodes by requesting their data over and over again.”

“You are currently using medium security in the range from low to high.”

“In this security level, seperated IDs are no perfect protection of your anonymity, though, since other members might not be able to see what you do in Freenet, but they can know that you use freenet in the first place, and corporations or governments with medium sized infrastructure can launch attacks which might make it possible to trace your contributions and accesses. If you want to disappear completely from the normal web and keep your freenet usage hidden, as well as make it very hard to trace your contributions, to be able to really exercise your right of free speech without fearing repercussions, you can use Freenet as Darknet — the more secure but less newcomer friendly way to use freenet; the current mode is Opennet.”

“To enter the Darknet, you add people you know and trust personally as your darknet friends. As soon as you have enough trusted friends, you can increase the security level to high and freenet will only connect to your trusted friends, making you disappear from the regular internet. The only way to tell that you are using freenet will then be to force your ISP to monitor all traffic coming from your computer.”

“And once transport plugins are integrated, steganography will come into reach and allow masking your traffic as regular internet usage, making it very hard to distinguish freenet from encrypted internet-telephony. If you want to help making this a reality in the near future, please consider contributing or donating to freenet.”

“Welcome to the pseudonymous web where no one can know who you are, but only that you are always using the same ID — if you do so.”

“To show this welcome message again, you can at any time click on Intro in the links.”


What do you think? Would this be a nice way to integrate WoT, FreeTalk, Sone and general user education in a welcome message, while adding more incentive to keep the node running?

PS: Also posted in the Freenet Bugtracker, in Freetalk and in Sone – the last two links need a running Freenet to work.

PPS: This vision is not yet a reality, but all the necessary infrastructure is already in place and working in Freenet. You can already do everything described in here, just without the nice guide and the level of integration (for example activating plugins once you have proven your humanity, which equals enough trust by others to be actually seen).

Anonymous code collaboration with Mercurial and Freenet

Anonymous DVCS in the Darknet.

There is a new Mercurial extension for interaction with Freenet called "infocalypse" (which should keep working after the information apocalypse).

It offers "fn-push" and "fn-pull" as an optimized way to store code in freenet: bundles are inserted and pulled one after the other. An index tells infocalypse in which order to pull the bundles. It makes using Mercurial in freenet far more efficient and convenient.

easy setup of infocalypse (script)
distributed, anonymous development

Also you can use it to publish collaborative anonymous websites like the freefaq and Technophob.

And it is a perfect fit for the workflow automatic trusted group of committers.

Otherwise it offers the same features as FreenetHG.


The rest of the article is concerned with the older FreenetHG extension. If you need to choose between the two, use Infocalypse: It’s concept for sharing over Freenet is more robust.


Using FreenetHG you can collaborate anonymously without having to give everyone direct write access to your code.

To work with others, you simply setup a local repository for your own work and use FreenetHG to upload your code automatically into Freenet under your private ID. Others can then access your code with the corresponding public ID, do their changes locally and publish them in their own anonymous repository.

You then pull changes you like into your repository and publish them again under your key.

FreenetHG uses freenet which offers the concept of pseudonymity to make anonymous communication more secure and Mercurial to allow for efficient distributed collaboration.

With pseudonymity you can't find out whom you're talking to, but you know that it is the same person, and with distibuted collaboration you don't need to let people write to your code directly, since every code repository is a full clone of the main repository.

Even if the main repository should go down, every contributor can still work completely unhindered, and if someone else breaks things in his repository, you can simply decide not to pull the changes from him.

What you need

To use FreenetHG you obviously need a running freenet node and a local Mercurial installation. Also you need the FreenetHG plugin for Mercurial and PyFCP which provides Python bindings for Freenet.

  • get FreenetHG (the link needs a running freenet node on 127.0.0.1)
  • alternatively just do

    hg clone static-http://127.0.0.1:8888/USK@fQGiK~CfI8zO4cuNyhPRLqYZ5TyGUme8lMiRnS9TCaU,E3S1MLoeeeEM45fDLdVV~n8PCr9pt6GMq0tuH4dRP7c,AQACAAE/freenethg/1/

Setup a simple anonymous workflow

To guide you through the steps, let's assume we want to create the anonymous repository "AnoFoo".

After you got all dependencies, you need to activate the FreenetHG plugin in your ~/.hgrc file

[extensions]
freenethg = path/to/FreenetHG.py

You can get the FreenetHG.py from the freenethg website or from the Mercurial repository you cloned.

Now you setup your anofoo Mercurial repository:

hg init AnoFoo

As a next step we create some sections in the .hg/hgrc file in the repository:

[ui]

[freenethg]

[hooks]

Now we enter the repository and use the setup wizard

cd AnoFoo
hg fcp-setupwitz

The setup wizard asks us for your username to use for this repository (to avoid accidently breaking our anonymity), the address to our freenet instance and for the path to our repository on freenet.

The default answers should fit. The only one where we have to set something else is the project name. There we enter AnoFoo.

Since we don't yet have a freenet URI for the repository, we just answer '.' to let FreenetHG generate one for us. That's also the default answer.

The commit hook makes sure that we don't commit with another but the selected username.

Also the wizard will print a line like the following:

Request uri is: USK@xlZb9yJbGaKO1onzwawDvt5aWXd9tLZRoSoE17cjXoE,zFqFxAk15H-NvVnxo69oEDFNyU9uNViyNN5ANtgJdbU,AQACAAE/freenethg_test/1/

This is the line others can use to clone your project and pull from it.

And with this we finished setting up our anonymous collaboration repository.

When we commit, every commit will directly be uploaded into Freenet.

So now we can pass the freenet Request uri to others who can clone our repository and setup their own repositories in freenet. When they add something interesting, we then pull the data from their Request uri and merge their code with ours.

Setup a more convenient anonymous workflow

This workflow is already useful, but it's a bit inconvenient to have to wait after each commit until your changes have been uploaded. So we'll now change this basic workflow a bit to be able to work more conveniently.

First step: clone our repositories to a backup location:

hg clone AnoFoo BackFoo

Second step: change our .hg/hgrc to only update when we push to the backup repository, and add the default-push path to the backup repository:

[paths]
default-push = ../BackFoo

[hooks]                                                               
pretxncommit = python:freenethg.username_checker                      
outgoing = python:freenethg.updatestatic_hook                           

[ui]
username = anonymuse

[freenethg]
commitusername = anonymuse
inserturi = USK@VERY_LONG_PRIVATE_KEY/AnoFoo/1/

Changes: We now have a default-push path, and we changed the "commit" hook to an "outgoing" hook which is evoked everytime changes leave this repository. It will also be evoked when someone pulls from this repo, but not when we clone it locally.

Now our commits roll as fast as we're used to from other Mercurial repositories and freenethg will make sure we don't use the wrong username.

When we want to anonymously publish the repository we then simply use

hg push

This will push the changes to the backup and then upload it to your anonymous repository.

And now we finished setting up our reopsitory and can begin using an anonymous and almost infinitely scaleable workflow which only requires our freenet installation to be running when we push the code online.

One last touch: If an upload should chance to fail, you can always repeat it manually with

hg fcp-uploadstatic

Time to go

...out there and do some anonymous coding (Maybe with the workflow automatic trusted group of committers).

Happy hacking!

And if this post caught your interest or you want to say anything else about it, please write a comment.

Also please have a look at and vote for the wish to add a way to contribute anonymously to freenet, to make it secure against attacks on developers.

And last but not least: vote for this article on digg and on yigg.

Background of Freenet Routing and the probes project (GSoC 2012)

The probes project is a google summer of code project of Steve Dougherty intended to optimize the network structure of freenet. Here I will give the background of his project very briefly:

The Small World Structure

Freenet organizes nodes by giving them locations - like coordinates. The nodes know some others and can send data only to those, to which they are connected directly. If your node wants to contact someone it does not know directly, it sends a message to one of the nodes it knows and asks that one to forward the message. The decision whom to ask to forward the message is part of the routing.

And the routing algorithm in Freenet assumes a small world network: Your node knows many people who are close to you and a few who are far away. Imagine that as knowing many people in your home town and few in other towns. There is mathematical proof, that the routing is very efficient and scales to billions of users - if it really operates on a small world network.

So each freenet node tries to organize its connections in such a way, that it is connected to many nodes close by and some from far away.⁽¹⁾ The structure of the local connections of your own node can be characterized by the link length distribution: “How many short and how many long connections do you have?”

Probes and their Promise

The probes project from Steve is to analyze the structure of the network and the structure of the local connections of nodes in an anonymous way to improve the self-organization algorithm in freenet. The reason is that if the structure of the network is no small world network, the routing algorithm becomes much less efficient.

That in turn means that if you want to get some data on the network, that data has to travel over far more intermediate nodes, because freenet cannot determine the shortest route. And if the data has to travel over more nodes, it consumes more bandwidth and takes longer to reach you. In the worst case it could happen that freenet does not find the data at all.

To estimate the effect of that, you can look at the bar chart The Seeker linked to:

chart

Low is an ideal structure with 16 connections per node, Conforming is the measured structure with about 17 connections per node (a cluster with 12, one with ~25). Ideally we would want Normal with 26 connections per node and an ideal structure. High is 86 connections. The simulated network sizes are 6000 nodes (Small), 18 000 (Normal, as measured), 36 000 (Large). Fewer hops is better.

It shows how many steps a request has to take to find some content. “Conforming” is the actually measured structure. “low”, “normal” and “high” shows the number of connections per node in an optimal network: 16, 26 and 86. The actually measured mean number of connections in freenet is similar to “low”, so that’s the bar with which we need to compare the “confirming” bar to see the effect of the suboptimal structure. And that effect is staggering: By default a request needs about two times as many steps in the real world than it would need in an optimally structured network.

Practically: If freenet would manage to get closer to the optimal structure, it could double its speed and cut the reaction times by factor 2. Without changing anything else - and also without changing the local bandwidth consumption: You would simply get your content much faster.

If we would manage to increase the mean number of connections to about 26 (that’s what a modern DSL connection can manage without too many ill effects), we could double the speed and half the reaction times again (but that requires more bandwidth in the nodes who currently have a low number of connections: Many have only about 12 connections, many have about 25 or so, few have something in between).

Essentially that means we could gain factor 2 to factor 4 in speed and reaction times. And better scaleability (compare the normal and the large network).


Note ⁽¹⁾: Network Optimization using Only Local Knowledge

To achieve a good local connection-structure, the node can use different strategies for Opennet and Darknet (this section is mostly guessed, take it with a grain of salt. I did not read the corresponding code).

In Opennet it can look if it finds nodes which would improve its local structure. If it finds one, it can replaces the local connection, which distorts its local structure the most, with the new connection.

In Darknet on the other hand, where it can only connect to the folks it already knows, it looks for locations of nodes it hears about. It then checks if its local connection would be better if it had that other nodes location. In that case, it asks the other node if it would agree to swap its location with it (without changing any real connections: It only changes the notion where it lives. As if you would swap the flat with someone else but without changing who your friends are. Afterwards both the other one and you live closer to your respective friends).

In short: In Opennet, Freenet changes to whom it is connected in order to achieve a small world structure: It selects its friends based on where it lives. In Darknet it swaps its location with stranges to live be closer to its friends.

AnhangGröße
freenet-probes-size-degree-chart.png13.94 KB

Bootstrapping the Freenet WoT with GnuPG - and GnuPG with Freenet

Intro

When you enter the freenet Web of Trust, you first need to get some trust from people by solving captchas. And even when people trust you somehow, you have no way to prove your identity in an automatic way, so you can’t create identities which freenet can label as trusted without manual intervention from your side.

Proposal

To change this, we can use the Web of Trust used in GnuPG to infer trust relationships between freenet WoT IDs.

Practically that means:

  • Write a message: “I am the WoT ID USK@” (replace with the public key of your WoT ID).
  • Sign that message with a GnuPG key you want to connect to the ID. The signature proves, that you control the GnuPG key.
  • Upload the signed message to your WoT key: USK@/bootstrap/0/gnupg.asc. To make this upload, you need the private key of the ID, so the upload proves, that you control the WoT ID.

Now other people can download the file from you, and when they trust the GnuPG key, they can transfer their trust to the freenet WoT-ID.

Automatic

Ideally all this should be mostly automatic:

  • click a link in the freenet interface and select the WoT ID to have freenet create the file and run your local GnuPG program.
  • Then select your GnuPG key in the GnuPG program and enter your password.
  • Finally check the information to be inserted and press a button to start the upload.

As soon as you have a GnuPG key connected with your WoT ID, freenet should scout all other WoT IDs for gnupg keys and check if the local GnuPG key you assigned to your WoT ID trusts the other key. If yes, give automatic trust (real person → likely no spammer).

Anonymously

To make the connection one-way (bootstrap the WoT from GnuPG, but not expose the key), you might be able to encrypt the message to all people who signed your GnuPG key. Then these can recognize you, but others cannot.

This will lose you the indirect trust in the GnuPG web-of-trust, though.


I hope this bootstrap-WoT draft sounded interesting :)

Happy hacking!

De-Orchestrating Freenet with the QUEEN program

So Poul-Henning Kamp thought this just a thought experiment …

In Fosdem2014 Poul-Henning Kamp talked about a hypothetical “Project ORCHESTRA” by the NSA with the goal of disrupting internet security: Information, Slides, Video (with some gems not in the slides).

One of the ideas he mentioned was the QUEEN program: Psy-Ops for Nerds.

I’ve been a contributor to the Freenet Project for several years. And in that time, I experienced quite a few of these hypothetical tactics first-hand.

This is the list of good matches: Disruptive actions which managed to keep Freenet from moving onwards, often for several months. It’s quite horrifying how many there are. Things which badly de-orchestrated Freenet:

  • Steer discussions to/from hot spots (“it can’t be that hard to exchange a text file!” ⇒ noderef exchange fails all the time, which is the core of darknet!)
  • Disrupt consensus building: Horribly long discussions which cause the resolution to be forgotten due to a fringe issue.
  • “Secrecy without authentication is pointless”.
  • “It gives a false sense of security” (if you talor [these kind of things] carefully, they speak to people's political leanings: If it’s not perfect: “No, that wouldn’t do it”. This stopped many implementations, till finally Bombe got too fed up and started the simple and working microblogging tool Sone)
  • “you shouldn’t do that! Do you really know what you are doing? Do you have a PhD in that? The more buttons you press, the more warnings you get” ← this is “filter failed”: No, I don’t understand this, “get me out of that!” ⇒ Freenet downloads fail when the filter failed.
  • Getting people to not do things by misdirecting their attention on it. Just check the Freenet Bugtracker for unresolved simple bugs with completely fleshed out solutions that weren’t realized.
  • FUD: I could be supporting bad content! (just like you do if your provider has a transparent proxy to reduce outgoing bandwidth - or with any VPN, Tor, i2p, .... Just today I read this: « you seriously think people will ever use freenet to post their family holiday photos, favourite recipes etc? … can you envisage ordinary people using freenet for stuff where they don't really have anything to hide? » — obvious answer: I do that, so naturally other people might do it, too.)
  • “Bikeshed” discussions: Sometimes just one single email from an anonymous person can derail a free software project for months!
  • Soak mental bandwidth with bogus crypto proposals: PSKs? (a new key-proposal which could make forums scale better but actually just soaked up half a year of the time of the main developer and wasn’t implemented - and in return, critical improvements for existing forums where delayed)
  • Witless volunteers (overlooking practical advantages due to paranoia, theoretical requirements which fail in the real world, overly pessimistic stance which scares away newcomers, voicing requirements for formal specification of protocols which are in flux).
  • Affect code direction (lot’s of the above - also ensuring that there is no direction, so it doesn’t really work well for anybody because it tries to have the perfect features for everybody before actually getting a reasonable user experience).
  • Code obfuscation (some of the stuff is pretty bad, lots of it looks like it was done in a hurry, because there was so much else to do).
  • Misleading documentation (or outdated or none…: There is plenty of Freenet 0.5 documentation while 0.7 is actually a very different beast)
  • Deceptive defaults (You have to setup your first pseudonym by hand, load two plugins manually and solve CAPTCHAS, before you are able to talk to people anonymously, darknet does not work out of the box, the connection speed when given no unit is interpreted as Bytes/s - I’m sure someone once voiced a reason for that)

Phew, quite a list…

I provided this because naming the problems is an important step towards resolving them. I am sure that we can fix most of this, but it’s important to realize that while many of the points I named are most probably homegrown, it is quite plausible that some of them were influenced from the outside. Freenet was always a pretty high profile project in the crypto community, so it is an obvious target. We’d be pretty naive to think that we weren’t targeted.

And we have to keep this in mind when we communicate: We don’t only have to look out for bad code, but also for influences which make us take up toxic communication patterns which keep us from moving forward.

The most obvious fix is: Stay friendly, stick together, keep honest and greet every newcomer as a potential ally. And call out disrupting behaviour early on: If someone insults new folks or takes up huge amounts of discussion time by rehashing old discussions instead of talking about the way forward - in a way which actually leads to going forward - then say that this is your impression. Still stay friendly: Most of the time that’s not intentional. And people can be affected by outside influences like someone attacking them in other channels, so it would be important to help them recover and not to push them away because their behaviour became toxic for some time (as long as the time investment for that is not overarching).

Overall it’s about keeping the community together despite the knowledge that some of us might actually be aggressors or influenced from the outside to disrupt our work.

Effortless password protected sharing of files via Freenet

Inserting a file into freenet using the key KSK@<password> creates an invisible, password protected file which is available over Freenet.

Often you want to exchange some content only with people who know a given password and make it accessible to everyone in your little group but invisible to the outside world.

Until yesterday I thought that problem slightly complex, because everyone in your group needs a given encryption program, and you need a way to share the file without exposing the fact that you are sharing it.

Then I learned two handy facts about Freenet:

  • <ArneBab> evanbd: If I insert a tiny file without telling anyone the key, can they get the content in some way?
    <evanbd> ArneBab: No.

  • <toad_> dogon: KSK@<any string of text> -> generate an SSK private key from the hash of the text
    <toad_> dogon: if you know the string, you can both insert and retrieve it

In other words: Just inserting a file into freenet using the key KSK@<password> creates an invisible, password protected file which is shared over Freenet.

The file is readable and writeable by everyone who knows the password (within limits1), but invisible to everyone else.

To upload a file as KSK, just go to the filesharing tab, click “upload a file”, switch to advanced mode and enter the KSK key.

Or simply click here (requires freenet to be running on your computer with default settings).

It’s strange to think that I only learned this after more than 7 years of using Freenet. How many more nuggets might be hidden there, just waiting for someone to find them and document them in a style which normal users understand?

Freenet is a distributed datastore which can find and transfer data efficiently on restricted routes (search for meshnet scaling to see why that type of routing is really hard), and it uses a WebOfTrust for real-life spam-resistance without the need for a central authority (look at your mailbox to see how hard that is, even with big money).

How many more complex problems might it already have solved as byproduct of the search for censorship resistance?

So, what’s still to be said? Well, if Freenet sounds interesting: Join in!


  1. A KSK is writeable with the limit, that you cannot replace the file if people still have it in their stores: You have to wait till it has been displaced or be aware that now two states for the file exist: One with your content and one with the old. Better just define a series of KSKs: Add a number to the KSK and if you want to write, simply insert the next one. 

Exact Math to the rescue - with Guile Scheme

I needed to calculate the probability that for every freenet user there are at least 70 others in a distance of at most 0.01. That needs binomial coefficients with n and k on the order of 4000. My old Python script failed me with an OverflowError: integer division result too large for a float. So I turned to Guile Scheme and exact math.

1 The challenge

I need the probability that within 4000 random numbers between 0 and 1, at least 70 are below 0.02.

Then I need the probability that within 4000 random numbers, at most 5 find less than 70 others to which the distance is at most 0.02.

Or more exactly: I need to find the right maximum length to replace the 0.02.

2 The old script

I had a Python-script lying around which I once wrote for estimating the probability that a roleplaying group will have enough people to play in a given gaming night.

It’s called spielfaehig.py (german for “able to play”).

It just does this:

from math import factorial
fac = factorial
def nük(n, k): 
   if k > n: return 0
   return fac(n) / (fac(k)*fac(n-k))

def binom(p, n, k): 
   return nük(n, k) * p** k * (1-p)**(n-k)

def spielfähig(p, n, min_spieler): 
   try: 
      return sum([binom(p, n, k) for k in range(min_spieler, n+1)])
   except ValueError: return 1.0

Now when I run this with p=0.02, n=4000 and minspieler=70, it returns

OverflowError: integer division result too large for a float

The reason is simple: There are some intermediate numbers which are much larger than what a float can represent.

3 Solution with Guile

To fix this, I rewrote the script in Guile Scheme:

#!/usr/bin/env guile-2.0
!#

(define-module (spielfaehig)
  #:export (spielfähig))
(use-modules (srfi srfi-1)) ; for iota with count and start

(define (factorial n)
  (if (zero? n) 1 
      (* n (factorial (1- n)))))

(define (nük n k)
  (if (> k n) 0
      (/ (factorial n) 
         (factorial k) 
         (factorial (- n k)))))

(define (binom p n k)
  (* (nük n k) 
     (expt p k) 
     (expt (- 1 p) (- n k))))

(define (spielfähig p n min_spieler) 
  (apply + 
         (map (lambda (k) (binom p n k)) 
              (iota (1+ (- n min_spieler)) min_spieler))))

To use this with exact math, I just need to call it with p as exact number:

(use-modules (spielfaehig))
(spielfähig #e.03 4000 70)
;           ^ note the #e - this means to use an exact representation
;                           of the number

; To make Guile show a float instead of some huge division, just
; convert the number to an inexact representation before showing it.
(format #t "~A\n" (exact->inexact (spielfähig #e.03 4000 70)))

And that’s it. Automagic hassle-free exact math is at my fingertips.

It just works and uses less then 200 MiB of memory - even though the intermediate factorials return huge numbers. And huge means huge. It effortlessly handles numbers with a size on the order of 108000. That is 10 to the power of 8000 - a number with 8000 digits.

4 The Answer

42! :)

The real answer is 0.0125: That’s the maximum length we need to choose for short links to get more than a 95% probability that in a network of 4000 nodes there are at most 5 nodes for which there are less than 70 peers with a distance of at most the maximum length.

If we can assume 5000 nodes, then 0.01 is enough. And since this is the number we directly got from an analysis of our link length distribution, it is the better choice, though it will mean that people with huge bandwidth cannot always max out their 100 connections.

5 Conclusion

Most of the time, floats are OK. But there are the times when you simply need exact math.

In these situations Guile Scheme is a lifesaver.

Dear GNU Hackers, thank you for this masterpiece!

And if you were crazy enough to read till here, Happy Hacking to you!

AnhangGröße
2014-07-21-Mo-exact-math-to-the-rescue-guile-scheme.org4.41 KB

Exploring the probability of successfully retrieving a file in freenet, given different redundancies and chunk lifetimes

In this text I want to explore the behaviour of the degrading yet redundant anonymous file storage in Freenet. It only applies to files which were not subsequently retrieved.

Every time you retrieve a file, it gets healed which effectively resets its timer as far as these calculations here are concerned. Due to this, popular files can and do live for years in freenet.

1 Static situation

Firstoff we can calculate the retrievability of a given file with different redundancy levels, given fixed chunk retrieval probabilities.

Files in Freenet are cut into segments which are again cut into up to 256 chunks each. With the current redundancy of 100%, only half the chunks of each segment have to be retrieved to get the whole file. I call that redundancy “2x”, because it inserts data 2x the size of the file (actually that’s just what I used in the code and I don’t want to force readers - or myself - to make mental jumps while switching from prose to code).

We know from the tests done by digger3, that after 31 days about 50% of the chunks are still retrievable, and after 30 days about 30%. Let’s look how that affects our retrieval probabilities.

# encoding: utf-8
from spielfaehig import spielfähig
from collections import defaultdict
data = []
res = []
for chunknumber in range(5, 105, 5):...
byred = defaultdict(list)
for num, prob, red, retrieval in data:...
csv = "; num prob retrieval"
for red in byred:...

# now plot the files

plotcmd = """
set term png
set width 15
set xlabel "chunk probability"
set ylabel "retrieval probability"
set output freenet-prob-redundancy-2.png
plot "2.csv" using 2:3 select ($1 == 5) title "5 chunks", "" using 2:3 select ($1 == 10) title "10 chunks", "" using 2:3 select ($1 == 30) title "30 chunks", "" using 2:3 select ($1 == 100) title "100 chunks"
set output freenet-prob-redundancy-3.png
plot "3.csv" using 2:3 select ($1 == 5) title "5 chunks", "" using 2:3 select ($1 == 10) title "10 chunks", "" using 2:3 select ($1 == 30) title "30 chunks", "" using 2:3 select ($1 == 100) title "100 chunks"
set output freenet-prob-redundancy-4.png
plot "4.csv" using 2:3 select ($1 == 5) title "5 chunks", "" using 2:3 select ($1 == 10) title "10 chunks", "" using 2:3 select ($1 == 30) title "30 chunks", "" using 2:3 select ($1 == 100) title "100 chunks"
"""
with open("plot.pyx", "w") as f:...

from subprocess import Popen
Popen(["pyxplot", "plot.pyx"])

So what does this tell us?

./freenet-prob-redundancy-2.png

Retrieval probability of a given file in a static case. redundancy 100% (2)

redundancy 200% (3)

Retrieval probability of a given file in a static case. redundancy 200% (3)

redundancy 300% (4)

Retrieval probability of a given file in a static case. redundancy 300% (4)

This looks quite good. After all, we can push the lifetime as high as we want by just increasing redundancy.

Sadly it is also utterly wrong :) Let’s try to get closer to the real situation.

2 Dynamic Situation: The redundancy affects the replacement rate of chunks

To find a better approximation of the effects of increasing the redundancy, we have to stop looking at freenet as a fixed store and have to start seeing it as a process. More exactly: We have to look at the replacement rate.

2.1 Math

A look on the stats from digger3 shows us, that after 4 weeks 50% of the chunks are gone. Let’s call this the dropout rate. The dropout rate consists of churn and chunk replacement:

dropout = churn + replacement

Since after one day the dropout rate is about 10%, I’ll assume that the churn is lower than 10%. So for the following parts, I’ll just ignore the churn (naturally this is wrong, but since the churn is not affected by redundancy, I just take it as constant factor. It should reduce the negative impacts of increasing redundancy). So we will only look at replacement of blocks.

Replacement consists of new inserts and healing of old files.

replacement = insert + healing

If we increase the redundancy from 2 to 3, the insert and healing rate should both increase by 50%, so the replacement rate should increase by 50%, too. The healing rate might increase a bit more, because healing can now restore 66% of the file as long as at least 33% are available. I’ll ignore that, too, for the time being (which is wrong again. We will need to keep this in mind when we look at the result).

redundancy 2 → 3 ⇒ replacement rate × 1.5

Increasing the replacement rate by 50% should decrease the lifetime of chunks by 1/1.5, or:

chunk lifetime × 2/3

So we will be at the 50% limit not after 4 weeks, but after 10 days. But on the other hand, redundancy 3 only needs 33% chunk probability, which has 2× the lifetime of 50% chunk probability. So the file lifetime should change by 2×2/3 = 4/3:

file lifetime × 4/3 = file lifetime +33%

Now doesn’t that look good?

As you can imagine, this pretty picture hides a clear drawback: The total storage capacity of Freenet gets reduced by 33%, too, because now every file requires 1.5× as much space as before.

2.2 Caveats (whoever invented that name? :) )

We ignored churn, so the chunk lifetime reduction should be a bit less than the estimated 33%%. That’s good and life is beautiful, right? :)

NO. We also ignored the increase in the healing rate. This should be higher, because every retrieved file can now insert more of itself in the healing process. If we had no new inserts, I would go as far as saying that the healing-rate might actually double with the increased redundancy. So in a network completely filled network without new data, the effects of the higher redundancy and the higher replacement rate would exactly cancel. But the higher redundancy would be able to store less files. Since we are constantly pushing new data into the network (for example via discussions in Sone), this should not be the case.

2.3 Dead space

Aside from hiding some bad effects, this simple model also hides a nice effect: A decreased amount of dead space.

Firstoff, lets define it:

2.4 What is dead space?

Dead space is the part of the storage space which cannot be used for retrieving files. With any redundancy, that dead space is just about the size of the original file without redundancy multiplier. So for redundancy 2, the storage space occupied by the file is dead, when less than 50% are available. With redundancy 3, it is dead when less than 33% are available.

2.5 Effect

That dead space is replaced like any other space, but it is never healed. So the higher replacement rate means that dead space is recovered more quickly. So, while a network with higher redundancy can store less files overall, those files which can no longer be retrieved take up less space. I won’t add the math for that, here, though (because I did not do that yet).

2.6 Closing

So, as closing remark, we can say that increasing the redundancy will likely increase the lifetime of files. It will also reduce the overall storage space in Freenet, though. I think it would be worthwhile.

It might also be possible to give probability estimates in the GUI which show how likely it is that we can retrieve a given file after a few percent were downloaded: If more than 1/redundancy chunks succeed, the probability to get the file is high. if close to 1/redundancy succeed, the file will be slow, because we might have to wait for nodes which went online and will come back at some point. Essentially we will have to hope for churn. If much less than 1/redundancy of the chunks succeed, we can stop trying to get the file.

Just use the code in here for that :)

3 Background and deeper look

Why redundancy after all redundancy 1: 1 chunk fails ⇒ file fails. redundancy 2: 50% redundancy 3: 33%

3.1 No redundancy

Let’s start with redundancy 1. If one chunk fails, the whole file fails.

Compared to freenet today the replacement rate would be halved, because each file takes up only half the current space. So the 50% dead chunks rate would be reached after 8 weeks instead of after 4 weeks. And 90% would be after 2 days instead of after 1 day. We can guess that 99% would be after a few hours.

Let’s take a file with 100 chunks as example. That’s 100× 32 kiB, or about 3 Megabyte. After a few hours the chance will be very high that it will have lost one chunk and will be irretrievable. Freenet will still have 99% of the chunks, but they will be wasted space, because the file cannot be recovered anymore. The average lifetime of a file will just be a few hours.

With 99% probability of retrieving a chunk, the probability of retrieving a file will be only about 37%.

from spielfaehig import spielfähig
return spielfähig(0.99, 100, 100)
→ 0.366032341273

To achieve 90% retrievability of the file, we need a chunk availability of 99,9%! The file is essentially dead directly after the insert finishes.

from spielfaehig import spielfähig
return spielfähig(0.999, 100, 100)
→ 0.904792147114

3.2 1% redundancy

Now, lets add one redundant chunk. Almost nothing will have changed for inserting and replacing, but now the probability of retrieving the file when the chunks have 99% availability is 73%!

from spielfaehig import spielfähig
return spielfähig(0.99, 101, 100)
→ 0.732064682546

The replacement rate is increased by 1%, as is the storage space.

To achieve 90% retrievability, we actually need a chunk availability of 99,5%. So we might have 90% retrievability one hour after the insert.

from spielfaehig import spielfähig
return spielfähig(0.995, 101, 100)
→ 0.908655654736

Let’s check for 50%: We need a chunk probability of about 98,4%

from spielfaehig import spielfähig
return spielfähig(0.984, 101, 100)
→ 0.518183035909

The mean lifetime of a file changed from about zero to a few hours.

3.3 50% redundancy

Now, let’s take a big step: redundancy 1.5. Now we need 71,2% block retrievability to have a 90% chance of retrieving one file.

from spielfaehig import spielfähig
return spielfähig(0.712, 150, 100)
→ 0.904577767501

for 50% retrievability we need 66,3% chunk availability.

from spielfaehig import spielfähig
return spielfähig(0.663, 150, 100)
→ 0.500313163333

66% would be reached in the current network after about 20 days (between 2 weeks and 4 weeks), and in a zero redundancy network after 40 days. fetch-pull-stats

At the same time, though, the chunk replacement rate increased by 50%, so the mean chunk lifetime decreased by factor 2/3. So the lifetime of a file would be 4 weeks.

3.4 Generalize this

So, now we have calculations for redundancy 1, 1.5, 2 and 3. Let’s see if we can find a general (if approximate) rule for redundancy.

From the fetch-pull-graph from digger3 we see empirically, that between one week and 18 weeks each doubling of the lifetime corresponds to a reduction of the chunk retrieval probability of 15% to 20%.

Also we know that 50% probability corresponds to 4 weeks lifetime.

And we know that redundancy x has a minimum required chunk probability of 1/x.

With this, we can model the required chunk lifetime as a function of redundancy:

chunk lifetime = 4 * 2**((0.5-1/x)/0.2)

with x as redundancy. Note: this function is purely empirical and approximate.

Having the chunk lifetime, we can now model the lifetime of a file as a function of its redundancy:

file lifetime = (2/x) * 4 * (2**((0.5-1/x)/0.2))

We can now use this function to find an optimum of the redundancy if we are only concerned about file lifetime. Naturally we could get the trusty wxmaxima and get the derivative of it to find the maximum. But that is not installed right now, and my skills in getting the derivatives by hand are a bit rusty (note: install running). So we just do it graphically. The function is not perfectly exact anyway, so the errors introduced by the graphic solution should not be too big compared to the errors in the model.

Note however, that this model is only valid in the range between 20% and 90% chunk retrieval probability, because the approximation for the chunk lifetime does not hold anymore for values above that. Due to this, redundancy values close to or below 1 won’t be correct.

Also keep in mind that it does not include the effect due to the higher rate of removing dead space - which is space that belongs to files which cannot be recovered anymore. This should mitigate the higher storage requirement of higher redundancy.

# encoding: utf-8
plotcmd = """
set term png
set width 15
set xlabel "redundancy"
set ylabel "lifetime [weeks]"
set output "freenet-prob-function.png"
set xrange [0:10]
plot (2/x) * 4 * (2**((0.5-1/x)/0.2))
"""
with open("plot.pyx", "w") as f:...

from subprocess import Popen
Popen(["pyxplot", "plot.pyx"])

4 Summary: Merit and outlook

Now, what do we make of this?

Firstoff: If the equations are correct, an increase in redundancy would improve the lifetime of files by a maximum of almost a week. Going further reduces the lifetime, because the increased replacement of old data outpaces the improvement due to the higher redundancy.

Also higher redundancy needs a higher storage capacity, which reduces the overall capacity of freenet. This should be partially offset by the faster purging of dead storage space.

The results support an increase in redundancy from 2 to 3, but not to 4.

Well, and aren’t statistics great? :)

Additional notes: This exploration ignores:

  • healing creates less insert traffic than new inserts by only inserting failed segments, and it makes files which get accessed regularly live much longer,
  • inter-segment redundancy improves the retrieving of files, so they can cope with a retrievability of 50% of any chunks of the file, even if the distribution might be skewed for a single segment,
  • Non-uniformity of the network which makes it hard to model effects with global-style math like this,
  • Seperate stores for SSK and CHK keys, which improve the availability of small websites and
  • Usability and security impact of increased insert times (might be reduced by only inserting 2/3rd of the file data and letting healing do the rest when the first downloader gets the file)

Due to that, the findings can only provides clues for improvements, but cannot perfectly predict the best path of action. Thanks to evanb for pointing them out!

If you are interested in other applications of the same theory, you might enjoy my text Statistical constraints for the design of roleplaying games (RPGs) and campaigns (german original: Statistische Zwänge beim Rollenspiel- und Kampagnendesign). The script spielfaehig.py I used for the calculations was written for a forum discussion which evolved into that text :)

This text was written and checked in emacs org-mode and exported to HTML via `org-export-as-html-to-buffer`. The process integrated research and documentation. In hindsight, that was a pretty awesome experience, especially the inline script evaluation. I also attached the org-mode file for your leisure :)

AnhangGröße
freenet-prob-redundancy-2.png67.05 KB
freenet-prob-redundancy-3.png65.67 KB
freenet-prob-redundancy-4.png63.43 KB
freenet-success-probability.org14.84 KB
freenet-prob-function.png20.5 KB
fetch_dates_graph-2012-03-16.png17.25 KB
spielfaehig.py.txt1.15 KB

Freenet Communication Primitives: Part 1, Files and Sites

Basic building blocks for communication in Freenet.

This is a guide to using Freenet as backend for communication solutions - suitable for anything from filesharing over chat up to decentrally hosted game content like level-data. It uses the Python interface to Freenet for its examples.

TheTim from Tim Moore, licensed under cc by
TheTim
from Tim Moore,
License: cc by.

This guide consists of several installments: Part 1 (this text) is about exchanging data, Part 2 is about confidential communication and finding people and services without drowning in spam and Part 3 ties it all together by harnessing existing plugins which already include all the hard work which distinguishes a quick hack from a real-world system. Happy Hacking and welcome to Freenet, the forgotten cypherpunk paradise where no one can watch you read!

1 Introduction

The immutable datastore in Freenet provides the basic structures for implementing distributed, pseudonymous, spam-resistant communication protocols. But until now there was no practically usable documentation how to use them. Every new developer had to find out about them by asking, speculating and second guessing the friendly source (also known as SGTFS).

We will implement the answers using pyFreenet. Get it from http://github.com/freenet/pyFreenet

We will not go into special cases. For these have a look at the API-documentation of fcp.node.FCPNode().

1.1 Install pyFreenet

To follow the code examples in this article, install Python 2 with setuptools and then run

easy_install --user pyFreenet

2 Sharing a File: The CHK (content hash key)

The first and simplest task is sharing a file. You all know how this works in torrents and file hosters: You generate a link and give that link to someone else.

To create that link, you have to know the exact content of the file beforehand.

import fcp
n = fcp.node.FCPNode()
key = n.put(data="Hello Friend!")
print key
n.shutdown()

Just share this key, and others can retrieve it. Use http://127.0.0.1:8888/ as prefix, and they can even click it - if they run Freenet on their local computer or have an SSH forward for port 8888.

The code above only returns once the file finished uploading. The Freenet Client Protocol (that’s what fcp stands for) however is asynchronous. When you pass async=True to n.put() or n.get(), you get a job object which gives you the result via job.wait().

To generate the key without actually uploading the file, use chkonly=True as argument to n.put().

Let’s test retrieving a file:

import fcp
n = fcp.node.FCPNode()
key = n.put(data="Hello Friend!")
mime, data, meta = n.get(key)
print data
n.shutdown()

This code anonymously uploads an invisible file into Freenet which can only be retrieved with the right key. Then it downloads the file from Freenet using the key and shows the data.

That the put and the get request happen from the same node is a mere implementation detail: They could be fired by total strangers on different sides of the globe and would still work the same. Even the performance would be similar.

Note: fcp.node.FCPNode() opens a connection to the Freenet node. You can have multiple of these connections at the same time, all tracking their own requests without interfering with each other. Just remember to call n.shutdown() on each of them to avoid getting ugly backtraces.

So that’s it. We can upload and download files, completely decentrally, anonymously and confidentially.

There’s just one caveat: We have to exchange the key. And to generate that key, we have to know the content of the file.

Let’s fix that.

3 Public/Private key publishing: The SSK (signed subspace key)

Our goal is to create a key where we can upload a file in the future. We can generate this key and tell someone else: Watch this space.

So we will generate a key, start to download from the key and insert the file to the key afterwards.

import fcp
n = fcp.node.FCPNode()
# we generate a key with the additional filename hello.
public, private = n.genkey(name="hello")
job = n.get(public, async=True)
n.put(uri=private, data="Hello Friend!")
mime, data, meta = job.wait()
print data
n.shutdown()

These 8 lines of code create a key which you could give to a friend. Your friend will start the download and when you get hold of that secret hello-file, you upload it and your friend gets it.

Hint: If you want to test whether the key you give is actually used, you can check the result of n.put(). It returns the key with which the data can be retrieved.

Using the .txt suffix makes Freenet use the mimetype text/plain. Without extension it will use application/octet-stream.

If you start downloading before you upload as we do here, you can trigger a delay of about half an hour due to overload protections (the mechanism is called “recently failed”).

Note that you can only write to a given key-filename combination once. If you try to write to it again, you’ll get conflicts – your second upload will in most cases just not work. You might recognize this from immutable datastructures (without the conflict stuff). Freenet is the immutable, distributed, public/private key database you’ve been phantasizing about when you had a few glasses too many during that long night. So best polish your functional programming skills. You’re going to use them on the level of practical communication.

3.1 short roundtrip time (speed hacks)

A SSK is a special type of key, and similar to inodes in a filesystem it can carry data. But if used in the default way, it will forward to a CHK: The file is salted and then inserted to a CHK which depends on the content and then some, ensuring that the key cannot be predicted from the data (this helps avoid some attacks against your anonymity).

When we want a fast round trip time, we can cut that. The condition is that your data plus filename is less than 1KiB after compression, the amount of data a SSK can hold. And we have to get rid of the metadata. And that means: Use the application/octet-stream mime type, because that’s the default one, so it is left out on upload. And insert single files (we did not yet cover uploading folders: You can do that, but they will forward to a CHK).

import fcp
n = fcp.node.FCPNode()
# we generate a key with the additional filename hello.
public, private = n.genkey(name="hello.txt")
job = n.get(public, async=True, realtime=True, priority=0)
n.put(uri=private, data="Hello Friend!", mimetype="application/octet-stream", realtime=True, priority=0)
mime, data, meta = job.wait()
print public
print data
n.shutdown()

To check whether we managed to avoid the metadata, we can use the KeyUtils plugin to analyze the key.

If it is right, when putting the key into the text field on the http://127.0.0.1:8888/KeyUtils/ site, you’ll see something like this:

0000000: 4865 6C6C 6F20 4672 6965 6E64 21
         Hello Friend!

Also we want to use realtime mode (optimized for the webbrowser: reacting quickly but with low throughput) with a high priority.

Let’s look at the round trip time we achieve:

import time
import fcp
n = fcp.node.FCPNode()
# we generate two keys with the additional filename hello.
public1, private1 = n.genkey(name="hello1.txt")
public2, private2 = n.genkey(name="hello2.txt")
starttime = time.time()
job1 = n.get(public1, async=True, realtime=True, priority=1)
job2 = n.get(public2, async=True, realtime=True, priority=1)
n.put(uri=private1, data="Hello Friend!",
      mimetype="application/octet-stream",
      realtime=True, priority=1)
mime, data1, meta = job1.wait()
n.put(uri=private2, data="Hello Back!",
      mimetype="application/octet-stream",
      realtime=True, priority=1)
mime, data2, meta = job2.wait()
rtt = time.time() - starttime
n.shutdown()
print public1
print public2
print data1
print data2
print "RTT (seconds):", rtt

When I run this code, I get less than 80 seconds round trip time. Remember that we’re uploading two files anonymously into a decentralized network, discover them and then download them, and all that in serial. Less than a minute to detect an upload to known key.

90s is not instantaneous, but when looking at usual posting frequencies in IRC and other chat, it’s completely sufficient to implement a chat system. And in fact it’s how FLIP is implemented: IRC over Freenet.

Compare this to the performance when we do not use the short round trip time trick of avoiding the Metadata and using the realtime queue:

import time
import fcp
n = fcp.node.FCPNode()
# we generate two keys with the additional filename hello.
public1, private1 = n.genkey(name="hello1.txt")
public2, private2 = n.genkey(name="hello2.txt")
starttime = time.time()
job1 = n.get(public1, async=True)
job2 = n.get(public2, async=True)
n.put(uri=private1, data="Hello Friend!")
mime, data1, meta = job1.wait()
n.put(uri=private2, data="Hello Back!")
mime, data2, meta = job2.wait()
rtt = time.time() - starttime
n.shutdown()
print public1
print public2
print data1
print data2
print "RTT (seconds):", rtt

With 300 seconds (5 minutes), that’s more than 3x slower. So you see, if you have small messages and you care about latency, you want to do the latency hacks.

4 Upload Websites: SSK as directory

So now we can upload single files, but the links look a lot like what we see on websites: http://127.0.0.1:8888/folder/file. So can we just mirror a website? The answer is: Yes, definitely!

import fcp
n = fcp.node.FCPNode()
# We create a key with a directory name
public, private = n.genkey() # no filename: we need different ones
index = n.put(uri=private + "index.html",
      data='''<html>
  <head>
    <link rel="stylesheet" type="text/css" href="style.css">
    <title>First Site!</title></head>
  <body>Hello World!</body></html>''')
n.put(uri=private + "style.css", 
      data='body {color: red}\n')
print index
n.shutdown()

Now we can navigate to the key in the freenet web interface and look at our freshly uploaded website! The text is colored red, so it uses the stylesheet. We have files in Freenet which can reference each other by relative links.

4.1 Multiple directories below an SSK

So now we can create simple websites on an SSK. But here’s a catch: key/hello/hello.txt simply returns key/hello. What if we want multiple folders?

For this purpose, Freenet provides manifests instead of single files. Manifests are tarballs which include several files which are then downloaded together and which can include references to external files - named redirects. They can be uploaded as folders into the key. And in addition to these, there are quite a few other tricks. Most of them are used in freesitemgr which uses fcp/sitemgr.py.

But we want to learn how to do it ourselves, so let’s do a more primitive version manually via n.putdir():

import os
import tempfile

import fcp
n = fcp.node.FCPNode()
# we create a key again, but this time with a name: The folder of the
# site: We will upload it as a container.
public, private = n.genkey()
# now we create a directory
tempdir = tempfile.mkdtemp(prefix="freesite-")
with open(os.path.join(tempdir, "index.html"), "w") as f:
    f.write('''<html>
    <head>
    <link rel="stylesheet" type="text/css" href="style.css">
    <title>First Site!</title></head>
    <body>Hello World!</body></html>''')

with open(os.path.join(tempdir, "style.css"), "w") as f:
    f.write('body {color: red}\n')

uri = n.putdir(uri=private, dir=tempdir, name="hello", 
               filebyfile=True, allatonce=True, globalqueue=True)
print uri
n.shutdown()

That’s it. We just uploaded a folder into Freenet.

But now that it’s there, how do we upload a better version? As already said, files in Freenet are immutable. So what’s the best solution if we can’t update the data, but only upload new files? The obvious solution would be to just number the site.

And this is how it was done in the days of old. People uploaded hello-1, hello-2, hello-3 and so forth, and in hello-1 they linked to an image under hello-2. When visitors of hello-1 saw that the image loaded, they knew that there was a new version.

When more and more people adopted that, Freenet added core support: USKs, the updatable subspace keys.

We will come to that in the next part of this series: Service Discovery and Communication.

AnhangGröße
thetim-tim_moore-flickr-cc_by-2471774514_8c9ed2a7e5_o-276x259.jpg19.79 KB

Freenet Communication Primitives: Part 2, Service Discovery and Communication

Basic building blocks for communication in Freenet.

This is a guide to using Freenet as backend for communication solutions - suitable for anything from filesharing over chat up to decentrally hosted game content like level-data. It uses the Python interface to Freenet for its examples.

Mirror, Freenet
 Project, Arne Babenhauserheide, GPL
Mirror,
Freenet Project,
License: GPL.

This guide consists of several installments: Part 1 is about exchanging data, Part 2 is about confidential communication and finding people and services without drowning in spam and Part 3 ties it all together by harnessing existing plugins which already include all the hard work which distinguishes a quick hack from a real-world system.

Note: You need the current release of pyFreenet for the examples in this article (0.3.2). Get it from PyPI:

# with setuptools
easy_install --user pyFreenet
# or pip
pip install --user --egg pyFreenet

This is part 2: Service Discovery and Communication. It shows how to find new people, build secure communication channels and create community forums. Back when I contributed to Gnutella, this was the holy grail of many p2p researchers (I still remember the service discovery papers). Here we’ll build it in 300 lines of Python.

Welcome to Freenet, where no one can watch you read!

USK: The Updatable Subspace Key

USKs allow uploading increasing versions of a website into Freenet. Like numbered uploads from the previous article they simply add a number to site, but they automate upload and discovery of new versions in roughly constant time (using Date Hints and automatic checking for new versions), and they allow accessing a site as <key>/<name>/<minimal version>/ (never understimate the impact of convenience!).

With this, we only need a single link to provide an arbitrary number of files, and it is easy and fast to always get the most current version of a site. This is the ideal way to share a website in Freenet. Let’s do it practically.

import os
import tempfile

import fcp
n = fcp.node.FCPNode()
# we create a key again, but this time with a 
name: The folder of the
# site: We will upload it as a 
container.
public, private = n.genkey()
# now we create a directory
tempdir = 
tempfile.mkdtemp(prefix="freesite-")
with open(os.path.join(tempdir, 
"index.html"), "w") as f:
    f.write('''<html>
    
<head>
    <link rel="stylesheet" 
type="text/css" href="style.css">
    <title>First 
Site!</title></head>
    <body>Hello 
World!</body></html>''')

with open(os.path.join(tempdir, 
"style.css"), "w") as f:
    f.write('body {color: 
red}\n')

uri = 
n.putdir(uri=private, dir=tempdir, name="hello",
               filebyfile=True, allatonce=True, globalqueue=True,
               usk=True)
print uri
n.shutdown()

But we still need to first share the public key, so we cannot just tell someone where to upload the files so we see them. Though if we were to share the private key, then someone else could upload there and we would see it in the public key. We could not be sure who uploaded there, but at least we would get the files. Maybe we could even derive both keys from a single value… and naturally we can. This is called a KSK (old description).

KSK: Upload a file to a password

KSKs allow uploading a file to a pre-determined password. The file will only be detectable for those who know the password, so we have effortless, invisible, password protected files.

import fcp
import uuid # avoid spamming the global namespace

n = fcp.node.FCPNode()
_uuid = str(uuid.uuid1())
key = "KSK@" + _uuid
n.put(uri=key, data="Hello 
World!",
      Global=True, 
persistence="forever",
      realtime=True, 
priority=1)
print key
print n.get(key)[1]
n.shutdown()

Note: We’re now writing a communication protocol, so we’ll always use realtime mode. Be aware, though, that realtime is rate limited. If you use it for large amounts of data, other nodes will slow down your requests to preserve quick reaction of the realtime queue for all (other) Freenet users.

Note: Global=True and

persistence="forever"

allows telling Freenet to upload some data and then shutting down the script. Use async=True and waituntilsent=True to just start the upload. When the function returns you can safely exit from the script and let Freenet upload the file in the background - if necessary it will even keep uploading over restarts. And yes, Capitcalized Global looks crazy. For pyFreenet that choice is sane (though not beautiful), because Global gets used directly as parameter in the Freenet Client Protocol (FCP). This is the case for many of the function arguments. In putdir() there’s a globalqueue parameter which also sets persistence. That should become part of the put() API, but isn’t yet. There are lots of places where the pyFreenet is sane, but not beautiful. It seems like that’s its secret how it could keep working from 2008 till 2014 with almost no maintenance

For our purposes the main feature of KSKs is that we can tell someone to upload to an arbitrary phrase and then download that.

If we add a number, we can even hand out a password to multiple people and tell them to just upload to the first unused version. This is called the KSK queue.

KSK queue: Share files by uploading to a password

The KSK queue used to be the mechanism of choice to find new posts in forums, until spammers proved that real anonymity means total freedom to spam: they burned down the Frost Forum System. But we’ll build this, since it provides a basic building block for the spam-resistant system used in Freenet today.

Let’s just do it in code (descriptions are in the comments):

import fcp
import uuid # avoid spamming the global namespace

n = fcp.node.FCPNode()
_uuid = str(uuid.uuid1())
print "Hey, this is the password:", 
_uuid
# someone else used it before us
for number in range(2):
    key = "KSK@" + _uuid + "-" + str(number)
    n.put(uri=key, data="Hello 
World!", 
          Global=True, 
persistence="forever",
          realtime=True, priority=1,
          timeout=360) # 
6 minutes
# we test for a free slot
for number in range(4):
  key = "KSK@" + _uuid + "-" + str(number)
  try:
    n.get(key, 
          realtime=True, priority=1, 
          timeout=60)
  except 
fcp.node.FCPNodeTimeout:
    break
# and write there
n.put(uri=key, data="Hello 
World!",
      Global=True, 
persistence="forever",
      realtime=True, 
priority=1,
      timeout=360) # 
6 minutes
print key
print n.get(key)[1]
n.shutdown()

Note that currently a colliding put – uploading where someone else uploaded before – simply stalls forever instead of failing. This is a bug in pyFreenet. We work around it by giving an explicit timeout.

But it’s clear how this can be spammed.

And it might already become obvious how this can be avoided.

KSK queue with CAPTCHA

Let’s assume I do not tell you a password. Instead I tell you where to find a riddle. The solution to that riddle is the password. Now only those who are able to solve riddles can upload there. And each riddle can be used only once. This restricts automated spamming, because it requires an activity of which we hope that only humans can do it reliably.

In the clearweb this is known as CAPTCHA. For the examples in this guide a plain text version is much easier.

import fcp
import uuid # avoid spamming the global namespace

n = fcp.node.FCPNode()
_uuid = str(uuid.uuid1())
_uuid2 = str(uuid.uuid1())
riddlekey = "KSK@" + _uuid
riddle =  """
What goes on four legs in the 
morning,                          
two legs at noon, and three legs in 
the                         
evening?
A 
<answer>
"""
# The ancient riddle of the sphinx
n.put(uri=riddlekey, data="""To 
reach me, answer this riddle.

%s

Upload your file to 
%s-<answer>
""" % (riddle, _uuid2),
      Global=True, 
persistence="forever",
      realtime=True, 
priority=1)

print n.get(riddlekey, 
realtime=True, 
priority=1)[1]
answer = "human"
print "answer:", answer
answerkey = "KSK@" + _uuid2 + "-%s" % answer

n.put(uri=answerkey, data="Hey, it's
 me!",
      Global=True, 
persistence="forever",
      realtime=True, 
priority=1)

print n.get(answerkey, 
realtime=True, 
priority=1)[1]
n.shutdown()

Now we have fully decentralized, spam-resistant, anonymous communication.

Let me repeat that: fully decentralized, spam-resistant, anonymous communication.

The need to solve a riddle everytime we want to write is not really convenient, but it provides the core of the feature we need. Everything we now add just makes this more convenient and makes it scale for many-to-many communication.

(originally I wanted to use the Hobbit riddles for this, but I switched to the sphinx riddle to avoid the swamp of multinational (and especially german) quoting restrictions)

Convenience: KSK queue with CAPTCHA via USK to reference a USK

The first step to improve this is getting rid of the requirement to solve a riddle every single time we write to a person. The second is to automatically update the list of riddles.

For the first, we simply upload a public USK key instead of the message. That gives a potentially constant stream of messages.

For the second, we upload the riddles to a USK instead of to a KSK. We pass out this USK instead of a password. Let’s realize this.

To make this easier, let’s use names. Alice wants to contact Bob. Bob gave her his USK. The answer-uuid we’ll call namespace.

import fcp
import uuid # avoid spamming the global namespace
import time # to check the timing

tstart = time.time()
def elapsed_time():
    return time.time() -
 tstart


n = fcp.node.FCPNode()

bob_public, bob_private = 
n.genkey(usk=True, 
name="riddles")
alice_to_bob_public, 
alice_to_bob_private = 
n.genkey(usk=True, 
name="messages")
namespace_bob = 
str(uuid.uuid1())
riddle =  """
What goes on four legs in the 
morning,                          
two legs at noon, and three legs in 
the                         
evening?
A 
<answer>
"""
print "prepared:", elapsed_time()
# Bob uploads the ancient riddle of the 
sphinx
put_riddle = 
n.put(uri=bob_private,
                   data="""To reach 
me, answer this riddle.

%s

Upload your key to 
%s-<answer>
""" % (riddle, 
namespace_bob),
                   Global=True, persistence="forever",
                   realtime=True, priority=1, async=True,
                   IgnoreUSKDatehints="true") # 
speed hack for 
USKs.

riddlekey = bob_public
print "riddlekey:", riddlekey
print "time:", elapsed_time()
# Bob shares the riddlekey. We're set 
up.

# Alice can insert the message before telling 
Bob about it.
put_first_message = 
n.put(uri=alice_to_bob_private,
                          data="Hey 
Bob, it's me, Alice!",
                          Global=True, persistence="forever",
                          realtime=True, priority=1, async=True,
                          IgnoreUSKDatehints="true")

print "riddle:", n.get(riddlekey, 
realtime=True, 
priority=1, followRedirect=True)[1]
print "time:", elapsed_time()

answer = "human"
print "answer:", answer
answerkey = "KSK@" + namespace_bob + 
"-%s" % answer
put_answer = 
n.put(uri=answerkey, data=alice_to_bob_public,
                   Global=True, persistence="forever",
                   realtime=True, priority=1, async=True)

print ":", elapsed_time()
# Bob gets the messagekey and uses it to 
retrieve the message from Alice

# Due to details in the insert process (i.e. 
ensuring that the file is
# accessible), the upload does not need to be 
completed for Bob to be
# able to get it. We just try to get 
it.
messagekey_alice_to_bob
 = n.get(answerkey, realtime=True, priority=1)[1]

print "message:", 
n.get(uri=messagekey_alice_to_bob, realtime=True, priority=1,
                        followRedirect=True, # 
get the new 
version
                        )[1]

print "time:", elapsed_time()
# that's it. Now Alice can upload further 
messages which Bob will see.

# Bob starts listening for a more recent 
message. Note that this does
# not guarantee that he will see all 
messages.
def next_usk_version(uri):
    elements = 
uri.split("/")
    elements[2] = 
str(abs(int(elements[2])) + 1)
    # USK@.../name/N+1/...
    return "/".join(elements)

next_message_from_alice
 = n.get(
    uri=next_usk_version(messagekey_alice_to_bob),
    realtime=True, 
priority=1, async=True,
    followRedirect=True) # 
get the new 
version

print "time:", elapsed_time()
# Alice uploads the next version.
put_second_message = 
n.put(uri=next_usk_version(alice_to_bob_private),
                           data="Me 
again!",
                           Global=True, persistence="forever",
                           realtime=True, priority=1,
                           IgnoreUSKDatehints="true",
                           async=True)

# Bob sees it.
print "second message:", 
next_message_from_alice.wait()[1]
print "time:", elapsed_time()

print "waiting for inserts to finish"
put_riddle.wait()
put_answer.wait()
put_first_message.wait()
put_second_message.wait()
print "time:", elapsed_time()

n.shutdown()

From start to end this takes less than 2 minutes minutes, and now Alice can send Bob messages with roughly one minute delay.

So now we set up a convenient communication channel. Since Alice already knows Bobs key, Bob could simply publish a bob-to-alice public key there, and if both publish GnuPG keys, these keys can be hidden from others: Upload not the plain key, but encrypt the key to Bob, and Bob could encrypt his bob-to-alice key using the GnuPG key from Alice. By regularly sending themselves new public keys, they could even establish perfect forward secrecy. I won’t implement that here, because when we get to the third part of this series, we will simply use the Freemail and Web of Trust plugin which already provide these features.

This gives us convenient, fully decentralized, spam-resistant, anonymous communication channels. Setting up a communication channel to a known person requires solving one riddle (in a real setting likely a CAPTCHA, or a password-prompt), and then the channel persists.

Note: To speed up these tests, I added another speed hack: IgnoreUSKDatehints. That turns off Date Hints, so discovering new versions will no longer be constant in the number of intermediate versions. For our messaging system that does not hurt, since we don’t have many intermediate messages we want to skip. For websites however, that could lead your visitors to see several old versions before they finally get the most current version. So be careful with this hack - just like you should with the other speed hacks.

But if we want to reach many people, we have to solve one riddle per person, which just doesn’t scale. To fix this, we can publish a list of all people we trust to be real people. Let’s do that.

Many-to-many: KSK->CAPTCHA->USK->USK which is linked in the original USK

To enable (public) many-to-many communication, we propagate the information that we believe that someone isn’t a spammer and add a blacklist to get rid of people who suddenly start to spam.

Let’s take Alice and Bob again, but add Carol. First Bob introduces himself to Alice, then Carol introduces herself to Alice. Thanks to propagating the riddle-information, Carol can directly write to Bob, without first solving a riddle. Scaling up that means that you only need to prove a single time that you are no spammer (or rather: not disruptive) if you want to enter a community.

To make it easier to follow, we will implement this with a bit of abstraction: People have a private key, can introduce themselves and publish lists of messages. Also they keep a public list of known people and a list of people they see as spammers who want to disrupt communication.

I got a bit carried away while implementing this, but please bear with me: It’ll work hard to make it this fun.

The finished program is available as alice_bob_carol.py. Just download and run it with python alice_bob_carol.py.

Let’s start with the minimal structure for any pyFreenet using program:

import fcp

n = fcp.node.FCPNode() 
# for debugging add verbosity=5

<<body>>

n.shutdown()

The body contains the definitions of a person with different actors, an update step (as simplification I use global stepwise updates) as well as the setup of the communication. Finally we need an event loop to run the system.

<<preparation>>

<<person>>

<<update>>

<<setup>>

<<event_loop>>

We start with some imports – and a bit of fun :)

import uuid
import random
try:
    import chatterbot 
# let's get a real conversation :)
    # https://github.com/guntherc/ChatterBot/wiki/Quick-Start

    # get with `pip install --user 
chatterbot`
    irc_loguri = 
"USK@Dtz9FjDPmOxiT54Wjt7JwMJKWaqSOS-UGw4miINEvtg,cuIx2THw7G7cVyh9PuvNiHa1e9BvNmmfTcbQ7llXh2Q,AQACAAE/irclogs/1337/"

    print "Getting the latest IRC log as base for the 
chatterbot"
    IRC_LOGLINES = 
n.get(uri=irc_loguri, realtime=True, priority=1, followRedirect=True)[1].splitlines()
    import re # what follows is an evil hack, but what the heck 
:)
    p = re.compile(r'<.*?>')
    q = re.compile(r'&.*?;')
    IRC_LOGLINES = 
[q.sub('', 
p.sub('', str(unicode(i.strip(), errors="ignore"))))
                    for
 i in IRC_LOGLINES]
    IRC_LOGLINES = 
[i[:-5] for i in IRC_LOGLINES # skip the time (last 5 letters)
                    if 
(i[:-5] and # skip empty
                        not "spam" in i # 
do not trigger 
spam-marking
                    )][7:] # 
skip header 

except ImportError:
    chatterbot = 
None

The real code begins with some helper functions – essentially data definition.

def get_usk_namespace(key, name, version=0):
    """Get a USK key with the given 
namespace (foldername)."""
    return "U" + key[1:] + name + "/" + str(version) + "/"

def extract_raw_from_usk(key):
    """Get an SSK key as used to 
identify a person from an arbitrary USK."""
    return "S" + (key[1:]+"/").split("/")[0] + "/"

def deserialize_keylist(keys_data):
    """Parse a known file to get a 
list of keys. Reverse: serialize_keylist."""
    return [i for i in keys_data.split("\n") if i]

def serialize_keylist(keys_list):
    """Serialize the known keys into
 a text file. Reverse: parse_known."""
    return "\n".join(keys_list)

Now we can define a person. The person is the primary actor. To keep everything contained, I use a class with some helper functions.

class Person(object):
    def __init__(self, myname, mymessage):
        self.name = 
myname
        self.message = 
mymessage
        self.introduced
 = False
        self.public_key, self.private_key = n.genkey()
        print self.name, "uses key", self.public_key
        # we need a list of versions for the different 
keys
        self.versions =
 {}
        for name 
in ["messages", "riddles", "known", "spammers"]:
            self.versions[name] = -1 # does not exist yet
        # and sets of answers, watched riddle-answer 
keys, known people and spammers.
        # We use sets for these, because we only need 
membership-tests and iteration.
        # The answers contain KSKs, the others the raw 
SSK of the person.
        # watched contains all persons whose messages 
we read.
        self.lists = {}
        for name 
in ["answers", "watched", "known", "spammers", "knowntocheck"]:
            self.lists[name] = set()
        # running requests per name, used for making 
all persons update asynchronously
        self.jobs = {}
        # and just for fun: get real conversations. 
Needs chatterbot and IRC_LOGLINES.
        # this is a bit slow to start, but fun. 

        try:
            self.chatbot = chatterbot.ChatBot(self.name)
            self.chatbot.train(IRC_LOGLINES)
        except:
            self.chatbot = None


    def public_usk(self, name, version=0):
        """Get the public usk of 
type name."""
        return 
get_usk_namespace(self.public_key, name, version)
    def private_usk(self, name, version=0):
        """Get the private usk of 
type name."""
        return 
get_usk_namespace(self.private_key, name, version)

    def put(self, key, data):
        """Insert the data 
asynchronously to the key. This is just a helper to
avoid typing the realtime arguments 
over and over again.

        :returns: a job object. To 
get the public key, use job.wait(60)."""
        return 
n.put(uri=key, data=data, async=True,
                     Global=True, persistence="forever",
                     realtime=True, priority=1,
                     IgnoreUSKDatehints="true")

    def get(self, key):
        """Retrieve the data 
asynchronously to the key. This is just a helper to
avoid typing the realtime arguments 
over and over again.

        :returns: a job object. To 
get the public key, use job.wait(60)."""
        return 
n.get(uri=key, async=True,
                     realtime=True, priority=1,
                     IgnoreUSKDatehints="true",
                     followRedirect=True)

    def introduce_to_start(self, other_public_key):
        """Introduce self to the 
other by solving a riddle and uploading the messages 
USK."""
        riddlekey = 
get_usk_namespace(other_public_key, "riddles", "-1") # 
-1 means the latest 
version
        try:
            self.jobs["getriddle"].append(self.get(riddlekey))
        except KeyError:
            self.jobs["getriddle"] = [self.get(riddlekey)]

    def introduce_start(self):
        """Select a person and start
 a job to get a riddle."""
        known = 
list(self.lists["known"])
        if known: 
# introduce to a random person to 
minimize
                  # 
the chance of 
collisions
            k = 
random.choice(known)
            self.introduce_to_start(k)

    def introduce_process(self):
        """Get and process the 
riddle data."""
        for job 
in self.jobs.get("getriddle", [])[:]:
            if 
job.isComplete():
                try:
                    riddle = job.wait()[1]
                except 
Exception as e: # try 
again next time
                    print self.name, "getting the riddle from", job.uri, "failed with", e
                    return
                self.jobs["getriddle"].remove(job)
                answerkey = self.solve_riddle(riddle)
                messagekey = self.public_usk("messages")
                try:
                    self.jobs["answerriddle"].append(self.put(answerkey, messagekey))
                except 
KeyError:
                    self.jobs["answerriddle"] = [self.put(answerkey, messagekey)]

    def introduce_finalize(self):
        """Check whether the riddle 
answer was inserted successfully."""
        for job 
in self.jobs.get("answerriddle", [])[:]:
            if 
job.isComplete():
                try:
                    job.wait()
                    self.jobs["answerriddle"].remove(job)
                    self.introduced = True
                except 
Exception as e: # try 
again next time
                    print self.name, "inserting the riddle-answer failed with", e
                    return

    def new_riddle(self):
        """Create and upload a new 
riddle."""
        answerkey = 
"KSK@" + str(uuid.uuid1()) + "-answered"
        self.lists["answers"].add(answerkey)
        self.versions["riddles"] += 1
        next_riddle_key
 = self.private_usk("riddles", self.versions["riddles"])
        self.put(next_riddle_key, answerkey)


    def solve_riddle(self, riddle):
        """Get the key for the given
 riddle. In this example we make it easy:
The riddle is the key. For a real 
system, this needs user interaction.
        """
        return riddle

    def update_info(self):
        for name 
in ["known", "spammers"]:
            data = 
serialize_keylist(self.lists[name])
            self.versions[name] += 1
            key = 
self.private_usk(name, 
version=self.versions[name])
            self.put(key, data)

    def publish(self, data):
        self.versions["messages"] += 1
        messagekey = 
self.private_usk("messages", version=self.versions["messages"])
        print self.name, "published a message:", data
        self.put(messagekey, data)

    def check_network_start(self):
        """start all network 
checks."""
        # first cancel all running jobs which will be 
replaced here.
        for name 
in ["answers", "watched", "known", "knowntocheck", "spammers"]:
            for job 
in self.jobs.get(name, []):
                job.cancel()
        # start jobs for checking answers, for checking
 all known people and for checking all messagelists for new 
messages.
        for name 
in ["answers"]:
            self.jobs[name] = [self.get(i) for i in self.lists[name]]
        for name 
in ["watched"]:
            self.jobs["messages"] = [self.get(get_usk_namespace(i, "messages")) for i in self.lists[name]]
        self.jobs["spammers"] = []
        for name 
in ["known", "knowntocheck"]:
            # find new nodes
            self.jobs[name] = [self.get(get_usk_namespace(i, "known")) for i in self.lists[name]]
            # register new nodes marked as 
spammers
            self.jobs["spammers"].extend([self.get(get_usk_namespace(i, "spammers")) for i in self.lists[name]])

    def process_network_results(self):
        """wait for completion of 
all network checks and process the results."""
        for kind, jobs 
in self.jobs.items():
            for job 
in jobs:
                if 
not kind in ["getriddle", "answerriddle"]:
                    try:
                        res = job.wait(60)[1]
                        self.handle(res, kind, job)
                    except:
                        continue

    def handle(self, result, kind, job):
        """Handle a successful job 
of type kind."""
        # travel the known nodes to find new 
ones
        if kind 
in ["known", "knowntocheck"]:
            for k 
in 
deserialize_keylist(result):
                if 
(not k in self.lists["spammers"] and
                    not
 k in self.lists["known"] and
                    not
 k == self.public_key):
                    self.lists["knowntocheck"].add(k)
                    self.lists["watched"].add(k)
                    print self.name, "found and started to watch", k
        # read introductions
        elif kind 
in ["answers"]:
            self.lists[kind].remove(job.uri) # no longer need to watch this riddle
            k = 
extract_raw_from_usk(result)
            if not k in self.lists["spammers"]:
                self.lists["watched"].add(k)
                print 
self.name, "discovered", k, "through a solved riddle"
        # remove found spammers
        elif kind 
in ["spammers"]:
            for k 
in 
deserialize_keylist(result):
                if 
not result in self.lists["known"]:
                    self.lists["watched"].remove(result)
        # check all messages for spam
        elif kind 
in ["messages"]:
            k = 
extract_raw_from_usk(job.uri)
            if not "spam" in result:
                if 
not k == self.public_key:
                    print self.name, "read a message:", result
                    self.chat(result) # just for
 fun :)
                    if 
not k in self.lists["known"]:
                        self.lists["known"].add(k)
                        self.update_info()
                        print self.name, "marked", k, "as known person"
            else:
                self.lists["watched"].remove(k)
                if 
not k in self.lists["spammers"]:
                    self.lists["spammers"].add(k)
                    self.update_info()
                    print self.name, "marked", k, "as spammer"


    def chat(self, message):
        if self.chatbot and not "spam" in self.message:
            msg = 
message[message.index(":")+1:-10].strip() # remove 
name and step
            self.message = self.name + ": " + self.chatbot.get_response(msg)

# some helper functions; the closest equivalent
 to structure definition
<<helper_functions>>

Note that nothing in here depends on running these from the same program. All communication between persons is done purely over Freenet. The only requirement is that there is a bootstrap key: One person known to all new users. This person could be anonymous, and even with this simple code there could be multiple bootstrap keys. In freenet we call these people “seeds”. They are the seeds from which the community grows. As soon as someone besides the seed adds a person as known, the seed is no longer needed to keep the communication going.

The spam detection implementation is pretty naive: It trusts people to mark others as spammers. In a real system, there will be disputes about what constitutes spam and the system needs to show who marks whom as spammer, so users can decide to stop trusting the spam notices from someone when they disagree. As example for a real-life system, the Web of Trust plugin uses trust ratings between -100 and 100 and calculates a score from the ratings of all trusted people to decide how much to trust people who are not rated explicitly by the user.

With this in place, we need the update system to be able to step through the simulation. We have a list of people who check keys of known other people.

We first start all checks for all people quasi-simultaneously and then check the results in serial to avoid long wait times from high latency. Freenet can check many keys simultaneously, but serial checking is slow.

people = []

def update(step):
    for p in people:
        if not p.introduced:
            p.introduce_start()
    for p in people:
        p.check_network_start()
    for p in people:
        if p.message:
            p.publish(p.name + ": 
" + p.message + "   
(step=%s)" % step)
        p.new_riddle()
    for p in people:
        if not p.introduced:
            p.introduce_process()
    for p in people:
        p.process_network_results()
    for p in people:
        if not p.introduced:
            p.introduce_finalize()

So that’s the update tasks - not really rocket science thanks to the fleshed out Persons. Only two things remain: Setting up the scene and actually running it.

For setup: We have Alice, Bob and Carol. Lets also add Chuck who wants to prevent the others from communicating by flooding them with spam.

def gen_person(name):
    try:
        return 
Person(myname=name, mymessage=random.choice(IRC_LOGLINES))
    except:
        return 
Person(myname=name, mymessage="Hi, 
it's me!")

# start with alice
alice = 
gen_person("Alice")
people.append(alice)

# happy, friendly people
for name in ["Bob", "Carol"]:
    p = 
gen_person(name)
    people.append(p)

# and Chuck
p = 
Person(myname="Chuck", 
mymessage="spam")
people.append(p)

# All people know Alice (except for 
Alice).
for p in people:
    if p == alice:
        continue
    p.lists["known"].add(alice.public_key)
    p.lists["watched"].add(alice.public_key)

# upload the first version of the spammer and 
known lists
for p in people:
    p.update_info()

That’s it. The stage is set, let the trouble begin :)

We don’t need a while loop here, since we just want to know whether the system works. So the event loop is pretty simple: Just call the update function a few times.

for i in range(6):
    update(step=i)

That’s it. We have spam-resistant message-channels and community discussions. Now we could go on and implement more algorithms on this scheme, like the turn-based games specification (ever wanted to play against truly anonymous competitors?), Fritter (can you guess from its name what it is? :)), a truly privacy respecting dropbox or an anonymizing, censoriship resistant, self-hosting backend for a digital market like OpenBazaar (there’s a 4 Bitcoin bounty on that – info, wallet – you might want to give it a shot, if you believe the offer. And If it should turn out to be a lie, you’ll at least have made OpenBazaar much more resilient).

But that would go far beyond the goal of this article – which is to give you, my readers, the tools to create the next big thing by harnessing the capabilities of Freenet.

These capabilities have been there for years, but hidden beneath non-existing and outdated documentation, misleading claims of being in alpha-stage even though Freenet has been used in what amounts to production for over a decade and, not to forget, the ever-recurring, ever-damning suggestion to SGTFS (second-guess the friendly source). As written in Forgotten Cypherpunk Paradise, Freenet already solved many problems which researchers only begin to tackle now, but there are reasons why it was almost forgotten. With this series I intend fix some of them and start moving Freenet documentation towards the utopian vision laid out in Teach, Don’t Tell. It’s up to you to decide whether I succeeded. If I did, it will show up as a tiny contribution to the utilities and works of art and vision you create.

Note that this is not fast (i.e. enough for blogging but not enough for chat). We can make it faster by going back to SSKs instead of USKs with their additional logic for finding the newest version in O(1), but for USK there are very cheap methods to get notified of new versions for large numbers of keys (subscribing) which are used by more advanced tools like the Web of Trust and the Sone plugin, so this would be an optimization we would have to revert later. With these methods, Sone reaches round trip times of 5-15 minutes despite using large uploads.

Also since this uses Freenet as backend, it scales up: If Alice, Bob, Carol und Chuck used different computers instead of running on my single node, their communication would actually be faster, and if they called in all their alphabet and unicode friends, the system would still run fast. We’re harvesting part of the payoff from using a fully distributed backend :)

And with that, this installment ends. You can now implement really cool stuff using Freenet. In the next article I’ll describe how to avoid doing this stuff myself by interfacing with existing plugins. Naturally I could have done that from the start, but then how could I have explained the Freenet communication primitives these plugins use? :)

If you don’t want to wait, have a look at how Infocalypse uses wot to implement github-like access with user/repo, interfaces with Freemail to realize truly anonymous pull-requests from the command line and builds on FMS to provide automated updates of a DVCS wiki over Freenet.

Happy Hacking!

PS: You might ask “What is missing?”. You might have a nagging feeling that something we do every day isn’t in there. And you’re right. It’s scalable search. Or rather: scalable, spam- and censorship-resistant search. Scalable search would be Gnutella. Spam-resistance would be Credence on the social graph (the people you communicate with). Censorship-resistant is unsolved – even Google fails there. But seeing that Facebook just overtook Google as the main source of traffic, we might not actually need fully global search. Together with the cheap and easy update notifications in Freenet (via USKs), a social recommendation and bookmark-sharing system should make scalable search over Freenet possible. And until then there’s always the decentralized YaCy search engine which has been shown to be capable of crawling Freenet. Also there are the Library and Spider plugins, but they need some love to work well.

PPS: You can download the final example as alice_bob_carol.py

Freenet anonymity: Best case and Worst case

As the i2p people say, anynomity is no boolean. Freenet allows you to take it a good deal further than i2p or tor, though. If you do it right.

  • Worst case: If all of Apple would want to find you, because you declared that you would post the videos of the new iDing - and already sent them your videos as teaser before starting to upload them from an Apple computer (and that just after they lost their beloved dictator), you might be in problems if you use Opennet. You are about as safe as with tor or i2p.

  • Best case: If a local politician would want to find you, after you uploaded proof that he takes bribes, and you compressed these files along with some garbage data and used Freenet in Darknet-mode with connections only to friends who would rather die than let someone take over their computer, there’s no way in hell, you’d get found due to freenet (the file data could betray you, or they could find you by other means, but Freenet won’t be your weak spot).

Naturally real life is somewhere in-between.

Things which improve anonymity a lot in the best case:

  • Don’t let others know the data you are going to upload before the upload finished (would allow some attacks).
  • Use only Darknet with trusted friends (Darknet means that you connect only to people you know personally. For that it is necessary to know other people who use Freenet).
  • Upload small files so the time in which you are actively uploading is short.

Implied are:

  • Use an OS without trojans. So no Windows. (Note: Linux can be hacked, too, but it is far less likely to already have been compromised)
  • Use no Apple devices. You don’t control them yourself and can’t know what they have under the hood. (You are compromised from the time you buy them)
  • If you use Android, flash it yourself to give it an OS you control (Freenet is not yet available for Android. That would be a huge task).
  • Know your friends.

Important questions to ask:

  • Who would want to find you?
  • How much would they invest to find you?
  • Do they already try to monitor Freenet? (in that case uploading files with known content would be dangerous)
  • Do they already know you personally? If yes and if they might have already compromised your computer or internet connection, you can’t upload anything anonymously anywhere. In that case, never let stuff get onto your computer in the first place. Let someone else upload it, who is not monitored (yet).
  • Can they eavesdrop on your internet connection? Then they might guess that you use Freenet from the amount of encrypted communication you do and might want to bug your computer just in case you want to use freenet against them some day.

See the Security Summary (mostly possible attacks) in the freenet wiki for details.

Freenet protects your DickPic!

Afraid that the NSA could steal your DickPic? Freenet to the rescue!

Freenet protects your DickPic!
(mirror via Freenet)

Don’t know what this is about?

Watch Edward Snowden reveal DickPic, the latest, most massive surveillance program from the NSA:

Thanks to John Oliver for one of the most awesome acts of journalism I’ve seen!

FAQ

Anonymous@lFG3mGbGf0b8nE6j8RC0i5ZgWEhsQXDG3ghkYIa-1wQ wrote :
I thought Freenet wasn’t able to protect against the NSA?

The link “Connect to your friends” shows how to connect via darknet and communicate via darknet N2N messages (node-to-node messages). From my understanding, these are currently one of the most secure communication methods we can get, because they hide our personal communication beneath Freenet traffic.

They aren’t suited to communicating anonymously (because we can only talk with our friends), but they are well suited to communicating confidentially.

PS: The image is licensed under GPL, copyright: the freenet team (for the rabbit) and Arne Babenhauserheide. It uses the source images Zuchineee (thanks to Arthurcravan prrrr!) and National Security Agency from the public domain. See the sources below. … but you know what: Just share it anyway you like. I’m sure the author of the rabbit agrees, and I for sure do ☺

PPS: Yes, I had lots of fun creating this ;-)

PPPS: For some reason, the image disappeared from my server. I did not take it down. Yes, that worries me. What you see above is served from an in-proxy into Freenet. Should that go down, too, you can still use Freenet to access the image or setup your own in-proxy to allow others to see it.

AnhangGröße
freenet-protects-your-dickpic-vs-nsa.gif659.81 KB
freenet-protects-your-dickpic-vs-nsa.png126.31 KB
freenet-protects-your-dickpic-vs-nsa.xcf2.54 MB

Freenet: The forgotten cryptopunk paradise

PDF

PDF (to print)

Org (source)

Text (for email)

I planned to get this into a newspaper, but it was too technical for the Guardian and too non-practical for Linux Voice. Then my free time ran out. Today I saw Barret Brown comment his 5 years sentence for quoting a Fox news commentator and sharing a public link. I knew it was time to publish. Welcome to Freenet: The forgotten cryptopunk paradise!

A long time ago in a chatroom far away, select groups of crypto-anarchists gathered to discuss the death of privacy since the NSA could spy on all communications with ease. Among those who proposed technical solutions was a student going by the name sanity, and he published the widely regarded first paper on Freenet: A decentralized anonymous datastore which was meant to be a cryptopunk paradise: true censorship resistance, no central authority and long lifetime only for information which people were actually interested in.

Many years passed, two towers fell, the empire expanded its hunt for rebels all over the globe, and now, as the empire’s grip has become so horrid that even the most loyal servants of the emperors turn against them and expose their dark secrets to the masses, Freenet is still moving forward. Lost to the eye of the public, it shaped and reshaped itself - all the while maintaining its focus to provide true freedom of the press in the internet.

A new old hope

Once only a way to anonymously publish one-shot websites anonymously into Freenet that other members of the group could see, Freenet now provides its users with most services found in the normal internet, yet safe from the prying eyes of the empire. Its users communicate with each other using email which hides metadata, micro-blogging with real anonymity, forums on a wide number of topics - from politics to drug-experiences - and websites with update-notifications (howto) whose topics span from music and anime over religion and programming to life without a state and the deepest pits of depravity.

All these possibilities emerge from its decentralized datastore and the tools built on top of a practically immutable data structure, and all its goals emerge from providing real freedom of the press. Decentralization is required to avoid providing a central place for censorship. Anonymity is needed to protect people against censorship by threat of subsequent punishment, prominently used in China where it is only illegal to write something against the state if too many people should happen to read it. Private communication is needed to allow whistleblowers to contact journalists and also to discuss articles before publication, invisible access to information makes it hard to censor articles by making everyone a suspect who reads one of those articles, as practiced by the NSA which puts everyone on the watchlist who accesses freenetproject.org (reported by german public TV program Panorama). And all this has to be convenient enough that journalists to actually use it during their quite stressful daily work. As side effect it provides true online freedom, because if something is safe enough for a whistleblower, it is likely safe enough for most other communication, too.

These goals pushed Freenet development into areas which other groups only touched much later - or not at all. And except for convenience, which is much harder to get right in a privacy-sensitive context than it seems, Freenet nowadays manages to fulfill these goals very well.

The empire strikes the web

The cloud was “invented” and found to be unsafe, yet Freenet already provided its users with a safe cloud. Email was found to spill all your secrets, while Freenet already provided its users with privacy preserving emails. Disaster control became all the rage after hurricane Katrina and researchers scrambled to find solutions for communicating on restricted routes, and Freenet already provided a globally connectable darknet on friend-to-friend connections. Blogs drowned in spam comments and most caved in and switched to centralized commenting solutions, which made the fabled blogosphere into little more than a PR outlet for Facebook, but Freenet already provided spam resistance via an actually working web of trust - after seeing the non-spam-resistant forum system Frost burn when some trolls realized that true anonymity also means complete freedom to use spam-bots. Censorship and total surveillance of user behavior on Facebook were exposed, G+ required users to use their real names and Twitter got blocked in many repressive regimes, whereas Freenet already provided hackers with convenient, decentralized, anonymous micro-blogging. Now websites are cracked by the minute and constant attacks made it a chore for private webmasters simply to stay available, though Freenet already offers attack-resistant hosting which stays online as long as people are interested in the content.

All these developments happened in a private microcosmos, where new and strange ideas could form and hatch, an incubator where reality could be rethought and rewritten to reestablish privacy in the internet. The internet was hit hard, and Freenet evolved to provide a refuge for those who could use it.

The return of privacy

What started as the idea of a student was driven forward by about a dozen free-time coders and one paid developer for more than a decade - funded by donations from countless individuals - and turned into a true forgotten cryptopunk paradise: actual working solutions to seemingly impossible problems, highly detailed documentation streams in a vast nothingness to be explored only by the initiated (where RTFS is a common answer: Read The Friendly Source), all this with plans and discussions about saving the world mixed in.

The practical capabilities of Freenet should be known to every cryptopunk - but a combination of mediocre user experience, bad communication and worse PR (and maybe something more sinister, if Poul-Henning Kamp should prove to be farsighted about project Orchestra) brought us to a world where a new, fancy, half finished, partially thought through, cash-cow searching project comes around and instead of being asked “how’s that different from Freenet?”, the next time I talk to a random crypto-loving stranger about Freenet I am asked “how is Freenet different from X which just made the news?” (the answer which fits every single time is: “Even if X should work, it would provide only half of Freenet, and none of the really important features - friend-to-friend darknet, access dependent content lifetime, decentralized spam resistance, stable pseudonyms, hosting without a server”).

Now, after many years of work have culminated in a big step forward, it is time for Freenet to re-emerge from hiding and take its place as one of the few privacy tools actually proven to work - and as the single tool with the most ambitious goal: Reestablishing freedom of the press and freedom of speech in the internet.

Join in

If you do not have the time for large scale contribution, a good way to support freenet is to run and use it - and ask your friends to join in, ideally over darknet.

Freenet Logo: Follow the RabbitFreenet Logo: Follow the Rabbitfreenetproject.org

More information about the movement which spawned Freenet can be found in Wikipedia under Cypherpunk, which would have made a more correct title for this text, but did not rime with "forgotten paradise".

If you can program, there are lots of low hanging fruit: small tasks which allow reaping the fruits of existing solutions to hard problems. For example my recent work on freenet includes 4 hours of hacking the Python-based site uploader in pyFreenet which sped up the load time of its sites by up to a factor of 4. If you want to join, come to #freenet @ freenode to chat, discuss with us in the freenet devl mailing list and check the github-project.

Freenet Logo: Follow the Rabbit Welcome to Freenet, where no one can watch you read.

Creative Commons License

I hereby release this article under the CC attribution License: You can use the text however you like as long as you name me (Arne Babenhauserheide) and link here ( draketo.de/english/freenet/forgotten-cryptopunk-paradise or draketo.de/node/656 ).

A huge thank you goes to Lacrocivious who helped me improve this text a lot! A second thank you goes to the other Freenet users with whom I discussed the article via Darknet-messages, when we were still thinking about submitting it to Wired and therefore needed to keep it confidential.

AnhangGröße
2014-08-24-So-freenet-forgotten-cryptopunk-paradise.pdf85.01 KB
freenet-forgotten-cryptopunk-paradise-mail.txt8.4 KB
freenet-forgotten-cryptopunk-paradise-pdf-thumb.png8.51 KB
2014-08-24-So-freenet-forgotten-cryptopunk-paradise.org7.93 KB
freenet_logo.png2.26 KB

I now have a spam-resistant, decentralized comment system via Freenet

In the last years, spam became worse and worse. The more my site grew, the more time I had to spend deleting blatant advertisements. Even captchas did not help anymore: Either they were so hard that I myself needed 3 tries on average to get through, or I got hundreds of spam messages per day. A few years ago, I caved in and disabled comments. The alternative would have been to turn my Website into a mere PR-outlet of Facebook, twitter or one of the commenting platforms out there.

But this all changed now. I finally have decentralized, spam-resistant comments using babcom with Freenet as backend!

» babcom: decentralized, spam-resistant comments! «

The comment-system builds on the decentral, spam-resistant social features of the Freenet Project, one of the old cypherpunk creations which started in 2000 with the goal to provide true Freedom of the Press in the Internet and has been evolving ever since. It’s an irony that nowadays spam became a vehicle to push people into censorship-enabling platforms, to use up their limited free time or to drown their words in a pile of dung, so other people cannot find them.

If you do not run Freenet now, this screenshot shows how the comment-system looks for me:

Babcom Screenshot

And for me this is a huge relief: I can finally get comments to my articles again without having to sell my conscience or waste most of my time deleting advertisements.

If that sounds interesting, head over to babcom and check if it suits you!

And if you like it, please Flattr babcom and Flattr Sone!

Infocalypse - Make your code survive the information apocalypse

Anonymous DVCS in the Darknet.

easy setup of infocalypse (script)
Freenet Development over Freenet

This is a mirror of the documentation of the infocalypse extension for Mercurial written by djk - published here with his permission. It is licensed solely under the GPLv2 or later. The text is long. For concise information, use the second Link above (Freenet Development over Freenet).

Introduction

The Infocalypse 2.0 hg extension is an extension for Mercurial that allows you to create, publish and maintain incrementally updateable repositories in Freenet.

Your code is then hosted decentrally and anonymously, making it just as censorship-resistant as all other content in Freenet.

It works better than the other DVCS currently available for Freenet.

Most of the information you will find in this document can also be found in the extension's online help. i.e.:

hg help infocalypse

HOWTO: Infocalypse 2.0 hg extension


updated: 20090927

Note: Contains Freenet only links

Table of Contents


Requirements

The extension has the following dependencies:

  • Freenet
    You can more information on Freenet here:

    http://freenetproject.org/ [HTTP Link!]

  • Python
    I test on Python 2.5.4 and 2.6.1. Any 2.5.x or later version should work. Earlier versions may work.

    You probably won't have to worry about installing Python. It's included in the Windows binary Mercurial distributions and most *nix flavor OS's should have a reasonably up to date version of Python installed.

  • Mercurial
    You can find more information on Mercurial here:

    http://mercurial.selenic.com/ [HTTP Link!]

    Version 1.0.2 won't work.

    I use version 1.2.1 (x86 Gentoo) on a daily basis. Later versions should work.

    I've smoke tested 1.1.2 (on Ubuntu Jaunty Jackalope) and 1.3 (on Widows XP) without finding any problems.

  • FMS
    Installation of the Freenet Messaging System (FMS) is optional but
    highly recommended. The hg fn-fmsread and hg fn-fmsnotify commands won't work without FMS. Without fn-fmsread it is extremely difficult to reliably detect repository updates.

    The official FMS freesite is here:

    USK@0npnMrqZNKRCRoGojZV93UNHCMN-6UU3rRSAmP6jNLE,~BG-edFtdCC1cSH4O3BWdeIYa8Sw5DfyrSV-TKdO5ec,AQACAAE/fms/106/
    
    

[TOC]


Installation

You checked the requirements and understandthe risks right?

Here are step-by-step instructions on how to install the extension.

  • Download the bootstrap hg bundle:
    CHK@S~kAIr~UlpPu7mHNTQV0VlpZk-f~z0a71f7DlyPS0Do,IB-B5Hd7WePtvQuzaUGrVrozN8ibCaZBw3bQr2FvP5Y,AAIC--8/infocalypse2_1723a8de6e7c.hg
        

    You'll get a Potentially Dangerous Content warning from fproxy because the mime type isn't set. Choose 'Click here to force your browser to download the file to disk.'.

    I'll refer to the directory that you saved the bundle file to as DOWNLOAD_DIR.

  • Create an empty directory where you want to install the extension.
    I'll refer to that directory as INSTALL_DIR in the
    rest of these instructions.

  • Create an empty hg repository there. i.e.:
    cd INSTALL_DIR
    hg init
    
  • Unbundle the bootstrap bundle into the new repository. i.e:
    hg pull DOWNLOAD_DIR/infocalypse2_1723a8de6e7c.hg
    hg update
    
  • Edit the '[extensions]' section of your .hgrc/mercurial.ini
    file to point to the infocalypse directory in the unbundled source.

    # .hgrc/mercurial.ini snippet
    [extensions]
    infocalypse = INSTALL_DIR/infocalypse
    

    where INSTALL_DIR is the directory you unbundled into.

    If you don't known where to find/create your .hgrc/mercurial.ini file this link may be useful:

    http://www.selenic.com/mercurial/hgrc.5.html [HTTP Link!]

  • Run fn-setup to create the config file and temp dir. i.e.
    hg fn-setup
       

    If you run your Freenet node on another machine or on a non-standard port you'll need to use the --fcphost and/or --fcpport parameters to set the FCP host and port respectively.

    By default fn-setup will write the configuration file for the extension (.infocalype on *nix, infocalypse.ini on Windows) into your home directory and also create a temp directory called infocalypse_tmp there.

    You can change the location of the temp directory by using the --tmpdir argument.

    If you want to put the config file in a different location set the cfg_file option in the [infocalypse] section of your .hgrc/mercurial.ini file before running fn-setup.

    Example .hgrc entry:
    # Snip, from .hgrc
    [infocalypse]
    cfg_file = /mnt/usbkey/s3kr1t/infocalypse.cfg
  • Edit the fms_id and possibly fms_host/fms_port information in the.infocalyse/infocalypse.ini file. i.e.:

    # Example .infocalypse snippet
    fms_id = YOUR_FMS_ID
    
    fms_host = 127.0.0.1
    fms_port = 1119
    

    where YOUR_FMS_ID is the part of your fms id before the '@' sign.

    If you run FMS with the default settings on the same machine you are running
    Mercurial on you probably won't need to adjust the fcp_host or fcp_port.

    You can skip this step if you're not running fms.

  • Read the latest know version of the extension's repository USK index from FMS.
    hg fn-fmsread -v
    

    You can skip this step if you're not running fms.

  • Pull the latest changes to the extension from Freenet for the first time. Don't skip this step! i.e.:
    hg fn-pull --aggressive --debug --uri USK@kRM~jJVREwnN2qnA8R0Vt8HmpfRzBZ0j4rHC2cQ-0hw,2xcoQVdQLyqfTpF2DpkdUIbHFCeL4W~2X1phUYymnhM,AQACAAE/infocalypse.hgext.R1/41
    hg update
    

    You may have trouble finding the top key if you're not using fn-fmsread. Just keep retrying. If you know the index has increased, use the new index in the URI.

    After the first pull, you can update without the URI.

[TOC]


Updating

This extension is under active development. You should periodically update to get the latest bug fixes and new features.

Once you've installed the extension and pulled it for the first time, you can get updates by cd'ing into the initial INSTALL_DIRand typing:

hg fn-fmsread -vhg fn-pull --aggressive hg update

If you're not running FMS you can skip the fn-fmsread step. You may have trouble getting the top key. Just keep retrying.

If you're having trouble updating and you know the index has increased, use the full URI with the new index as above.

[TOC]


Background

Here's background information that's useful when using the extension.See the
Infocalypse 2.0 hg extension page on my freesite for a more detailed description of how the extension works.

Repositories are collections of hg bundle files

An Infocalypse repository is just a collection of hg bundle files which have been inserted into Freenet as CHKs and some metadata describing how to pull the bundles to reconstruct the repository that they represent. When you 'push' to an infocalypse repository a new bundleCHK is inserted with the changes since the last update. When you 'pull', only the CHKs for bundles for changesets not already in the local repository need to be fetched.

Repository USKs

The latest version of the repository's metadata is stored on a Freenet Updateable Subspace Key (USK) as a small binary file.

You'll notice that repository USKs end with a number without a trailing '/'. This is an important distinction. A repository USK is not a freesite. If you try to view one with fproxy you'll just get a 'Potentially Dangerous Content' warning. This is harmless, and ugly but unavoidable at the current time because of limitation in fproxy/FCP.

Repository top key redundancy

Repository USKs that end in *.R1/<number> are inserted redundantly, with a second USK insert done on *.R0/<number>. Top key redundancy makes it easier for other people to fetch your repository.

Inserting to a redundant repository USK makes the inserter more vulnerable to
correlation attacks. Don't use '.R1' USKs if you're worried about this.

Repository Hashes

Repository USKs can be long and cumbersome. A repository hash is the first 12 bytes of the Sha1 hash of the zero index version of a repository USK. e.g.:

SHA1( USK@kRM~jJVREwnN2qnA8R0Vt8HmpfRzBZ0j4rHC2cQ-0hw,2xcoQVdQLyqfTpF2DpkdUIbHFCeL4W~2X1phUYymnhM,AQACAAE/infocalypse.hgext.R1/0 )
  == 'be68e8feccdd'

You can get the repository hash for a repository USK using:

hg fn-info

from a directory the repository USK has been fn-pull'd into.

You can get the hashes of repositories that other people have announced via fms with:

hg fn-fmsread --listall

Repository hashes are used in the fms update trust map.

The default private key

When you run fn-setup, it creates a default SSK private key, which it stores inthe default_private_key parameter in your .infocalypse/infocalypse.ini file.

You can edit the config file to substitute any valid SSK private key you want.

If you specify an Insert URI without the key part for an infocalypse command the default private key is filled in for you. i.e

hg fn-create --uri USK@/test.R1/0

Inserts the local hg repository into a new USK in Freenet, using the private key in your config file.

USK <--> Directory mappings

The extension's commands 'remember' the insert and request repository USKs they were last run with when run again from the same directory.

This makes it unnecessary to retype cumbersome repository USK values once a repository has been successfully pulled or pushed from a directory.

Aggressive top key searching

fn-pull and fn-push have an --aggressive command line argument which causes them to search harder for the latest request URI.

This can be slow, especially if the USK index is much lower than the latest index in Freenet.

You will need to use it if you're not using FMS update notifications.

[TOC]


Basic Usage

Here are examples of basic commands.

Generating a new private key

You can generate an new private key with:

hg fn-genkey

This has no effect on the stored default private key.

Make sure to change the 'SSK' in the InsertURI to 'USK' when supplying the insert URI on the command line.

Creating a new repository

hg fn-create --uri USK@/test.R1/0

Inserts the local hg repository into a new USK in Freenet, using the privatekey in your config file. You can use a full insert URI value if you want.

If you see an "update -- Bundle too big to salt!" warning message when you run this command you should consider running
fn-reinsert --level 4.

Pushing to a repository

hg fn-push --uri USK@/test.R1/0

Pushes incremental changes from the local directory into an existing Infocalypse repository.

The <keypart>/test.R1/0 repository must already exist in Freenet.In the example above the default private key is used. You could have specified a full Insert URI. The URI must end in a number but the value doesn't matter because fn-push searches for the latest unused index.

You can ommit the --uri argument whenyou run from the same directory the fn-create (or a previous fn-push)was run from.

Pulling from a repository

hg fn-pull --uri <request uri>

pulls from an Infocalypse repository in Freenet intothe local repository.
Here's an example with a fully specified uri.

You can ommit the --uri argument whenyou run from the same directory a previous fn-pull was successfully run from.

For maximum reliability use the --aggressive argument.

[TOC]


Using FMS to send and receive update notifications

The extension can send and receive repository update notifications via FMS. It is highly recommended that you
setup this feature.

The update trust map

There's a trust map in the .infocalypse/infocalypse.ini config file which determines which fms ids can update the index values for which repositories. It is purely local and completely separate from the trust values which appear in the FMS web of trust.

The format is:
<number> = <fms_id>|<usk_hash0>|<usk_hash1>| ... |<usk_hashn>

The number value must be unique, but is ignored.

The fms_id values are the full FMS ids that you are trusting to update the repositories with the listed hashes.

The usk_hash* values are repository hashes.

Here's an example trust map config entry:

# Example .infocalypse snippet
[fmsread_trust_map]
1 = test0@adnT6a9yUSEWe5p8J-O1i8rJCDPqccY~dVvAmtMuC9Q|55833b3e6419
0 = djk@isFiaD04zgAgnrEC5XJt1i4IE7AkNPqhBG5bONi6Yks|be68e8feccdd|5582404a9124
2 = test1@SH1BCHw-47oD9~B56SkijxfE35M9XUvqXLX1aYyZNyA|fab7c8bd2fc3

You must update the trust map to enable index updating for repos other than the one this code lives in (be68e8feccdd). You can edit the config file directly if you want.

However, the easiest way to update the trust map is by using the--trust and --untrust options on fn-fmsread.

For example to trust falafel@IxVqeqM0LyYdTmYAf5z49SJZUxr7NtQkOqVYG0hvITwto notify you about changes to the repository with repo hash 2220b02cf7ee,type:

hg fn-fmsread --trust --hash 2220b02cf7ee --fmsid falafel@IxVqeqM0LyYdTmYAf5z49SJZUxr7NtQkOqVYG0hvITw

And to stop trusting that FMS id for updates to 2220b02cf7ee, you would type:

hg fn-fmsread --untrust --hash 2220b02cf7ee --fmsid falafel@IxVqeqM0LyYdTmYAf5z49SJZUxr7NtQkOqVYG0hvITw

To show the trust map type:

hg fn-fmsread --showtrust

Reading other people's notifications

hg fn-fmsread -v

Will read update notifications for all the repos in the trust map and locally cache the new latest index values. If you run with -vit prints a message when updates are available which weren't used because the sender(s) weren't in the trust map.

hg fn-fmsread --list

Displays announced repositories from fms ids that appear inthe trust map.

hg fn-fmsread --listall

Displays all announced repositories including ones from unknown fms ids.

Pulling an announced repository

You can use the --hash option with fn-pull to pull any repository you see in the fn-read --list or fn-read --listall lists.

For example to pull the latest version of the infocalypse extension code, cd to an empty directory and type:

hg inithg fn-pull --hash be68e8feccdd --aggressive

Posting your own notifications

hg fn-fmsnotify -v

Posts an update notification for the current repository to fms.

You MUST set the fms_id value in the config fileto your fms id for this to work.

Use --dryrun to double check before sending the actual fms message.

Use --announce at least once if you want your USK to show up in the fmsread --listall list.

By default notifications are written to and read from the infocalypse.notify fms group.

The read and write groups can be changed by editing the following variables in the config file:

fmsnotify_group = <group>
fmsread_groups = <group0>[|<group1>|...]

fms can have pretty high latency. Be patient. It may take hours (sometimes a day!) for your notification to appear. Don't send lots of redundant notifications.

[TOC]


Reinserting and 'sponsoring' repositories

hg fn-reinsert

will re-insert the bundles for the repository that was last pulled into the directory.

The exact behavior is determined by the level argument.

level:

  • 1 - re-inserts the top key(s)
  • 2 - re-inserts the top keys(s), graphs(s) and the most recent update.
  • 3 - re-inserts the top keys(s), graphs(s) and all keys required to bootstrap the repo.

    This is the default level.

  • 4 - adds redundancy for big (>7Mb) updates.
  • 5 - re-inserts existing redundant big updates.

Levels 1 and 4 require that you have the privatekey for the repository. For other levels, the top key insert is skipped if you don't have the private key.

DO NOT use fn-reinsert if you're concerned about
correlation attacks. The risk is on the order of re-inserting a freesite, but may be worse if you use redundant(i.e. USK@<line noise>/name.R1/0) top keys.

[TOC]


Forking a repository onto a new USK

hg fn-copy --inserturi USK@/name_for_my_copy.R1/0

copies the Infocalypse repository which was fn-pull'd intothe local directory onto a new repository USK under your default private key. You can use a full insert URI if you want.

This only requires copying the top key data (a maximum of 2 SSK inserts).

[TOC]


Sharing private keys

It is possible for multiple people to collaborate anonymously over Freenet by sharing the private key to a single Infocalypse repository.

The FreeFAQ is an example of this technique.

Here are some things to keep in mind when sharing private keys.

  • There is no (explict) key revocation in Freenet

    If you decide to share keys, you should generate a special key on a per repo basis with fn-genkey. There is no way to revoke a private key once it has been shared. This could be mitigated with an ad-hoc convention. e.g. if I find any file named USK@<public_key>/revoked.txt, I stop using the key.
  • Non-atomic top key inserts

    Occasionally, you might end up overwriting someone elses commits because the FCP insert of the repo top key isn't atomic. I think you should be able to merge and re fn-push to resolve this. You can fn-pull a specific version of the repo by specify the full URI including the version number with --uri and including the --nosearch option.
  • All contributors should be in the fn-fmsread trust map

[TOC]


Inserting a freesite

hg fn-putsite --index <n>

inserts a freesite based on the configuration inthe freesite.cfg file in the root of the repository.

Use:

hg fn-putsite --createconfig

to create a basic freesite.cfg file that you can modify. Look at the comments in it for an explanation of the supported parameters.

The default freesite.cfg file inserts using the same private key as the repo and a site name of 'default'. Editing the name is highly recommended.

You can use --key CHK@ to insert a test version of the site to a CHK key before writing to the USK.

Limitations:

  • You MUST have fn-pushed the repo at least once in order to insert using the repo's private key. If you haven't fn-push'd you'll see this error: "You don't have the insert URI for this repo. Supply a private key with --key or fn-push the repo."
  • Inserts all files in the site_dir directory in the freesite.cfg file. Run with --dryrun to make
    sure that you aren't going to insert stuff you don't want too.
  • You must manually specify the USK edition you want to insert on. You will get a collision error
    if you specify an index that was already inserted.
  • Don't use this for big sites. It should be fine for notes on your project. If you have lots of images
    or big binary files use a tool like jSite instead.
  • Don't modify site files while the fn-putsite is running.

[TOC]


Risks

I don't believe that using this extension is significantly more dangerous that using any other piece of Freenet client code, but here is a list of the risks which come to mind:

  • Freenet is beta software
    The authors of Freenet don't pretend to guarantee that it is free of bugs that could that could compromise your anonymity or worse.

    While written in Java, Freenet loads native code via JNI (FEC codecs, bigint stuff, wrapper, etc.) that makes it vulnerable to the same kinds of attacks as any other C/C++ code.

  • FMS == anonymous software
    FMS is published anonymously on Freenet and it is written in C++ with dependencies on large libraries which could contain security defects.

    I personally build FMS from source and run it in a chroot jail.

    Somedude, the author of FMS, seems like a reputable guy and has conducted himself as such for more than a year.

  • correlation attacks
    There is a concern that any system which inserts keys that can be predicted ahead of time could allow an attacker with control over many nodes in the network to eventually find the IP of your node.

    Any system which has this property is vulnerable. e.g. fproxy Freesite insertion,Freetalk, FMS, FLIP. This extension's optional use of
    redundant top keys may make it particularly vulnerable. If you are concerned don't use '.R1' keys.

    Running your node in pure darknet mode with trusted peers may somewhat reduce the risk of correlation attacks.

  • Bugs in my code, Mercurial or Python
    I do my best but no one's perfect.

    There are lots of eyes over the Mercurial and Python source.

[TOC]


Advocacy

Here are some reasons why I think the Infocalypse 2.0 hg extension is better than
pyFreenetHg and
egit-freenet:

  • Incremental

    You only need to insert/retrieve what has actually changed. Changes of up to 32kof compressed deltas can be fetched in as little as one SSK fetch and one CHK fetch.

  • Redundant

    The top level SSK and the CHK with the representation of the repository state are inserted redundantly so there are no 'critical path' keys. Updates of up to ~= 7Mbare inserted redundantly by cloning the splitfile metadata at the cost of a single32k CHK insert.

  • Re-insertable

    Anyone can re-insert all repository data except for the top level SSKs with a simple command (hg fn-reinsert). The repository owner can re-insert the top levelSSKs as well.

  • Automatic rollups

    Older changes are automatically 'rolled up' into large splitfiles, such that the entire repository can almost always be fetched in 4 CHK fetches or less.

  • Fails explictly

    REDFLAG DCI

[TOC]


Source Code

The authoritative repository for the extension's code is hosted in Freenet:

hg inithg fn-fmsread -vhg fn-pull --aggressive --debug --uri USK@kRM~jJVREwnN2qnA8R0Vt8HmpfRzBZ0j4rHC2cQ-0hw,2xcoQVdQLyqfTpF2DpkdUIbHFCeL4W~2X1phUYymnhM,AQACAAE/infocalypse.hgext.R1/41hg update

It is also mirrored on bitbucket.org:

hg clone http://bitbucket.org/dkarbott/infocalypse_hgext/

[TOC]


Fixes and version information

  • hg version: c51dc4b0d282

    Fixed abort: <bundle_file> not found! problem on fn-pull when hg-git plugin was loaded.
  • hg version: 0c5ce9e6b3b4

    Fixed intermittent stall when bootstrapping from an empty repo.
  • hg version: 7f39b20500f0

    Fixed bug that kept fn-pull --hash from updating the initial USK index.
  • hg version: 7b10fa400be1

    Added fn-fmsread --trust and --untrust and fn-pull --hash support.


    fn-pull --hash isn't really usable until 7f39b20500f0
  • hg version: ea6efac8e3f6

    Fixed a bug that was causing the berkwood binary 1.3 Mercurial distribution
    (http://mercurial.berkwood.com/binaries/Mercurial-1.3.exe [HTTP Link!]) not to work.

[TOC]


Freenet-only links

This document is meant to inserted into Freenet.

It contains links (starting with 'CHK@' and 'USK@')to Freenet keys that will only work from within fproxy [HTTP link!].

You can find reasonably up to date version of this document on my freesite:

USK@-bk9znYylSCOEDuSWAvo5m72nUeMxKkDmH3nIqAeI-0,qfu5H3FZsZ-5rfNBY-jQHS5Ke7AT2PtJWd13IrPZjcg,AQACAAE/feral_codewright/15/infocalypse_howto.html

[TOC]


Contact

FMS:
djk@isFiaD04zgAgnrEC5XJt1i4IE7AkNPqhBG5bONi6Yks

I lurk on the freenet and fms boards.

If you really need to you can email me at d kar bott at com cast dot net but I prefer FMS.

freesite:
USK@-bk9znYylSCOEDuSWAvo5m72nUeMxKkDmH3nIqAeI-0,qfu5H3FZsZ-5rfNBY-jQHS5Ke7AT2PtJWd13IrPZjcg,AQACAAE/feral_codewright/15/

[TOC]


Install and setup infocalypse on GNU/Linux (script)

Install and setup infocalypse on GNU/Linux:

setup_infocalypse_on_linux.sh

Just download and run1 it via

wget http://draketo.de/files/setup_infocalypse_on_linux.sh_1_0.txt
bash setup_infocalypse*

This script needs a running freenet node to work!

In-Freenet-link: CHK@RZjy7Whe3vT3aEdox3pEG4fRbmRGsyuybPPhdvr7MoQ,g8YZO1~FAJM5suS7Uch06ugblVPE4YJd1rl15DxAwkY,AAMC--8/setup_infocalypse_on_linux.sh

The script allows you to get and setup the infocalypse extension with a few keystrokes to be able to instantly use the Mercurial DVCS for decentral, anonymous code-sharing over freenet.

« Real Life Infocalypse »
DVCS in the Darknet. The decentralized p2p code repository (using Infocalypse)

This gives you code hosting like a minimal version of BitBucket, Gitorious or GitHub but without the central control. Additionally the Sone plugin for freenet supplies anonymous communication and the site extension allows creating static sites with information about the repo, recent commits and such without the need of a dedicated hoster.

Basic Usage

Clone a repo into freenet with a new key:

hg clone localrepo USK@/repo

(Write down the insert key and request key after the upload! Localrepo is an existing Mercurial repository)

Clone a repo into or from freenet (respective key known):

hg clone localrepo freenet://USK@<insert key>/repo.R1/0
hg clone freenet://USK@<request key>/repo.R1/0 [localpath]

Push or pull new changes:

hg push freenet://USK@<insert key>/repo.R1/0
hg pull freenet://USK@<request key>/repo.R1/0

For convenient copy-pasting of freenet keys, you can omit the “freenet://” here, or use freenet:USK@… instead.

Also, as shown in the first example, you can let infocalypse generate a new key for your repo:

hg clone localrepo USK@/repo

mind the “USK@/” (slash after @ == missing key). Also see the missing .R1/0 after the repo-name and the missing freenet://. Being able to omit those on repository creation is just a convenience feature - but one which helps me a lot.

You can also add the keys to the <repo>/.hg/hgrc:

[paths]
example = freenet://USK@<request key>/repo.R1/0
example-push = freenet://USK@<insert key>/repo.R1/0
# here you need the freenet:// !

then you can simply use

hg push example-push

and

hg pull example

Contribute

This script is just a quick sketch, feel free to improve it and upload improved versions (for example with support for more GNU/Linux distros). If you experience any problems, please contact me! (i.e. write a comment)

If you want to contribute more efficiently to this script, get the repo via

hg clone freenet://USK@73my4fc2CLU3cSfntCYDFYt65R4RDmow3IT5~gTAWFk,Fg9EAv-Hut~9NCJKtGaGAGpsn1PjA0oQWTpWf7b1ZK4,AQACAAE/setup_infocalypse/1 

Then hack on it, commit and upload it again via

hg clone setup_infocalypse freenet://USK@/setup_infocalypse

Finally share the request URI you got.

Alternate repo: http://draketo.de/proj/setup_infocalypse


  1. On systems based on Debian or Gentoo - including Ubuntu and many others - this script will install all needed software except for freenet itself. You will have to give your sudo password in the process. Since the script is just a text file with a set of commands, you can simply read it to make sure that it won’t do anything evil with those sudo rights

AnhangGröße
setup_infocalypse_on_linux.sh.txt2.39 KB
setup_infocalypse_on_linux.sh.txt2.39 KB
setup_infocalypse_on_linux.sh_1.txt2.49 KB

Let us talk over Freenet, so I can speak freely again

I sent this email to many of my friends to regain confidential private communication. If you want to do the same, feel free to reuse the text-version (be sure to replace the noderef textblock with your own noderef from http://127.0.0.1:8888/friends/myref.txt).

About 10% of my friends joined - which is enough to build the darknet and makes it possible for me to speak freely again.

First: The Essence of this text:

I’ve been censoring my emails for years. Not just what I write, but also whom and when.

Freenet allows me to write invisible messages to my friends. Those are messages I do not need to censor. They give me freedom. Surveillance can show that we could write, but not whether, when or what we actually write. If Freenet is used for that, it needs very little resources.

This is how to connect:

  1. Download and install Freenet from https://freenetproject.org,
  2. in the automatically opened setup wizard select “only friends”
  3. Copy the textblock1 you got with my email and paste it into the textfield on http://127.0.0.1:8888/addfriend/
  4. Then just send me what Freenet shows on the page http://127.0.0.1:8888/friends/myref.txt (attach to an email or just copy it into the email)

As soon as I add, too, that we are connected. We can then write messages via the friends page (click my name):

  • Write message,: http://127.0.0.1:8888/friends/
  • Read messages: http://127.0.0.1:8888/alerts/

Hi,

I’ve been self-censoring what I write by email for years. But over the past year, with ever more details of surveillance being proven as fact and not just conspiracy theory, that became more serious: I no longer see email as safe, and with that, email is lost for me as a medium for personal communication. If I want to talk privately, I don’t use email.

You might have noticed that since then I’ve been writing fewer and fewer non-public emails.

This started impeding my life, when the critical law reporter at groklaw stopped publishing, because the owner did not consider sending information via email as safe anymore. Now I self-censor what I write, to whom I write, and when I write.

There is now no shield from forced exposure.2

But I have one haven left: Instead of writing private stuff by email, I’m communicating more and more via Freenet, especially with darknet contacts: People I know personally. And I’d like to do that with you, too. The reason is that Freenet Darknet messages hide even the information that we have a conversation at all:

I can finally send completely invisible messages.

This gives me the confidentiality back which allows talking freely. Talking without self-censoring every word I write.

And I would like to have that freedom when talking to you online. So I would be very happy if you’d install Freenet and connect to me over Darknet.

Install Freenet

To install Freenet, just go to https://freenetproject.org and click the green install-button

Then click through the installer as usual. After that your browser should open and show the Freenet Setup Wizard.

The Wizard

In the wizard, choose "Connect only to friends: (high security)".

For the following questions, just use the default or the option called "normal".

You can always revisit the wizard at http://127.0.0.1:8888/wizard/

Connect with me

Now go to the page “connect to a friend”: http://127.0.0.1:8888/addfriend/

There simply paste the following into the empty text field below the blurp of explanation (note: for this article I replaced the identifying info with X-es. Use your own from http://127.0.0.1:8888/friends/myref.txt):

identity=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
lastGoodVersion==XXXXXXXXXXXXXXXXXXXXXXX
location==XXXXXXXXXXXXXXXXXXXXXXXX
myName=XXXXXXX
opennet=XXXXX
sig=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
sigP256=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
version==XXXXXXXXXXXXXXXXXXXXXXX
ark.number=XXXX
ark.pubURI=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
auth.negTypes==XXXXXX
dsaGroup.g=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
dsaGroup.p=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
dsaGroup.q=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
dsaPubKey.y=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ecdsa.P256.pub=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
physical.udp==XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
End

Just put my name in the description above the “Add” button and leave everything else at default.

Then send me an email3 with the text you find at the URL http://127.0.0.1:8888/friends/myref.txt

Once I copy that text into my own addfriends page, our computers will connect over Freenet.

(no need to babysit Freenet for this: simply let it run when you’re online and as soon as I add you, our Computers will connect over Freenet. Please give me a few days: With the PhD and the two little ones I’m often no longer able to answer email daily, but I see them)

And that’s it. We’re connected. In the rest of this mail, I’ll describe what you can do with Freenet.

Welcome to Freenet, where no one can watch you read!

I hope we will connect soon!

Best wishes, Arne

Using Freenet

Talk with me over Freenet

Once we are connected, you can send me confidential messages by going on the Friends page and clicking my name.

Friends-page: http://127.0.0.1:8888/friends/

That page lists all the people you are connected to. You can also tick the checkbox for multiple people and then use the drop down list “– Select Action –” and select “Send N2NTM to selected peers”. A N2NTM is a “node to node text message”.

You can see all messages you received on the messages page:

Messages-page: http://127.0.0.1:8888/alerts/

These messages are invisible to the outside.

Send me files over Freenet

If you want to send me bigger files, you can upload them from the upload page:

Upload-page: http://127.0.0.1:8888/insertfile/

When they finish uploading, just go to the list of Uploads, select the files you want to share with me and click the button “Recommend files to friends”. Then select my name and click the “Recommend” button at the bottom.

List of Uploads: http://127.0.0.1:8888/uploads/

You can also do the same for downloads, so it’s easy to pass on files.

The files you upload are stored encrypted in Freenet and can only be found by people who have the Link to the file. Like a filehoster, but it is encrypted and completely decentralized.

Advanced Freenet Usage

What I show here aren’t all the features of Freenet. Not by a long shot. But it’s enough to provide confidential communication between friends:

I can talk to you without self-censoring every single thought.

If you want to explore further features of Freenet, there are three central features:

  • Bookmarks to have hidden websites which inform you when they are updated.
  • Your own website in Freenet.
  • Anonymous Discussions with a Web of Trust to prevent spam.

Bookmarks

Bookmarks are easy. Just go to the main freenet page and click the [Edit] link above the bookmarks. It gets you to the bookmarks editor for changing and sharing bookmarks.

Bookmark-editor: http://127.0.0.1:8888/bookmarkEditor/

Websites in Freenet

Websites in Freenet are also simple. To get a basic website, just install the ShareWiki plugin, enter text, click publish and once the upload finished, send the URL to your friends by clicking “share” in the list of uploads. With this you can publish in Freenet: Your friends will know that it’s your site, but no one else.

Configure Plugins: http://127.0.0.1:8888/plugins/ The key for sharewiki to add as “Plugin from Freenet”: CHK@aCQTjPQI3uGsahMiTuddwJ51UJypA5Mqg4y0tf1VqXQ,eEkO3uge6IJ1QcrT5KGlJ1R6kEcMhQV4rXfv6NzoL5o,AAMC–8/ShareWiki-b17.jar

(note: ignore the search box on the main page. It’s broken)

Anonymous Discussions

Anonymous Discussions are somewhat different from the other features, because they require the Web of Trust, and that is very heavyweight.

If you want to keep the resource consumption of Freenet low, avoid the anonymous discussion platforms.

You will see people recommend it - even me. It is cool, but you should only enable it, if you have a computer which always runs and for which it does not matter when it runs at high load.

If you only want confidential communication with Friends, just avoid the Web of Trust for now. If you stick to the basic features (darknet messages, uploads, downloads bookmarks), Freenet will require few resources and little bandwidth.

For a low-spec computer or a laptop, avoid the Web of Trust and anonymous discussions: They are really cool, but still require lots of resources.

If you value truly anonymous discussions higher than keeping the load on your computer low, or if you have a computer which is always running, have a look at the Freenet Social Networking guide. It shows you how to setup and use the social features of Freenet.

Freenet Social Networking Guide: http://freesocial.draketo.de

Have fun in Freenet!

Troubleshooting

High resource usage

If Freenet makes your fans run at full speed and your disk cackle, you can fix that with three steps:

Technical details


  1. Censored version of my textblock (you’ll get an uncensored version by email) identity=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    lastGoodVersion==XXXXXXXXXXXXXXXXXXXXXXX
    location==XXXXXXXXXXXXXXXXXXXXXXXX
    myName=ArneBab
    opennet=false
    sig=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    sigP256=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    version==XXXXXXXXXXXXXXXXXXXXXXX
    ark.number=XXXX
    ark.pubURI=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    auth.negTypes==XXXXXX
    dsaGroup.g=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    dsaGroup.p=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    dsaGroup.q=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    dsaPubKey.y=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    ecdsa.P256.pub=XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    physical.udp==XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
    End 

  2. Groklaw: Forced exposure 

  3. Naturally it would be better to send the freenet addfriend text via encrypted emails with a full chain of trusted signatures. But for the basic goal of confidential communication that is not necessary. We can check sometime later, whether the text we exchanged was changed, so if someone wants to eavesdrop, we can detect that. And we would have proof, which would make for the next great story for political magazines like Panorama - which would help a lot at fighting surveillance on the long term (so it’s unlikely that people who want surveillance will dare to do that). Example: “NSA targets privacy conscious” in german public media documentation

AnhangGröße
connect-over-freenet-01.txt11.32 KB

Lots of site uploads into freenet

I just finished lots of new uploads of sites into freenet - with the new freesitemgr (which actually uploads quickly when WoT is disabled, check todays IRC-logs tomorrow to get background on that). You can get the new freesitemgr from github.com/ArneBab/lib-pyfreenet-staging or via infocalypse:

hg clone freenet://USK@kDVhRYKItV8UBLEiiEC8R9O8LdXYNOkPYmLct9oT9dM%2CdcEYugEmpW6lb9fe4UzrJ1PgyWfe0Qto2GCdEgg-OgE%2CAQACAAE/pyfreenet.R1/14 

The sites are also available via my freenet inproxy:

freenet-team - an introduction of most of the freenet hackers I know.

mathmltest - example of mathml in freenet.

winterface-deadlines - deadlines for the Winterface GSoC project

freenet-funding - the freenet fundraising plan, still lacking good design and crisp presentation slides or a video

freenet-meltdown - on the recent massive performance degradation which lasted a few month and ended with the link length fix.

fix-link-length - background on the link-length fix which made freenet actually do small world routing again instead of random routing (into which it had degraded, partially due to local requests, partially due to having so many peers per node that random routing actually worked for the current network size, so the pressure by routing-success to go back to small world routing was too weak compared to the pressure from local requests to randomize the connections)

download-web-site - how to download a single page from a website - for example to mirror it into freenet. Hint: For all the sites on draketo.de or 1w6.org you are allowed to do so freely (licensed under GPL).

guiledocs - the online documentation for GNU Guile with a focus on Scheme (using Guile): A powerful lisp-like language with multiple implementations.

decorrespondent-metadata - experiment how much information one can glean about your life from just one week of metadata, in dutch.

netzpolitiz-metadaten - same article translated to german. License: cc by-nc-sa

Adventures of a Pythonista in Schemeland - the adventures of a Pythonista in Schemeland: A deep understanding of Scheme for Python users. I learned to love Scheme with this. BSD license.

programming-languages - The Programming languages lecture. License: cc by-nc-sa

tao of programming - "When you have learned to snatch the error code from the trap frame, it will be time for you to leave."

On the 2014 freenet-meltdown

Update (2014-09-06): The meltdown is stopped and reversed. We implemented the link length fix and this solved an issue with the network structure we had for the last few years. We’re currently watching anxiously, whether the performance only comes back to the performance before the backdown or whether the lifetime actually gets much better. Watch the fetch-pull stats!

Current Fetch Performance, 1 day

^ inserted one day ago: You see the meltdown starting in april and the improvement with the latest version: It’s back on at least the level before the meltdown.

Current Fetch Performance, 4/2 weeks

^ 4 weeks ago inserted, 2 weeks ago accessed. If this goes above 0.6 starting 2014-09-19, the improvement could prove to be huge: It could give us much longer lifetimes of data in freenet.

Update (2014-07-23): The fetch-pull graphs look like we have oscillating here. This could mean that this is NOT an attack, but rather the direct effect of the KittyPorn patches: First the good connections get broken. This ruins the network. Then they can’t get any worse and the network recovers. Then they break again. This is still speculative. For an up to date plot, see fetchplots1.

Update (2014-05-22): The performance stats are much better again and the link-length distribution recovered. We might have been hit by an attack after all (which failed to take down freenet, but hurt pretty much). With some luck we’ll soon see a paper published with the evaluation of the attack and ways to mitigate it cleanly. (hint to computer scientists: we link to papers from the freenetproject.org website, so if you want to get a small boost for your chances of citation, send the link to your paper to devl@freenetproject.org)

Summary: There is a freenet patch floating around which claims to increase performance. The reality is (to our current-knowledge), that it breaks the network as soon as more than a few percent run it. And this is the case, which is why the network is almost completely broken right now. If you run that patch, please get rid of it!

Freenet is currently experiencing a meltdown, with extremely slow downloads, high connection churn and lifetimes for bigger files down to about a day. For a visualization, see the fetch-performance in the following graph and take note of the drop at the end. It nicely shows how a bad patch spread while more and more users installed it (hoping for better performance) and slowly killed the network. When that line goes below 50%, bigger files are dead approximately one day after being uploaded.

Fetch Performance (thanks for these stats goes to fetchpull from digger3)

We suspect that patch, because the number of nodes reporting 100 or more connections in the anonymised probe-stats increased a lot over the past few weeks (this is only possible with a patched freenet) and the link-length-distribution almost completely lost a bump it had at 0.004, suggesting that freenet essentially reverted to random routing, while the number of nodes did not change significantly.

connections per node
link length distribution
number of freenet nodes which report stats
(thanks for these stats goes to probe stats which operhiem1 implemented in Google Summer of Code 2012)

We are working on creating a clean solution.

Freesites still work, because the SSK-queue did not get hammered, so if you are a freesite author, please inform your readers about the patch and ask them to get rid of it!

In case you use freenet and want information on that patch, please read the note from TheSeeker:

Information from TheSeeker in IRC (#freenet @ freenode)

Recently Kittyporn released an autopatcher-script: CHK@r6dUGAYs2No4lWT3DTkY2dIYdgA-eoBAcU~U-kLU-0I,hxGN5OTN4j~04YnCS4UTflMK4fpW2hfhB58CU1KNRAw,AAMC--8/FNAutoPatch-1.0_by_Kittyporn.rar

This increased usage of the patch by probably several hundred nodes, judging by the partial logs from the webserver that we have for fetches of the source tarball.

The script stupidly pulls the freenet source from freenetproject.org rather than say, github, or freenet. Really bad for anonymity, but good for tracking.

logs only go back a couple weeks, which is why they are incomplete, and we don't know the real number of people that have run it. hard to tell how much less the people that are cheating feel the effects of the whole network collapsing around them. surely can't be long before they too start complaining about speeds given the data retention issues it's causing.

NLM was supposed to fix all this shit. :|

modified nodes are flooding the network, creating broad backoff issues. this makes routing suffer, and avg path lengths increase, which reduces overall availability of bandwidth and more backoff and more misrouting. Death spiral until we hit some equilibrium that is roughly equal to random routing.

essentially what the broken NLM did. thankfully, it is only routing for bulk chk, so it'll still be possible to do some things if forced through the realtime queue... e.g. if we want to deploy an update, and have the constituent blocks actually get routed anywhere near the correct destination...

Additional comment

To do the math: a few hundred users easily equals 10% of the network. No wonder we have a meltdown.

and even worse, these few hundred users are likely the high-bandwidth folks with a huge number of connections.

Let’s assume that they each have 40 connections while the others have ~10. Every node connected to such an abusive node will essentially be blocked. That’s 100% of the nodes…

40 other nodes wrecked × 10% = ouch!

AnhangGröße
fetchpull-stats-1148-fetchplots1.png43.8 KB
probe-stats-489-plot_link_length.png6.67 KB
probe-stats-489-plot_peer_count.png7.64 KB
probe-stats-489-year_900x300_plot_network_size.png26.24 KB
fetchpull-stats-1228-fetchplots1.png46.38 KB

Real Life Infocalypse

Freenet Logo: Follow the Rabbit DVCS in the Darknet. The decentralized p2p code repository.

In this guide I show by example how you can use the Infocalypse plugin for distributed development without central point of failure or reliance on unfree tools.12

If you think “I have no idea what this tool is for” (like this reddit commenter): Infocalypse gives you fully decentralized Github with real anonymity, using only free software.

# freenet -> local
hg clone freenet://ArneBab/life-repo
# local -> freenet
hg clone life-repo real-life
hg clone real-life freenet://ArneBab/real-life
# send pull request
cd real-life
hg fn-pull-request --wot ArneBab/life-repo
# check for pull-requests
cd ../life-repo
hg fn-check-notifications --wot ArneBab

If you like this, please don’t only click like or +1, but share it with everyone who could be interested. The one who knows best how to reach your friends is you — and that’s how it should be.

Setup

(I only explain the setup for GNU/Linux because that’s what I use. If you want Infocalypse for other platforms, come to the #freenet IRC channel so we can find the best way to do it)

Freenet Setup

Install and start Freenet. This should just take 5 minutes.

Then activate the Web of Trust plugin and the Freemail plugin. As soon as your Freenet is running, you’ll find the Web of Trust and Freemail plugins on the Plugins-Page (this link will work once you have a running Freenet. If you want to run Freenet on another computer, you can make it accessible to your main machine via ssh port forwarding: ssh -NL 8888:localhost:8888 -L 9481:localhost:9481 <host>).

Now create a new Pseudonym on the OwnIdentities-page.

Infocalypse Setup

Install Mercurial, defusedxml, PyYAML for Python2. The easiest way of doing so is using easy_install from setuptools:

cd ~/
echo '
export PATH="${PATH}:~/.local/bin:~/bin"
export PYTHONPATH="${PYTHONPATH}:~/.local/lib64/python2.7:~/.local/lib/python2.7"
export PYTHONPATH="${PYTHONPATH}:~/lib/python2.7:~/lib64/python2.7"
' >> ~/.bashrc
source ~/.bashrc
wget https://bootstrap.pypa.io/ez_setup.py -O - | python2.7 - --user
easy_install --user Mercurial defusedxml PyYAML pyFreenet

Then get and activate the Infocalypse extension:

hg clone https://bitbucket.org/ArneBab/infocalypse
echo '[extensions]' >> ~/.hgrc
echo 'infocalypse=~/infocalypse/infocalypse' >> ~/.hgrc

Infocalypse with Pseudonym

Finally setup Infocalypse for the Pseudonym you just created. The Pseudonym provides pull-requests and for shorter repository URLs.1

hg fn-setup --truster <Nickname of your Web of Trust Pseudonym>
hg fn-setupfreemail --truster <Nickname of your Web of Trust Pseudonym>

That’s it. You’re good to go. You can now share your code over Freenet.

Welcome to the Infocalypse!

Example

This example shows how to share code over Freenet (using your Pseudonym instead of ArneBab).
# Create the repo
hg init life-repo
cd life-repo
echo "my" > life.txt
hg commit -Am "first steps"
cd ..

# Share the repo
hg clone life-repo freenet://ArneBab/life-repo

# Get a repo and add changes
hg clone freenet://ArneBab/life-repo real-life
cd real-life
echo "real" > life.txt
hg commit -m "getting serious"

# Share the repo and file a pull-request
hg clone . freenet://ArneBab/real-life
# the . stands for "the current folder"
hg fn-pull-request --wot ArneBab/life-repo # enter a message
cd ..

# Check for pull-requests and share the changes
cd life-repo
hg fn-check-notifications --wot ArneBab
hg pull -u freenet://ArneBab/real-life
hg push freenet://ArneBab/life-repo

Privacy Protections

Infocalypse takes your privacy seriously. When you clone a repository from freenet, your username for that repository is automatically set to “anonymous” and when you commit, the timezone is faked as UTC to avoid leaking your home country.

If you want to add more security to your commits, consider also using a fake time-of-day:

hg commit -m "Commit this sometime today" --date \
   "$(date -u "+%Y-%m-%d $(($RANDOM % 24)):$(($RANDOM % 60)):$(($RANDOM % 60)) +0000")"

Open path/to/repo-from-freenet/.hg/hgrc to set this permanently via an alias (just adapt the alias for rewriting the commit-date to UTC - these are already in the hgrc file if you cloned from Freenet).

Background Information

Let’s look at a few interesting steps in the example to highlight the strengths of Infocalypse, and provide an outlook with steps we already took to prepare Infocalypse for future development.

Efficient storage in Freenet

hg clone life-repo freenet://ArneBab/life-repo

Here we clone the local repository into Freenet. Infocalypse looks up the private key from the identity ArneBab. Then it creates two repositories in Freenet: <private key>/life-repo.R1/0 and <private key>/life-repo.R0/0. The URLS only differ in the R1 / R0: They both contain the same pointers to the actual data, and if one becomes inaccessible, the chances are good that the other still exists. Doubling them reduces the chance that they fall out and become inaccessible, which is crucial because they are the only part of your repository which does not have 100% redundancy. Also these pointers are the only part of the repository which only you can insert. As long as they stay available, others can reinsert the actual data to keep your repository accessible.

To make that easy, you can run the command hg fn-reinsert in a cloned repository. It provides 5 levels:

  • 1 - re-inserts the top key(s)
  • 2 - re-inserts the top keys(s), graphs(s) and the most recent update.
  • 3 - re-inserts the top keys(s), graphs(s) and all keys required to bootstrap the repo (default).
  • 4 - adds redundancy for big (>7Mb) updates.
  • 5 - re-inserts existing redundant big updates.
To reinsert everything you can insert, just run a tiny bash-loop:

for i in {1..5}; do hg fn-reinsert --level $i; done

Let’s get to that “actual data”. When uploading your data into Freenet, Infocalypse creates a bundle with all your changes and uploads it as a single file with a content-dependent key (a CHK). Others who know which data is in that bundle can always recreate it exactly from the repository.

When someone else uploads additional changes into Freenet, Infocalypse calculates the bundle for only the additional changes. This happens when you push:

hg push freenet://ArneBab/life-repo

To clone a repository, Infocalypse first downloads the file with pointers to the data, then downloads the bundles it needs (it walks the graph of available bundles and only gets the ones it needs) and reassembles the whole history by pulling it from the downloaded bundles.

hg clone freenet://ArneBab/life-repo real-life

By reusing the old bundles and only inserting the new data, Infocalypse minimizes the amount of data it has to transfer in and out of Freenet, and more importantly: Many repositories can share the same bundles, which provides automatic deduplication of content in Freenet. When you take into account that in Freenet often accessed content is faster and more reliable than seldomly accessed content, this gives Infocalypse a high degree of robustness and uses the capabilities of Freenet in an optimal way.

If you want to go into Infocalypse-specific commands, you can also clone a repository directly to your own keyspace without having to insert any actual data yourself:

hg fn-copy --requesturi USK@<other key>/<other reponame>.R1/N \
   --inserturi USK@<your key>/<your reponame>.R1/N

Pull requests via anonymous Freemail

Since the Google Summer of Code project from Steve Dougherty in 2013, Infocalypse supports sending pull-requests via Freemail, anonymous E-Mail over Freenet.

hg fn-pull-request --wot ArneBab/life-repo # enter a message
hg fn-check-notifications --wot ArneBab

This works by sending a Freemail to the owner of that repository which contains a YAML-encoded footer with the data about the repository to use.

You have to trust the owner of the other repository to send the pull-request, and the owner of the other repository has to trust you to receive the message. If the other does not trust you when you send the pull-request, you can change this by introducing your Pseudonym in the Web of Trust plugin (this means solving CAPTCHAs).

Convenience

To make key management easier, you can add the following into path/to/repo/.hg/hgrc

[paths]
default = freenet://ArneBab/life-repo
real-life = freenet://ArneBab/real-life

Now pull and push will by default go to freenet://ArneBab/life-repo and you can pull from the other repo via hg pull real-life.

Your keys are managed by the Web of Trust plugin in Freenet, so you can use the same freenet-uri for push and pull, and you can share the paths without having to take care that you don’t spill your private key.

DVCS WebUI

When looking for repositories with the command line interface, you are reliant on finding the addresses of repositories somewhere else. To ease that, Steve also implemented the DVCS WebUI for Freenet during his GSoC project. It provides a web interface via a Freenet plugin. In addition to providing a more colorful user interface, it could add 24/7 monitoring, walking remote repositories and pre-fetching of relevant data to minimize delays in the command line interface. It is still in rudimentary stages, though.

All the heavy lifting is done within the Infocalypse Mercurial plugin: Instead of implementing DVCS parsing itself, The DVCS WebUI asks you to connect Infocalypse so it can defer processing to that:

hg fn-connect

The longterm goal of the DVCS WebUI is to use provide a full-featured web interface for repository exploration. The current version provides the communication with the Mercurial plugin and lists the paths of locally known repositories.

You can get the DVCS WebUI from http://github.com/Thynix/plugin-Infocalypse-WebUI

Gitocalypse

If you prefer working with git, you can use gitocalypse written by SeekingFor to seamlessly use Infocalypse repositories as git remotes. Gitocalypse is available from https://github.com/SeekingFor/gitocalypse

The setup is explained in the README.

Troubleshooting

  • When I'm running "hg fn-setup" I get the error "abort: No module named fcp.node"
    Do you have pyFreenet installed? Also ensure that you installed it for python 2.
    wget bootstrap.pypa.io/ez_setup.py -O - | python2.7 - --user
    easy_install --user Mercurial defusedxml PyYAML pyFreenet

Conclusion

Infocalypse provides hosting of repositories in Freenet with a level of convenience similar to GitHub or Bitbucket, but decentralized, anonymous and entirely built of Free Software.

You can leverage it to become independent from centralized hosting platforms for sharing your work and collaborating with other hackers.


  1. This guide shows the convenient way of working which has a higher barrier of entry. It uses WoT Pseudonyms to allow you to insert repositories by Pseudonym and repository name. If you can cope with inserting by private key and sending pull-requests manually, you can use it without the WoT, too, which reduces the setup effort quite a bit. Just skip the setup of the Web of Trust and Freemail and plugins. You can then clone the life repo via hg clone freenet://USK@6~ZDYdvAgMoUfG6M5Kwi7SQqyS-gTcyFeaNN1Pf3FvY,OSOT4OEeg4xyYnwcGECZUX6~lnmYrZsz05Km7G7bvOQ,AQACAAE/life-repo.R1/4 life-repo. See hg fn-genkey and hg help infocalypse for details. 

  2. Infocalypse shows one of many really interesting possibilities offered by Freenet. To get a feeling of how much more is possible, have a look at The Forgotten Cryptopunk Paradise

Spread Freenet: A call for action on identi.ca and twitter

“Daddy, where were you, when they took the freedom of the press away from the internet?” — Mike Godwin, Electronic Frontier Foundation

Reposted from Freetalk, the distributed pseudonymous forum in Freenet.

For all those among you, who use twitter1 and/or identi.ca2, this is a call to action.

Go to your identi.ca or twitter accounts and post about freenet. Tell us in 140 letters why freenet is your tool of choice, and remember to use the !freenet group (identi.ca) or #freenet hashtag (twitter), so we can resend your posts!

I use !freenet because we might soon need it as safe harbour to coordinate the fight against censorship → freenetproject.org !zensur — ArneBab

The broader story is the emerging concept of a right to freely exchange arbitrary data — Toad (the main freenet developer)

Background

There are still very many people out there who don’t know what freenet is. Just today a coder came into the #freenet IRC channel, asked what it did and learned that it already does everything he had thought about. And I still remember someone telling me “It would be cool if we had something like X-net from Cory Doctorow’s ‘Little Brother’” — he did not know that freenet already offers that with much improved security.

So we need to get the word out about freenet. And we have powerful word to choose from, beginning with Mike Godwin’s cite above but going much further. To just name a few buzz-words: Freenet is a crowdfunded distributed and censorship resistant freesoftware cloud publishing system. And different from info about corporate PR-powered projects, all these buzz words are true.

But to make us effective, we need to achieve critical mass. And to reach that, we need to coordinate and cross promote heavily.

Call to action

So I want to call to you to go to your identi.ca or twitter accounts and post about freenet. Tell us in 140 letters why freenet is your tool of choice, and remember to use the !freenet group or #freenet hashtag, so we can find and retweet your posts!

If you use identi.ca, join the !freenet group, so you get informed about new freenet-posts automatically.

We can make a difference, if we fight together.

And if you always wanted to get an identi.ca account, here’s the opportunity to get it and do something good at the same time :)

If you already have a twitter-account, you can connect your identi.ca account to your twitter account, then post to identi.ca and have your post forwarded to twitter automatically.

Additional info

Besides: My accounts are:

But no need to tell me your account and connect your Freetalk ID with it. Just use identi.ca or twitter and remember to tell your friends to talk about freenet, too (so we can’t find out who read this post and who decided to join in because he learned about the action from a friend).

As second line of defense, I also posted this message to my website and hereby allow anyone to reuse it in any form and under any license (up to the example tweets), so I can’t know who saw it here and who saw it elsewhere.

http://draketo.de/light/english/spread-freenet-a-call-to-action-on-twitter-and-identica

I hope I’ll soon see floods of entusiastic tweets and dents about Freenet!

Some example tweets and/or dents

I’ll gladly post and link yours here, if you allow it!

!Freenet: #crowdfunded distributed and censorship resistant !freesoftware cloud publishing → http://freenetproject.org — rightful buzz! — ArneBab

#imhappiestwhen when the internet is free. I hope it will remain so thanks to projects like #Freenet http://t.co/GMRXmDtGaming4JC

#freenet: freedom to publish that you may have to rely on, because censorship and ©ensorship are on the rise — Ixoliva


  1. Twitter is a service for sending small text messages to people who “follow” you (up to 140 letters), so it works like a newsticker of journalists. Sadly it is no free software, so you can’t trust them to keep your data or even just the service available. It’s distinctive features are hashtags (#blafoo) for marking and searching messages and retweeting for passing a message on towards people who read your messages. 

  2. identi.ca is like twitter and offers the same features and a few more advanced ones, but as a decentral free software system where everyone can create his own server and connect it to others. When using identi.ca, you make yourself independent from any single provider and can even run the system yourself. And it is free to stay due to using the AGPL (v3 or later). 

USK and Date-Hints: Finding the newest version of a site in Freenet's immutable datastore

Freenet provides a global, anonymous datastore where you can upload sites which then work like normal websites. But different from websites, they have a version-number.

The reason for this is, that you can only upload to a given key once1. This data then gets stored in the network and is effectively immutable (much like immutable data structures in functional programming).

In this model conflicts can arise from uploads of different users and from uploads of different versions of the site.

Avoid conflicts between users

So what if Alice uploads the file gpl.txt, and then Mallory tries to upload it again before users get the upload from Alice?

To avoid these conflicts between users, you can upload to an address defined by a key-pair. That key-pair has two keys, a public and a privat one. The URL of the site is derived from the public key. Everyone who has this URL can access the site. The private one allows uploading new data to the site. Only the owner of the private key can upload files to the site. This is the SSK: The Signed Subspace Key. It defines a space in Freenet which only you can update.

An SSK looks like this: SSK@[key]/[sitename]/[path/to/file]

Avoid conflicts between versions

But now what if Alice wants to upload a new version of gpl.txt - say GPLv3?

To avoid conflicts between different versions, each new version gets a unique identifier. The reason for using version numbers and not some other identifier is historical: To update sites despite not being able to rewrite published data, freenet users started to version their sites by simply appending a number to the name and then adding small images for future versions. If these images showed up, the new version existed.2

Most sites in freenet had a section like this (the images might take a bit to load - they are downloaded from a freenet outproxy):

technophob technophob-116technophob technophob-117technophob technophob-118technophob technophob-119

At some point, the freenet developers decided to integrate that function into freenet. They added a new key-type: The Updatable Subspace Key, in short: USK.

A USK looks like this: USK@[key]/[sitename]/[version]/[path/to/file]

If you enter a USK, freenet automatically checks for newer versions and then shows you the most recent version of the site.

As a practical example:

technophob
technophob

Note that this link will automatically get you to version 117 (or whatever version is the current one when you read this article), even though it has version 116 in its URL.

Internally the USK simply gets translated to an SSK in the form of SSK@[key]/[sitename]-[version]/[path/to/file]. You’ll surely recognize the scheme which is used here.

This is a prime example of demand-driven development: Users found a way to make sites dynamic with the activelink-hack. Then the Freenet developers added this as official feature. As nice side-effect, the activelink-images stayed with us as part of the Freenet Culture: Almost every site in freenet has a small logo with width and height 108x36 (pixels).

Date-Hints

USKs solved the problem of having updatable sites by checking some versions into the future. But they had a limitation: If your USK-Link was very old, freenet would have to check hundreds or even thousands of URLs to find the newest version. And this would naturally be very, very slow. Due to the distributed nature of Freenet, it is also not possible to just list all files under a given Key. You can only check for directories - the sitenames.

Also files in Freenet only stay available when people access them - but checking to see whether some file might still be accessible isn’t a defined problem: The data to that file could be on the computer of someone who is currently offline. When he or she comes online again, the file could suddenly be available, so determining whether a file does not exist isn’t actually possible.

A timeline of versions could look like this:

200920102011201220132014
1,2,34,567,8,9,10,11,12,13,141516,17,18

Now imagine that you find a link on a site which was added in 2010. It would for example link to version 4 of the site. If you access this site in 2014, freenet has to check versions 5,6,7,8...18 to find the most recent version. That requires 13 downloads - and for normal freesites the versions can be as high as 1200.

But remember that you can upload to arbitrary filenames. So what if the author of the site gave you a hint of the first version in 2014? With that, freenet would only have to start at version 16 - just 3 versions to check, and the hint.

Why the first? Remember that files cannot be overwritten, so the author cannot give you the most recent version in 2014.

And this is just what the freenet developers did: Date-Hints are simply files in freenet which contain the information about the most recent version of the site at some point in time.

The datehint keys look like this: SSK@[key]/[sitename]-DATEHINT-[year]

The file found at this key is a simple plain text file with content like the following:

HINT
46
2013-7-5

The first line is the identifier, the second is the most recent version at the time of insert (the first version in the year) and the last is the date of the upload of that version.

A yearly date-hint speeds up getting the most recent version a lot. But since sites in freenet have hundreds of versions rather then tens, it is a bit too coarse. It can still leave you with 20 or 30 possible new versions. So it actually provides additional date hints on a monthly, weekly and daily basis:

  • SSK@[key]/[sitename]-DATEHINT-[year]
  • SSK@[key]/[sitename]-DATEHINT-[year]-WEEK-[week]
  • SSK@[key]/[sitename]-DATEHINT-[year]-[month]
  • SSK@[key]/[sitename]-DATEHINT-[year]-[month]-[day]

If you give freenet a USK-link, it starts on the order of 10 requests: 4 date hints with the current date and requests for versions following the version in the link. Normally it gets a result in under 10 seconds.

Conclusion

With USKs and Date-Hints Freenet implements updatable sites with acceptable performance in its anonymous datastore with effectively immutable data.

If you want to see it for yourself, come to freenetproject.org and install freenet. It’s free software and available for Windows, Linux and MacOSX.


  1. If you try to upload to a given key twice, you can get collisions. In that case, it isn’t clear which data a client will retrieve - similar to race conditions in threaded programs. That’s why we do not write to the same key twice in practice (though there is a key-type which can be used for passwords or simple file-names. It is called KSK and was the first key-type freenet provided. That led to wars on overwriting files like gpl.txt - similar to the edit-wars we nowadays get on Wikipedia, but with real anonymity thrown in ☺). 

AnhangGröße
technophob-activelink.png5.25 KB
freenet-logo.png2.26 KB

What can Freenet do well already?

From the #freenet IRC channel at freenode.net:

toad_1: what can freenet do well already?

  • sharing and retrieving files asynchronously, freemail, IRC2, publishing sites without need of a central server, sharing code repositories

  • I can simply go online, upload a file, send the key to a friend and go offline. the friend can then retrieve the file, even though I am already offline without needing a central server.

  • and nobody can eavesdrop.

  • it might be kinda slow, but it actually makes it easy to publish stuff: via jSite, floghelper and others.

  • floghelper is cool: spam-resistant anonymous blogging without central server

  • and freereader is, too (even though it needs lots of polish): forward RSS feeds into freenet

  • you can actually exchange passwords in a safe way via freemail: anonymous email with an intergrated web-interface and imap access.

    • Justus and me coordinated the upload of the social networking site onto my FTP solely over freemail, and I did not have any fear of eavesdropping - different from any other mail I write.

… I think I should store this conversation somewhere

which I hereby did - I hope you enjoyed this little insight into the #freenet channel :)

And if you grew interested, why not install freenet yourself? It only takes a few clicks via webstart and you’re part of the censorship-resistant web.


  1. toad alias Matthew Toseland is the main developer of freenet. He tends to see more of the remaining challenges and fewer of the achievements than me - which is a pretty good trait for someone who builds a system to which we might have to entrust our basic right of free speech if the world goes on like this. From a PR perspective it is a pretty horrible trait, though, because he tends to forget to tell people what freenet can already do well :) 

  2. To setup the social networking features of Freenet, have a look at the social networking guide 

Wrapup: Make Sone scale - fast, anonymous, decentral microblogging over freenet

Sone1 allows fast, identi.ca-style microblogging in Freenet. This is my wrapup on a discussion on the steps to take until Sone can become an integral part of Freenet.

Current state

  • Is close to realtime.

  • Downloads all IDs and all their posts and replies → polling which won’t scale; short term local breakage.

  • Uploads all posts on every update → Can displace lots of content. Effective Size: X*M, X = revisions which did not drop out, M = total number of your messages. Long term self-DDoS of freenet.

Future

  • Is close to realtime for those you follow and your usual discussion group.

  • Uploads only recent posts directly and bundles older posts → much reduced storage need: Effective size: B * Z + Y*M; B = posts per bundle, Z = number of bundles which did not drop out, Y = numbers of not yet bundled messages; Z << Y, B << X, Y << X.

  • Downloads only the ones you follow + ones you get told about. Telling others means that you need to include info about people you follow, because you only get information from them.

Telling others about replies, options

  • Include all replies to anyone which I see in my own Sone → size rises massively, since you include all replies of all people you follow in your own Sone.

  • Include all IDs from which you saw replies along with the people they replied to → needs to poll more IDs. Optionally forward that info for several hops → for efficient routing it needs knowledge about the full follower topology, which is a privacy risk.

  • Discovering replies from people you don’t know yet: Add a WoT info: replies. Updated only when you reply to someone you did not reply to before. Poll people’s reply lists based on their WoT rating. Keep a list of people who answered one of your posts and poll these more often. Maybe poll people instantly who solve one of your captchas (your general captcha queue) → new users can enter quickly. When you solve captchas in WoT, preferably solve those from people you follow.
    → four ways to discover a reply:

    1. poll those you follow,
    2. poll the people who posted the latest replies to you (your usual discussion-group),
    3. poll those who solve one of your captchas (get new people in as fast as possible) and
    4. poll the replies-info from everyone with the polling frequency based on their WoT rating.

  1. You can find Sone in Freenet using the key USK@nwa8lHa271k2QvJ8aa0Ov7IHAV-DFOCFgmDt3X6BpCI,DuQSUZiI~agF8c-6tjsFFGuZ8eICrzWCILB60nT8KKo,AQACAAE/sone/38/ 

“regarding B.S. like SOPA, PIPA, … freenet seems like a good idea after all!”

“Some years ago, I had a look at freenet and wasn't really convinced, now I'm back - a lot has changed, it grew bigger and insanely fast (in freenet terms), like it a lot, maybe this time I'll keep it. Especially regarding B.S. like SOPA, PIPA and other internet-crippling movements, freenet seems like a good idea after all!”
— sparky in Sone

So, if you know freenet and it did not work out for you in the past, it might be time to give it another try: freenetproject.org

This quote just grabbed me, and sparky gave me permission to cite it.

Freenet: WoT, database error, recovery patch

I just had a database error in WoT (the Freenet generic Web of Trust plugin) and couldn’t access one of my identities anymore (plus I didn’t have a backup of its private keys though it told me to keep backups – talk about carelessness :) ).

I asked p0s on IRC and he helped me patch together a WoT which doesn’t access the context for editing the ID (and in turn misses some functionality). This allowed me to regain my IDs private key and with that redownload my ID from freenet.

I didn’t want that patch rotting on my drive, so I uploaded it here: disable-context-checks-regain-keys.path

Applied to revision 4f84492d277e25618003e0e5a0cb14159a50535d of WoT staging.

Essentially it just comments out some stuff.

AnhangGröße
disable-context-checks-regain-keys.path3.79 KB

Mercurial

Mercurial is a distributed source control management tool.

Mercurial links:
- Mercurial Website.
- bitbucket.org - Easy repository publishing.
- Hg Init - A very nice Mercurial tutorial for newcomers.

With it you can save snapshots of your work on documents and go back to these at all times.

Also you can easily collaborate with other people and use Mercurial to easily merge your work.

Someone changes something in text file you also worked on? No problem. If you didn't work on the same line, you can simply let Mercurial do an automatic merge and your work will be joined. (If you worked on the same line you'll naturally have to select how you want to merge these two changes).

It doesn't need a network connection for normal operation, except when you want to push your changes over the internet or pull changes of others from the web, so its commands are very fast. The time to do a commit is barely noticeable which makes atomic commits easy to do.

And if you already know subversion, the switch to Mercurial will be mostly painless.

But its most important strength is not its speed. It is that Mercurial just works. No hassle with complicated setup. No arcane commands. Almost everything I ever wanted to do with it just worked out of the box, and that's a rare and precious feature today.

And to answer a common question:

“Once you have learned git well, what use is hg?” — Ross Bartlett in Why Mercurial?

  • Easier usage (with git I shot myself in the foot quite often. Mercurial just works), accessing both hg and git repos from one ui, Thoroughly planned features.
  • No need to think that much about the tool. There is a reason why hg users tend to talk less about hg: There is no need to talk about it that much.
  • Also versioned tags and the option to use persistent branches to make it easier to track why a commit was added later on.
  • And many great extensions.

I wish you much fun with Mercurial!

A complete Mercurial branching strategy

This is a complete collaboration model for Mercurial. It shows you all the actions you may need to take, except for the basics already found in the guide Mercurial in workflows or the talk hg init science. Extensions allow optimizing the model for special needs like maintaining multiple releases1 and an explicit code review stage.

Summary

Any model to be used by people should consist of simple, consistent rules. Programming is complex enough without having to worry about elaborate branching directives. Therefore this model boils down to 3 simple rules:

(1) you do all the work on default2 - except for hotfixes.

(2) on stable you only do hotfixes, merges for release3 and tagging for release. Only maintainers4 touch stable.

(3) you can use arbitrary feature-branches5, as long as you don’t call them default or stable. They always start at default (since you do all the work on default).

Diagram

To visualize the structure, here’s a 3-tiered diagram. To the left are the actions of developers (commits and feature branches) and in the center the tasks for maintainers (release and hotfix). The users to the right just use the stable branch.6

Overview Diagram
An overview of the branching strategy. Click the image to get the emacs org-mode ditaa-source.

Table of Contents

Practial Actions

Now we can look at all the actions you will ever need to do in this model:7

  • Regular development

    • commit changes: (edit); hg ci -m "message"

    • continue development after a release: hg update; (edit); hg ci -m "message"

  • Feature Branches

    • start a larger feature: hg branch feature-x; (edit); hg ci -m "message"

    • continue with the feature: hg update feature-x; (edit); hg ci -m "message"

    • merge the feature: hg update default; hg merge feature-x; hg ci -m "merged feature x into default"

    • close and merge the feature when you are done: hg update feature-x; hg ci --close-branch -m "finished feature x"; hg update default; hg merge feature-x; hg ci -m "merged finished feature x into default"

  • Tasks for Maintainers

    • Initialize (only needed once)

      • create the repo: hg init reponame; cd reponame

      • first commit: (edit); hg ci -m "message"

      • create the stable branch and do the first release: hg branch stable; hg tag tagname; hg up default; hg merge stable; hg ci -m "merge stable into default: ready for more development"

    • apply a hotfix8: hg up stable; (edit); hg ci -m "message"; hg up default; hg merge stable; hg ci -m "merge stable into default: ready for more development"

    • do a release9: hg up stable; hg merge default; hg ci -m "(description of the main changes since the last release)" ; hg tag tagname; hg up default ; hg merge stable ; hg ci -m "merged stable into default: ready for more development"

Example

This is the output of a complete example run 10 of the branching model, including all complications you should ever hit.

We start with the full history. In the following sections, we will take it apart to see what the commands do. So just take a glance, take in the basic structure and then move on for the details.

hg log -G
@    changeset:   15:855a230f416f
|\   tag:         tip
| |  parent:      13:e7f11bbc756c
| |  parent:      14:79b616e34057
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:49 2013 +0100
| |  summary:     merged stable into default: ready for more development
| |
| o  changeset:   14:79b616e34057
|/|  branch:      stable
| |  parent:      7:e8b509ebeaa9
| |  parent:      13:e7f11bbc756c
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:48 2013 +0100
| |  summary:     merged default into stable for release
| |
o |    changeset:   13:e7f11bbc756c
|\ \   parent:      11:e77a94df3bfe
| | |  parent:      12:aefc8b3a1df2
| | |  user:        Arne Babenhauserheide <bab@draketo.de>
| | |  date:        Sat Jan 26 15:39:47 2013 +0100
| | |  summary:     merged finished feature x into default
| | |
| o |  changeset:   12:aefc8b3a1df2
| | |  branch:      feature-x
| | |  parent:      9:1dd6209b2a71
| | |  user:        Arne Babenhauserheide <bab@draketo.de>
| | |  date:        Sat Jan 26 15:39:46 2013 +0100
| | |  summary:     finished feature x
| | |
o | |  changeset:   11:e77a94df3bfe
|\| |  parent:      10:8c423bc00eb6
| | |  parent:      9:1dd6209b2a71
| | |  user:        Arne Babenhauserheide <bab@draketo.de>
| | |  date:        Sat Jan 26 15:39:45 2013 +0100
| | |  summary:     merged feature x into default
| | |
o | |  changeset:   10:8c423bc00eb6
| | |  parent:      8:dc61c2731eda
| | |  user:        Arne Babenhauserheide <bab@draketo.de>
| | |  date:        Sat Jan 26 15:39:44 2013 +0100
| | |  summary:     3
| | |
| o |  changeset:   9:1dd6209b2a71
|/ /   branch:      feature-x
| |    user:        Arne Babenhauserheide <bab@draketo.de>
| |    date:        Sat Jan 26 15:39:43 2013 +0100
| |    summary:     x
| |
o |  changeset:   8:dc61c2731eda
|\|  parent:      5:4c57fdadfa26
| |  parent:      7:e8b509ebeaa9
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:43 2013 +0100
| |  summary:     merged stable into default: ready for more development
| |
| o  changeset:   7:e8b509ebeaa9
| |  branch:      stable
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:42 2013 +0100
| |  summary:     Added tag v2 for changeset 089fb0af2801
| |
| o  changeset:   6:089fb0af2801
|/|  branch:      stable
| |  tag:         v2
| |  parent:      4:d987ce9fc7c6
| |  parent:      5:4c57fdadfa26
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:41 2013 +0100
| |  summary:     merge default into stable for release
| |
o |  changeset:   5:4c57fdadfa26
|\|  parent:      3:bc625b0bf090
| |  parent:      4:d987ce9fc7c6
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:40 2013 +0100
| |  summary:     merge stable into default: ready for more development
| |
| o  changeset:   4:d987ce9fc7c6
| |  branch:      stable
| |  parent:      1:a8b7e0472c5b
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:39 2013 +0100
| |  summary:     hotfix
| |
o |  changeset:   3:bc625b0bf090
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:38 2013 +0100
| |  summary:     2
| |
o |  changeset:   2:3e8df435bcb0
|\|  parent:      0:f97ea6e468a1
| |  parent:      1:a8b7e0472c5b
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:38 2013 +0100
| |  summary:     merged stable into default: ready for more development
| |
| o  changeset:   1:a8b7e0472c5b
|/   branch:      stable
|    user:        Arne Babenhauserheide <bab@draketo.de>
|    date:        Sat Jan 26 15:39:36 2013 +0100
|    summary:     Added tag v1 for changeset f97ea6e468a1
|
o  changeset:   0:f97ea6e468a1
   tag:         v1
   user:        Arne Babenhauserheide <bab@draketo.de>
   date:        Sat Jan 26 15:39:36 2013 +0100
   summary:     1

Action by action

Let’s take the log apart to show the actions contributors will do.

Initialize

Initializing and doing the first commit creates the first changeset:

o  changeset:   0:f97ea6e468a1
   tag:         v1
   user:        Arne Babenhauserheide <bab@draketo.de>
   date:        Sat Jan 26 15:39:36 2013 +0100
   summary:     1

Nothing much to see here.

Commands:

hg init test-branch; cd test-branch
(edit); hg ci -m "message"

Stable branch and first release

We add the first tagging commit on the stable branch as release and merge back into default:

o    changeset:   2:3e8df435bcb0
|\   parent:      0:f97ea6e468a1
| |  parent:      1:a8b7e0472c5b
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:38 2013 +0100
| |  summary:     merged stable into default: ready for more development
| |
| o  changeset:   1:a8b7e0472c5b
|/   branch:      stable
|    user:        Arne Babenhauserheide <bab@draketo.de>
|    date:        Sat Jan 26 15:39:36 2013 +0100
|    summary:     Added tag v1 for changeset f97ea6e468a1
|
o  changeset:   0:f97ea6e468a1
   tag:         v1
   user:        Arne Babenhauserheide <bab@draketo.de>
   date:        Sat Jan 26 15:39:36 2013 +0100
   summary:     1

Mind the tag field which is now shown in changeset 0 and the branchname for changeset 1. This is the only release which will ever be on the default branch (because the stable branch only starts to exist after the first commit on it: The commit which adds the tag).

Commands:

hg branch stable
hg tag tagname
hg up default
hg merge stable
hg ci -m "merged stable into default: ready for more development"`

Further development

Now we just chuck along. The one commit shown here could be an arbitrary number of commits.

o    changeset:   3:bc625b0bf090
|    user:        Arne Babenhauserheide <bab@draketo.de>
|    date:        Sat Jan 26 15:39:38 2013 +0100
|    summary:     2
|  
o    changeset:   2:3e8df435bcb0
|\   parent:      0:f97ea6e468a1
| |  parent:      1:a8b7e0472c5b
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:38 2013 +0100
| |  summary:     merged stable into default: ready for more development

Commands:

(edit)
hg ci -m "message"

Hotfix

If a hotfix has to be applied to the release out of order, we just update to the stable branch, apply the hotfix and then merge the stable branch into default11. This gives us changesets 4 for the hotfix and 5 for the merge (2 and 3 are shown as reference).

o    changeset:   5:4c57fdadfa26
|\   parent:      3:bc625b0bf090
| |  parent:      4:d987ce9fc7c6
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:40 2013 +0100
| |  summary:     merge stable into default: ready for more development
| |
| o  changeset:   4:d987ce9fc7c6
| |  branch:      stable
| |  parent:      1:a8b7e0472c5b
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:39 2013 +0100
| |  summary:     hotfix
| |
o |  changeset:   3:bc625b0bf090
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:38 2013 +0100
| |  summary:     2
| |
o |  changeset:   2:3e8df435bcb0
|\|  parent:      0:f97ea6e468a1
| |  parent:      1:a8b7e0472c5b
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:38 2013 +0100
| |  summary:     merged stable into default: ready for more development

Commands:

hg up stable
(edit)
hg ci -m "message"
hg up default
hg merge stable
hg ci -m "merge stable into default: ready for more development"    

Regular release

To do a regular release, we just merge the default branch into the stable branch and tag the merge. Then we merge stable back into default. This gives us changesets 6 to 812. The commit-message you use for the merge to stable will become the description for your tag, so you should choose a good description instead of “merge default into stable for release”. Userfriendly, simplified release notes would be a good choice.

o    changeset:   8:dc61c2731eda
|\   parent:      5:4c57fdadfa26
| |  parent:      7:e8b509ebeaa9
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:43 2013 +0100
| |  summary:     merged stable into default: ready for more development
| |
| o  changeset:   7:e8b509ebeaa9
| |  branch:      stable
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:42 2013 +0100
| |  summary:     Added tag v2 for changeset 089fb0af2801
| |
| o  changeset:   6:089fb0af2801
|/|  branch:      stable
| |  tag:         v2
| |  parent:      4:d987ce9fc7c6
| |  parent:      5:4c57fdadfa26
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:41 2013 +0100
| |  summary:     merge default into stable for release
| |
o |  changeset:   5:4c57fdadfa26
|\|  parent:      3:bc625b0bf090
| |  parent:      4:d987ce9fc7c6
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:40 2013 +0100
| |  summary:     merge stable into default: ready for more development

Commands:

hg up stable
hg merge default
hg ci -m "merge default into stable for release"
hg tag tagname
hg up default
hg merge stable
hg ci -m "merged stable into default: ready for more development"

Feature branches

Now we want to do some larger development, so we use a feature branch. The one feature-commit shown here (x) could be an arbitrary number of commits, and as long as you stay in your branch, the development of your colleagues will not disturb your own work. Once the feature is finished, we merge it into default. The feature branch gives us changesets 9 to 13 (with 10 being an example for an unrelated intermediate commit on default).

o    changeset:   13:e7f11bbc756c
|\   parent:      11:e77a94df3bfe
| |  parent:      12:aefc8b3a1df2
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:47 2013 +0100
| |  summary:     merged finished feature x into default
| |
| o  changeset:   12:aefc8b3a1df2
| |  branch:      feature-x
| |  parent:      9:1dd6209b2a71
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:46 2013 +0100
| |  summary:     finished feature x
| |
o |  changeset:   11:e77a94df3bfe
|\|  parent:      10:8c423bc00eb6
| |  parent:      9:1dd6209b2a71
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:45 2013 +0100
| |  summary:     merged feature x into default
| |
o |  changeset:   10:8c423bc00eb6
| |  parent:      8:dc61c2731eda
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:44 2013 +0100
| |  summary:     3
| |
| o  changeset:   9:1dd6209b2a71
|/   branch:      feature-x
|    user:        Arne Babenhauserheide <bab@draketo.de>
|    date:        Sat Jan 26 15:39:43 2013 +0100
|    summary:     x
|  
o    changeset:   8:dc61c2731eda
|\   parent:      5:4c57fdadfa26
| |  parent:      7:e8b509ebeaa9
| |  user:        Arne Babenhauserheide <bab@draketo.de>
| |  date:        Sat Jan 26 15:39:43 2013 +0100
| |  summary:     merged stable into default: ready for more development

Commands:

  • Start the feature

    hg branch feature-x 
    (edit)
    hg ci -m "message"
    
  • Do an intermediate commit on default

    hg update default
    (edit)
    hg ci -m "message"
    
  • Continue working on the feature

    hg update feature-x
    (edit)
    hg ci -m "message"
    
  • Merge the feature

    hg update default
    hg merge feature-x
    hg ci -m "merged feature x into default"`
    
  • Close and merge a finished feature

    hg update feature-x
    hg ci --close-branch -m "finished feature x"
    hg update default; hg merge feature-x
    hg ci -m "merged finished feature x into default"
    

Note: Closing the feature branch hides that branch in the output of hg branches (except when using --closed) to make the repository state lean and simple while still keeping the feature branch information in history. It shows your collegues, that they no longer have to keep the feature in mind as soon as they merge the most recent changes from the default branch into their own feature branches.

Note: To make the final merge of your feature into default easier, you can regularly merge the default branch into the feature branch.

Note: We use feature branches to ensure that new clones start at a revision which other developers can directly use. With bookmarks you could get trapped on a feature-head which might not be merged to default for quite some time. For more reasons, see the bookmarks footnote.

The final action is a regular merge to stable to get into a state from which we could safely do a release. Since we already showed how to do that, we are finished here.

Extensions

This realizes the successful Git branching model13 with Mercurial while maintaining one release at any given time.

If you have special needs, this model can easily be extended to fullfill your requirements. Useful extensions include:

  • multiple releases - if you need to provide maintenance for multiple releases side-by-side.
  • grafted micro-releases - if you need to segment the next big changes into smaller releases while leaving out some potentially risky changes.
  • explicit review - if you want to ensure that only reviewed changes can get into a release, while making it possible to leave out some already reviewed changes from the next releases. Review gets decoupled from releasing.

All these extensions are orthogonal, so you can use them together without getting side-effects.

Multiple maintained releases

To use the branching model with multiple simultaneously maintained releases, you only need to change the hotfix procedure: When applying a hotfix, you go back to the old release with hg update tagname, fix there, add a new tag for the fixed release and then update to the next release. There you merge the new fix-release and do the same for all other releases. If the most recent release is not the head of the stable branch, you also merge into stable. Then you merge the stable branch into default, as for a normal hotfix.14

With this merge-chain you don’t need special branches for releases, but all changesets are still clearly recorded. This simplification over git is a direct result of having real anonymous branching in Mercurial.

hg update release-1.0
(edit)
hg ci -m "message"
hg tag release-1.1
hg update release-2.0
hg merge release-1.1
hg ci -m "merged changes from release 1.1"
hg tag release-2.1
… and so on

In the Diagram this just adds a merge path from the hotfix to the still maintained releases. Note that nothing changed in the workflow of programmers.

Overview Diagram
An overview of the branching strategy with maintained releases. Click the image to get the emacs org-mode ditaa-source.

Graft changes into micro-releases

If you need to test parts of the current development in small chunks, you can graft micro releases. In that case, just update to stable and merge the first revision from default, whose child you do not want, and graft later changes15.

Example for the first time you use micro-releases16:

You have changes 1, 2, 3, 4 and 5 on default. First you want to create a release which contains 1 and 4, but not 2, 3 or 5.

hg update 1
hg branch stable
hg graft 4

As usual tag the release and merge stable back into default:

hg tag rel-14 
hg update default
hg merge stable
hg commit -m "merge stable into default. ready for more development"

Example for the second and subsequent releases:

Now you want to release the change 2 and 5, but you’re still not ready to release 3. So you merge 2 and graft 5.

hg update stable
hg merge 2
hg commit -m "merge all changes until 2 from default"
hg graft 5

As usual tag the release and finally merge stable back into default:

hg tag rel-1245 
hg update default
hg merge stable
hg commit -m "merge stable into default. ready for more development"

The history now looks like this17:

@    merge stable into default. ready for more development (default)
|\
| o  Added tag rel-1245 for changeset 4e889731c6ca (stable)
| |
| o  5 (stable)
| |
| o    merge all changes until 2 from default (stable)
| |\
o---+  merge stable into default. ready for more development (default)
| | |
| | o  Added tag rel-14 for changeset cc2c95dd3f27 (stable)
| | |
| | o  4 (stable)
| | |
o | |  5 (default)
| | |
o | |  4 (default)
| | |
o | |  3 (default)
|/ /
o /  2 (default)
|/
o  1 (default)
|
o  0 (default)

In the Diagram this just adds graft commits to stable:

Overview Diagram
An overview of the branching strategy with grafted micro-releases. Click the image to get the emacs org-mode ditaa-source.

Grafted micro-releases add another layer between development and releases. They can be necessary in cases where testing requires actually deploying a release, as for example in Freenet.

Explicit review branch

If you want to add a separate review stage, you can use a review branch1819 into which you only merge or graft reviewed changes. The review branch then acts as a staging area for all changes which might go into a release.

To use this extension of the branching model, just create a branch on default called review in which you merge or graft reviewed changes. The first time you do that, you update to the first commit whose children you do not want to include. Then create the review branch with hg branch review and use hg graft REV to pull in all changes you want to include.

On subsequent reviews, you just update to review with hg update nextrelease, merge the first revision which has a child you do not want with hg merge REV and graft additional later changes with hg graft REV as you would do it for micro-releases..

In both cases you create the release by merging the review branch into stable.

A special condition when using a review branch is that you always have to merge hotfixes into the review branch, too, because the review branch does not automatically contain all changes from the default branch.

In the Diagram this just adds the review branch between default and stable instead of the release merge. Also it adds the hotfix merge to the review branch.

Overview Diagram
An overview of the branching strategy with a review branch. Click the image to get the emacs org-mode ditaa-source.

Frequently Asked Questions (FAQ)

Where does QA (Quality Assurance) come in?

In the default flow when the users directly use the stable branch you do QA on the default branch before merging to stable. QA is a part of the maintainers job, there.

If your users want external QA, that QA is done for revisions on the stable branch. It is restricted to signing good revisions. Any changes have to be done on the default branch - except for hotfixes for previously signed releases. It is only a hotfix, if your users could already be running a broken version.

There is also an extension with an explicit review branch. There QA is done on the review branch.

Simple Summary

This realizes the successful Git branching model with Mercurial.

We now have nice graphs, examples, potential extensions and so on. But since this strategy uses Mercurial instead of git, we don’t actually need all the graphics, descriptions and branch categories in the git version - or in this post.

Instead we can boil all of this down to 3 simple rules:

(1) you do all the work on default - except for hotfixes.

(2) on stable you only do hotfixes, merges for release and tagging for release. Only maintainers touch stable.

(3) you can use arbitrary feature-branches, as long as you don’t call them default or stable. They always start at default (since you do all the work on default).

They are the rules you already know from the starting summary. Keep them in mind and you’re good to go. And when you’re doing regular development, there is only one rule to remember:

You do all the work on default.

That’s it. Happy hacking!


  1. if you need to maintain multiple very different releases simultanously, see or 20 for adaptions 

  2. default is the default branch. That’s the named branch you use when you don’t explicitely set a branch. Its alias is the empty string, so if no branch is shown in the log (hg log), you’re on the default branch. Thanks to John for asking! 

  3. If you want to release the changes from default in smaller chunks, you can also graft specific changes into a release preparation branch and merge that instead of directly merging default into stable. This can be useful to get real-life testing of the distinct parts. For details see the extension Graft changes into micro-releases

  4. Maintainers are those who do releases, while they do a release. At any other time, they follow the same patterns as everyone else. If the release tasks seem a bit long, keep in mind that you only need them when you do the release. Their goal is to make regular development as easy as possible, so you can tell your non-releasing colleagues “just work on default and everything will be fine”. 

  5. This model does not use bookmarks, because they don’t offer benefits which outweight the cost of introducing another concept: If you use bookmarks for differenciating lines of development, you have to define the canonical revision to clone by setting the @ bookmark. For local work and small features, bookmarks can be used quite well, though, and since this model does not define their use, it also does not limit it.
    Additionally bookmarks could be useful for feature branches, if you use many of them (in that case reusing names is a real danger and not just a rare annoyance) or if you use release branches:
    “What are people working on right now?” → hg bookmarks
    “Which lines of development do we have in the project?” → hg branches 

  6. Those users who want external verification can restrict themselves to the tagged releases - potentially GPG signed by trusted 3rd-party reviewers. GPG signatures are treated like hotfixes: reviewers sign on stable (via hg sign without options) and merge into default. Signing directly on stable reduces the possibility of signing the wrong revision. 

  7. hg pull and hg push to transfer changes and hg merge when you have multiple heads on one branch are implied in the actions: you can use any kind of repository structure and synchronization scheme. The practical actions only assume that you synchronize your repositories with the other contributors at some point. 

  8. Here a hotfix is defined as a fix which must be applied quickly out-of-order, for example to fix a security hole. It prompts a bugfix-release which only contains already stable and tested changes plus the hotfix. 

  9. If your project needs a certain release preparation phase (like translations), then you can simply assign a task branch. Instead of merging to stable, you merge to the task branch, and once the task is done, you merge the task branch to stable. An Example: Assume that you need to update translations before you release anything. (next part: init: you only need this once) When you want to do the first release which needs to be translated, you update to the revision from which you want to make the release and create the “translation” branch: hg update default; hg branch translation; hg commit -m "prepared the translation branch". All translators now update to the translation branch and do the translations. Then you merge it into stable: hg update stable; hg merge translation; hg ci -m "merged translated source for release". After the release you merge stable back into default as usual. (regular releases) If you want to start translating the next time, you just merge the revision to release into the translation branch: hg update translation; hg merge default; hg commit -m "prepared translation branch". Afterwards you merge “translation” into stable and proceed as usual. 

  10. To run the example and check the output yourself, just copy-paste the following your shell: LC_ALL=C sh -c 'hg init test-branch; cd test-branch; echo 1 > 1; hg ci -Am 1; hg branch stable; hg tag v1 ; hg up default; hg merge stable; hg ci -m "merged stable into default: ready for more development"; echo 2 > 2; hg ci -Am 2; hg up stable; echo 1.1 > 1; hg ci -Am hotfix; hg up default; hg merge stable; hg ci -m "merge stable into default: ready for more development"; hg up stable; hg merge default; hg ci -m "merge default into stable for release" ; hg tag v2; hg up default ; hg merge stable ; hg ci -m "merged stable into default: ready for more development" ; hg branch feature-x; echo x > x ; hg ci -Am x; hg up default; echo 3 > 3; hg ci -Am 3; hg merge feature-x; hg ci -m "merged feature x into default"; hg update feature-x; hg ci --close-branch -m "finished feature x"; hg update default; hg merge feature-x; hg ci -m "merged finished feature x into default"; hg up stable ; hg merge default; hg ci -m "merged default into stable for release"; hg up default; hg merge stable ; hg ci -m "merged stable into default: ready for more development"; hg log -G' 

  11. We merge the hotfix into default to define the relevance of the fix for general development. If the hotfix also affects the current line of development, we keep its changes in the merge. If the current line of development does not need the hotfix, we discard its changes in the merge. We do this to ensure that it is clear in future how to treat the hotfix when merging new changes: let the merge record the decision. 

  12. We can also merge to stable regularly as soon as some set of changes is considered stable, but without making an actual release (==tagging). That way we always have a stable branch which people can test without having to create releases right away. The releases are those changesets on the stable branch which carry a tag. 

  13. If you look at the Git branching model which inspired this Mercurial branching model, you’ll note that its diagram is a lot more complex than the diagram of this Mercurial version.

    The reason for that is the more expressive history model of Mercurial. In short: The git version has 5 types of branches: feature, develop, release, hotfix and master (for tagging). With Mercurial you can reduce them to 3: default, stable and feature branches:

    • Tags are simple in-history objets, so we need no special branch for them: a tag signifies a release (down to 4 branch-types - and no more duplication of information, since in the git-model a release is shown by a tag and a merge to master).
    • Hotfixes are simple commits on stable followed by a merge to default, so we also need no branch for them (down to 3 branch-types). And if we only maintain one release at a time, we only need one branch for them: stable (down from branch-type to single branch).
    • And feature branches are not required for clean separation since mercurial can easily cope with multiple heads in a branch, so developers only have to worry about them if they want to use them (down to 2 mandatory branches).
    • And since the default branch is the branch to which you update automatically when you clone a repository, new developers don’t have to worry about branches at all.

    So we get down from 5 mandatory branches (2 of them are categories containing multiple branches) to 2 simple branches without losing functionality.

    And new developers only need to know two things about our branching model to contribute:

    “If you use feature branches, don’t call them default or stable. And don’t touch stable”.

  14. Merging old releases into new ones sounds like a lot of work. If you get that feeling, then have a look how many releases you really maintain right now. In my Gentoo tree most programs actually have only one single release, so using actual release branches would incur an additional burden without adding real value. You can also look at the rule of thumb whether to choose feature branches instead 

  15. If you want to make sure that every changeset on stable is production-ready, you can also start a new release-branch on stable, then merge the first revision, whose child you do not want, into that branch and graft additional changes. Then close the branch and merge it into stable. You can achieve the same with much lower overhead (unneeded complexity) by changing the requirement to “every tagged revision on stable is production-ready”. To only see tagged revisions on stable, just use hg log -r "branch(stable) and tag()". This also works for incoming and outgoing, so you can use it for triggering a build system. 

  16. To test this workflow yourself, just create the test repository with hg init 12345; cd 12345; for i in {0..5}; do echo $i > $i; hg ci -Am $i; done

  17. The short graphlog for the grafted micro-releases was created via hg glog --template "{desc} ({branch})"

  18. The review branch is a special preparation-branch, because it can get discontinous changes, if maintainers decide to graft some changes which have ancestors they did not review yet. 

  19. We use one single review branch which gets reused at every review to ensure that there are no changes in stable which we did not have in the review. As alternative, you could use one branch per review. In that case, ensure that you start the review-* branches from stable and not from default. Then merge and graft the changes from default which you want to review for inclusion in your next release. 

  20. If you want to adapt the model to multiple very distinct releases, simply add multiple release-branches (i.e. release-x). Then hg graft the changes you want to use from default or stable into the releases and merge the releases into stable to ensure that the relationship of their changes to current changes is clear, recorded and will be applied automatically by Mercurial in future merges21. If you use multiple tagged releases, you need to merge the releases into each other in order - starting from the oldest and finishing by merging the most recent one into stable - to record the same information as with release branches. Additionally it is considered impolite to other developers to keep multiple heads in one branch, because with multiple heads other developers do not know the canonical tip of the branch which they should use to make their changes - or in case of stable, which head they should merge to for preparing the next release. That’s why you are likely better off creating a branch per release, if you want to maintain many very different releases for a long time. If you only use tags on stable for releases, you need one merge per maintained release to create a bugfix version of one old release. By adding release branches, you reduce that overhead to one single merge to stable per affected release by stating clearly, that changes to old versions should never affect new versions, except if those changes are explicitely merged into the new versions. If the bugfix affects all releases, release branches require two times as many actions as tagged releases, though: You need to graft the bugfix into every release and merge the release into stable.22 

  21. If for example you want to ignore that change to an old release for new releases, you simply merge the old release into stable and use hg revert --all -r stable before committing the merge. 

  22. A rule of thumb for deciding between tagged releases and release branches is: If you only have a few releases you maintain at the same time, use tagged releases. If you expect that most bugfixes will apply to all releases, starting with some old release, just use tagged releases. If bugfixes will only apply to one release and the current development, use tagged releases and merge hotfixes only to stable. If most bugfixes will only apply to one release and not to the current development, use release branches. 

AnhangGröße
hgbranchingoverview.png28.75 KB
hgbranchinggraft.png29.36 KB
hgbranchingreview.png35.6 KB
2012-09-03-Mo-hg-branching-diagrams.org12.43 KB
hgbranchingmaintain.png45.08 KB
2012-09-03-Mo-hg-branching-diagrams.org10.74 KB

A short introduction to Mercurial with TortoiseHG (GNU/Linux and Windows)

Note: This tutorial is for the old TortoiseHG (with gtk interface). The new one works a bit differently (and uses Qt). See the official quick start guide. The right-click menus should still work similar to the ones described here, though.

Downloading the Repository

After installing TortoiseHG, you can download a repository to your computer by right-clicking in a folder and selecting the menu "TortoiseHG" and then "Clone" in there (currently you still need Windows for that - all other dialogs can be evoked in GNU/Linux on the commandline via "hgtk").

Right-Click menu, Windows:

Right-click-Menu

Create Clone, GNU/Linux:

Create Clone

In the dialog you just enter the url of the repository, for example:

http://www.bitbucket.org/ArneBab/md-esw-2009

(that's also the address of the repository in the internet - just try clicking the link.

When you log in to bitbucket.org you will find a clone-address directly on the site. You can also use that clone address to upload changes (it contains your login-name, and I can give you "push" access on that site).

Workflow with TortoiseHG

This gives you two basic abilities:

  • Save and view changes locally, and
  • synchronize changes with others.

(I assume that part of what I say is redundant, but I'd rather write a bit too much than omit a crucial bit)

To save changes, you can simlply select "HG Commit" in the right-click-menu. If some of your files aren't known to HG yet (the box before the file isn't ticked), you have to add them (tick the box) to be able to commit them.

Commit

To go back to earlier changes, you can use "Checkout Revision" in the "TortoiseHG" menu. In that dialog you can then select the revision you want to see and use the icon on the upper left to get all files to that revision.

Update

Update-Result

You can synchronize by right-clicking in the folder and selecting "Synchronize" in the "TortoiseHG" menu (inside the right-click menu). In the opening dialog you can "push" (upload changes - arrow up with the bar above it), "pull" (download changes to your computer - arrow down with bar below), and check what you would pull or push (arrows iwthout bars). I thing that using dialog will soon became second nature for you, too :)

Synchronize

Pull

Have fun with TortoiseHG! :) - Arne

PS: There's also a longer intro to TortoiseHG and an overview to DVCS.

PPS: md-esw-2009 is a repository in which Baddok and I planned a dual-gm roleplaying session Mechanical Dream.

PPPS: There's also a german version of this article on my german pages.

Basic usecases for DVCS: Workflow Failures

If you came here searching for a way to set the username in Mercurial: just edit $HOME/.hgrc and add
    [ui]
    username = YOURNAME <EMAIL>
If that file does not exist, simply create it.

Update (2014-05-01): The Mercurial breakage is fixed in Mercurial 3.0: When you commit without username it now says “Abort: no username supplied (use "hg config --edit" to set your username)”. The editor shows a template with a commented-out field for the username. Just put your name and email after the pre-filled username = and save the file. The Git breakage still exists.

Update (2013-04-18): In #mercurial @ irc.freenode.net there were discussions yesterday for improving the help output if you do not have your username setup, yet.

1 Intro

I recently tried contributing to a new project again, and I was quite surprised which hurdles can be in your way, when you did not setup your environment, yet.

So I decided to put together a small test for the basic workflow: Cloning a project, doing and testing a change and pushing it back.

I did that for Git and Mercurial, because both break at different points.

I’ll express the basic usecase in Subversion:

  • svn checkout [project]
  • (hack, test, repeat)
  • (request commit rights)
  • svn commit -m "added X"

You can also replace the request for commit rights with creating a patch and sending it to a mailing list. But let’s take the easiest case of a new contributor who is directly welcomed into the project as trusted committer.

dvcs-basic-svn.png

A slightly more advanced workflow adds testing in a clean tree. In Subversion it looks almost like the simple commit:

dvcs-basic-svn-testing.png

2Git

Let’s start with Linus’ DVCS. And since we’re using a DVCS, let’s also try it out in real life

2.1 Setup the test

LC_ALL=C
LANG=C
PS1="$"
rm -rf /tmp/gitflow > /dev/null
mkdir -p /tmp/gitflow > /dev/null
cd /tmp/gitflow > /dev/null
# init the repo
git init orig  > /dev/null
cd orig > /dev/null
echo 1 > 1
# add a commit
git add 1 > /dev/null
git config user.name upstream > /dev/null
git config user.email up@stream > /dev/null
git commit -m 1 > /dev/null
# checkout another branch but master. YES, YOU SHOULD DO THAT on the shared repo. We’ll see later, why.
git checkout -b never-pull-this-temporary-useless-branch master 2> /dev/null
cd .. > /dev/null
echo # purely cosmetic and implementation detail: this adds a new line to the output
ls
wolf, n.:
    A man who knows all the ankles.
arne@fluss ~/.emacs.d/private/journal $ arne@fluss ~/.emacs.d/private/journal $ $$$$$$$$$$$$$$$$
orig
git --version

git version 1.8.1.5

2.2 Simplest case

2.2.1 Get the repo

First I get the repo

git clone orig mine
echo $ ls
ls
Cloning into 'mine'...
done.
$ ls
mine  orig

2.2.2 Hack a bit

cd mine
echo 2 > 1
git commit -m "hack"

$# On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)

modified:   1
no changes added to commit (use "git add" and/or "git commit -a")

ARGL… but let’s paste the commands into the shell. I do not use –global, since I do not want to shoot my test environment here.

git config user.name "contributor"
git config user.email "con@tribut.or"

and try again

git commit -m "hack"

On branch master
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)

modified:   1
no changes added to commit (use "git add" and/or "git commit -a")

ARGL… well, paste it in again…

git add 1
git commit -m "hack"

[master aba911a] hack
 1 file changed, 1 insertion(+), 1 deletion(-)

Finally I managed to commit my file. Now, let’s push it back.

2.2.3 Push it back

git push
warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

Counting objects: 5, done.
(1/3)   
Writing objects:  66% (2/3)   
Writing objects: 100% (3/3)   
Writing objects: 100% (3/3), 222 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To /tmp/gitflow/orig
master

HA! It’s in.

2.2.4 Overview

In short the required commands look like this:

  • git clone orig mine
  • cd mine; (hack)
  • git config user.name "contributor"
  • git config user.email "con@tribut.or"
  • git add 1
  • git commit -m "hack"
  • (request permission to push)
  • git push

dvcs-basic-git.png

compare Subversion:

./dvcs-basic-svn.png

Now let’s see what that initial setup with setting a non-master branch was about…

2.3 With testing

2.3.1 Test something

I want to test a change and ensure, that it works with a fresh clone. So I just clone my local repo and commit there.

cd ..
git clone mine test
cd test
# setup the user locally again. Normally you do not need that again, since you’d use --global.
git config user.email "contributor" 
git config user.name "con@tribut.or"
# hack and commit
echo test > 1
git add 1
echo # cosmetic
git commit -m "change to test" >/dev/null
# (run the tests)

2.3.2 Push it back

git push
warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

Counting objects: 5, done.
(1/3)   
Writing objects:  66% (2/3)   
Writing objects: 100% (3/3)   
Writing objects: 100% (3/3), 234 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
remote: error: refusing to update checked out branch: refs/heads/master        
remote: error: By default, updating the current branch in a non-bare repository        
remote: error: is denied, because it will make the index and work tree inconsistent        
remote: error: with what you pushed, and will require 'git reset --hard' to match        
remote: error: the work tree to HEAD.        
remote: error:         
remote: error: You can set 'receive.denyCurrentBranch' configuration variable to        
remote: error: 'ignore' or 'warn' in the remote repository to allow pushing into        
remote: error: its current branch; however, this is not recommended unless you        
remote: error: arranged to update its work tree to match what you pushed in some        
remote: error: other way.        
remote: error:         
remote: error: To squelch this message and still keep the default behaviour, set        
remote: error: 'receive.denyCurrentBranch' configuration variable to 'refuse'.        
To /tmp/gitflow/mine
master (branch is currently checked out)
error: failed to push some refs to '/tmp/gitflow/mine'

Uh… what? If I were a real first time user, at this point I would just send a patch…

The simple local test clone does not work: You actually have to also checkout a different branch if you want to be able to push back (needless duplication of information - and effort). And it actually breaks this simple workflow.

(experienced git users will now tell me that you should always checkout a work branch. But that would mean that I would have to add the additional branching step to the simplest case without testing repo, too, raising the bar for contribution even higher)

git checkout -b testing master
git push ../mine testing
Switched to a new branch 'testing'
Counting objects: 5, done.
(1/3)   
Writing objects: 66% (2/3) Writing objects: 100% (3/3) Writing objects: 100% (3/3), 234 bytes, done. : Total 3 (delta 0), reused 0 (delta 0) : To ../mine : testing

Since I only pushed to mine, I now have to go there, merge and push.

cd ../mine
git merge testing
git push
Updating aba911a..820dea8
Fast-forward
 1 | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
warning: push.default is unset; its implicit value is changing in
Git 2.0 from 'matching' to 'simple'. To squelch this message
and maintain the current behavior after the default changes, use:

  git config --global push.default matching

To squelch this message and adopt the new behavior now, use:

  git config --global push.default simple

See 'git help config' and search for 'push.default' for further information.
(the 'simple' mode was introduced in Git 1.7.11. Use the similar mode
'current' instead of 'simple' if you sometimes use older versions of Git)

Counting objects: 5, done.
(1/3)   
Writing objects:  66% (2/3)   
Writing objects: 100% (3/3)   
Writing objects: 100% (3/3), 234 bytes, done.
Total 3 (delta 0), reused 0 (delta 0)
To /tmp/gitflow/orig
master

2.3.3 Overview

In short the required commands for testing look like this:

  • git clone mine test
  • cd test; (hack)
  • git add 1
  • git checkout -b testing master
  • git commit -m "hack"
  • git push ../mine testing
  • cd ../mine
  • git merge testing
  • git push

./dvcs-basic-git-testing.png

Compare to Subversion

./dvcs-basic-svn-testing.png

2.4 Wrapup

The git workflows broke at several places:

Simplest:

  • Set the username (minor: it’s just pasting shell commands)
  • Add every change (==staging. Minor: paste shell commands again - or use `commit -a`)

Testing clone (only additional breakages):

  • Cannot push to the local clone (major: it spews about 20 lines of error messages which do not tell me how to actually get my changes into the local clone)
  • Have to use a temporary branch in a local clone to be able to push back (annoyance: makes using clean local clones really annoying).

3Mercurial

Now let’s try the same

3.1 Setup the test

LC_ALL=C
LANG=C
PS1="$"
rm -rf /tmp/hgflow > /dev/null
mkdir -p /tmp/hgflow > /dev/null
cd /tmp/hgflow > /dev/null
# init the repo
hg init orig  > /dev/null
cd orig > /dev/null
echo 1 > 1 > /dev/null
# add a commit
hg add 1 > /dev/null
hg commit -u upstream -m 1 > /dev/null
cd .. >/dev/null
echo # purely cosmetic and implementation detail: this adds a new line to the output
ls
The most happy marriage I can imagine to myself would be the union
of a deaf man to a blind woman.
        -- Samuel Taylor Coleridge
arne@fluss ~/.emacs.d/private/journal $ arne@fluss ~/.emacs.d/private/journal $ $$$$$$$$$$$$
orig
hg --version

Mercurial Distributed SCM (version 2.5.2)
(see http://mercurial.selenic.com for more information)

Copyright (C) 2005-2012 Matt Mackall and others
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

3.2 Simplest case

3.2.1 Get the repo

hg clone orig mine
echo $ ls
ls
updating to branch default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$ ls
mine  orig

3.2.2 Hack a bit

cd mine
echo 2 > 1
echo
# I disable the username to show the problem
hg --config ui.username= commit -m "hack" 

$
$abort: no username supplied (see "hg help config")

ARGL, what??? Mind the update at the top of this article: This is fixed in Mercurial 3.0

Well, let’s do what it says (but only see the first 30 lines to avoid blowing up this example):

hg help config | head -n 30 | grep -B 3 -A 1 per-repository
These files do not exist by default and you will have to create the
    appropriate configuration files yourself: global configuration like the
USERPROFILE%\mercurial.ini" or
HOME/.hgrc" and local configuration is put into the per-repository
/.hg/hgrc" file.

Are you serious??? I have to actually read a guide just to commit my change??? As normal user this would tip my frustration with the tool over the edge and likely get me to just send a patch… Mind the update at the top of this article: This is fixed in Mercurial 3.0

But I am no normal user, since I want to write this guide. So I assume a really patient user, who does the following (after reading for 3 minutes):

echo '[ui]
username = "contributor"' >> .hg/hgrc

and tries again:

hg commit -m "hack"

Now it worked. But this is MAJOR BREAKAGE. Mind the update at the top of this article: This is fixed in Mercurial 3.0

3.2.3 Push it back

hg push
pushing to /tmp/hgflow/orig
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files

Done. This was easy, and I did not get yelled at (different from the experience with git :) ).

3.2.4 Overview

In short the required commands look like this:

  • hg clone orig mine
  • cd mine; (hack)
  • hg help config ; (read) ; echo '[ui]

username = "contributor"' >> .hg/hgrc (are you serious?)

  • hg commit -m "hack"
  • (request permission to push)
  • hg push

dvcs-basic-hg.png

Compare to Subversion

./dvcs-basic-svn.png

and to git

./dvcs-basic-git.png

3.3 With testing

3.3.1 Test something

cd ..
hg clone mine test
cd test
# setup the user locally again. Normally you do not need that again, since you’d use --global.
echo '[ui]
username = "contributor"' >> .hg/hgrc
# hack and commit
echo test > 1
echo # cosmetic
hg commit -m "change to test"
# (run the tests)

updating to branch default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
$$> $$$

3.3.2 Push it back

hg push
pushing to /tmp/hgflow/mine
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files

It’s in mine now, but I still need to push it from there.

cd ../mine
hg push

pushing to /tmp/hgflow/orig
searching for changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files

Done.

If I had worked on mine in the meantime, I would have to merge there, too - just as with git with the exception that I would not have to give a branch name. But since we’re in the simplest case, we don’t need to do that.

3.3.3 Overview

In short the required commands for testing look like this:

  • hg clone mine test
  • cd test; (hack)
  • hg commit -m "hack"
  • hg push ../mine
  • cd ../mine
  • hg push

dvcs-basic-hg-testing.png

Compare to Subversion

./dvcs-basic-svn-testing.png

and to git

./dvcs-basic-git-testing.png

3.4 Wrapup

The Mercurial workflow broke only ONCE, but there it broke HARD: To commit you actually have to READ THE HELP PAGE on config to find out how to set your username.

So, to wrap it up: ARE YOU SERIOUS? Mind the update at the top of this article: This is fixed in Mercurial 3.0

That’s a really nice workflow, disturbed by a devastating user experience for just one of the commands.

This is a place where hg should learn from git: The initial setup must be possible from the commandline, without reading a help page and without changing to an editor and then back into the commandline.

4 Summary

  • Git broke at several places, and in one place it broke hard: Pushing between local clones is a huge hassle, even though that should be a strong point of DVCSs.
  • Mercurial broke only once, but there it broke hard: Setting the username actually requires reading help output and hand-editing a text file.

Also the workflows for a user who gets permission to push always required some additional steps compared to Subversion.

One of the additional steps cannot be avoided without losing offline-commits (which are a major strength of DVCS), because those make it necessary to split svn commit into commit and push: That separates storing changes from sharing them.

But git actually requires additional steps which are only necessary due to implementation details of its storage layer: Pushing to a repo with the same branch checked out is not allowed, so you have to create an additional branch in your local clone and merge it in the other repo, even if all your changes are siblings of the changes in the other repository, and it requires either a flag to every commit command or explicit adding of changes. That does not amount to the one unavoidable additional command, but actually further three commands, so the number of commands to get code, hack on it and share it increases from 5 to 9. And if you work in a team where people trust you to write good code, that does not actually reduce the required effort to share your changes.

On the other hand, both Mercurial and Git allow you to work offline, and you can do as many testing steps in between as you like, without needing to get the changes from the server every time (because you can simply clone a local repo for that).

4.1 Visually

4.1.1 Subversion

./dvcs-basic-svn-testing.png

4.1.2 Mercurial

./dvcs-basic-hg-testing.png

4.1.3 Git

./dvcs-basic-git-testing.png

Date: 2013-04-17T20:39+0200

Author: Arne Babenhauserheide

Org version 7.9.2 with Emacs version 24

Validate XHTML 1.0
AnhangGröße
dvcs-basic-svn.png2.53 KB
dvcs-basic-svn-testing.png2.68 KB
dvcs-basic-hg.png2.72 KB
dvcs-basic-hg-testing.png3.08 KB
dvcs-basic-git.png2.89 KB
dvcs-basic-git-testing.png3.95 KB
2013-04-17-Mi-basic-usecase-dvcs.org13.02 KB
2013-04-17-Mi-basic-usecase-dvcs.pdf274.67 KB

Creating nice logs with revsets in Mercurial

In the mercurial list Stanimir Stamenkov asked how to get rid of intermediate merges in the log to simplify reading the history (and to not care about missing some of the details).

Update: Since Mercurial 2.4 you can simply use
hg log -Gr "branchpoint()"

I did some tests for that and I think the nicest representation I found is this:

hg log -Gr "(all() - merge()) or head()"

This article shows examples for this. To find more revset options, run hg help revsets.

The result

It showed that in the end the revisions converged again - and it shows the actual states of the development.

$ hg log -Gr "(all() - merge()) or head()"

@    Änderung:        7:52fe4a8ec3cc
|\   Marke:           tip
| |  Vorgänger:       6:7d3026216270
| |  Vorgänger:       5:848c390645ac
| |  Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| |  Datum:           Tue Aug 14 15:09:54 2012 +0200
| |  Zusammenfassung: merge
| |
| \
| |\
| | o  Änderung:        3:55ba56aa8299
| | |  Vorgänger:       0:385d95ab1fea
| | |  Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| | |  Datum:           Tue Aug 14 15:09:40 2012 +0200
| | |  Zusammenfassung: 4
| | |
| o |  Änderung:        2:b500d0a90d40
| |/   Vorgänger:       0:385d95ab1fea
| |    Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| |    Datum:           Tue Aug 14 15:09:39 2012 +0200
| |    Zusammenfassung: 3
| |
o |  Änderung:        1:8cc66166edc9
|/   Nutzer:          Arne Babenhauserheide <bab@draketo.de>
|    Datum:           Tue Aug 14 15:09:38 2012 +0200
|    Zusammenfassung: 2
|
o  Änderung:        0:385d95ab1fea
   Nutzer:          Arne Babenhauserheide <bab@draketo.de>
   Datum:           Tue Aug 14 15:09:38 2012 +0200
   Zusammenfassung: 1

Even shorter, but not quite correct

The shortest representation is without the heads, though. It does not represent the current state of development if the last commit was a merge or if some branches were not merged. Otherwise it is equivalent.

$ hg log -Gr "(all() - merge())"

o  Änderung:        3:55ba56aa8299
|  Vorgänger:       0:385d95ab1fea
|  Nutzer:          Arne Babenhauserheide <bab@draketo.de>
|  Datum:           Tue Aug 14 15:09:40 2012 +0200
|  Zusammenfassung: 4
|
| o  Änderung:        2:b500d0a90d40
|/   Vorgänger:       0:385d95ab1fea
|    Nutzer:          Arne Babenhauserheide <bab@draketo.de>
|    Datum:           Tue Aug 14 15:09:39 2012 +0200
|    Zusammenfassung: 3
|
| o  Änderung:        1:8cc66166edc9
|/   Nutzer:          Arne Babenhauserheide <bab@draketo.de>
|    Datum:           Tue Aug 14 15:09:38 2012 +0200
|    Zusammenfassung: 2
|
o  Änderung:        0:385d95ab1fea
   Nutzer:          Arne Babenhauserheide <bab@draketo.de>
   Datum:           Tue Aug 14 15:09:38 2012 +0200
   Zusammenfassung: 1

The basic log For reference

The vanilla-log looks like this:

$ hg log -G

@    Änderung:        7:52fe4a8ec3cc
|\   Marke:           tip
| |  Vorgänger:       6:7d3026216270
| |  Vorgänger:       5:848c390645ac
| |  Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| |  Datum:           Tue Aug 14 15:09:54 2012 +0200
| |  Zusammenfassung: merge
| |
| o    Änderung:        6:7d3026216270
| |\   Vorgänger:       2:b500d0a90d40
| | |  Vorgänger:       4:8dbc55213c9f
| | |  Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| | |  Datum:           Tue Aug 14 15:09:45 2012 +0200
| | |  Zusammenfassung: merged 4
| | |
o | |  Änderung:        5:848c390645ac
|\| |  Vorgänger:       3:55ba56aa8299
| | |  Vorgänger:       2:b500d0a90d40
| | |  Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| | |  Datum:           Tue Aug 14 15:09:43 2012 +0200
| | |  Zusammenfassung: merged 2
| | |
+---o  Änderung:        4:8dbc55213c9f
| | |  Vorgänger:       3:55ba56aa8299
| | |  Vorgänger:       1:8cc66166edc9
| | |  Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| | |  Datum:           Tue Aug 14 15:09:41 2012 +0200
| | |  Zusammenfassung: merged 1
| | |
o | |  Änderung:        3:55ba56aa8299
| | |  Vorgänger:       0:385d95ab1fea
| | |  Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| | |  Datum:           Tue Aug 14 15:09:40 2012 +0200
| | |  Zusammenfassung: 4
| | |
| o |  Änderung:        2:b500d0a90d40
|/ /   Vorgänger:       0:385d95ab1fea
| |    Nutzer:          Arne Babenhauserheide <bab@draketo.de>
| |    Datum:           Tue Aug 14 15:09:39 2012 +0200
| |    Zusammenfassung: 3
| |
| o  Änderung:        1:8cc66166edc9
|/   Nutzer:          Arne Babenhauserheide <bab@draketo.de>
|    Datum:           Tue Aug 14 15:09:38 2012 +0200
|    Zusammenfassung: 2
|
o  Änderung:        0:385d95ab1fea
   Nutzer:          Arne Babenhauserheide <bab@draketo.de>
   Datum:           Tue Aug 14 15:09:38 2012 +0200
   Zusammenfassung: 1

Creating the test repo

To create the test repo, I just used a few short loops in the shell:

hg init test ; cd test 
for i in 1 2 3 4; do echo $i > $i ; hg ci -Am "$i"; hg up -r -$i; done
for i in 1 2 3 4; do echo $i > $i ; hg ci -Am "$i"; hg up -r -$i; hg merge $i ; hg ci -m "merged $i"; done
for i in $(hg heads --template "{node} ") ; do hg merge $i ; hg ci -m "merge"; done

Better representations?

Do you have better representations for viewing convoluted history?

PS: Yes, you can rewrite history, but that’s a really bad idea if you have many people who closely interact and publish early and often.

Factual Errors in “Git vs Mercurial: Why Git?” from Atlassian

2 years ago, Atlassian developer Charles O’Farrell published the article Git vs. Mercurial: Why Git? in which he “showed the winning side of Git” as he sees it. This article was part of the Dev Tools series at Atlassian and written as a reply to the article Why Mercurial?. It was spiced with so much misinformation that the comments exploded right away. But the article was never corrected. Just now I was referred to the text again, and I decided to do what I should have done 2 years ago: Write an answer which debunks the myths.

“I also think that git isn’t the most beginner-friendly program. That’s why I’m only using its elementary features” — “I hear that from many git-users …” — part of the discussion which got me to write this article

Safer history and rewriting history with Git

Charles starts off by contradicting himself: He claims that git is safer, because it “actually never lets you change anything” - and goes on to explain, that all unreferenced data can be garbage collected after 30 days. Since nowadays the git garbage collector runs automatically, all unreferenced changes are lost after approximately 30 days.

This obviously means that git does allow you to change something. That this change only becomes irreversible after 30 days is an implementation detail which you have to keep in mind if you want to be safe.1

He then goes on to say how this allows for easy history rewriting with the interactive rebase and correctly includes, that the histedit extension of Mercurial allows you to do the same. (He also mentions the Mercurial Queues Extension (mq), just to admit that it is not the equivalent of git rebase -i but instead provides a staging area for future commits).

Then he starts the FUD2: Since histedit stores its backup in an external file, he asks rhetorically what new commands he would have to learn to restore it.

Dear reader, what new command might be required to pull data out of a backup? Something like git ref? Something like git reflog to find it and then something else?

Turns out, this is as easy and consistent as most things in Mercurial: Backup bundles can be treated just like repositories: To restore the changes, simply use

hg pull backup.bundle

So, all FUD removed, his take on safer history and rewriting history is reduced to “in hg it’s different, and potentially confusing features are shipped as extensions. Recovering changes from backups is consistent with your day-to-day usage of hg”.

(note that the flexibility of hg also enables the creation of extensions like mutable hg which avoids all the potential race conditions with git rebase - even for code you share between repositories (which is a total no-go in git), with a safety net which warns you if you try to change published history; thanks to the core feature phases)

Branching

On branching Charles goes deep into misinformation: He wrote his article in the year 2012, when Mercurial had already provided named branches as well as anonymous branching for 6 years, and one year after bookmarks became a core feature in hg 1.8, and he kept talking about how Mercurial advised to keep one clone per branch by referencing to a blog post which incorrectly assumed that the hg developers were using that workflow (obviously he did not bother to check that claim). Also he went on clamoring, that bookmarks initially could not be pushed between repositories, and how they were added “due to popular demand”. The reality is, that at some point a developer simply said “I’ll write that”. And within a few months, he implemented the equivalent of git branches. Before that, no hg developer saw enough need for them to excert that effort and today most still simply use named branches.

But obviously Charles could not imagine named branches to work, so he kept talking about how bookmarks do not have namespaces while git branches have them, and that this would create confusion. He showed the following example for git and Mercurial (shortened here):

* 9e4b1b8 (origin/master, origin/test) Remove unused variable
| * 565ad9c (HEAD, master) Added Hello example
|/
* 46f0ac9 Initial commit

and

o  changeset:   2:67deb4acba33
|  bookmark:    master@default
|  summary:     Third commit
|
| @  changeset:   1:2d479c025719
|/   bookmark:    master
|    summary:     Second commit
|
o  changeset:   0:e0e024ff06ad
   summary:     First commit

Then he asked: “would the real master branch please stand up?”

Let’s try to answer that:

Git: there is a commit marked as (origin/master, origin/test), and one marked as (HEAD, master). If you know that origin is the canonical remote repository in git, then you can guess, that the names prefixed with origin/ come from the remote repository.

Mercurial: There is a commit with the bookmark master@default and one with the bookmark master. When you know that default is the canonical remote repository in Mercurial, then you can guess, that the bookmark postfixed with @default comes from the remote repository.

But Charles concludes his example with the sentence: “Because there is no notion of namespaces, we have no way of knowing which bookmarks are local and which ones are remote, and depending on what we call them, we might start running into conflicts.”

And this is not only FUD, it is factually wrong and disproven in his own example. After this, I cannot understand how anyone could take his text seriously.

But he goes on.

Staging

His final misinformation is about the git index - a staging area for uncommitted changes. He correctly identifies the index as “one of the things that people either love or hate about Git”. As Mercurial cares a lot about giving newcomers a safe environment to work in, it ships this controversial feature as extension and not as core command.

Charles now claims that the equivalent of the git index is the record extension - and then complains that it does not imitate the index exactly, because it does not give a staging area but rather allows committing partial changes. Instead of now turning towards the Mercurial Queues Extension which he mentioned earlier as staging area for commits, he asserts that record cannot provide the same feature as git.

Not very surprisingly, when you have an extension to provide partial commits (record) and one to provide a staging area (mq), if you want both, you simply activate both extensions. When you do that, Mercurial offers the qrecord command which stores partial changes in the current staging area.

Not mentioning this is simply a matter of not having done proper research for his article - and not updating the post means that he intentionally continues to spread misinformation.

Blame

The only thing he got right is that git blame is able to reconstruct copies of code from one file to another.

Mercurial provides this for renamed files, but not for directly copy-pasted lines. Analysis of the commits would naturally allow doing the same, and all the information for that is available, but this is not implemented yet. If people ask for it loud enough, it will only be a matter of time, though. As bookmarks showed, the Mercurial code base is clean enough that it suffices to have a single developer who steps up and create an extension for this. If enough people use it, the extension can become a core feature later on.

Conclusion

“There is a reason why hg users tend to talk less about hg: There is no need to talk about it that much.” — Arne Babenhauserheide as answer to Why Mercurial?

Charles concludes with “Git means never having to say, you should have”, and “Mercurial feels like Git lite”. Since he obviously did not do his research on Mercurial while he took the time to acquire in-depth knowledge of git, it’s quite understandable that he thinks this. But it is no base for writing an article - especially not for Atlassian, the most prominent Mercurial hosting provider since their acquisition of Bitbucket, which grew big as pure Mercurial hoster and added git after being acquired by Atlassian.

He then manages to finish his article with one more unfounded smoke bomb: The repository format drives what is possible with our DVCS tools, now and in the future.

While this statement actually is true, in the context of git-vs-mercurial it is a horrible misfit: The hg-git extension shows since 2009, 3 years before Charles wrote his article, that it is possible to convert transparently from git to Mercurial and back. So the repository format of Mercurial has all capabilities of the repository format of git - and since git cannot natively store named branches, represent branches with multiple heads or push changes into a checked out branch, the capabilities of the repository format of Mercurial are actually a superset of the capabilities of the storage format of Git.

But what he also states is that “there are more important things than having a cuddly command line”. And this is the final misleading statement to debunk: While the command line does not determine what is theoretically possible with the tool, it does determine what regular users can do with it. The horrible command line of git likely contributes to the many git users who never use anything but commit -a, push and pull - and to the proliferation of git gurus whom the normal users call when git shot them into their foot again.

It’s sad when someone uses his writing skills to wrap FUD and misinformation into pretty packaging to get people to take his side. Even more sad is, that this often works for quite some time and that few people read the comments section.3

And now that I finished debunking the article, there is one final thing I want to share. It is a quote from the discussion which prompted me to write this piece:

<…> btw. I also think that git isn’t the most beginner-friendly program.
<…> That’s why I’m only using its elementary features
<ArneBab> I hear that from many git-users…
<…> oh, maybe I should have another look at hg after all

This is a translation of the real quote in German:

<…> ich finde btw auch dass git nicht gerade das anfängerfreundlichste programm ist
<…> darum nutze ich das auch nur recht rudimentär
<ArneBab> das höre ich von vielen git-Nutzern…
<…> oha. nagut, dann sollte ich mir hg vielleicht doch nochmal ansehen

Note: hg is short for Mercurial. It is how Mercurial is called on the command line.

Footnotes:

1

Garbage collection after 30 days means that you have to remember additional information while you work. And that is a problem: You waste resources which would be better spent on the code you write. A DVCS should be about having to remember less, because your DVCS keeps the state for you.