diff --git a/README.md b/README.md
index 6ffa519..729692e 100644
--- a/README.md
+++ b/README.md
@@ -81,32 +81,38 @@ To reproduce our benchmarks, you can go through the notebooks provided in the `/
 You can add new test cases by adding new notebooks to the `/notebooks/human-eval-bia` directory. 
 Check out the examples there and make sure to stick to the following rules.
 
-![CAUTION]
+> ![CAUTION]
 > Most importantly: When writing new test case notebooks, do not use language models for code generation. 
 > You would otherwise bias the benchmark towards this model. 
 > Use human-writen code only and/or examples from the documentation of specific librarires.
 
 The notebooks have to have the following format:
-* Within one cell there must be a function that solves a specific [bio-image analysis] task. Very basic example, computing the sum of two numbers:
+* Within one cell there must be a function that solves a specific [bio-image analysis] task. An example would be to compute the number and sum of all pixels in an image:
 ```python
-def sum(a, b):
+def compute_image_sum(image):
     """
-    This function computes the sum of two numbers.
+    Takes an image as a numpy array as an input and returns the number and sum of all pixels as outputs.
     """
-    return a + b
+    flattened_image = image_array.flatten()
+    num_pixels = flattened_image.size
+    sum_pixels = np.sum(flattened_image)
+    return num_pixels, sum_pixels
 ```
-* This function must have a meaningful docstring between """ and """. It must be so meaningful that a language model could possibly write the entire function.
-* There must be another code cell that starts with `def check(candiate):` and contains test code to test the generated code.
-* The text code must use `assert` statements and call the `candidate` function. E.g. if a given function to test is `sum`, then a valid test for `sum` would be:
+* The function must have a meaningful docstring between """ and """ which will serve as prompt together with the function signature. Ideally, write a short natural sentence one could hear between two humans. It must be specific enough though, so that a language model (or a human) has all necessary information to write the entire function. For example, it is **not** specific enough to just write "Takes an image as an input...", because then the model cannot really know whether this is a path to an image, a numpy array, or something else. Also make sure you specify return values detailed enough; and if there is more than one return value, be aware that the order of those values matters: "...returns the number and sum of all pixels..." is different from  "...returns the sum and number of all pixels...". [Check out the list of pre-existing prompts](https://github.com/haesleinhuepf/human-eval-bia/blob/main/test_cases/readme.md) to get some inspiration. 
+* There must be another code cell that starts with `def check(candidate):` and contains test code to test the generated code.
+* The test code must use `assert` statements and call the `candidate` function. E.g. if a given function to test is `compute_image_sum`, then a valid test for `compute_image_sum` would be:
 ```
 def check(candidate):
-    assert candidate(3, 4) == 7
+    image = np.array([[1, 2], [3, 4]])
+    num_pixels, sum_pixels = candidate(compute_image_sum)
+    assert num_pixels == 4
+    assert sum_pixels == 10
 ```
 * A third python code cell in the notebook must call the `check` function with your custom function, e.g. like this, to prove that the code you provided works with the tests you wrote:
 ```
-check(sum)
+check(sum_image)
 ```
-* Save the new test-case in a notebook that has the same name as the test, so that people can find it easily. In our case above: `sum.ipynb`.
+* Save the new test-case in a notebook that has the same name as the test function, so that others can find it easily. In our case above: `my_sum.ipynb`.
 * Optional: You can add as many markdown cells as you like to explain the test case.
 
 ## Adding dependencies