Unexpected double dilution

Hi there!

I’m running a turbidostat experiment with an OD range between 0.3 and 0.4. Lately, some of the vials diluted below 0.25 on most of the dilution points, as you can see in the example image.

I investigated the first dilution shown in the image by looking at the log files of the experiment:

> vial12_OD.txt

1213.7257,0.4041455191221323 <- Median OD higher than threshold, triggers dilution
1213.7434,0.4118299634803643 <- OD is not going down
1213.779,0.412122912499114 <- Second dilution triggered (see pump log)
1213.7912,0.3112550659559478 <- OD finally goes down
1213.8034,0.3151366065371684 <- Median OD near lower threshold
1213.8448,0.23536777571092626 <- Unexpected drop in OD

> vial12_pump_log.txt

1213.7257,10.98 <- First dilution, matches OD file
1213.779,11.24 <- Second dilution 3 minutes later, not reflected in OD

> vial12_ODSet.txt

1213.7257,0.3 <- Time of first dilution (correct)
1213.8034,0.4 <- Median OD near lower threshold, file updated with upper threshold

All this makes me think that the pumps are diluting way after the command is sent, so there may be some kind of miscommunication between the client and the server. Any suggestions on how to solve this?


If a command goes into the queue on the server it is immediately executed - there should not be a delay. If the command fails there is an event that is triggered to message out to connected clients, but the DPU currently is not acting on this. A quick thing you could do is add a listener for this event and log it, that will help narrow down what is going on. There is also this event that reports a command was received if you think the command is being lost between DPU->RPi.

You can also compare the logs on the server to the experiment logs here. If this problem is persisting consistently, you could run sudo supervisorctl on the RPi, then run tail -f evolver to watch in realtime as commands come in and are executed.

1 Like

Hi Zack,

Thanks for your fast answer.

I would say the commands are not failing because we end up with two dilutions being triggered (and logged). Also, I am not sure how to listen to the event you are suggesting. Right now I added something like this in eVOLVER.py.

def on_commandbroadcast(self, data):
    logger.info('Command received')

However, I hear the pump working before I see the log (sometimes is just one timepoint, sometimes is more, so I don’t think I can correctly debug the problem with the way I implemented this).

Another option could be that the culture is not mixing well enough after the dilution so the OD actually does not go down, and eVOLVER triggers yet another dilution. However, I find this hard to believe, since we end up having a much lower OD than expected.

For now, I set pump_wait = 5 instead of 3 to see if waiting a bit more for the cultures to properly mix solves the problem.

Any other ideas?

Looking at it again, it appears that the first dilutions did not get the OD below the set threshold. You can try changing this line of code such that it’s more lenient in when it determines it’s hit the threshold. The timings here are still pretty weird though.

The problem could also be that the code is taking the median of the last seven measurements. Maybe change this to average instead.

As for the listeners, what you did was correct. You just need to add the on_<event> function and then log things.

After thinking about it more, the problem probably is with the median not capturing the new OD before the pump_wait times out - your solution of increasing pump_wait will probably work. We use median to be insensitive to outlier data points, but it does mean that instantaneous dilutions have a delay before they appear in the data. You could also decrease the number of points averaged over to 4 or 5 instead of 7, which should save a minute or so.

1 Like

Yeah I think either reducing the median window or increasing pump wait would work.
Increasing pump wait will make the system more robust to outlier data points, but can be a problem if doing very large dilutions where diluting in multiple steps is preferable (to prevent overflows).
Reducing the OD median window will make it less robust to outliers (and bubbles are most frequent during a dilution) but should make the system respond much more quickly.

The size of the median window may have been set back in an earlier version of the server with a different delay between measurements and it might not hurt to default to 5, thoughts?

Thanks for your input, Chris! Yeah, modifying those parameters comes with a tradeoff.

However, I think we might have found what the problem is. It turns out that client is dealing with large files (> 6 Mb) and it takes too long to process the data from the server (about 25s, while the server is broadcasting every 20s). This causes the events to pile up in a queue, and the client not being able to handle all of them. Therefore, when the client decides to trigger a dilution, it is immediately executed, but the OD data reflecting that dilution will stall in the queue. This causes the client to think it needs to dilute again, thus triggering the second dilution.

Thanks to @heinsz and his suggestions, I’m working on a way to make the processing faster and potentially avoiding this problem for people running long eVOLVER experiments.

Haha just now realizing the timestamps on those measurements, definitely going the distance! There’s always the nuclear option of starting a new “experiment” and appending new data to the old data files separately. But solving this problem would be better overall, so good luck!

1 Like

Daniel and I talked for a while off of the forum and tried some things out - the culprit I think was a numpy function call to load every OD and ODset file every iteration .@danigarcia and I are working on making a simple function that only reads the last N lines of the data (similar to UNIX tail command) instead of loading them all up every iteration in memory, which is computationally wasteful.

1 Like

I’m writing this as a small summary of what happened and how we solved it:

My turbidostat experiment had been running for about 50 days and I started seeing an unexpected double dilution on all the vials. This happened for almost all the dilution points without an apparent pattern.

First, I looked at the OD, ODset, and pump_log files, and it seemed that when a dilution was triggered, it took a while for the OD to go down and reflect the dilution. The problem could not be the dilution happening late, since I would hear the pumps immediately after the DPU sent the dilution command. It seemed that the OD data was not real-time / that was somehow lagging.

It was troubleshooting time! I paused and resumed the experiment, stopped and restarted the experiment, stopped and restarted the server, and stopped and restarted de eVOLVER. Yet, the problem persisted. The raw OD values were also lagging, so it was not a problem of the new OD calculation method that I implemented on my eVOLVER.

Then, I just tried to compare the messages sent by eVOLVER and the ones received by my DPU. I did:

  • DPU: Printed the data object received inside the on_broadcast function
  • Server: ran sudo supervisorctl and tail -f evolver as Zack suggested here

Indeed, the server was fine, but the DPU was receiving the messages late. Well, the first messages after restarting were only a bit late, but they soon started to pile up. Therefore, I thought of timing the logic inside on_broadcast:

import time

def on_broadcast(self, data):
    delta = time.time()
    # Code
    # Custom functions
    print(time.time() - delta)

This gave me timings around 25s. Given the default broadcast_timing in conf.yml (20s), messages were piling up in a queue. After digging a bit more, I noticed that using the function np.genfromtxt() to read all the OD data was taking most of the time.

Why were some dilutions triggered normally? Because, every hour, the DPU restarts the connection with the server, precisely to avoid message buildup:

A quick workaround was to change the 3600 for 300 here, so the connection would reset every 5 minutes. However, the last 20-25% of the broadcasts would be lost.

Anyway, the optimal way to solve it was to implement a function that would read the OD file from the end to get the relevant OD points to calculate the average. And this is what was implemented on the new version.

As a comparison to the previous implementation, I took one of my example files (6.5 mb) and I read it 16 times using the previous and the current implementation, printing the times:

Time of tail_to_np():  # Reads 6 last lines

Time of np.genfromtxt():  # Reads whole file

I think that’s it. Thanks to @heinsz for all the help, suggestions and for reviewing my code.