<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Dr. Lukas Pfannschmidt | Lukas Pfannschmidt</title><link>https://lpfann.me/author/dr.-lukas-pfannschmidt/</link><atom:link href="https://lpfann.me/author/dr.-lukas-pfannschmidt/index.xml" rel="self" type="application/rss+xml"/><description>Dr. Lukas Pfannschmidt</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><copyright>Lukas Pfannschmidt © 2026</copyright><image><url>https://lpfann.me/author/dr.-lukas-pfannschmidt/avatar_hu76ce8048580910718dc4c38eabc0134f_89317_270x270_fill_q70_lanczos_center.jpg</url><title>Dr. Lukas Pfannschmidt</title><link>https://lpfann.me/author/dr.-lukas-pfannschmidt/</link></image><item><title>Supercharge Your Developer Productivity</title><link>https://lpfann.me/post/productivity-tools/</link><pubDate>Sat, 04 Nov 2023 17:05:04 +0100</pubDate><guid>https://lpfann.me/post/productivity-tools/</guid><description>&lt;h1 id="boost-your-macbooks-productivity-with-these-power-tools">Boost Your Macbook&amp;rsquo;s Productivity with These Power Tools&lt;/h1>
&lt;p>Maximize your efficiency as a developer on macOS with a few savvy tools and shortcuts designed to speed up your workflow. Here&amp;rsquo;s a rundown of the tools I&amp;rsquo;ve integrated into my routine to navigate and manage projects more efficiently.&lt;/p>
&lt;h2 id="oh-my-zsh-the-powerhouse-shell">Oh My Zsh: The Powerhouse Shell&lt;/h2>
&lt;p>&lt;a href="https://ohmyz.sh/" target="_blank" rel="noopener">Oh My Zsh&lt;/a> is a collection of extensions for the normal &lt;code>ZSH&lt;/code>. It comes packed with handy features and plugins to help enhance your terminal experience.
It also allows using custom prompts like the &lt;a href="https://spaceship-prompt.sh/" target="_blank" rel="noopener">Spaceship prompt&lt;/a>, which provides a wealth of information at a glance, including the current directory, git status, and Python virtual environment.&lt;/p>
&lt;h3 id="venv-display-plugin">Venv Display plugin&lt;/h3>
&lt;p>For developers who work with Python, keeping track of virtual environments is crucial. Spaceship allows you to display your current environment directly in the prompt, ensuring you&amp;rsquo;re always aware of the context you&amp;rsquo;re working in.&lt;/p>
&lt;h3 id="k8s-display">K8s Display&lt;/h3>
&lt;p>If you&amp;rsquo;re juggling multiple &lt;a href="https://kubernetes.io/" target="_blank" rel="noopener">Kubernetes&lt;/a> contexts, Spaceship can display the current context and namespace, saving you from the confusion of command-line checks.&lt;/p>
&lt;h3 id="git-branch-display">Git Branch Display&lt;/h3>
&lt;p>Avoid the git status commands with the branch name on display, helping you keep tabs on your current work branch without additional commands.&lt;/p>
&lt;p>Integrating these features into your terminal can greatly improve your navigation and productivity within complex development workflows.&lt;/p>
&lt;h2 id="jump-to-any-directory-based-on-substring">Jump to any directory based on substring&lt;/h2>
&lt;p>&lt;a href="https://github.com/ajeetdsouza/zoxide" target="_blank" rel="noopener">zoxide&lt;/a> replaced &lt;code>cd&lt;/code> command for me. It allows you to jump to any directory based on a substring of its name. It learns your habits and uses a ranking algorithm to prioritize the most likely directory you want to jump to.&lt;/p>
&lt;pre>&lt;code class="language-bash">z foo
&lt;/code>&lt;/pre>
&lt;p>will jump to the directory containing &lt;code>foo&lt;/code> in its name, sorted by frequency of use.
I basically never have to use &lt;code>z&lt;/code> a second time because it is nearly 100% accurate in guessing the directory I want to jump to.&lt;/p>
&lt;h2 id="jump-to-any-command-based-on-substring">Jump to any command based on substring&lt;/h2>
&lt;p>Similar for my shell command history, I use &lt;a href="https://github.com/junegunn/fzf" target="_blank" rel="noopener">fzf&lt;/a>. With &lt;code>Ctrl-R&lt;/code> I can search through my command history and jump to any command based on a substring of its name.&lt;/p>
&lt;p>Here is an example of searching for &lt;code>docker&lt;/code> in my command history:
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Example of using docker command in shell history. It showjs multiple commands which contain the word docker, like docker-compose logs" srcset="
/post/productivity-tools/fzf_hucdaace42089d01d626724e6dfd195f58_30318_6512e95012e89c6f57c38f03bfd1a294.png 400w,
/post/productivity-tools/fzf_hucdaace42089d01d626724e6dfd195f58_30318_cb7f1cf668b7d0047b0302667b6ef3d0.png 760w,
/post/productivity-tools/fzf_hucdaace42089d01d626724e6dfd195f58_30318_1200x1200_fit_lanczos_3.png 1200w"
src="../../post/productivity-tools/fzf_hucdaace42089d01d626724e6dfd195f58_30318_6512e95012e89c6f57c38f03bfd1a294.png"
width="606"
height="249"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2 id="raycast-your-search-supercharged">Raycast: Your Search, Supercharged&lt;/h2>
&lt;p>&lt;a href="https://raycast.com/" target="_blank" rel="noopener">Raycast&lt;/a> replaces the need for multiple apps and tools by consolidating all your search and command needs into one sleek, unified application.
It can be accessed with a simple keyboard shortcut, allowing you to search for files, open applications, and execute commands without ever leaving your keyboard.&lt;/p>
&lt;h3 id="github-repositories">GitHub Repositories&lt;/h3>
&lt;p>With Raycast&amp;rsquo;s powerful plugin capabilities you can add the &lt;a href="https://www.raycast.com/raycast/github" target="_blank" rel="noopener">Github extension&lt;/a>. It gives you the ability to swiftly navigate through all the repositories in your organization—a real time-saver for developers handling multiple projects.&lt;/p>
&lt;h3 id="vs-code-project-switching">VS Code Project Switching&lt;/h3>
&lt;p>Forget about sifting through your project directories. Raycast lets you switch between your &lt;a href="https://code.visualstudio.com/" target="_blank" rel="noopener">Visual Studio Code&lt;/a> projects without breaking your flow.&lt;/p>
&lt;h3 id="instant-zoom-access">Instant Zoom Access&lt;/h3>
&lt;p>Raycast also provides a direct line to your scheduled Zoom meetings, allowing you to join with just one command—no more digging through emails or calendars.&lt;/p>
&lt;h2 id="wrapping-up">Wrapping Up&lt;/h2>
&lt;p>Incorporating Oh My Zsh, Fuzzy Finder, fasd, and Raycast into your daily routine is like adding superpowers to your development workflow. These tools help minimize friction and maximize productivity, letting you focus on what you do best: creating incredible software. Try them out and see the difference for yourself.&lt;/p>
&lt;p>This curated selection of tools is by no means exhaustive, but it represents a personal toolbox that has significantly improved my efficiency on macOS. Hopefully, they&amp;rsquo;ll do the same for you.&lt;/p></description></item><item><title> Knative Serving: Streamlining Microservice Deployment on Kubernetes</title><link>https://lpfann.me/post/knative-serving/</link><pubDate>Sat, 04 Nov 2023 11:09:29 +0100</pubDate><guid>https://lpfann.me/post/knative-serving/</guid><description>&lt;p>Kubernetes has revolutionized the way organizations deploy and manage applications at scale. However, its complexity can be daunting for developers who may not be familiar with container orchestration concepts. Enter Knative Serving, a Kubernetes-based platform that simplifies the deployment and scaling of serverless applications.&lt;/p>
&lt;h2 id="knative-serving-making-kubernetes-accessible-to-all-developers">Knative Serving: Making Kubernetes Accessible to All Developers&lt;/h2>
&lt;p>Knative Serving builds on Kubernetes to support deploying and serving of applications and functions as serverless, autoscaling services. It abstracts away much of the Kubernetes-specific workflow, allowing developers to focus on writing code. This simplifies the deployment model, where a single configuration file can replace a myriad of Kubernetes objects and commands.&lt;/p>
&lt;h2 id="scaling-microservices-with-knative-serving">Scaling Microservices with Knative Serving&lt;/h2>
&lt;p>One of the standout features of Knative Serving is its ability to automatically manage the scale of your applications, including scaling down to zero when services are not in use. This feature is particularly useful for cost-saving and efficient resource utilization.&lt;/p>
&lt;p>Knative Serving supports various scaling metrics and parameters, allowing for fine-tuned control over how your applications respond to traffic demands. Developers can specify the number of concurrent requests per pod and control the ramp-up and cool-down behavior of their services.&lt;/p>
&lt;h2 id="example-knative-serving-configuration">Example: Knative Serving Configuration&lt;/h2>
&lt;p>Here&amp;rsquo;s a look at a Knative Service configuration that showcases the simplicity of getting a service up and running:&lt;/p>
&lt;pre>&lt;code class="language-yaml">apiVersion: serving.knative.dev/v1
kind: Service
metadata:
name: example-service
namespace: default
spec:
template:
spec:
containers:
- image: gcr.io/my-project/my-app:latest
ports:
- containerPort: 8080
&lt;/code>&lt;/pre>
&lt;p>Note that the configuration file is significantly shorter than the equivalent Kubernetes deployment file, which would require additional objects such as a deployment, service, and ingress.
Knative has sensible defaults for many of its parameters, allowing developers to get started quickly.
Knative wil take care of the rest, including creating the necessary Kubernetes objects and managing the scaling of your service.&lt;/p>
&lt;ul>
&lt;li>Health check using the container port.&lt;/li>
&lt;li>Deploying the service will create a new revision of the application.&lt;/li>
&lt;li>The revision will be scaled to zero if there are no requests for a specified period of time.&lt;/li>
&lt;li>A new revision will be created when the service is updated, allowing for seamless rollouts and rollbacks.&lt;/li>
&lt;li>Traffic splitting can be configured to allow for canary rollouts and A/B testing.&lt;/li>
&lt;/ul>
&lt;p>However, Knative Serving also provides the flexibility to customize many parameters to suit your needs.&lt;/p>
&lt;h3 id="autoscaling">Autoscaling&lt;/h3>
&lt;p>For example the autoscaling configuration can be modified to specify the minimum and maximum number of pods, the maximum number of concurrent requests per pod, and the target CPU utilization percentage.
The default autoscaling in vanilla Kubernetes is the Horizontal Pod Autoscaler (HPA), which scales based on CPU utilization. Knative Serving uses a custom autoscaler that supports scaling based on concurrency, which is more suitable for serverless applications.&lt;/p>
&lt;p>The default in Knative Serving is identical to using those annotations on the service:&lt;/p>
&lt;pre>&lt;code class="language-yaml">spec:
template:
metadata:
annotations:
autoscaling.knative.dev/metric: &amp;quot;concurrency&amp;quot;
autoscaling.knative.dev/target-utilization-percentage: &amp;quot;70&amp;quot;
&lt;/code>&lt;/pre>
&lt;p>To revert back to plain CPU-based autoscaling, you can use the following annotations:&lt;/p>
&lt;pre>&lt;code class="language-yaml"> autoscaling.knative.dev/class: &amp;quot;hpa.autoscaling.knative.dev&amp;quot;
autoscaling.knative.dev/metric: &amp;quot;cpu&amp;quot;
autoscaling.knative.dev/target: &amp;quot;100&amp;quot;
&lt;/code>&lt;/pre>
&lt;p>which would scale up another pod if the CPU utilization of the current pod is at 100%.&lt;/p>
&lt;p>More information on Knative Serving configuration can be found in the &lt;a href="https://knative.dev/docs/serving/autoscaling/" target="_blank" rel="noopener">official documentation&lt;/a>.&lt;/p>
&lt;h2 id="effortless-deployment-pipelines-with-argocd">Effortless Deployment Pipelines with ArgoCD&lt;/h2>
&lt;p>ArgoCD can integrate with Knative Serving to create a seamless deployment pipeline. This GitOps tool allows developers to simply merge changes into specific branches, such as the main branch for integration or deployment branches for staging and production environments, to initiate automated deployment processes.&lt;/p>
&lt;p>A Continuous Integration (CI) process like Github Actions can be triggered by a merge into the main branch, which will build the container image and tag it with a version.
A subsequent merge into a deployment branch can prompt ArgoCD to deploy the tagged image to the respective environment.&lt;/p>
&lt;h3 id="branching-strategy">Branching Strategy&lt;/h3>
&lt;p>To visualize the workflow, imagine a branching strategy resembling the following:&lt;/p>
&lt;pre>&lt;code class="language-plaintext">[main] ---- [development] ---- [feature branches]
\ /
\-- [staging] -- [QA] -- [production]
&lt;/code>&lt;/pre>
&lt;p>The only interface for developers is the GitHub UI, no special tools or knowledge of Kubernetes is required. This allows for a clear separation of concerns, where developers can focus on writing code and leave the deployment and scaling to Knative Serving and ArgoCD.&lt;/p>
&lt;h2 id="knative-serving-vs-aws-lambda">Knative Serving vs. AWS Lambda&lt;/h2>
&lt;p>Knative Serving offers a similar proposition to AWS Lambda in that it removes the need for developers to manage the underlying infrastructure. However, unlike the closed AWS Lambda environment, Knative operates on the open-source Kubernetes system, allowing for use across multiple cloud providers or on-premises environments. It also hooks into the Kubernetes ecosystem, allowing for seamless integration with other tools and services&lt;/p>
&lt;h2 id="in-conclusion">In Conclusion&lt;/h2>
&lt;p>Knative Serving stands as a robust solution for teams seeking the benefits of serverless architectures without the intricate knowledge of Kubernetes. It simplifies application deployment, automates scaling to match demand, and integrates easily with modern development workflows. By providing developers with tools that are easy to use and manage, Knative ensures that the focus remains on creating value through application functionality, not infrastructure complexity.&lt;/p>
&lt;p>For organizations already invested in Kubernetes, Knative Serving offers a way to streamline and enhance their deployment strategies without the need for extensive Kubernetes expertise, thus further democratizing the power of container orchestration.&lt;/p>
&lt;h4 id="ps-knative-eventing">PS: Knative Eventing&lt;/h4>
&lt;p>Knative not only offers the Serving component but also an event mesh and primitives to control delivery of async events. This allows for a more complete serverless experience, where events can trigger serverless functions and services. This is a topic for another post, but I wanted to mention it here as it is a powerful feature of Knative.&lt;/p></description></item><item><title>Efficient Machine Learning Model Deployment: Integrating Seldon into MLOps Workflows</title><link>https://lpfann.me/post/seldon/</link><pubDate>Sat, 14 Oct 2023 11:09:29 +0100</pubDate><guid>https://lpfann.me/post/seldon/</guid><description>&lt;h1 id="enhancing-mlops-with-seldon-advantages-and-practical-deployment-with-scikit-learn">Enhancing MLOps with Seldon: Advantages and Practical Deployment with Scikit-Learn&lt;/h1>
&lt;p>Deploying machine learning models can often be a complex process that extends beyond the model&amp;rsquo;s development. The ease with which these models transition into production can significantly impact their usefulness and applicability. In this context, Seldon Core offers a suite of features that cater to various aspects of MLOps with a particular emphasis on ease of monitoring, scaling, and deployment. In this article, I&amp;rsquo;ll outline some of the features I appreciate about Seldon and walk through the deployment of a Scikit-Learn classifier using Seldon&amp;rsquo;s tools.&lt;/p>
&lt;h2 id="advantages-of-using-seldon-in-mlops">Advantages of Using Seldon in MLOps&lt;/h2>
&lt;p>&lt;strong>Easy Monitoring with Prometheus:&lt;/strong>
One of the more tedious aspects of machine learning operations is setting up monitoring for deployed models. Seldon simplifies this by providing out-of-the-box integration with Prometheus, a powerful monitoring system that automatically collects and stores metrics in a time-series database. This integration allows for real-time monitoring of a wide array of model performance metrics, without the need for complex setup procedures.&lt;/p>
&lt;p>&lt;strong>Automatic Scaling with KEDA:&lt;/strong>
Maintaining the balance between resource allocation and cost-efficiency is key in production environments. Seldon integrates with Kubernetes Event-driven Autoscaling (KEDA) to facilitate automatic scaling of machine learning models. KEDA allows Seldon deployments to scale based on metrics from external sources like Kafka queues, providing a responsive and resource-efficient solution for handling variable workloads. This is especially useful for scaling to zero, which allows for significant cost savings when the model is not in use.&lt;/p>
&lt;p>&lt;strong>Seamless Deployments:&lt;/strong>
The need for smooth rollouts and updates to machine learning models cannot be overstated. Seldon supports seamless deployments, allowing for blue-green testing, canary rollouts, and phased introductions of new model versions. This results in reduced downtime and improved user experience, as new features or models can be tested and rolled out with minimal disruption to the production service.&lt;/p>
&lt;h2 id="practical-deployment-a-scikit-learn-classifier-example">Practical Deployment: A Scikit-Learn Classifier Example&lt;/h2>
&lt;p>To demonstrate the advantages mentioned above, let&amp;rsquo;s consider the deployment of a simple logistic regression classifier using Scikit-Learn, wrapped with Seldon&amp;rsquo;s Sklearn server.&lt;/p>
&lt;pre>&lt;code class="language-python">from sklearn import datasets
from sklearn.linear_model import LogisticRegression
import joblib
# Load Iris dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
# Create and train the logistic regression model
model = LogisticRegression()
model.fit(X, y)
# Serialize the model to a file
joblib.dump(model, 'model.joblib')
&lt;/code>&lt;/pre>
&lt;p>After training and saving the model, we create a SeldonDeployment custom resource definition (CRD) that outlines the deployment specifics:&lt;/p>
&lt;pre>&lt;code class="language-yaml">apiVersion: machinelearning.seldon.io/v1
kind: SeldonDeployment
metadata:
name: iris-model
spec:
predictors:
- name: default
replicas: 1
graph:
name: iris-classifier
implementation: SKLEARN_SERVER
modelUri: gs://my-bucket/iris-model
envSecretRefName: seldon-init-container-secret
&lt;/code>&lt;/pre>
&lt;p>Using &lt;code>kubectl&lt;/code>, we apply the manifest to our Kubernetes cluster, which triggers the deployment process orchestrated by Seldon Core.
In the end, it will create a new plain Kubernetes deployment with a pod running our model. The pod will be exposed via a Kubernetes service, which can be used to send requests to the model.
Depending on the deployment setup, it can also be exposed via an Istio gateway, which allows for more advanced traffic management and monitoring.&lt;/p>
&lt;p>If we make updates, we should create a new version of the model and update the SeldonDeployment CRD to point to the new model version. This will trigger a rolling update of the deployment, which will ensure that the model is updated without any downtime.&lt;/p>
&lt;h3 id="advanced-deployment-options">Advanced Deployment Options&lt;/h3>
&lt;p>For more advanced use cases, Seldon also provides support for other machine learning frameworks like TensorFlow, PyTorch, and XGBoost, as well as integration with other tools like KubeFlow and Kubeflow Pipelines. It is also possible to wrap your own custom model code with Seldon&amp;rsquo;s Python server.
This gives extreme flexibility in terms of deployment options, allowing for a wide range of use cases without the need for extensive deployment code.&lt;/p>
&lt;h3 id="monitoring-and-predicting-with-your-deployed-model">Monitoring and Predicting with Your Deployed Model&lt;/h3>
&lt;p>With our model deployed, we can utilize Prometheus to monitor it closely. This setup allows us to keep track of our model&amp;rsquo;s performance and health with ease. Prometheus can be queried to fetch relevant metrics, which aids in maintaining the robustness of the deployed model. More information can be found in the &lt;a href="https://docs.seldon.io/projects/seldon-core/en/latest/analytics/analytics.html" target="_blank" rel="noopener">Seldon documentation&lt;/a>.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>By providing built-in support for monitoring, scaling, and deployment, Seldon Core addresses three critical aspects of MLOps, making the journey from model development to production a lot smoother.
This simplifies many of the operational complexities and allows data scientists and ML engineers to focus more on model improvement and less on the intricacies of production environments.
As we&amp;rsquo;ve seen, leveraging Seldon Core with a Scikit-Learn model can be a very straightforward process, illustrating how practical and beneficial Seldon can be in real-world applications.&lt;/p></description></item><item><title>Reproducible Experiments in Machine Learning</title><link>https://lpfann.me/post/reproducible-experiments/</link><pubDate>Mon, 04 May 2020 17:10:53 +0200</pubDate><guid>https://lpfann.me/post/reproducible-experiments/</guid><description>&lt;h2 id="replication-crisis">Replication Crisis&lt;/h2>
&lt;p>Today not only the economy but also science is working in a breakneck pace.
Even more accelerated through the current pandemic, the iteration time of new scientific research is short and not much time for peer review is available.&lt;/p>
&lt;p>Good practice in science (and life in general) is the replication of results: to check for correctness or to facilitate understanding.
This is crucial in a peer review process, having an ever increasing amount of scientific papers with questionable quality.
One big problem is therefore the lack of replicable results also known as the &lt;a href="https://en.wikipedia.org/wiki/Replication_crisis" target="_blank" rel="noopener">replication crisis&lt;/a>.
The term covers many facets of this problem in different scientific disciplines.
Specifically in machine learning many results and comparisons are questionable.
A recent &lt;a href="https://dl.acm.org/doi/10.1145/3298689.3347058" target="_blank" rel="noopener">study&lt;/a> tried to replicate results in scientific papers
and was only successful on average 40% of the time.&lt;/p>
&lt;h2 id="current-problems">Current problems&lt;/h2>
&lt;p>There are many frustrating aspects I encountered when reading papers in machine learning:&lt;/p>
&lt;h3 id="1-no-public-data-used">1. No public data used&lt;/h3>
&lt;p>While I understand the reasons for it a study should not solely rely on private &lt;em>data&lt;/em> to highlight its merits.&lt;/p>
&lt;h3 id="2-no-source-code">2. No source code&lt;/h3>
&lt;p>While many scientists are not software engineers and are shy about sharing their &lt;em>paper-deadline scripts&lt;/em> it should at least be part of the requirements in journals to &lt;em>share the code&lt;/em>.
It is not reasonable for other peers to replicate results by implementing algorithms themselves.&lt;/p>
&lt;h3 id="3-high-installusage-barriers">3. High install/usage barriers&lt;/h3>
&lt;p>If the source code is available, necessary declarations of &lt;em>dependencies&lt;/em> can be missing which requires installing those manually (if possible).
This problem gets worse with age of the publication, as newer versions of programming languages or libraries are not always backwards compatible.&lt;/p>
&lt;h3 id="4-no-replicable-experimental-setup">4. No replicable experimental setup&lt;/h3>
&lt;p>Even if 2. and 3. are fulfilled, sometimes the experimental scripts are missing.
While most parameters should be part of the scientific manuscript itself, sometimes authors forget to mention crucial preprocessing steps or parameters.
If the complete experimental script is available, which was used to produce the results in the paper, this problem would be impossible.&lt;/p>
&lt;h3 id="5-lack-of-necessary-resources-to-replicate-models">(5. Lack of necessary resources to replicate models)&lt;/h3>
&lt;p>Even if all the former things are provided, another problem, specifically in deep learning, is the necessary amount of resources to train a model.
Many big players (Google et al.) in this area have nearly unlimited GPU resources available which is unattainable for many research institutions.
&lt;em>This also often leads to the question, if stated improvements are based on architectural changes or on more time for training.&lt;/em>&lt;/p>
&lt;h2 id="towards-reproducible-science-and-experiments">Towards reproducible science and experiments&lt;/h2>
&lt;p>While aspects 1. and 2. are getting better in my opinion a solution to 5. is still not clear to me.
On the other hand 3. and 4. can be improved but are still lacking in academia as they require skills in software engineering and development
most often found in industry applications.&lt;/p>
&lt;p>In the following I will describe how I made the experiments in my newest scientific &lt;a href="https://lpfann.me/publication/pfannschmidt-sequential-feature-classification-2020/">preprint&lt;/a> reproducible.&lt;/p>
&lt;p>The source code of the algorithm is available on &lt;a href="https://github.com/lpfann/squamish" target="_blank" rel="noopener">GitHub&lt;/a>.&lt;/p>
&lt;p>In my case I am using Python for the implementation.
There exist (too) many approaches for Python to declare dependencies.
A new alternative which tries to encapsulate the best ideas of all before is the tool &lt;a href="https://python-poetry.org/" target="_blank" rel="noopener">Poetry&lt;/a>, which allows the declaration of general requirements and also specific versions.&lt;/p>
&lt;figure id="figure-excerpt-of-general-dependency-declaration-for-poetry">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Excerpt of general dependency declaration for Poetry." srcset="
/post/reproducible-experiments/poetry_hu0a6f7bc5515536603ab496f1898bfa93_70512_c146d73af2519588397b9ff918c1c965.png 400w,
/post/reproducible-experiments/poetry_hu0a6f7bc5515536603ab496f1898bfa93_70512_e2b6365c3f491665fb94c633b486a999.png 760w,
/post/reproducible-experiments/poetry_hu0a6f7bc5515536603ab496f1898bfa93_70512_1200x1200_fit_lanczos_3.png 1200w"
src="../../post/reproducible-experiments/poetry_hu0a6f7bc5515536603ab496f1898bfa93_70512_c146d73af2519588397b9ff918c1c965.png"
width="683"
height="695"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Excerpt of general dependency declaration for Poetry.
&lt;/figcaption>&lt;/figure>
&lt;p>In addition, Poetry supports the automatic creation of virtual environments which encapsulate these specific dependencies, even if the global Python environment is widely different to the original creators.
These environments are defined by hashes which Poetry automatically derives and are located in the &lt;a href="https://github.com/lpfann/squamish_experiments/blob/master/poetry.lock" target="_blank" rel="noopener">poetry.lock&lt;/a> file.&lt;/p>
&lt;p>To get another layer of encapsulation we also utilize &lt;a href="https://www.opencontainers.org/" target="_blank" rel="noopener">Containers&lt;/a> made popular under the name &lt;em>Docker&lt;/em>.
While Poetry encapsulates Python environments, containers can encapsulate the complete operating system.
This makes it possible to run experiments even many years in the future with the same global software stack.&lt;/p>
&lt;p>We provide a GitHub repository for all &lt;a href="https://github.com/lpfann/squamish_experiments" target="_blank" rel="noopener">experiments&lt;/a> with a &lt;a href="https://github.com/lpfann/squamish_experiments/blob/master/Dockerfile" target="_blank" rel="noopener">Dockerfile&lt;/a> included, which is basically a recipe list for all software needed (including the OS).&lt;/p>
&lt;figure id="figure-excerpt-of-dockerfile-responsible-for-installing-poetry-and-its-dependencies">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Excerpt of Dockerfile responsible for installing Poetry and its dependencies." srcset="
/post/reproducible-experiments/dockerfile_hudcd8c965f5ba68d6fe68ed65a05b4a43_58406_5e694664c796bab3ccd79e2cc64480e3.png 400w,
/post/reproducible-experiments/dockerfile_hudcd8c965f5ba68d6fe68ed65a05b4a43_58406_cdb0152f1f9ee614faefe85f7abe29e0.png 760w,
/post/reproducible-experiments/dockerfile_hudcd8c965f5ba68d6fe68ed65a05b4a43_58406_1200x1200_fit_lanczos_3.png 1200w"
src="../../post/reproducible-experiments/dockerfile_hudcd8c965f5ba68d6fe68ed65a05b4a43_58406_5e694664c796bab3ccd79e2cc64480e3.png"
width="760"
height="350"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Excerpt of Dockerfile responsible for installing Poetry and its dependencies.
&lt;/figcaption>&lt;/figure>
&lt;p>In short the Dockerfile instructs the container builder to use a specific OS and Poetry to install all Python dependencies and create an &lt;em>image&lt;/em>.
One can also execute these instructions beforehand to create a &lt;em>container image&lt;/em> and upload it to a public &lt;a href="https://hub.docker.com/repository/docker/mirek1337/squamish_experiments" target="_blank" rel="noopener">repository&lt;/a>, which makes building unnecessary.&lt;/p>
&lt;p>Now we can provide the potential reviewer or user with the following instructions which automatically perform replication:&lt;/p>
&lt;h2 id="instructions-for-replication">Instructions for replication&lt;/h2>
&lt;p>To replicate the experimental results of the paper (figure and tables)&lt;/p>
&lt;h3 id="1-get-container-image">1. Get container image&lt;/h3>
&lt;p>Build the image yourself with&lt;/p>
&lt;pre>&lt;code class="language-sh">docker build -t squamish_experiments .
&lt;/code>&lt;/pre>
&lt;p>or pull it from &lt;code>DockerHub&lt;/code>&lt;/p>
&lt;pre>&lt;code class="language-sh">docker pull mirek1337/squamish_experiments
&lt;/code>&lt;/pre>
&lt;h3 id="2-run-container">2. Run container&lt;/h3>
&lt;pre>&lt;code class="language-sh">docker run -v ./tmp:/exp/tmp:Z -v ./output:/exp/output:Z -it squamish_experiments make
&lt;/code>&lt;/pre>
&lt;p>which calls &lt;code>make&lt;/code> inside the container to execute all experiments in the &lt;code>Makefile&lt;/code>.
After the experiments are done (can take several hours) the output should end up in the &lt;code>./output&lt;/code> folder.&lt;/p>
&lt;p>It&amp;rsquo;s also possible to change the following parameters as environment variables in the docker command via the &lt;code>-e&lt;/code> option:&lt;/p>
&lt;p>Defaults used in paper&lt;/p>
&lt;ul>
&lt;li>&lt;code>SEED&lt;/code> = 123&lt;/li>
&lt;li>&lt;code>REPEATS&lt;/code> = 10&lt;/li>
&lt;li>&lt;code>N_THREADS&lt;/code> = 1&lt;/li>
&lt;/ul>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>While working with containers is still new for many scientists, the advantages are big.
One can not expect everybody to know the tools in detail, and more work in usable abstractions is needed.
For now there are project &lt;a href="https://github.com/timtroendle/cookiecutter-reproducible-research" target="_blank" rel="noopener">templates&lt;/a> or &lt;a href="https://github.com/IDSIA/sacred" target="_blank" rel="noopener">libraries&lt;/a> available with which make this work easier.&lt;/p></description></item><item><title>Decentralized Website</title><link>https://lpfann.me/post/decentralized-site/</link><pubDate>Tue, 25 Feb 2020 18:11:19 +0100</pubDate><guid>https://lpfann.me/post/decentralized-site/</guid><description>&lt;p>The website you are reading can be completely used without a running backend on a server.
Such a website is known as &lt;em>static&lt;/em>.&lt;/p>
&lt;p>&lt;em>Static&lt;/em> websites deliver all the content and logic (JavaScript) to the browser.
All the interaction, such as search or clicking on internal links, is happening through the JS scripts included.
While this sounds like a layman would expect it to, this is far from the current state of the internet.&lt;/p>
&lt;p>In the early days, many websites only consisted of static HTML sites.
Today, many modern websites rely on a running centralized backend server.
This enables dynamic experiences but also leads to &lt;a href="https://en.wikipedia.org/wiki/Link_rot" target="_blank" rel="noopener">link rot&lt;/a>, where specific websites (and their URLs) have a limited lifespan.
Many people experienced the sight of dead links at least once and this problem is expected to grow with an ageing internet.&lt;/p>
&lt;h2 id="content-addressable-storage">Content-Addressable Storage&lt;/h2>
&lt;p>A recent push to decentralize the internet again lead to technologies such as content-addressable storage.&lt;/p>
&lt;p>Normal URLs on the internet such as &lt;code>https://lpfann.me/&lt;/code> are arbitrarily chosen words and have no relation to the actual content.&lt;/p>
&lt;p>Content-addressing uses a mathematical hash function to &lt;em>compress&lt;/em> the contents of a website into a short string called a &lt;em>hash&lt;/em>.
The great thing about hash functions is that they most likely produce a unique output and as such a unique address.&lt;/p>
&lt;p>This allows use cases where people can serve and exchange content just based on a content hash.
An example for an application of this is &lt;a href="https://ipfs.io" target="_blank" rel="noopener">IPFS&lt;/a> (Interplanetary Filesystem).&lt;/p>
&lt;p>IPFS introduced an address scheme for content and also the exchange of information using peer to peer networking without a central server.
People using the IPFS application automatically act as servers for other peers when they have information another node needs.&lt;/p>
&lt;p>This enables a more robust and decentralized web without the need of a big central server or a content distribution network.&lt;/p>
&lt;p>To host a website using IPFS we need it to be static.&lt;/p>
&lt;h2 id="making-a-website-static">Making a website static&lt;/h2>
&lt;p>This website is built using &lt;a href="https://gohugo.io/" target="_blank" rel="noopener">Hugo&lt;/a> which already produces static output.
It is only important to enable &lt;a href="https://gohugo.io/content-management/urls/#relative-urls" target="_blank" rel="noopener">relativeURLs&lt;/a> to work with the IPFS addressing.&lt;/p>
&lt;p>We are also using the &lt;a href="https://sourcethemes.com/academic/" target="_blank" rel="noopener">Academic&lt;/a> theme for Hugo.
Academic uses several external font and JavaScript resources to enhance the content presentation.
While hosting a IPFS-website with references to non-IPFS resources is perfectly possible, it is not completely decentralized.&lt;/p>
&lt;p>Luckily the Academic theme also provides a &lt;a href="https://github.com/sourcethemes/academic-admin/" target="_blank" rel="noopener">downloader tool&lt;/a>, which saves all external assets inside the website folder.&lt;/p>
&lt;p>At the time of writing the main downloader does not support all assets yet, but an open &lt;a href="https://github.com/sourcethemes/academic-admin/pull/57" target="_blank" rel="noopener">pull request&lt;/a> added support for most of the missing things.
Another thing missing were the fonts, which originally came from &lt;a href="https://fonts.google.com/" target="_blank" rel="noopener">Googles Font CDN&lt;/a> which we downloaded manually.&lt;/p>
&lt;p>Now we have a complete website running on local fonts and JavaScript assets&lt;sup id="fnref:1">&lt;a href="#fn:1" class="footnote-ref" role="doc-noteref">1&lt;/a>&lt;/sup>.
As such you could download the website files and kill the internet connection and you would have the same experience.&lt;/p>
&lt;h2 id="hosting-an-ipfs-website">Hosting an IPFS website&lt;/h2>
&lt;p>If we would use IPFS to hash our website we would get a content hash like this:&lt;/p>
&lt;pre>&lt;code class="language-sh">/ipfs/QmSPZuY3K1XieH7M9zh4qs9MEGFf4GZdBv3STaiJpBaC6o
&lt;/code>&lt;/pre>
&lt;p>Now somebody else could retrieve the website using his own IPFS client directly or using one of the available browser plugins.&lt;/p>
&lt;figure id="figure-draft-of-this-blog-post-hashed-and-pinned-to-local-ipfs-node">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="Draft of this blog post hashed and pinned to local IPFS node." srcset="
/post/decentralized-site/terminal_pin_hu46f3b4e5b2f023caedfbe9b2a2de6338_44064_034d2fe79fb91efce2cebbc810b12cfa.png 400w,
/post/decentralized-site/terminal_pin_hu46f3b4e5b2f023caedfbe9b2a2de6338_44064_f47f093261e036c8111e133d65bda01c.png 760w,
/post/decentralized-site/terminal_pin_hu46f3b4e5b2f023caedfbe9b2a2de6338_44064_1200x1200_fit_lanczos_3.png 1200w"
src="../../post/decentralized-site/terminal_pin_hu46f3b4e5b2f023caedfbe9b2a2de6338_44064_034d2fe79fb91efce2cebbc810b12cfa.png"
width="740"
height="355"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Draft of this blog post hashed and pinned to local IPFS node.
&lt;/figcaption>&lt;/figure>
&lt;p>For somebody else to retrieve the files of the website we would have to keep a IPFS node running or ask somebody else to keep it cached (&lt;em>pinned&lt;/em>).&lt;/p>
&lt;p>There are so called pinning services (e.g. &lt;a href="https://pinata.cloud/" target="_blank" rel="noopener">Pinata&lt;/a>) which provide this service.
Another project is &lt;a href="https://filecoin.io/" target="_blank" rel="noopener">Filecoin&lt;/a> which is built on top of IPFS.
It provides monetary incentive using a type of Blockchain to reward nodes to keep IPFS files pinned.&lt;/p>
&lt;p>&lt;blockquote class="twitter-tweet">&lt;p lang="en" dir="ltr">&lt;a href="https://twitter.com/hashtag/Dynamic?src=hash&amp;amp;ref_src=twsrc%5Etfw">#Dynamic&lt;/a> folders for pinning and managing data on &lt;a href="https://twitter.com/IpfSbot?ref_src=twsrc%5Etfw">@ipfsbot&lt;/a>: Introducing Textile Buckets. A tool to host your &lt;a href="https://twitter.com/hashtag/staticwebsite?src=hash&amp;amp;ref_src=twsrc%5Etfw">#staticwebsite&lt;/a>, app assets, &lt;a href="https://twitter.com/hashtag/opensource?src=hash&amp;amp;ref_src=twsrc%5Etfw">#opensource&lt;/a> code and more. &lt;a href="https://twitter.com/hashtag/commandline?src=hash&amp;amp;ref_src=twsrc%5Etfw">#commandline&lt;/a> tool, &lt;a href="https://twitter.com/hashtag/CI?src=hash&amp;amp;ref_src=twsrc%5Etfw">#CI&lt;/a> integration, and &lt;a href="https://twitter.com/hashtag/web3?src=hash&amp;amp;ref_src=twsrc%5Etfw">#web3&lt;/a> gateway. Check&amp;#39;em out: &lt;a href="https://t.co/K6RY5e1t2h">https://t.co/K6RY5e1t2h&lt;/a> &lt;a href="https://t.co/JyRgvknMAt">pic.twitter.com/JyRgvknMAt&lt;/a>&lt;/p>&amp;mdash; Recall Labs | re/acc (@RecallLabs_) &lt;a href="https://twitter.com/RecallLabs_/status/1231996985760051200?ref_src=twsrc%5Etfw">February 24, 2020&lt;/a>&lt;/blockquote>
&lt;script async src="https://platform.twitter.com/widgets.js" charset="utf-8">&lt;/script>
In the last few days we looked for ways to automatically pin this website when new content is added to the &lt;a href="https://github.com/lpfann/website" target="_blank" rel="noopener">git repository&lt;/a>.
Just yesterday &lt;a href="https://blog.textile.io/first-look-at-textile-buckets-dynamic-ipfs-folders/" target="_blank" rel="noopener">Textile&lt;/a> announced dynamic &lt;em>buckets&lt;/em> working on top of IPFS.
While not the main focus of their blogpost, they also presented new GitHub Actions which automatically deploy content to their free bucket hosting.
We extended &lt;a href="https://github.com/textileio/gatsby-ipfs-blog" target="_blank" rel="noopener">their scripts&lt;/a> on the demo site based on Gatsby to work with Hugo.
&lt;figure id="figure-github-action-building-and-pushing-files-to-textile-bucket">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img alt="GitHub Action building and pushing files to Textile bucket" srcset="
/post/decentralized-site/gh-action_hu246af52953b593e1d7597384993b4d21_56467_ed8cae45cadd05a602f5ab38fd9dfc3c.png 400w,
/post/decentralized-site/gh-action_hu246af52953b593e1d7597384993b4d21_56467_ac73292fa88adba6d95da85a9b131567.png 760w,
/post/decentralized-site/gh-action_hu246af52953b593e1d7597384993b4d21_56467_1200x1200_fit_lanczos_3.png 1200w"
src="../../post/decentralized-site/gh-action_hu246af52953b593e1d7597384993b4d21_56467_ed8cae45cadd05a602f5ab38fd9dfc3c.png"
width="760"
height="502"
loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
GitHub Action building and pushing files to Textile bucket
&lt;/figcaption>&lt;/figure>
Now after every push and pull request, the GitHub Action compiles Hugo output and pushes it to a Textile bucket which is also pinned and works with IPFS.&lt;/p>
&lt;p>Our website content is automatically available under a content hash after every change and &lt;a href="https://github.com/lpfann/website/blob/master/.github/workflows/bucket_publish.yml" target="_blank" rel="noopener">push&lt;/a> to the repository.&lt;/p>
&lt;h3 id="dns">DNS&lt;/h3>
&lt;p>To let people know that a site is available with IPFS one can use &lt;a href="https://docs.ipfs.io/guides/concepts/dnslink/" target="_blank" rel="noopener">DNSLinks&lt;/a>.
These are TXT records attached to DNS domains which hint at the IPFS resource available.
IPFS browser extensions can detect these records and automatically use IPFS for content retrieval when coming to such a site.&lt;/p>
&lt;p>The scripts from Textile also included an &lt;a href="https://github.com/lpfann/website/blob/master/.github/workflows/update_dnslink.yml" target="_blank" rel="noopener">updater&lt;/a> for DNS records which post the IPFS hash to Cloudflare DNS service.
This script updates the DNSLink after every manual release.&lt;/p>
&lt;h3 id="ethereum-name-service-ens">Ethereum Name Service (ENS)&lt;/h3>
&lt;p>To have a completely decentralized solution one can use technologies like &lt;a href="https://ens.domains/" target="_blank" rel="noopener">ENS&lt;/a> which is an alternative to the DNS system.&lt;/p>
&lt;p>Our website is also available under the ENS domain &lt;a href="https://pfannschmidt.eth" target="_blank" rel="noopener">https://pfannschmidt.eth&lt;/a> or via the transition link &lt;a href="https://pfannschmidt.eth.link" target="_blank" rel="noopener">https://pfannschmidt.eth.link/&lt;/a> which uses the &lt;code>eth.link&lt;/code> service to allow browsers without ENS support to visit the site.&lt;/p>
&lt;p>For now, we update the IPFS hash stored in ENS manually, but we could automate this in the future.&lt;/p>
&lt;h2 id="backwards-compatibility">Backwards Compatibility.&lt;/h2>
&lt;p>IPFS is still in its early stages.
Most popular browsers do not support the protocol which is necessary to reach the majority of web users.&lt;/p>
&lt;p>Until that changes one additionally needs to host websites the traditional way using web servers and DNS.
One can use &lt;a href="https://developers.cloudflare.com/distributed-web/ipfs-gateway/connecting-website/" target="_blank" rel="noopener">Cloudflares&lt;/a> IPFS gateway and DNS solution to automatically serve IPFS content over normal HTTP.&lt;/p>
&lt;p>For now this Blog is hosted by &lt;a href="https://www.netlify.com/" target="_blank" rel="noopener">Netlify&lt;/a> for non-IPFS enabled visitors.&lt;/p>
&lt;h2 id="summary">Summary&lt;/h2>
&lt;p>Overall this process is still very much a complicated and hard thing.
While IPFS and its ecosystem is steadily improving there is still a lot to do.&lt;/p>
&lt;p>Luckily new services such as &lt;code>fleek&lt;/code> &lt;del>Terminal.co&lt;/del> are coming up which provide end to end decentralized hosting solutions.&lt;/p>
&lt;h2 id="update">Update&lt;/h2>
&lt;p>We now tried out &lt;a href="https://fleek.co/" target="_blank" rel="noopener">fleek&lt;/a> which makes it a lot easier to deploy a static site to IPFS.
They automatically build your site from your GitHub repository, pin it on IPFS and also handle your DNSLinks such that people know you also provide a content hash.&lt;/p>
&lt;section class="footnotes" role="doc-endnotes">
&lt;hr>
&lt;ol>
&lt;li id="fn:1" role="doc-endnote">
&lt;p>We have one&lt;sup id="fnref:2">&lt;a href="#fn:2" class="footnote-ref" role="doc-noteref">2&lt;/a>&lt;/sup> external reference left which provides our visitor counting script. Missing it would not influence the usability negatively for the visitors. (You could argue it would improve the experience 😉)&amp;#160;&lt;a href="#fnref:1" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;li id="fn:2" role="doc-endnote">
&lt;p>After publishing this article we added a new &lt;a href="https://commento.io/" target="_blank" rel="noopener">commenting system&lt;/a>. While it is self hosted, it is not decentralized. Apparently, that is &lt;a href="https://fixingtao.com/2016/06/how-to-create-a-fairly-decentralized-commenting-system/" target="_blank" rel="noopener">still a non-trivial thing&lt;/a> to do.&amp;#160;&lt;a href="#fnref:2" class="footnote-backref" role="doc-backlink">&amp;#x21a9;&amp;#xfe0e;&lt;/a>&lt;/p>
&lt;/li>
&lt;/ol>
&lt;/section></description></item><item><title>FRI Quickstart Guide</title><link>https://lpfann.me/post/fri-user-guide/</link><pubDate>Thu, 02 May 2019 13:51:40 +0200</pubDate><guid>https://lpfann.me/post/fri-user-guide/</guid><description>&lt;h1 id="quick-start-guide">Quick start guide&lt;/h1>
&lt;p>In this guide i am going describe how to use the FRI python library to analyse arbitrary datasets.&lt;/p>
&lt;p>(This guide is a copy of the official documentation found &lt;a href="https://lpfann.github.io/fri/notebooks/Guide.html" target="_blank" rel="noopener">here&lt;/a>)&lt;/p>
&lt;h2 id="installation">Installation&lt;/h2>
&lt;h3 id="stable">Stable&lt;/h3>
&lt;p>Fri can be installed via the Python Package Index (PyPI).&lt;/p>
&lt;p>If you have &lt;code>pip&lt;/code> installed just execute the command&lt;/p>
&lt;pre>&lt;code>pip install fri
&lt;/code>&lt;/pre>
&lt;p>to get the newest stable version.&lt;/p>
&lt;p>The dependencies should be installed and checked automatically.
If you have problems installing please open issue at our &lt;a href="https://github.com/lpfann/fri/issues/new" target="_blank" rel="noopener">tracker&lt;/a>.&lt;/p>
&lt;h3 id="development">Development&lt;/h3>
&lt;p>To install a bleeding edge dev version of &lt;code>FRI&lt;/code> you can clone the GitHub repository using&lt;/p>
&lt;pre>&lt;code>git clone git@github.com:lpfann/fri.git
&lt;/code>&lt;/pre>
&lt;p>and then check out the &lt;code>dev&lt;/code> branch: &lt;code>git checkout dev&lt;/code>.&lt;/p>
&lt;p>To check if everything works as intented you can use &lt;code>pytest&lt;/code> to run the unit tests.
Just run the command&lt;/p>
&lt;pre>&lt;code>pytest
&lt;/code>&lt;/pre>
&lt;p>in the main project folder&lt;/p>
&lt;pre>&lt;code class="language-python"># For the purpose of viewing this notebook online we install the library directly with pip
!pip install fri
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>Requirement already satisfied: fri in /home/lpfannschmidt/workbench/fri (3.4.0+2.g1eb5429.dirty)
Requirement already satisfied: numpy in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from fri) (1.15.1)
Requirement already satisfied: scipy&amp;gt;=0.19 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from fri) (1.1.0)
Requirement already satisfied: scikit-learn&amp;gt;=0.18 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from fri) (0.19.2)
Requirement already satisfied: cvxpy==1.0.8 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from fri) (1.0.8)
Requirement already satisfied: ecos==2.0.5 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from fri) (2.0.5)
Requirement already satisfied: matplotlib in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from fri) (2.2.3)
Requirement already satisfied: scs&amp;gt;=1.1.3 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from cvxpy==1.0.8-&amp;gt;fri) (2.0.2)
Requirement already satisfied: toolz in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from cvxpy==1.0.8-&amp;gt;fri) (0.9.0)
Requirement already satisfied: multiprocess in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from cvxpy==1.0.8-&amp;gt;fri) (0.70.6.1)
Requirement already satisfied: osqp in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from cvxpy==1.0.8-&amp;gt;fri) (0.4.1)
Requirement already satisfied: fastcache in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from cvxpy==1.0.8-&amp;gt;fri) (1.0.2)
Requirement already satisfied: six in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from cvxpy==1.0.8-&amp;gt;fri) (1.11.0)
Requirement already satisfied: cycler&amp;gt;=0.10 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from matplotlib-&amp;gt;fri) (0.10.0)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,&amp;gt;=2.0.1 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from matplotlib-&amp;gt;fri) (2.2.0)
Requirement already satisfied: python-dateutil&amp;gt;=2.1 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from matplotlib-&amp;gt;fri) (2.7.3)
Requirement already satisfied: pytz in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from matplotlib-&amp;gt;fri) (2018.5)
Requirement already satisfied: kiwisolver&amp;gt;=1.0.1 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from matplotlib-&amp;gt;fri) (1.0.1)
Requirement already satisfied: dill&amp;gt;=0.2.8.1 in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from multiprocess-&amp;gt;cvxpy==1.0.8-&amp;gt;fri) (0.2.8.2)
Requirement already satisfied: future in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from osqp-&amp;gt;cvxpy==1.0.8-&amp;gt;fri) (0.16.0)
Requirement already satisfied: setuptools in /home/lpfannschmidt/anaconda3/lib/python3.7/site-packages (from kiwisolver&amp;gt;=1.0.1-&amp;gt;matplotlib-&amp;gt;fri) (40.2.0)
&lt;/code>&lt;/pre>
&lt;h2 id="using-fri">Using FRI&lt;/h2>
&lt;p>Now we showcase the workflow of using FRI on a simple classification problem.&lt;/p>
&lt;h3 id="data">Data&lt;/h3>
&lt;p>To have something to work with, we need some data first.
&lt;code>fri&lt;/code> includes a generation method for binary classification and regression data.&lt;/p>
&lt;p>In our case we need some classification data.&lt;/p>
&lt;pre>&lt;code class="language-python">from fri import genClassificationData
&lt;/code>&lt;/pre>
&lt;p>We want to create a small set with a few features.&lt;/p>
&lt;p>Because we want to showcase the all-relevant feature selection, we generate multiple strongly and weakly relevant features.&lt;/p>
&lt;pre>&lt;code class="language-python">n = 100
features = 6
strongly_relevant = 2
weakly_relevant = 2
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python">X,y = genClassificationData(n_samples=n,
n_features=features,
n_strel=strongly_relevant,
n_redundant=weakly_relevant,
random_state=123)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>Generating dataset with d=6,n=100,strongly=2,weakly=2, partition of weakly=None
&lt;/code>&lt;/pre>
&lt;p>The method also prints out the parameters again.&lt;/p>
&lt;pre>&lt;code class="language-python">X.shape
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>(100, 6)
&lt;/code>&lt;/pre>
&lt;p>We created a binary classification set with 6 features of which 2 are strongly relevant and 2 weakly relevant.&lt;/p>
&lt;h4 id="preprocess">Preprocess&lt;/h4>
&lt;p>Because our method expects mean centered data we need to standardize it first.
This centers the values around 0 and deviation to the standard deviation&lt;/p>
&lt;pre>&lt;code class="language-python">from sklearn.preprocessing import StandardScaler
X_scaled = StandardScaler().fit_transform(X)
&lt;/code>&lt;/pre>
&lt;h3 id="model">Model&lt;/h3>
&lt;p>Now we need to creata a Model.
We use the &lt;code>FRIClassification&lt;/code> class.&lt;/p>
&lt;p>For regression one would use &lt;code>FRIRegression&lt;/code>&lt;/p>
&lt;pre>&lt;code class="language-python">from fri import FRIClassification
fri_model = FRIClassification()
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python">fri_model
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>FRIClassification(C=None, debug=False, n_resampling=3,
optimum_deviation=0.001, parallel=False, random_state=None)
&lt;/code>&lt;/pre>
&lt;p>We used no parameters for creation so the defaults are active.&lt;/p>
&lt;p>&lt;code>C=None&lt;/code> means, that &lt;code>FRI&lt;/code> itself chooses the regularization parameter &lt;code>C&lt;/code> using crossvalidation on a fixed grid.&lt;/p>
&lt;p>By default, parallel computation is also disabled but can be enabled using &lt;code>parallel=True&lt;/code>.&lt;/p>
&lt;h4 id="fitting-to-data">Fitting to data&lt;/h4>
&lt;p>Now we can just fit the model to the data using &lt;code>scikit-learn&lt;/code> like commands.&lt;/p>
&lt;pre>&lt;code class="language-python">fri_model.fit(X_scaled,y)
&lt;/code>&lt;/pre>
&lt;p>The resulting feature relevance bounds are saved in the &lt;code>interval_&lt;/code> variable.&lt;/p>
&lt;pre>&lt;code class="language-python">fri_model.interval_
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>array([[0.45993233, 0.46169499],
[0.26954548, 0.27159876],
[0. , 0.25802293],
[0. , 0.25802293],
[0.00516909, 0.00711219],
[0.00446591, 0.00694219]])
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python">fri_model.interval_.shape
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>(6, 2)
&lt;/code>&lt;/pre>
&lt;p>The bounds are grouped in 2d sublists for each feature.&lt;/p>
&lt;p>To acess the relevance bounds for feature 2 we would use&lt;/p>
&lt;pre>&lt;code class="language-python">fri_model.interval_[2]
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>array([0. , 0.25802293])
&lt;/code>&lt;/pre>
&lt;p>The relevance classes are saved in the corresponding variable &lt;code>relevance_classes_&lt;/code>:&lt;/p>
&lt;pre>&lt;code class="language-python">fri_model.relevance_classes_
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>array([2, 2, 1, 1, 0, 0])
&lt;/code>&lt;/pre>
&lt;p>&lt;code>2&lt;/code> denotes strongly relevant features, &lt;code>1&lt;/code> weakly relevant and &lt;code>0&lt;/code> irrelevant.&lt;/p>
&lt;h4 id="plot-results">Plot results&lt;/h4>
&lt;p>The bounds in numerical form are useful for postprocesing.
If we want a human to look at it, we recommend the plot function &lt;code>plot_relevance_bars&lt;/code>.&lt;/p>
&lt;p>We can also color the bars according to &lt;code>relevance_classes_&lt;/code>&lt;/p>
&lt;pre>&lt;code class="language-python"># Import plot function
from fri.plot import plot_relevance_bars
import matplotlib.pyplot as plt
%matplotlib inline
# Create new figure, where we can put an axis on
fig, ax = plt.subplots(1, 1,figsize=(6,3))
# plot the bars on the axis, colored according to fri
out = plot_relevance_bars(ax,fri_model.interval_,classes=fri_model.relevance_classes_)
&lt;/code>&lt;/pre>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./Guide_28_0.png" alt="png" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>In the plot we can see both strongly relevant features 1 and 2 not allowing much change in their contribution.
Feature 3 and 4 are highly correlated and show therefore a big variance.
Noise features 5 and 6 show some necessary contribution which can be accounted to numerical instabilities of the solver.&lt;/p>
&lt;h3 id="print-internal-parameters">Print internal Parameters&lt;/h3>
&lt;p>If we want to take at internal parameters, we can use the &lt;code>debug&lt;/code> flag in the model creation.&lt;/p>
&lt;pre>&lt;code class="language-python">fri_model = FRIClassification(debug=True)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python">fri_model.fit(X_scaled,y)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>loss 0.517120931358002
L1 6.743126681372926
offset 0.32474176019022094
C 1
score 1.0
coef:
[[ 3.10516847]
[-1.82001413]
[ 0.86614471]
[-0.86614471]
[-0.03919911]
[-0.03971916]]
&lt;/code>&lt;/pre>
&lt;p>This prints out the parameters of the baseline model &lt;code>loss&lt;/code> (sum of slack), &lt;code>L1&lt;/code> ($L_1$ norm of weight vector) and &lt;code>offset&lt;/code> (from the origin).
&lt;code>coef&lt;/code> shows the coefficients of the baseline model.&lt;/p>
&lt;p>One can also see the best &lt;code>C&lt;/code> according to gridsearch and the training score of the model in &lt;code>score&lt;/code>.&lt;/p>
&lt;p>These values can also be accessed by the object variables.&lt;/p>
&lt;h5 id="print-out-hyperparameter-found-by-gridsearchcv">Print out hyperparameter found by GridSearchCV:&lt;/h5>
&lt;pre>&lt;code class="language-python">fri_model.tuned_C_
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>1
&lt;/code>&lt;/pre>
&lt;p>or the baseline parameters:&lt;/p>
&lt;pre>&lt;code class="language-python">fri_model.optim_L1_
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>6.743126681372926
&lt;/code>&lt;/pre>
&lt;h3 id="setting-constraints-manually">Setting constraints manually&lt;/h3>
&lt;p>Our model also allows to compute relevance bounds when the user sets a given range for the features.&lt;/p>
&lt;h4 id="presets">Presets&lt;/h4>
&lt;p>Presets are encoded using a array in the same shape as the &lt;code>interval_&lt;/code> variable.
Each value represents the user given minimum and maximum contribution of the feature.
If one would set both values to be the same, we interpret this feature as fixed.&lt;/p>
&lt;p>Additionally, entries with &lt;code>np.nan&lt;/code> are interpreted as not-set or free.&lt;/p>
&lt;pre>&lt;code class="language-python">import numpy as np
preset = np.full_like(fri_model.interval_,np.nan,dtype=np.double)
&lt;/code>&lt;/pre>
&lt;p>Now we have a preset array without any constraints:&lt;/p>
&lt;pre>&lt;code class="language-python">preset
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>array([[nan, nan],
[nan, nan],
[nan, nan],
[nan, nan],
[nan, nan],
[nan, nan]])
&lt;/code>&lt;/pre>
&lt;h4 id="example">Example&lt;/h4>
&lt;p>As an example, let us constrain feature 3 from our example to the minimum relevance bound.&lt;/p>
&lt;p>Note the different indexing using numpy (3 -&amp;gt; 2)&lt;/p>
&lt;pre>&lt;code class="language-python">preset[2] = fri_model.interval_[2, 0]
&lt;/code>&lt;/pre>
&lt;p>We use the function &lt;code>constrained_intervals_&lt;/code>.&lt;/p>
&lt;p>Note: we need to fit the model before we can use this function.
We already did that, so we are fine.&lt;/p>
&lt;pre>&lt;code class="language-python">constrained_interval = fri_model.constrained_intervals_(preset=preset)
&lt;/code>&lt;/pre>
&lt;pre>&lt;code class="language-python">constrained_interval
&lt;/code>&lt;/pre>
&lt;pre>&lt;code>array([[0.45993233, 0.46169499],
[0.26954548, 0.27159876],
[0. , 0. ],
[0.25608488, 0.25802293],
[0.00516909, 0.00711219],
[0.00446591, 0.0069422 ]])
&lt;/code>&lt;/pre>
&lt;p>Feature 3 is set to its minimum (at 0).&lt;/p>
&lt;p>How does it look visually?&lt;/p>
&lt;pre>&lt;code class="language-python">fig, ax = plt.subplots(1, 1,figsize=(6,3))
out = plot_relevance_bars(ax, constrained_interval)
&lt;/code>&lt;/pre>
&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="./Guide_48_0.png" alt="png" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;p>Feature 3 is reduced to its minimum (no contribution).&lt;/p>
&lt;p>In turn, its correlated partner feature 4 had to take its maximum contribution.&lt;/p></description></item></channel></rss>