In the era of growing privacy consciousness, the recent introduction of Microsoft's "Windows Recall" feature on Windows laptops has raised alarm bells among many users. This feature continuously records screen activities and analyzes the screen, extracting text from images, and users can interact with it about everything that they did on that computer. Users are worried about the mandatory cloud transmission of their data.
These worries are well-founded, as transmitting personal data to the cloud inherently involves risks such as unauthorized access, data breaches, and potential misuse of sensitive information. To address these privacy concerns, and for a fun challenge we have developed an open-source pilot that demonstrates that we can have comparable core functionality using open-source tooling.
The focus on this blog is to prioritize user privacy and security by eliminating the need to transfer any data to external servers. By keeping and processing all the user data on the user's machine, we ensure that users have full control over their information, significantly reducing the risks and privacy concerns.
Using the Power of Open-Source and Open Models
This project, check source in GitHub, relies on the power of open-source tools and open models in creating robust, privacy-conscious alternatives to proprietary solutions. By using a suite of open-source libraries and models, we have developed a tiny tool that meets user needs in a privacy-preserving fashion.
Key Components:
Screen Capture: We use the PyAutoGUI library to capture screenshots of the user's screen at regular intervals, ensuring a continuous record of their activities. At the end of this post, it is shown how the user interface looks like.
Optical Character Recognition (OCR): The DocTR library, a state-of-the-art OCR tool, is used to extract text from the captured screenshots.
Language Model: To provide contextual understanding and insights based on the extracted text, we utilize Qwen2:1.5b language model, a local and privacy-preserving alternative to large language models (LLMs).
Semantic Search and Reranking: The core of our solution lies in the integration of JinaAI's embedding model and reranker. The embedding model creates high-quality vector representations of the extracted text, capturing the semantic meaning of the information. The reranker model then enhances the accuracy of the retrieved data, ensuring that the most relevant information is presented to the user.
Vector Database: We use ChromaDB, an open-source vector database, to store the text embeddings for efficient retrieval and querying.
Running The Setup
To run the setup locally follow the instructions in the video below:
Next Steps:
To further improve this open-source tool, one can implement the following features:
Image Compression: Reduces storage needs and speeds up processing by minimizing screenshot sizes.
Advanced Image Embedding: Enhances text extraction accuracy and contextual understanding from screenshots.
Simplified Installation: Makes setup easier for users of all technical levels, promoting broader adoption.
Memory Usage Optimization: Reduce memory usage for smoother performance on machines with limited resources.
Summary
In summary, this pilot provides a privacy-conscious alternative to Windows Recall, ensuring that all data processing and storage remain under the user's control. Using the power of open-source libraries and models, including JinaAI's high-quality embedding and reranking capabilities, we have developed a feature-rich solution that addresses the privacy concerns surrounding cloud-based data transmission. This project demonstrates the potential of open-source tools and models to deliver robust functionality while prioritizing user privacy.