
Background
One year ago, IBM Research open sourced Docling, a toolkit that transforms unstructured documents into usable, AI-ready data. In just 12 months, the project garnered ~40,000 GitHub stars (as of October 2025) and attracted a dedicated community of users and contributors worldwide. Docling's adoption made it one of IBM's most popular open-source projects and earned it acceptance into the Linux Foundation's LF AI & Data as an incubating project.
Docling originated as an ambitious research initiative within IBM to address a seemingly straightforward question: "How do you get clean, structured data from unstructured documents?" In late 2024, a team at IBM Research Zurich, led by Peter Staar (Principal Research Staff Member at IBM and now Chair of Docling's Technical Steering Committee), decided to open source their toolkit. The project quickly became a go-to solution for AI engineers working with PDFs, Word documents, spreadsheets, and other file types.
Docling converts diverse documents (e.g., PDF, DOCX, HTML, images, audio transcripts) into a unified, structured format ready for large language models and knowledge bases. It features advanced PDF layout understanding, optical character recognition (OCR) for scans, and an ultra-compact vision-language model (GraniteDocling) for document parsing.
Docling collaborated with Red Hat to launch a Docling OpenShift Operator for large-scale deployments, and its community continued to grow significantly. By its first anniversary, Docling had tens of thousands of users and ~40,000 GitHub stars (as of October 2025). "From a community perspective, we've always put a lot of emphasis on being fast at fixing bugs and adding new features, continuously keeping momentum and velocity very high," Dr. Staar noted.
The Challenge
Once Docling gained popularity, they were inundated with GitHub issues, installation problems, community questions, bug reports, and feature requests. Each month, the maintainers dealt with an average of 130 new issues and pull requests, with issues comprising nearly two-thirds of the volume (83 issues vs 47 PRs), far more than they had the bandwidth to triage manually. In peak months, such as November 2024, this number reached 215 items. Even spending just 15 minutes triaging each item would require over 50 hours of work with only four maintainers to share the load.
Duplicate Questions
Many incoming issues were duplicates of earlier discussions. For example, multiple users would ask how to configure OCR or why a specific HTML section was missing after conversion. The team found themselves typing out the same answers over and over, or linking to FAQs and docs. It was an immense amount of work and not necessarily where the team wanted to focus their expertise.
Maintainer Burnout Risks
Each day, the Docling maintainers would have a new batch of GitHub notifications to respond to. Letting issues pile up unanswered was not an option, though keeping up with the pace of growth was becoming difficult for the team.
Global User Base
Docling's adopters span multiple time zones and languages. Queries came in from enterprise teams and individuals worldwide. With a traditional approach, users might wait a day or more for a response if maintainers were offline or busy. Response lag threatened the team's real-time, welcoming support.
Michele Dolfi recalls coming out of meetings to find "tens of notification emails" waiting for the team, a daily reminder of the unsustainable pace.
Enter Dosu
Facing these challenges, the Docling team began to seek solutions. One tool kept coming up in their conversations - Dosu! Initially, there was some skepticism about using a tool like Dosu. Would it give accurate answers? Could it really understand a complex project like Docling?
After seeing Dosu handle real questions about projects like Langflow, the Docling team decided to test Dosu. They installed Dosu's GitHub app and configured it to start assisting on the Docling repositories.
We quickly realized we made a good decision on activating Dosu. The aha moment was the first time we saw a user closing an issue on their own after the Dosu reply!
— Michele Dolfi, Docling Maintainer
Once onboard, the Docling team enabled a range of Dosu's capabilities to maximize their team's impact.
Automated Issue Management
Dosu monitors every new issue and pull request, automatically labeling them based on content (bug, enhancement, or question) and correctly tagging high-priority items. One of Dosu's key strengths is answering questions by cross-referencing Docling's extensive issue history and linking to relevant past discussions. This lets Dosu detect duplicate issues, mark them, and point to existing resolutions, preventing the backlog from filling with redundant reports.
Multi-Language Q&A Responses
For usage questions or known bugs, Dosu generates helpful replies that reference project documentation, relevant code, and prior issues. It handles multi-part technical errors and suggests fixes with impressive accuracy.
Early on, this capability proved itself when a user reported a cryptic runtime error while processing a 120-page PDF with picture descriptions enabled via a HuggingFace model. Within minutes, Dosu responded: "I'm Dosu and I'm helping the Docling team. This error is a known issue in Docling when using HuggingFace VLMs like SmolVLM for picture descriptions…" and then explained the cause.
In another case, a user complained that an entire section of their HTML document was missing after conversion. Dosu's reply not only pinpointed the likely reason (limitations in the HTML parser) but suggested a creative workaround: converting the HTML to PDF first and then running Docling on the PDF, which often yields better results for complex, nested structures. Dosu's ability to provide advice by combining product knowledge with pragmatic problem-solving showed the Docling team that Dosu could go beyond rote answers and actually help users succeed.
Because Docling's community is global, the team valued that Dosu can understand and respond in multiple languages.
Ecosystem Support
The Docling maintainers integrated Dosu's "Chat with Dosu" widget on their documentation site, allowing users to receive live assistance outside of GitHub. Dosu can answer questions interactively, guiding new users through Docling's setup and features in real time.
Dosu was initially set to require maintainer approval before posting any responses. After a trust-building period, the Docling team enabled Dosu to respond automatically to their issues. Maintainers reviewed these answers and, in most cases, found them accurate and sufficiently helpful so that they could post them as-is or with minor edits. Within just one week of testing, Dosu earned the trust of the Docling team by flagging duplicate issues and auto-replying to community questions.
The turning point came during a particularly busy day when the team was heads-down preparing a major release (coincidentally, the day IBM's Granite Docling model was announced). After a long coding session, they checked GitHub issues, expecting a pile of unanswered posts.
Instead, they found that Dosu had answered every single new issue that day, with users reacting positively. The maintainers knew then that Dosu had become essential in keeping up with their community's growth.
Dosu's Impact
Operational Efficiency
Dosu now automatically handles almost 70% of incoming issues, with project leads only engaging when human insight is required. This dramatic shift freed up hundreds of hours that maintainers previously spent on triage and basic support. The impact extended beyond just new issues—Dosu's issue triage helped close a significant backlog of lingering issues by identifying duplicates and providing answers that resolved older, unanswered questions that had been sitting dormant for months.
Discoverable Knowledge
Because Dosu can read the entire project history and documentation, it crafts answers with context that maintainers may not immediately recall. It links to older issues from months ago and cites particular code commits, providing users with comprehensive responses that would take a human significantly longer to assemble. This freed Docling's developers to focus on core development and complex problems rather than support duties, allowing them to spend more time building new features—such as the upcoming structured data extraction module—and less time on routine helpdesk tasks.
Community Self-Service
Dosu's presence has indirectly improved Docling's documentation and community knowledge base. As it fields common questions, maintainers identified gaps in the docs and created FAQ entries from real user inquiries. In some cases, Dosu itself suggests topics that require documentation updates, creating a feedback loop that makes the project more approachable for newcomers. This has led to a more knowledgeable user community and reduced repetitive queries over time, as users can find answers more easily in the improved documentation.
Community
Happier, More Engaged Users
Docling's user community has responded very positively to the always-on support. Newcomers are delighted to receive prompt help, and many issues get resolved so quickly that users are unblocked and can continue building with Docling the same day, rather than waiting in frustration. Some community members have even started to interact with Dosu as if it were another team member, thanking it for solutions or asking follow-up questions in the thread. Dolfi noted, "Sometimes we found ourselves having fun reading the conversations," appreciating how users naturally engaged with Dosu to reach solutions. This dynamic was unexpected but welcome, as it shows that users feel served and heard.
Public Perception
By effectively managing issues, Docling sends a strong signal to its community that the project is well-maintained and responsive. Many open source projects suffer from "signal-to-noise" problems or languish with hundreds of open issues. Docling, by contrast, maintains a clean, actively-managed issue tracker that builds confidence among potential adopters evaluating the project for enterprise use.
Sustainable Community Growth
Thanks to Dosu, Docling has been able to scale its community support even as the user base doubled and tripled. The support experience remained smooth throughout this growth phase. The project's Discord and other channels also see less Q&A traffic, since Dosu often handles questions on GitHub before users reach out elsewhere. This has enabled the team to nurture a substantial user base with the warmth and attentiveness that would typically require a much larger team.
Community Insights
Working Smarter
The Docling case demonstrates that Dosu can seamlessly integrate into a community without replacing the human touch. The maintainers still review complex issues and engage in design discussions. Dosu helps handle repetitive work. This augments the team's capacity and enhances human interactions.
Sustainability at Scale
Reaching tens of thousands of users in a year would have been a concern for many OSS teams, often leading to burnout or abandoned issues. By embracing automation early, Docling built a sustainable workflow to support growth.
Knowledge Continuity
Docling's maintainers learned that Dosu can reference historical project context (commits, closed issues, design decisions) that a single team member may find challenging to recall. This is incredibly valuable for long-running projects. Dosu creates a rich knowledge base that new contributors and users can easily tap into, reducing the burden on project maintainers.
Looking Forward
With Dosu handling most day-to-day issues, the Docling team is excited about what comes next for the project. On the horizon for Docling is a push into more agentic capabilities, building systems that not only read documents but can dynamically generate and manipulate them. The maintainers are actively exploring on-the-fly structured content extraction and integrations with workflow automation (e.g., using Docling within AI agents and IBM's watsonx tools). These are challenging, forward-looking projects that will demand significant R&D and community coordination.
Docling plans to expand its usage of Dosu as new features become available. The team is particularly interested in Dosu's ability to classify issues and determine whether there are ways to encourage people to close tickets (or have Dosu close them) when it appears a problem has been resolved.
In short, Docling views Dosu as a long-term partner in its open source journey. As the project expands into new domains, they expect Dosu to grow alongside it, learning new parts of the code, assisting with emerging question types, and continuing to lighten the maintainer workload.
Conclusion
Docling's first year has been a whirlwind, from zero to ~40,000 GitHub stars (as of October 2025), and from an internal prototype to a thriving open source community. The project remains as responsive and engaged as ever, and the maintainers can scale their documentation and community with confidence.
We are already advising other Linux Foundation projects to adopt Dosu as a maintainer assistant. We are sure they will not regret it.
— Michele Dolfi, Docling Maintainer
If you're working with unstructured data and need a robust solution, check out Docling. If you're an open source maintainer working with a growing user base (or hoping to grow one), check out Dosu! You can join our community on Discord if you want to chat with our team!


