Every year Wikimedia Ukraine brings the Ukrainian community for its annual Wikiconference. This year’s edition took place in October and early November, bringing a record 120 people across two cities and online.

Because of the ongoing Russia’s war against Ukraine, we decided to hold three smaller conferences in one to include as many participants as possible. Here’s a brief look at our biggest event of the year.

Сollage of the conference’s group photos (image by multiple authors in public domain)

The format: three conferences in one

Since 2021, Wikimedia Ukraine has a tradition of dividing our annual Wikiconference into several separate events, both online and offline. First it was caused by pandemic restrictions, now by the ongoing Russian invasion of Ukraine. 

Because of Russia’s war against Ukraine millions of Ukrainians are abroad or unable to travel within Ukraine. Some are serving in the Armed Forces and can join online but cannot take a leave to participate in-person. However, we still want the conference to be inclusive of everyone.

Doing a truly hybrid event where online participants wouldn’t get an inferior experience is too difficult and expensive (note: live-streaming an offline event for people to tune in on YouTube is great, and we are also doing it, but it’s still an offline event). The solution for us is organizing a separate online-only event on a separate weekend, with its own program.

In 2024, our Wikiconference consisted of three events:

  • Online conference on October 19th-20th
  • The main offline conference in Kyiv on October 26th-27th
  • A smaller conference in Kharkiv on November 3rd

Both the online event and the Kyiv conference attracted over 60 people each, and the Kharkiv meetup gathered 25 people. Some people attended two or three events, but we had at least 124 confirmed unique participants overall (likely an undercount as we cannot capture all online participants).

Online conference: 60+ people, two days

The online edition took place on the weekend of October 19th and 20th. The conference featured 21 sessions, including our small “Wikimania at home”, as one organizing team member joked – 6 sessions from invited international speakers. 

Participants rated the program highly, and the feedback form shows that the international program was particularly popular – especially two sessions about AI and Wikimedia from the Wikimedia Foundation’s Asaf Bartov and Wikimedia Polska’s Natalia Ćwik. Other notable sessions included a slate of sessions on the Ukrainian community during the war, interactive welcome sessions, and a review of Wikimedia Ukraine’s activities. 

  • See full program in English
  • How to make an online conference engaging and tackle Zoom fatigue? It’s a big question for us, one that we’re hoping to tackle in a separate Diff post.
Image by Anton Protsiuk & Iryna Boiko, public domain

Kyiv conference: 60+ people from across Ukraine

A week later we gathered 63 people in Kyiv, Ukraine’s capital city. It was the biggest offline event of the year with people coming from all over Ukraine and even abroad. 

The two-day event included sessions on traditional topics like the Wikipedia Education Program, news and trends of the international wiki movement, practical workshops and project updates. We’ve also had one major innovation – an award ceremony for “Wikipedia 20”, a newly created award to distinguish people who’ve helped build Ukrainian Wikipedia over the 20 years of its existence. 

Although this event was offline-first, we also had a high-quality online stream on YouTube, which helped more people tune in virtually. 

Kharkiv conference: a meetup in a bomb shelter for 25 people

Despite being located close to the Russian border and suffering from constant Russian attacks, Ukraine’s second-largest city of Kharkiv has a vibrant Wikimedia community. On the first Sunday of November it gathered in a bomb shelter for a day-long conference for Wikimedians from the city and the surrounding region. 

Participants had a packed program, featuring everything from the experience of implementing a wikischool for high-school students held in Bohodukhiv last year to ideas for engaging young people in Wikipedia.

In the words of the Kharkiv event’s lead organizer, Wikiconference 2024 in Kharkiv showed the potential of community development in a frontline city and region, even under constant security threats.

Image by Serhii Bobok, CC BY-SA 4.0

This month, I had the opportunity to attend the first-ever Bangla Wikiconference, held in Gazipur, Bangladesh, from November 15 to 16, as both a participant and a member of the organizing team. I volunteered as part of the operations and communications team. The conference was organized by Wikimedia Bangladesh.

Group photo at Bangla Wikiconference. Photo: SHEIKH / CC BY-SA 4.0

Nearly 60 Bangla-speaking Wikimedians from South Asia (Bangladesh, India, and Nepal) gathered in Gazipur, including seven international participants.

I attended the event to connect with Wikimedians contributing to Bangla projects, to share and gather experiences, and to discuss new ideas, successes, and challenges related to Bangla projects. I presented two sessions, participated in a panel discussion, and awarded prizes for the Bangla Wikivoyage Article Contest 2024 at a side event during the conference. Additionally, I participated in the first Wikimedia hackathon in Bangladesh, where I developed and showcased a small project.

Panel Discussion with the Administrators of Bangla Wikipedia. Photo: Mayeenul Islam / CC BY-SA 4.0

During the panel discussion, I explored the uses, advantages, and disadvantages of Large Language Models (LLMs) on Bangla Wikipedia. Some participants asked insightful questions following my presentation. One of my sessions focused on the future of recent-changes patrolling, emphasizing the use of machine learning and artificial intelligence for anti-vandalism efforts and future tools. In another session, I teamed up with Maruf to discuss a GA review drive.

Audience listening to Mamunur Rashid. Photo: NahidHossain / CC BY-SA 4.0

At the conference, one participant reflected, “After attending this event, I found the answer to what the Wikimedia movement is and why I contribute to it.” Another participant, Mamunur Rashid, an AI researcher, remarked, “If you cut Wikipedia off from generative AIs, the AIs will go offline.” I learned a great deal from the knowledgeable discussions led by the speakers. I also attended a workshop on Wikidata Lexeme, discovering new tools I hadn’t encountered before.

Members of the organizing team. Photo: NahidHossain / CC BY-SA 4.0

As you can imagine, the two days of the conference were extremely busy. However, my busyness exceeded expectations! Each night after the main events, the organizers held meetings to review the day’s activities. Before dinner, we played football and basketball, followed by late-night chats. Most of our conversations centered around new ideas, the future of the Wikimedia movement, and various aspects of the Wikimedia ecosystem. Honestly, I felt I gained more insights and exchanged ideas more freely during these informal chats outside the main sessions.

Snacks time. Photo: SHEIKH / CC BY-SA 4.0

This conference made me realize that the Bangla-speaking community is united, unbroken by borders. I left with renewed hope that the energy and dedication of our members can continue to support unity among Bangla speakers through volunteerism, knowledge sharing, and learning.

See more images of the event on Wikimedia Commons.

Booth “Wikipedia Exhibition” at Library Fair 2024

Wednesday, 20 November 2024 15:41 UTC

From November 5th to November 7th, 2024, The Library Fair 2024 was held in Yokohama, Kanagawa Prefecture, Japan, and the booth “Wikipedia Exhibition” was exhibited.

What is the Library Fair?

The Library Fair is an exhibition for the library industry held every autumn. It takes place in the Pacifico Yokohama hall, and attracts approximately 10,000 visitors per day, for a total of 30,000 visitors over the three days.

Exhibitors include companies that undertake library management services, companies that sell library fixtures, publishers, bookstores, and university research groups. Visitors include librarians from public and school libraries, employees of various library-related companies, and students studying library and information science.

What is the “Wikipedia Exhibition”?

In order to spread awareness of the compatibility between libraries and Wikipedia among participants of the Library Fair, the “Wikipedia Exhibition” booth has been exhibited since 2022, led by Soseki-no-Neko. The exhibition of this booth is financially subsidized by the Wikimedia Foundation.

In addition to exhibiting a booth over the three days of the Library Fair, we also exhibited posters in the poster session. In the morning of November 6th, Yuriko Kadokura, author of the book “The 70-Year-Old Wikipedian,” gave a presentation titled “The Current State of the Wikimedia Movement” at the forum venue.

About 10 staff members were involved in managing the booth and broadcasting the forum via Zoom. The staff explained the basic structure of Wikipedia and how to edit to visitors, introduced them to Wikipedia Towns being held all over Japan, and even let them try editing Wikipedia.

Wikipedia exhibition booth

Many of the people who visited the booth happened to find the Wikipedia exhibition booth at the Library Fair. Many people probably wondered, “Why is there a Wikipedia booth at a library exhibition?”

Wikipedia has a rule that “writing must be based on sources.” From Wikipedia’s perspective, when editing, we use library documents, and from the library’s perspective, we use documents that they own.

Because of this close relationship between libraries and Wikipedia, citizen-participation workshops called “Wikipedia Town” are held all over Japan. Wikipedia Town typically involve editing local Wikipedia articles using materials from public libraries, but there are also edit-athons that specialize in subjects such as literature or art, or use materials from specialized libraries.

A variety of people visited our booth at the Wikipedia exhibition. Many of them worked in public or school libraries, but we also had people who worked at the National Diet Library, people from an encyclopedia publishing company, and people who teach library and information studies at universities.

When I talked to people who visited our booth, I got the sense that Wikipedia itself is well-known, but most people still don’t know that Wikipedia can be used in more ways, so I think it was worth it to have a booth.

Wikipedia BUNGAKU 12 Abe Kobo

Wednesday, 20 November 2024 15:33 UTC

On Sunday, November 3, 2024, I participated in “Wikipedia BUNGAKU 12 Abe Kobo” held in Yokohama, Kanagawa Prefecture, Japan.

“Wikipedia Town” in Japan

Since 2013, “Wikipedia Town”, a citizen-participation workshop in which participants work on “Wikipedia” editing with “town” as the theme, has been held all over Japan.

Because it is an event that makes use of the local materials held by libraries and the reference skills of librarians, it won the Excellence Award at Library of the Year 2017. The short comment at the time of the award was “Co-creation activities of public information assets using local information resources”.

Bungaku means literature in Japanese. “Wikipedia BUNGAKU” is a literary-themed event that is a derivative of Wikipedia Town, which is based on the local theme.

“Abe Kobo Exhibition” at the Kanagawa Museum of Modern Literature

In the morning, I visited the special exhibition “Abe Kobo Exhibition” at the Kanagawa Museum of Modern Literature.

In Wikipedia BUNGAKU, participants are given the choice of which article to edit, and this time, the candidates for articles to be edited were people and regions that Abe Kobo was involved with, works he wrote, and projects he worked on.

Kanagawa Prefectural Library Wikipedia Editing Meeting

The venue for the afternoon was the multipurpose space on the 4th floor of the Kanagawa Prefectural Library.

Hundreds of books prepared by the executive committee were lined up. After receiving a lecture on Wikipedia from executive committee member and instructor Tago Tamaki, all participants declared the topic they wanted to edit and began editing.

There were various ways to participate, including a group of several people editing the existing article “Abe Kobo,” a group of experienced Wikipedians and first-time editors creating a new article “Monami (Higashinakano),” a person creating a new article “Tadeusz Kantor” alone, and a person participating remotely from Hokkaido creating a new article “Asahikawa City Chikabumi Daiichi Elementary School.”

Two librarians from Kanagawa Prefectural Library will be on hand for the event, so even if you are editing a topic that the executive committee did not anticipate, you can provide on-site references. In addition, many experienced Wikipedians will be participating, and the executive committee members will also provide thorough support to first-time editors, making this an event that is easy for first-time editors to participate in.

Hopes for a ripple effect throughout Japan

There are many Good Articles (GA) on modern Japanese literature on the Japanese Wikipedia. However, the coverage of contemporary literature and foreign literature on the Japanese Wikipedia seems to be in its infancy.

A huge amount of literature on authors and literary works has been accumulated in various forms, including books, magazine articles, and academic papers. Wikipedia and literature have a very high affinity. I hope that projects similar to Wikipedia BUNGAKU will spread outside of Yokohama City.


. Keywords: TAROCH Coalition

Wikimedia Australia is proud to announce its membership in the TAROCH Coalition, a global alliance dedicated to preserving, sharing and advocating for cultural heritage. By joining, we reaffirm our commitment to empowering communities to access, co-design, and contribute to celebrating and protecting cultural heritage. Wikimedia projects, including Wikimedia Commons, play a vital role in hosting and making public domain cultural heritage content accessible to all.  

The TAROCH Coalition, which stands for "Towards A Recommendation On Cultural Heritage," unites organisations and individuals passionate about humanity's diverse heritage. Its goal is to achieve the adoption of a UNESCO Recommendation on Open Cultural Heritage by 2029. This legal instrument will promote open solutions to remove barriers to accessing cultural heritage in the public domain, while respecting governance frameworks from local regions.

Joining the TAROCH Coalition aligns with our mission to empower communities across Australia and the wider ESEAP region to share knowledge and build connections across cultures. Wikimedia Australia can play a key role in international dialogue and be part of a national agenda advocating for the removal of barriers and the adoption of open access policies in and for the cultural heritage sector.

Through this partnership, Wikimedia Australia will:  

  • Support Coalition Goals: Advocate for a UNESCO Recommendation that recognises the essential role of cultural heritage in identity, education, and global understanding while addressing local and regional needs.
  • Champion Open Knowledge: Promote free and accessible information for all, ensuring cultural heritage is responsibly and ethically documented and shared.  
  • Collaborate with Stakeholders: Partner with cultural institutions, community leaders, and like-minded organisations to amplify and protect underrepresented voices in heritage conversations.  

We are excited to join other Wikimedia affiliates – including Wikimedia Indonesia, Wikimedia UK, and Wikimedia Deutschland – alongside significant organisations such as Creative Commons, Flickr, Communia, and the International Federation of Library Associations and Institutions (IFLA).

We look forward to contributing to the TAROCH Coalition's impactful work and invite our community and partners to support this vital initiative. Together, we can ensure cultural heritage remains accessible and celebrated for generations to come.  

Useful links:[edit | edit source]

Wikipedia:Administrators' newsletter/2024/12

Wednesday, 20 November 2024 11:23 UTC

News and updates for administrators from the past month (November 2024).

Administrator changes

added ·
readded
removed

Interface administrator changes

added
readded Pppery

CheckUser changes

readded

Guideline and policy news

Technical news

Arbitration

Miscellaneous


Archives
2017: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2018: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2019: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2020: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2021: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2022: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2023: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12
2024: 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11


<<  Previous Archive    —    Current Archive    —    Next Archive  >>

Episode 170: Stephen Harrison

Tuesday, 19 November 2024 22:36 UTC

🕑 1 hour

Stephen Harrison is a tech lawyer and journalist who has been writing about Wikipedia since 2018, including dozens of articles for the online magazine Slate as part of his Source Notes column. He is the author of the 2024 novel The Editors, which is about a group of editors of the fictional user-editable online encyclopedia "Infopendium" who are drawn together by dramatic events.

Links for some of the topics discussed:

(Wikimedia CEE Meeting 2024 – Day 2, Müze Gazhane, Istanbul, Türkiye by Adem, CC BY-SA 4.0, Wikimedia Commons)

This year the CEE Meeting regional conference was larger than ever. Among 195 participants, there were many representatives of the Central Asian communities, including Kazakh, Karakalpak, Kyrgyz, Tajik and Uzbek language projects. We asked some of them what they took home from the conference and what their personal highlights were from this event.

Mustafaalmas

“I am from Kazakhstan and representing the Russian language community.  It was my first experience. It was absolutely great. I especially appreciate meeting colleagues from other Central Asian countries. I liked the session about using AI in the projects. I guess it may become a challenge and an opportunity for the community at the same time.”

Mamatkazy

“I am from the Kyrgyz Wikipedia language community. I mostly translate articles from Russian to Kyrgyz, and this year I was in the organizing team of the Wiki Loves Earth 2024 in Kyrgyzstan. As a civic activist, I translate articles on stereotyped topics into Kyrgyz, so that accurate and reliable knowledge becomes accessible to Kyrgyz-speaking people.

The opportunity to participate in this meeting, despite being a newbie to the Wikimedia movement, gave me the motivation to be more active. I’m very grateful that this meeting connected me with the global Wikimedia family and made me feel like a part of it. 

One of the most interesting sessions for me was the results of a study conducted by the participants from Poland “Youth – the end or the future of Wikimedia?”. The topic was why young people do not read or edit Wikipedia. Based on this, the Polish participants shared how they attract and engage young people. 

Darafsh

“I am part of the Tajik Wikimedians User Group and edit Wikimedia projects under user: Darafsh. This was my second time attending the CEE Meeting. The learning day, conducted by Asaf Bartov, was especially informative and useful for me. I gained valuable insights into capacity building and learned effective ways to facilitate discussions and meetings within our user group.

One key takeaway I’d like to share is the importance of patience in building a sustainable community. For small communities, such as the Tajik Wikimedians User Group, it’s crucial to spend time with each member, help them through challenges, gradually give them responsibilities, and guide them. This approach fosters a more sustainable and effective community.”

Muzaffar Turgunov

“I am a member of the Kazakh and the Uzbek Wikipedia groups. I have been editing for more than 2 years. I started first with the Uzbek Wikipedia. Now I write articles in both languages. My favourite subjects are medicine, politics and history.

At the CEE Meeting 2024, I had the opportunity to meet different people and learn new ideas. This is my first international conference. As for what I learned in this conference, it is mainly about the need to act in line with the times, the development of Wikipedia in the era of artificial intelligence, engaging new participants, creating and fostering Wikimedia communities, running a WikiClub in cooperation with each other, organizing edit-a-thons and competitions.”

Nataev

“It just so happened that I celebrated 15 years of being a Wikimedian during the conference in Istanbul. This was my second CEE meeting, and I absolutely loved it! Everything, from the program to the delicious food, was fantastic. I reconnected with old Wikimedia friends and met new fellow Wikimedians. Importantly, we held the first-ever meeting of representatives from the Karakalpak, Kazakh, Kyrgyz, Tajik, and Uzbek communities, where we exchanged ideas about learning from the CEE community and applying those lessons to foster regional cooperation in Central Asia.

I was glad to learn about both the dangers and opportunities that AI presents for wiki projects. My main takeaway is that AI is a double-edged sword: while we can harness it to accomplish many useful tasks, we must also stay vigilant about its potential to cause harm.

I really enjoyed the group facilitation session led by Asaf Bartov during the Learning Day, which took place before the main conference. The skills I learned will be useful not only for online and offline wiki activities but also for my professional career. I also found the AI workshop organized by John Cummings quite interesting. I also highly recommend checking out the recording of Marios Magioladitis’ session on AutoWikiBrowser, which is a tool that makes boring or repetitive editing tasks easier.

During the conference, Casual, a fellow Uzbek Wikimedian, and I presented on WikiStipendiya, a multi-year project led by the Uzbek Language User Group in partnership with the Youth Affairs Agency of Uzbekistan. We also discussed the challenges that a specific application of AI, namely, machine translation has created on the Uzbek Wikipedia, as well as how we have been addressing this issue. We greatly appreciate the feedback we received from those who attended our session, and we returned from the conference feeling even more motivated and energized to continue our important cleanup work.

For the future, I would love to see even more sessions on supporting and learning from small communities, as there are still quite a few of them in the CEE and CA region.”

Welcoming Tulu Wikisource!

Tuesday, 19 November 2024 17:21 UTC
Santhosh Notagar99, CC0, via Wikimedia Commons

30 October 2024 marked a significant landmark for the Tulu community with the launch of Tulu Wikisource, a platform dedicated to preserving and sharing its literary heritage. Tulu is a Dravidian language spoken by approximately 2.5 million people and is predominantly found along the coastal region of Karnataka, India. The Tulu language’s journey in the Wikimedia movement began in 2016 with the launch of Tulu Wikipedia. Since then, a group of volunteers also started working on the Wiktionary incubator and the Tulu Wiktionary is also live now. This achievement reflects years of dedication from Tulu speakers committed to making their language accessible in the digital age.

The journey towards Tulu Wikisource started in September 2023 when the Karavali Wikimedians User Group, led by organizer Kavitha G. Kana in collaboration with CIS-A2K, held a two-day workshop focused on relicensing and digitization. Renowned author M. Prabhakara Joshi’s relicensing inspired the team, with technical support from Prof. Sanjiv Bonde of Lek Ladki Abhiyan, Satara, and facilitation by Subodh Kulkarni from CIS-A2K.

Kavitha G. Kana, CC BY-SA 4.0

Soon after Kavitha G. Kana participated in the 2023 Train the Trainer (TTT) program, where she actively engaged in discussions around collaboration with libraries, digitization, and relicensing published texts. Motivated by these learnings and backed by CIS-A2K, Kavitha continued her mission by organizing a Kannada Wikisource workshop in May 2024 under the She Leads campaign, sparking broader community interest in Wikisource.

Active engagement in creating a Wikisource in Tulu started a few months ago, with community members eager to learn how to digitize and share Tulu literature. In July 2024, CIS-A2K initiated and conducted a series of training sessions designed to introduce Wikisource processes to new volunteers and provide demonstrations on how to contribute effectively. These sessions offered hands-on experience and empowered community members to begin working on Tulu texts. 

Training Sessions and Community Involvement

A series of online training sessions were organized to lay the groundwork for Tulu Wikisource. Each session was designed to build essential skills and knowledge among participants, enabling them to contribute effectively. These sessions were supported by CIS-A2K’s Nitesh as coordinator, WMF staff Satdeep Gill as a trainer, and community coordinator Kavitha G. Kana. The initial session introduced foundational topics, including the motivations of the Tulu community and the steps required to establish a Tulu Wikisource. The second session was focused on updates from the community, tackling challenges, and utilising tools to track contributions. The third session dived into technical topics like translating system messages, creating Wikidata entries, and managing text transclusions. 

Following the first training session, several volunteers, including Babitha Shetty, Vinoda mamatharai, Kavitha G. Kana and ~aanzx made significant contributions to the Tulu Wikisource project. In August, Shreelatha.Halemane stood out with an impressive 654 edits within the month, and her contributions have continued consistently since then. Many other dedicated editors actively contributed to multilingual Wikisource.

Community-driven training in local language

In addition to the above-mentioned sessions, Kavitha and active community members like Anoop (Kannada Wikisource admin) and Babitha Shetty organized local training in Tulu. These sessions, led by knowledgeable community members who had undergone prior training, provided an accessible and relatable learning environment. Anoop and Babitha Shetty, both deeply involved in the Karavali Wikimedians User Group and Kannada Wikisource and other Wikimedia projects, played instrumental roles as trainers, addressing queries and offering personalized guidance to new volunteers or to those Wikimedians who did not have any idea regarding Wikisource editing.

Looking Ahead

Through this initiative, the Tulu language’s rich literary heritage will be preserved and disseminated in the digital realm, inviting new volunteers and ensuring free access to everyone. The birth of Tulu Wikisource is a testament to the collaborative spirit of the Tulu community and the support provided by CIS-A2K and a WMF staff.

Here is what Kavitha G. Kana, who went on to become the founding admin of Tulu Wikisource, said about their plans for the future:

ತುಳು ವಿಕಿಮೀಡಿಯನ್ಸ್‌ನಕ್ಲೆಗ್  ವಿಕಿಸೋರ್ಸ್ ಕಲ್ತಿನೆಡ್‌ದ್ ಬೊಕ್ಕ ವಿಕಿಡೇಟಾ ಕಲ್ಪೆರೆ CIS-A2Kದ ಸಹಾಯ ಬೋಡು. ವಿಕಿಸೋರ್ಸ್‌ ಲೈವ್ ಆತಿನ ಪ್ರೋಗ್ರಾಂನ್ ಮಂಗಳೂರು ವಿಶ್ವವಿದ್ಯಾನಿಲಯಡ್ ಮಲ್ಫುಗಾಂದ್ ಉಂಡು. ತುಳು ಸಮುದಾಯದಕುಲೆಗ್ ಬರ್ಪಿನ ಆಜಿ ತಿಂಗೊಲುಡು ಒಂಜಿ ಸಾರ ಎಡಿಟ್ ಮಲ್ಪ್ಯೆರೆ ಉತ್ತೇಜನ ಮಲ್ಪುಗ ಅಂದ್ ಉಲ್ಲೆ.  Edu Wiki ಕಲ್ಫೊಡುಂದು ಉಂಡು. ಕುಡ್ಲಡ್ ಒಂಜಿ ಡಿಜಿಟೈಸೇಶನ್ ಸೆಂಟರ್ ಸೆಟಪ್ ಮಲ್ಪೊಡುಂದು ಉಂಡು. (ಇತ್ತೆನೆ ಎಣ್ಮ ಲೇಖಕೆ‌ರ್ನ ಬುಕುಲೆನ್ relicense ಮಲ್ತ್ ಆತ್‍ಂಡ್‌. ನಲ್ಪ ಪುಸ್ತಕಲೆನ್ digitization ಮಲ್ಪುನ ಕೆಲಸ ನಡತೊಂದು ಉಂಡು)

Tulu Wikimedians seek CIS-A2K’s support to learn Wikidata after mastering Wikisource. We are planning an in-person celebration at Mangalore University. I aspire to explore EduWiki and establish a digitization center in Mangalore, where eight authors’ books were relicensed, and 40 have been digitized.

WikiForAll is a series of training programs designed to train the Alumnae student chapter of the iGiver in association with the Women In Tech collectives(WITc) India. iGiver is a Not for Profit organisation to help the women students from marginalised backgrounds to pursue their skills in tech. As part of the Alumnae chapter initiative iGiver and WITc have associated with the CIS-A2K to co-design, develop, research and implement training programs on WIkipedia and sister projects. 

The Orientation program of the WikiForAll was held on 20th October 2024 at Shri Krishnaswamy College for Women and around 25 participants took part in the event.

In the first phase of this program series (timeline is for next 2-3 months) we will be introducing MediaWiki projects to the students. This program would help the students to learn beyond engagement on Wiki platform, they shall get an understanding on web development languages and scripts. This also would help them improve their practices in learning fundamental programming languages and logical thinking.

We expect that this program would help the students understand various technical skills that shall associate with their careers and academics. Moreover, this is their first time of engagement with an Open Knowledge community. Their volunteer interest to learn and contribute would add more to the MediaWiki developers community in India.

My previous research with the CIS-A2K titled “Bridging the Gender gap: A report on Indian Language Wikimedia Communities” includes an observation on Outreach and Training for women in technical aspects. Therefore, the output of this program would be to meet the needs for skill and knowledge gap for women who have lack of access to resources. 

The inaugural event included an orientation and introduction to Free and Open Source Software, Wikimedia projects, communities and explaining about how to participate in the local Wiki events happening in India.

“WikiForAll is an initiative to bring together all the Alumnae to learn and contribute to the FOSS and Open Knowledge projects. This would help us learn how global intiatives work so that we can participate in an educational movement, and also share Open Knowledge. We also believe that this program will bring us a good exposure to many tools and programming languages such as JavaScript, Python, also using GitHub and other command line interfaces. This way it will help us in our career front and also to empower other fellow women in our family and surroundings to easily get introduced to learning new technologies. We look forward for this collaboration with the Wikimedia movement where we are keen to learn, educate, research and empower our community of first-gen of Women in Tech.” – Mahalakshmi, an Alumnae of the Vizhuthugal program by iGiver

We also discussed the challenges that the students face to learn the tech skills and the medium/platform that can work best for them to learn and contribute to FOSS projects. Mr. G. Srinivasan who is an experienced tech enthusiast also addressed the students and shared his insights about how students must choose their passion and follow them to aspire a career that they wish to pursue.

As a follow up to this event we are co-designing a curriculum and an interface along with the help of User:Ranjithsiji (contributor from Wikimedians of Kerala User Group) to mentor and help the students to contribute to MediaWiki projects. This being a recurrent event we are initially planning this for 2-3 months and the output will also include a research project on how to implement this prototype on a large scale.

Thanks to Mr. Raghav and Mrs. Vijaylakshmi from the iGiver team and also the management of the Shri Krishnaswamy College for Women, Chennai for bringing up this unique initiative to take forward FOSS to the suburbs.

Image credites: User: Reshmak0615, CC BY-SA 4.0

When it comes to rectifying the historical underrepresentation of Black and Brown women in various fields, we tend to focus on the achievement and visibility of women in the formal sector, corporations, governance, entertainment, and politics, among other fields.

However, indigenous women’s contributions and knowledge, particularly that of women in the informal sector and professions considered traditional, are often shoved into obscurity.

This year’s #VisibleWikiWomen campaign aims to acknowledge our place in history and the collectiveness of our struggles and resistance by highlighting images and stories of the work of our collective liberation.

However, how will history be complete without the inclusion of informal ways of knowing? What about the role and contributions of marginalized individuals based on race, gender, disability, and other factors?

Talk of black/brown women, persons with disability, and the aged.

All of these thought processes lingered while strategizing our approach for this year’s campaign.

As the lead for the #VisibleWikiWomen 2024 campaign in Ghana, my team and I initially planned to focus on curating festive images. However, we observed that even within the invisibility of marginalized black/brown women lies another layer that needs to be tackled to ensure that true visibility is achieved. As the Whose Knowledge team? team puts it, nobody is free until everybody is free.

In addition to capturing festivals, we focused on marginalized women in Ghana, their critical role in social justice, and their efforts toward the local economy.

At the end of the campaign, we curated and uploaded over 500 quality images from three regions in Ghana.

Below are selected images from the curation that highlighted the role of Indigenous forms of knowledge and contributions from women who have committed to liberating their communities behind the scenes.

Amesuwo Kunugbe  / CC BY-SA 4.0, by Pambelle12

Amesuwo Kunugbe, age 60, is a traditional birth attendant (TBA). She has offered free birth delivery services to several underprivileged women for over nine years in the Volta Region of Ghana. Traditional birth attendants are often based in rural and disadvantaged communities and assist women during pregnancy. They do not have formal training and practice based on their skills; skills passed from a family member or by working with other TBAs.

Esowoeka Bleku  /  CC BY-SA 4.0, by Pambelle12

84-year-old Esowoeka Bleku is also a traditional birth attendant (TBA) in Ghana. Over the last twenty years, she has offered free birth delivery services to women in the Volta region who cannot afford to deliver in a health facility, primarily due to financial constraints.

                                CC BY-SA 4.0, by esthee2010                              Emelia Okine

During the protest against the Ghana Football Association to demand an end to corruption, we captured two incredible women: one physically challenged individual and Emelia Okine, a 75-year-old woman. Disabilities (Disablism) and aging (ageism) are both stigmatized concepts. As such, individuals in these social groups are often marginalized and unrecognized for their contribution to their communities and nation-building. The two individuals mentioned above participated in the protest to advocate for better governance in the Ghana Football Association, thereby contributing to social change.

Enyonam, a single mom with five children, ventured into the coconut-selling business last year to enable her to care for her children. Her customer service skills have been her unique selling point and have kept her in a business considered the preserve of men.

Enyonam      CC BY-SA 4.0, by esthee2010

Aside from catering to her customers’ health needs by selling fresh coconut juice, her customer service has remained the main push for her growth. Will she ever be nominated for an award in customer service? Probably not, because she is in an informal business.

However, that does not diminish her contribution to her community. Her effort to ensure her children live dignified lives and her boost to the local economy are incredible feats!

Who is looking out for them (women like Enyonam)? How do we ensure that their efforts are not erased from history?

These are the questions that motivate us and keep us anticipating the #VisibleWikiWomen Campaign each year!

By Pamela Ofori-Boateng

Connect with #VisibleWikiWomen Ghana Team

Pamela Ofori-Boateng

Francis Quasie

Connect with WhoseKnowledge Team

Tech/News/2024/47

Tuesday, 19 November 2024 15:32 UTC

Latest tech news from the Wikimedia technical community. Please tell other users about these changes. Not all changes will affect you. Translations are available.

Updates for editors

  • Users of Wikimedia sites will now be warned when they create a redirect to a page that doesn’t exist. This will reduce the number of broken redirects to red links in our projects. [1]
  • View all 42 community-submitted tasks that were resolved last week. For example, Pywikibot, which automates work on MediaWiki sites, was upgraded to 9.5.0 on Toolforge. [2]

Updates for technical contributors

  • On wikis that use the FlaggedRevs extension, pages created or moved by users with the appropriate permissions are marked as flagged automatically. This feature has not been working recently, and changes fixing it should be deployed this week. Thanks to Daniel and Wargo for working on this. [3][4]

In depth

  • There is a new Diff post about Temporary Accounts, available in more than 15 languages. Read it to learn about what Temporary Accounts are, their impact on different groups of users, and the plan to introduce the change on all wikis.

Meetings and events

  • Technical volunteers can now register for the 2025 Wikimedia Hackathon, which will take place in Istanbul, Turkey. Application for travel and accommodation scholarships is open from November 12 to December 10 2024. The registration for the event will close in mid-April 2025. The Wikimedia Hackathon is an annual gathering that unites the global technical community to collaborate on existing projects and explore new ideas.
  • Join the Wikimedia Commons community calls this week to help prioritize support for Commons which will be planned for 2025–2026. The theme will be how content should be organised on Wikimedia Commons. This is an opportunity for volunteers who work on different things to come together and talk about what matters for the future of the project. The calls will take place November 21, 2024, 8:00 UTC and 16:00 UTC.
  • Language community meeting will take place November 29, 16:00 UTC to discuss updates and technical problem-solving.

Tech news prepared by Tech News writers and posted by bot • Contribute • Translate • Get help • Give feedback • Subscribe or unsubscribe.

The report “Open Movement’s Common(s) Causes” maps the current threats and opportunities facing the open movement, based on the ongoing work of the organisations behind the Common(s) Cause event, which took place in Katowice, Poland; as a pre-conference event for Wikimania 2024 on 6 August, 2024.

The meeting was organised by Creative Commons, Open Knowledge Foundation, Open Future, and Wikimedia Europe in collaboration with the Wikimedia Foundation. The goal of the meeting was to create links between different advocacy efforts so that a shared advocacy strategy for the Knowledge Commons can be created.

One of the calls that jumped out for us was a call for defining new open principles – principles that could clarify what openness means in the context of today’s digital space and ensure its pro-public, democratic potential. Formulating such principles could help against several challenges, e.g. open washing.

Another clear call is the one confirming the assumptions behind the Common(s) Cause project: it is the call for a shared advocacy agenda, which could help ensure that Knowledge Commons are treated and sustained as critical digital infrastructures.

The event gathered over 55 participants from 20 countries, most of whom travelled to Katowice to attend the Wikimania conference. The majority of attendees were from open advocacy communities. The event not only enabled the organizers to build stronger working ties with one another, but with the many other organisations who were represented at the event. 

Participants acknowledged that the power of the open movement is only as strong as the bonds of the people working to advance an open, equitable agenda, and collective impact can only be achieved through individuals from different organisations working closely together.

The report identifies a few common causes that can be found at the intersection of open movement organisations’ strategies, the socio-technological zeitgeist, and current policy opportunities, such as: 

  1. (Re)defining openness in a new technological era.
  2. Creation of a shared advocacy strategy and enhanced regional and thematic cooperation across the organisations.
  3. Developing and testing governance approaches for our digital commons.
  4. Advancing openness and sustainability for the technology, data, content, and governance of Digital Public Infrastructure.

This report is a starting point and serves as an invitation to the wider open community to join these causes as well as to formulate their own, which could then be backed by other organisations. The next step in this process will be disseminating its findings, hopefully resulting in further backing and refinement of the causes and additional feedback from the wider community, which this small convening could not fully represent.

Read the full report

O valor da Wikipédia na era da IA generativa

Tuesday, 19 November 2024 10:47 UTC

Pode parecer uma pergunta filosófica, mas atualmente essa é uma pergunta bastante prática, considerando os recentes avanços na inteligência artificial generativa e nos modelos de linguagem de grande escala (do inglês large language models, ou LLMs). Devido ao uso generalizado da tecnologia de IA generativa, projetada para prever e imitar respostas humanas, agora é possível criar, quase sem esforço, textos que parecem ter saído da Wikipédia.

Minha resposta a essa pergunta é simples: não, não seria a mesma coisa.

O processo de criar conhecimento de forma livre, compartilhá-lo e aperfeiçoá-lo ao longo do tempo, publicamente e com a ajuda de centenas de milhares de pessoas voluntárias, é o que, há 20 anos, tem definido a Wikipédia e os diversos outros projetos da Wikimedia. A Wikipédia contém conhecimento confiável e de fontes seguras justamente porque esses conteúdos são criados, debatidos e selecionados por pessoas. Ela também se baseia em um modelo aberto e não comercial, o que significa que a Wikipédia é livre para acessar e compartilhar, e sempre será. E em uma internet inundada de conteúdos gerados por máquinas, isso significa que a Wikipédia tem ainda mais valor.

Nos últimos seis meses, dezenas de LLMs foram lançados ao público, treinados com base em amplos conjuntos de dados capazes de ler, resumir e gerar textos. A Wikipédia é uma das maiores bases abertas de informação da internet, com versões em mais de 300 idiomas. Até o momento, todos os LLMs são treinados com base nos conteúdos da Wikipédia, e ela é quase sempre a maior fonte de dados de treinamento nos conjuntos de dados desses LLMs.

Uma coisa óbvia a se fazer com alguma  desses novos sistemas é tentar gerar artigos da Wikipédia. É claro que as pessoas já tentaram. E, tenho certeza de que muitos leitores já perceberam isso em primeira mão, essas tentativas mostram muitos desafios no uso de LLMs para produzir o que wikipedistas chamam de conhecimento, ou seja, textos e imagens confiáveis, em formato enciclopédico, com fontes seguras. Algumas dessas limitações incluem as seguintes:

  • Atualmente, os resultados dos LLMs não passam por uma checagem de fatos, e já há muitos casos conhecidos de pessoas que usam a IA generativa para tentar realizar seus trabalhos. Há inúmeras situações de baixo risco em que os resultados podem ser úteis sem causar nenhum risco, como prompts para criar textos de agradecimento, planos para férias divertidas ou um roteiro para dar início a uma redação. No entanto, em outras situações, os resultados não são tão bons, como no caso em que um LLM fabricou processos judiciais, e o advogado que usou esses resultados em um tribunal acabou sendo multado. Em outra situação, um médico demonstrou que um sistema de IA generativa apresentava diagnósticos inadequados ao analisar sintomas de pacientes atendidos no pronto-socorro. Com o tempo, acredito que esses sistemas ficarão muito melhores e se tornarão mais confiáveis em uma variedade de contextos. Uma possibilidade interessante é que a demanda por melhores fontes melhorará o acesso a pesquisas e livros on-line. Mas será preciso tempo para chegar lá e, provavelmente, uma pressão significativa por parte dos órgãos reguladores e do público para que haja melhorias que beneficiem todas as pessoas.
  • Os LLMs não podem contar com informações que não foram usadas em seu treinamento para responder aos prompts. Isso significa que todos os livros do mundo que não estão disponíveis na íntegra on-line, conteúdos de pesquisas anteriores ao advento da internet e informações em outros idiomas que não o inglês não fazem parte daquilo que um LLM típico “sabe”. Consequentemente, os conjuntos de dados usados para treinar LLMs atualmente podem ampliar as desigualdades e os vieses existentes em muitas áreas – como nas contratações, na medicina e em sentenças criminais. Talvez um dia isso mude, mas estamos muito longe de poder acessar livremente e treinar LLMs em todos os diferentes tipos de informações que as pessoas em todos os idiomas usam atualmente para criar conteúdo para a Wikipédia. E, mesmo então, será necessário mais trabalho para mitigar os vieses.
  • Por fim, já foi demonstrado que LLMs treinados a partir dos resultados de LLMs têm um desempenho comprovadamente pior, e chegam até mesmo a esquecer de coisas que eles já “sabiam”, uma condição chamada de “colapso do modelo”. Isso significa que, para que os LLMs tenham bons resultados e continuem melhorando, eles precisarão de um abastecimento constante de conteúdos originais, escritos por humanos, o que torna a Wikipédia e outras fontes de conteúdos gerados por humanos ainda mais valiosas. Também significa que as empresas de IA generativa de todo o mundo precisam descobrir como manter as fontes de conteúdos humanos originais, o elemento mais importante do nosso ecossistema de informações, sustentável e crescendo com o tempo.

Esses são apenas alguns dos problemas que precisam ser resolvidos enquanto internautas exploram como os LLMs podem ser usados. Acreditamos que internautas darão cada vez mais valor a fontes confiáveis de informações que tenham sido validadas por pessoas. As políticas da Wikipédia e nossa experiência de mais de uma década no uso do aprendizado de máquina para apoiar voluntários humanos oferecem lições valiosas sobre esse futuro.

Princípios para uso da IA generativa

O conteúdo gerado por máquina e as ferramentas de aprendizado de máquina não são novidade na Wikipédia e nos demais projetos da Wikimedia. Na Wikimedia Foundation, desenvolvemos ferramentas de aprendizado de máquina e IA com base nos mesmos princípios que tornaram a Wikipédia um recurso tão útil para tantas pessoas: dando centralidade à moderação de conteúdo e à governança humana. Continuamos a experimentar novas maneiras de atender às necessidades das pessoas por conhecimento de forma responsável, inclusive com plataformas de IA generativa, com o objetivo de colocar a contribuição humana e a reciprocidade em primeiro plano. As pessoas editoras da Wikipédia têm controle sobre todo o conteúdo gerado por máquina – elas editam, aprimoram e auditam qualquer trabalho feito por IA – e criam políticas e estruturas para controlar as ferramentas de aprendizado de máquina usadas para gerar conteúdo para a Wikipédia.

Esses princípios podem ser um bom ponto de partida para o uso dos LLMs atuais e em desenvolvimento. Para começar, os LLMs devem considerar como seus modelos auxiliam as pessoas de três maneiras principais:

  1. Sustentabilidade. A tecnologia de IA generativa tem o potencial de afetar negativamente a motivação humana para criar conteúdo. Para preservar e incentivar mais pessoas a contribuir com seu conhecimento para o bem comum, os LLMs devem procurar aumentar e apoiar a participação humana no cultivo e na criação de conhecimento. Eles não devem jamais impedir ou substituir a criação humana de conhecimento. Isso pode ser alcançado mantendo sempre os humanos no processo e dando o devido crédito às suas contribuições. Continuar a apoiar os seres humanos no compartilhamento de seus conhecimentos não só é algo que está alinhado à missão estratégica do movimento Wikimedia, como também será necessário para continuar a expandir nosso ecossistema geral de informações, que é o que cria os dados de treinamento atualizados dos quais os LLMs dependem.
  2. Equidade. Na melhor das hipóteses, os LLMs podem ampliar o acesso às informações e oferecer formas inovadoras de fornecer informações a quem busca conhecimento. Para isso, essas plataformas precisam incorporar verificações e contrapesos que não reproduzam os vieses de informação, não ampliem as lacunas de conhecimento, não perpetuem o apagamento de histórias e perspectivas tradicionalmente excluídas nem contribuam com danos aos direitos humanos. Os LLMs também devem considerar como identificar, tratar e corrigir vieses nos dados de treinamento que podem produzir resultados imprecisos e extremamente injustos.
  3. Transparência. Os LLMs e suas interfaces devem permitir que os humanos entendam a origem dos resultados do modelo, verifiquem e corrijam esses resultados. Uma maior transparência na forma como os resultados são gerados pode nos ajudar a entender e, então, mitigar vieses sistêmicos nocivos. Ao permitir que os usuários desses sistemas avaliem as causas e as consequências dos vieses que podem estar presentes nos dados de treinamento ou nos resultados, pessoas criadoras e usuárias poderão contribuir para uma maior compreensão e a aplicação criteriosa dessas ferramentas.

Visão para um futuro confiável

A contribuição humana é parte essencial da internet. As pessoas são o motor que impulsionou o crescimento e a expansão da web, criando um espaço incrível para o aprendizado, os negócios e a conexão com outras pessoas.

A IA generativa pode substituir a Wikipédia? Ela pode tentar, mas essa é uma substituição que ninguém realmente deseja. Não há nada de inevitável nas novas tecnologias. Em vez disso, cabe a todos nós escolher o que é mais importante. Podemos priorizar a compreensão humana e sua contribuição com o conhecimento no mundo – de forma sustentável, equitativa e transparente – como um dos principais objetivos dos sistemas de IA generativa, e não como algo secundário. Isso ajudaria a mitigar o aumento da desinformação e das alucinações dos LLMs; garantiria que a criatividade humana fosse reconhecida pelo conhecimento criado; e, o mais importante, assegurará que os LLMs e as pessoas possam continuar a contar com um ecossistema de informações atualizado, em evolução e confiável a longo prazo.

Selena Deckelmann é Diretora de Produtos e Tecnologia na Wikimedia Foundation.

The post O valor da Wikipédia na era da IA generativa appeared first on Wikimedia Foundation.

In the last quarter, we expanded our existing partnerships with publishers, with extended terms and new collections.

  • The British Online Archives extended our access to a whole year after the initial pilot. Wikimedians can continue to access a wide range of primary source collections in humanities and social sciences from The British Online Archives. This collection is available to all eligible editors via the library bundle until June 2025. 
  • Springer Nature also extended our access for another year. We get continued access to their collections in science, technology, medicine, business, transport, architecture, and more.  
  • Cairn.Info has been providing access to scholarly materials in the humanities and social sciences, primarily in the French language. They have expanded The Wikipedia Library’s access to their newly released collections in law, science, technology, mathematics, and engineering. You can access Cairn.Info here.
  • Finally, we have moved Newspapers.com and Ancestry.com back to email-based access methods. The proxy-based access was brittle and prone to errors, so we worked with Ancestry.com, the parent group of Newspapers.com, to return to the older way of distributing access. Interested community members can apply here for access. More than 200 users have already been given access.

Over the last couple of months, we’ve also been taking The Wikipedia Library to Wikimedia community conferences and publishing events. 

  • In August, we conducted a workshop about The Wikipedia Library with Wikimedia Canada. The workshop gave an overview of the library and how the community can use it for their Wikimedia projects. If you would like to arrange a workshop or a training session about The Wikipedia Library, please let us know by sending an email to wikipedialibrary@wikimedia.org.
  • We delivered talks about The Wikipedia Library at Wikimania in Katowice (recording), WikiConference North America in Indianapolis, and WikiArabia in Muscat.
  • We attended the world’s largest trade fair for books, Frankfurt Book Fair 2024, to learn, network, and collaborate with the publishing industry. We met with more than 30 publishers, both strengthening our relationships with existing partners and engaging new partners to expand the library. We’ll be announcing new partners soon. 
  • We joined a London-based conference for product leaders in the publishing industry, addressing the topic of AI and the Future of Publishing: Predictions and Possibilities. There were animated discussions about both research integrity and the need for transparency and consent for training data.
  • We’ve also been invited to introduce the Wikimedia projects and movement at the Dubai International Library Conference, alongside the Wikimedians of the UAE User Group
Sam Walton and Vipin SJ from The Wikipedia Library team at the Frankfurt Book Fair

The next Wikimedia+Libraries International Convention (WikiLibCon25) will be in Mexico City, 15-17 January 2025 and has the theme, Disinformation as a threat. Registration is open now and we hope to see you there. 

David-James Gonzales is an Assistant Professor of History at Brigham Young University and the host of New Books in Latino Studies. He is a historian of migration, urbanization, and social movements in the U.S., and specializes in Latina/o/x politics and social movements. 

I began teaching with the Wikipedia assignment in the spring of 2018. At the time, I sought an alternative to the standard term paper that had been, and likely remains, the staple of most college history courses. My motivation was to find an assignment that students would enjoy completing and that I would enjoy grading. Over my previous six years of university teaching, I developed a dread for grading term papers as it became apparent that most students either did not have the time or did not see the point in writing a well-researched argumentative paper. Moreover, I noticed that many of my students were developing bad habits in their rush to complete term papers, including committing to an argument before establishing a research question, cherry-picking sources that confirmed unfounded assumptions, and ignoring counterevidence. I desired an assignment that would reinforce the teaching of historical methodology and leverage the accessibility of the internet, allowing students to reach a broader audience, which I hoped would motivate them to take greater pride in their work.

David-James Gonzales
David-James Gonzales. Image courtesy David-James Gonzales, all rights reserved.

After speaking with colleagues and searching the internet for ideas, I stumbled upon the Wiki Education website and found the Wikipedia assignment. Despite my lack of experience editing or authoring Wikipedia pages, I was drawn to the assignment because it facilitates experiential learning by requiring students to apply the knowledge acquired through course readings, lectures, and research to a public-facing project. In my US history survey course, for example, I use the Wikipedia assignment instead of a final paper to evaluate students’ ability to do the work of a historian by choosing a topic, developing a research question, selecting and evaluating sources, and writing a historical narrative. 

I also use the assignment to help students build social and professional skills applicable beyond the classroom. To promote peer collaboration in larger classes, I have students work in pairs. Admittedly, most groan when they hear this is a group project; however, by the end of the semester, they overwhelmingly express appreciation for their partner and the flexibility the assignment provides to capitalize on each person’s strengths. For example, those interested in computer programming and coding tend to enjoy learning about wikitext and the formatting aspects of the assignment. For others, conducting research, locating images, videos, and sound clips, or writing the text of the article is preferred. While I require them to work in pairs, students decide how to manage their workload by deciding who does what and evaluating each other’s performance at the end of the term.   

To facilitate student-teacher mentoring, I require students to meet with me throughout the semester to approve their topics and receive feedback on sources and drafts. These interactions help break down the reluctance and intimidation students feel towards interacting with authority figures and often lead to future opportunities to advise them about their degree progress, university resources, and career opportunities. To teach information and media literacy, I have students turn in an annotated bibliography halfway through the term. Although not a required part of the Wikipedia assignment, I find that it reinforces the dashboard’s trainings on evaluating sources according to the credibility of the author and publication. It also teaches students to pay as much, if not more, attention to the sources used in a publication than the text itself. 

I have used the Wikipedia assignment in thirteen courses over the past six years and have been thrilled by the results. Overall, my students have published 180 new articles, edited an additional 492 articles, and added 8,500 references to Wikipedia! Incredibly, their work has received over 13 million views as of spring 2024. But the best part is that my students admit they enjoy the assignment. 

Here are a few examples of what students appreciate about the Wikipedia assignment: 

“The Wikipedia project we had over the course of the semester was very effective in getting us all to participate in the learning process. It helped us to be more involved in research and in learning how to be historians.”

“I loved the Wikipedia project we worked on throughout the semester. We got to pick our own topic and I appreciated what it taught me about doing accurate historical research.”

“I loved the Wikipedia Assignment in this class and using our research skills to be able to put something useful out onto the internet.”

“The incorporation of making a Wikipedia article was the best way to actually be part of making and recording history.”

As reflected in the comments above, students relish the “hands-on” opportunity provided by the Wikipedia assignment to apply what they learn through a medium that allows them to create something that makes a public contribution beyond the classroom. And this is the primary reason why I continue to teach with Wikipedia; it encourages students to become more informed knowledge producers rather than passive consumers of information.


Interested in incorporating a Wikipedia assignment into your courses? Visit teach.wikiedu.org to learn more about the free resources, digital tools, and staff support that Wiki Education offers to postsecondary instructors in the United States and Canada. 

Wikipedia:Wikipedia Signpost/2024-11-18/Traffic report

Monday, 18 November 2024 00:00 UTC
File:2024 US elections Donald Trump selection.jpg
Oleg Yunakov
cc-by-sa-4.0
139
578
Traffic report

Well, let us share with you our knowledge, about the electoral college

This traffic report is adapted from the Top 25 Report, prepared with commentary by Igordebraga, Vestrian24Bio, and CAWylie (October 27 to November 2); and Igordebraga, Soulbust, Vestrian24Bio, and Rajan51 (November 3 to 9).

Oh, sweet mystery of life at last I've found you! (October 27 to November 2)

Rank Article Class Views Image Notes/about
1 Teri Garr 1,355,055 This American actress known for her comedic roles in film and television, such as Young Frankenstein, Tootsie, and playing the mother of Phoebe Buffay on Friends, died at the age of 79 last Tuesday after years fighting multiple sclerosis.
2 2024 Ballon d'Or 1,273,764 European champion Rodri was chosen by France Football as the best player of the season. Debates soon started discussing if Vinícius Júnior, who was also European champion, would've been a more deserving winner.
3 Rodney Alcala 1,258,084 Netflix brought attention to this reprehensible man who killed and assaulted at least 8 women (some of them minors), was sentenced to death, and died of natural causes after decades in prison. Although the distinction that made Alcala's story be told in a movie, Woman of the Hour, is the fact that in the middle of his killing spree he appeared in a matchmaking TV show and won a date, though the woman declined to go out with him and thus escaped a grisly fate.
4 2024 United States presidential election 1,234,532 At least it's over? I'll be catching up on sleep now. Next week's Report will have a lot to discuss on this.
5 Tony Hinchcliffe 1,121,021 The 2024 Trump rally at Madison Square Garden (which was compared by the opposition's potential VP to 1939 Nazi rally at Madison Square Garden, proving Godwin's law is alive and well) had a set by this comedian, to which the reaction wasn't pretty; Hinchcliffe's description of Puerto Rico as a "floating island of garbage" in particular drew much criticism.
6 Rúben Amorim 1,110,284 Manchester United hired this Portuguese coach, who has just managed Sporting CP to a national title.
7 Liam Payne 1,069,395 Two weeks after the shocking death of this musician falling off a hotel balcony at just 33, readers want to learn if the Argentinian police have discovered more on what happened that night.
8 Diwali 1,053,976 The Hindu festival of lights, symbolising the spiritual victory of Dharma over Adharma, light over darkness, good over evil, and knowledge over ignorance, annually celebrated on Kartik Amavasya as per the Hindu lunisolar calendar, which usually falls from the second half of October to the first half of November.
9 Deaths in 2024 1,005,464 "From that fateful day when stinking bits of slime first crawled from the sea and shouted to the cold stars, 'I am man!', our greatest dread has always been the knowledge of our mortality."
10 Freddie Freeman 988,883 As the Los Angeles Dodgers won their eighth MLB title, the World Series Most Valuable Player Award was this first baseman who had home runs in the first four games, including a walk-off grand slam in the first. And adding the 2021 finals that Freeman won with the Atlanta Braves, he had home runs on six consecutive World Series games.

For this could be the biggest sky, and I could have the faintest idea (November 3 to 9)

Rank Article Class Views Image Notes/about
1 2024 United States presidential election 9,045,895 U.S. election between Democrat Harris (#4) and Republican Trump (#3), who won both the Electoral College and the popular vote.
2 2020 United States presidential election 6,934,170 Previous U.S. election, between then-incumbent Trump (#3) and successful Democratic challenger Joe Biden.
3 Donald Trump 5,268,623 Republican elected as the 47th U.S. President, after emerging victorious in #1 against #5. He became the second President to win non-consecutive elections, after Grover Cleveland (1884 and 1892).
4 2016 United States presidential election 3,477,149 The erelast election, in which Trump (#3) defeated Democratic candidate Hillary Clinton.
5 Kamala Harris 3.378,730 Lost the 2024 U.S. presidential election (#1). Lots can be said about the defeat.
6 Susie Wiles 2,428,992 After leading #3 to two successful elections, this political consultant will become the first female White House Chief of Staff.
7 JD Vance 2,243,627 Recently elected Vice President, e.g. #2 to this week's #3.
8 Quincy Jones 1,747,761 One of the greatest music producers of all time, whose work included the best-selling album ever and the Austin Powers theme, and who also had a hand in television by helping make shows like The Fresh Prince of Bel-Air and Mad TV, died on November 3 at the age of 91. Former Presidents Clinton and Obama, as well as President Biden and VP Harris all paid their tributes.
9 Project 2025 1,736,612 To sum the general reaction to this conservative plan for reforms, let's quote someone who didn't live to see #2:

I'm Afraid of Americans
I'm afraid of the world
I'm afraid I can't help it...

10 2024 United States elections 1,692,891 In addition to the presidential election (#1), the U.S. also saw elections in the Senate and House of Representatives, as well as gubernatorial and legislative elections.

Exclusions

  • These lists exclude the Wikipedia main page, non-article pages (such as redlinks), and anomalous entries (such as DDoS attacks or likely automated views). Since mobile view data became available to the Report in October 2014, we exclude articles that have almost no mobile views (5–6% or less) or almost all mobile views (94–95% or more) because they are very likely to be automated views based on our experience and research of the issue. Please feel free to discuss any removal on the Top 25 Report talk page if you wish.

Most edited articles

For the October 11 – November 11 period, per this database report.

Title Revisions Notes
Deaths in 2024 2084 Among the obituary's inclusions in the period, along with the three listed above, were Baba Siddique, Mitzi Gaynor, Paul Di'Anno and Tony Todd.
2024 United States presidential election 1675 We are citizens of this land
And we're here to lend a hand
We come together and we vote
Because we're all in the same boat...
Timeline of the Israel–Hamas war (27 September 2024 – present) 1600 The pain experienced in the Gaza Strip doesn't seem to end, and has extended to the West Bank and Lebanon.
2024 Maharashtra Legislative Assembly election 1332 A few months after choosing their federal representatives, India voted on their state assemblies. Maharashtra, the country's second most populous province (which houses their biggest city Mumbai), mostly went for the Bharatiya Janata Party that already rules the country.
Chromakopia 1242 One week after single "Noid", Tyler, the Creator released his eighth album to critical acclaim and quickly becoming the most successful rap album of the year (its first day on Spotify alone is one of the 20 biggest).
Tropical Storm Trami (2024) 1170 The Philippines were ravaged by this cyclone (that caused lesser damage once it reached Vietnam and Thailand), with 178 deaths, 23 people reported missing, 151 others injured, and US$374 million in damages.
2024 World Series 1108 Major League Baseball came down to the biggest cities of the United States, and the New York Yankees win on game 4 only delayed the title by the Los Angeles Dodgers. As mentioned above, the MVP was Freddie Freeman, and the Japanese designated hitter nicknamed "Shotime" justified the Dodgers paying him a record contract of $700 million over 10 years by helping them to a World Series right in his first season with the team.
2024 Pacific typhoon season 928 Tropical cyclones form between June and November, so lots of storms to cover. The strongest were Milton and Helene in the Atlantic, and Yagi and Krathon in the Pacific.
2024 Atlantic hurricane season 905
Israel–Hamas war 887 Ever since Israel went on war with Hamas, their other enemies Hezbollah took the opportunity for attacks of their own. Israel eventually decided to extend its war on Palestine to Lebanon, with exploding pagers, an air strike on the Hezbollah headquarters and ultimately a ground invasion. The international community just can't wait for the ceasefires.
Timeline of the Israel–Hezbollah conflict (17 September 2024 – present) 883
Liam Payne 811 The One Direction member went to Buenos Aires to solve O visa problems that would prevent him from going to his girlfriend's home in Miami, and while there watch a concert by former bandmate Niall Horan. Two weeks later he fell to death from his hotel room. Lots of edits were made with updates on the investigation, and apparently he fainted on the balcony after a night of drugs.
Donald Trump 773 And can you hear the sound of hysteria?
The subliminal mind Trump America...
2024 Jharkhand Legislative Assembly election 770 Another of India's State Assembly elections, namely for Jharkhand. The BJP were tied for the most seats with the Jharkhand Mukti Morcha.
Bigg Boss (Hindi TV series) season 18 769 One of the Indian versions of Big Brother.

Wikipedia:Wikipedia Signpost/2024-11-18/Recent research

Monday, 18 November 2024 00:00 UTC
File:SPINACH (SPARQL-Based Information Navigation for Challenging Real-World Questions) logo.png
Liu, Shicheng; Semnani, Sina; Triedman, Harold; Xu, Jialiang; Zhao, Isaac Dan; Lam, Monica
CC BY 4.0
75
0
450
Recent research

SPINACH: AI help for asking Wikidata "challenging real-world questions"


A monthly overview of recent academic research about Wikipedia and other Wikimedia projects, also published as the Wikimedia Research Newsletter.

"SPINACH": LLM-based tool to translate "challenging real-world questions" into Wikidata SPARQL queries

SPINACH's logo or custom emoji (from the paper's title, which we regret not being able to reproduce faithfully here)

A paper[1] presented at last week's EMNLP conference reports on a promising new AI-based tool (available at https://spinach.genie.stanford.edu/ ) to retrieve information from Wikidata using natural language questions. It can successfully answer complicated questions like the following:

"What are the musical instruments played by people who are affiliated with the University of Washington School of Music and have been educated at the University of Washington, and how many people play each instrument?"

The authors note that Wikidata is one of the largest publicly available knowledge bases [and] currently contains 15 billion facts, and claim that it is of significant value to many scientific communities. However, they observe that Effective access to Wikidata data can be challenging, requiring use of the SPARQL query language.

This motivates the use of large language models to convert natural language questions into SPARQL queries, which could obviously be of great value to non-technical users. The paper is far from being the first such attempt, see also below for a more narrowly tailored effort. And in fact, some of its authors (including Monica S. Lam and members of her group at Stanford) had already built such a system – "WikiSP" – themselves last year, obtained by fine-tuning an LLM; see our review: "Fine-tuned LLMs Know More, Hallucinate Less with Few-Shot Sequence-to-Sequence Semantic Parsing over Wikidata". (Readers of this column may also recall coverage of Wikipedia-related publications out of Lam's group, see "STORM: AI agents role-play as 'Wikipedia editors' and 'experts' to create Wikipedia-like articles" and "WikiChat, 'the first few-shot LLM-based chatbot that almost never hallucinates'" – a paper that received the Wikimedia Foundation's "Research Award of the Year".)

The SPINACH dataset

More generally, this kind of task is called "Knowledge Base Question Answering" (KBQA). The authors observe that many benchmarks have been published for it over the last decade, and that recently, the KBQA community has shifted toward using Wikidata as the underlying knowledge base for KBQA datasets. However, they criticize those existing benchmarks as either contain[ing] only simple questions [...] or synthetically generated complex logical forms that are not representative enough of real-world queries. To remedy this, they

introduce the SPINACH dataset, an expert-annotated KBQA dataset collected from forum discussions on Wikidata's "Request a Query" forum with 320 decontextualized question-SPARQL pairs. Much more complex than existing datasets, SPINACH calls for strong KBQA systems that do not rely on training data to learn the KB schema, but can dynamically explore large and often incomplete schemas and reason about them.

In more detail, the researchers scraped the "Request a Query" forum's archive from 2016 up to May 2024, obtaining 2780 discussions that had resulted in a valid SPARQL query, which were then filtered by various criteria and sampled to a subset of 920 conversations spanning many domains for consideration. Those were then further winnowed down with a focus on end-users rather than Wikipedia and Wikidata contributors interested in obscure optimizations or formatting. The remaining conversations were manually annotated with a self-contained, decontextualized natural language question that accurately captures the meaning of the user-written SPARQL. These steps include disambiguation of terms in the question as originally asked in the forum (For example, instead of asking "where a movie takes place", we distinguish between the "narrative location” and the "filming location"; thus avoiding an example that had confused the authors' own WikiSP system). This might be regarded as attaching training wheels, i.e. artificially making the task a little bit easier. However, another step goes in the other direction, by refrain[ing] from directly using [Wikidata's] entity and property names, instead using a more natural way to express the meaning. For instance, instead of asking "what is the point of time of the goal?", a more natural question with the same level of accuracy like "when does the goal take place?" should be used.

The SPINACH agent

The paper's second contribution is an LLM-based system, also called "SPINACH", that on the authors' own dataset outperforms all baselines, including the best GPT-4-based KBQA agent by a large margin, and also achiev[es] a new state of the art on several existing KBQA benchmarks, although on it narrowly remains behind the aforementioned WikiSP model on the WikiWebQuestions dataset (both also out of Lam's lab).

"unlike prior work, we design SPINACH with the primary goal of mimicking a human expert writing a SPARQL query. An expert starts by writing simple queries and looking up Wikidata entity or property pages when needed, all to understand the structure of the knowledge graph and what connections exist. This is especially important for Wikidata due to its anomalous structure (Shenoy et al., 2022). An expert then might add new SPARQL clauses to build towards the final SPARQL, checking their work along the way by executing intermediate queries and eyeballing the results."

This agent is given several tools to use, namely

  • searching Wikidata for the QID for a string (like a human user would using the search box on the Wikidata site). This addresses an issue that thwarts many naive attempts to use e.g. ChatGPT directly for generating SPARQL queries, which the aforementioned WikiSP paper already pointed out last year: "While zero-shot LLMs [e.g. ChatGPT] can generate SPARQL queries for the easiest and most common questions, they do not know all the PIDs and QIDs [property and item IDs in Wikidata]."
  • retrieving the Wikidata entry for a QID (i.e. all the information on its Wikidata page)
  • retrieving a few examples demonstrating the use of the specified property in Wikidata
  • running a SPARQL query on the Wikidata Query Service

The authors note that Importantly, the results of the execution of each action are put in a human-readable format to make it easier for the LLM to process. To limit the amount of information that the agent has to process, we limit the output of search results to at most 8 entities and 4 properties, and limit large results of SPARQL queries to the first and last 5 rows. That LLMs and humans have similar problems reading through copious Wikidata query results is a somewhat intriguing observation, considering that Wikidata was conceived as a machine-readable knowledge repository. (In an apparent effort to address the low usage of Wikidata in today's AI systems, Wikimedia Deutschland recently announced "a project to simplify access to the open data in Wikidata for AI applications" by "transformation of Wikidata’s data into semantic vectors.")

The SPINACH system uses the popular ReAct (Reasoning and Acting) framework for LLM agents,[supp 1] where the model is alternating between reasoning about its task (e.g. It seems like there is an issue with the QID I used for the University of Washington. I should search for the correct QID) and acting (e.g. using its search tool: search_wikidata("University of Washington")).

The generation of these thought + action pairs in each turn is driven by an agent policy prompt

that only includes high-level instructions such as "start by constructing very simple queries and gradually build towards the complete query" and "confirm all your assumptions about the structure of Wikidata before proceeding" [...]. The decision of selecting the action at each time step is left to the LLM.

Successfully answering a question with a correct SPARQL query can require numerous turns. The researchers limit these by providing the agents with a budget of 15 actions to take, and an extra 15 actions to spend on [...] "rollbacks" of such actions. Even so, Since SPINACH agent makes multiple LLM calls for each question, its latency and cost are higher compared to simpler systems. [...] This seems to be the price for a more accurate KBQA system.

Still, for the time being, an instance is available for free at https://spinach.genie.stanford.edu/ , and also on-wiki as a bot (operated by one of the authors, a – now former – Wikimedia Foundation employee), which has already answered about 30 user queries since its introduction some months ago.

Example from the paper: "The sequence of 13 actions that the SPINACH agent takes to answer a sample question from the SPINACH validation set. Here, the agent goes through several distinct phases, only with the high-level instruction [prompt]. Note that every step includes a thought, action and observation, but some are omitted here for brevity."

Briefly

Other recent publications

Other recent publications that could not be covered in time for this issue include the items listed below. Contributions, whether reviewing or summarizing newly published research, are always welcome.

"SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph"

From the abstract:[2]

"we evaluate several strategies for fine-tuning the OpenLlama LLM for question answering over life science knowledge graphs. In particular, we propose an end-to-end data augmentation approach for extending a set of existing queries over a given knowledge graph towards a larger dataset of semantically enriched question-to-SPARQL query pairs, enabling fine-tuning even for datasets where these pairs are scarce."

From the paper:

"Recently, the benchmark dataset so-called [sic] KQA Pro was released [...]. It is a large-scale dataset for complex question answering over a dense subset of the Wikidata1 KB. [...] Although Wikidata is not a domain specific KB, it contains relevant life science data."
"We augment an existing catalog of representative questions over a given knowledge graph and fine-tune OpenLlama in two steps: We first fine-tune the base model using the KQA Pro dataset over Wikidata. Next, we further fine-tune the resulting model using the extended set of questions and queries over the target knowledge graph. Finally, we obtain a system for Question Answering over Knowledge Graphs (KGQA) which translates natural language user questions into their corresponding SPARQL queries over the target KG."

A small number of "culprits" cause over 10 million "Disjointness Violations in Wikidata"

This preprint identifies 51 pairs of classes on Wikidata that should be disjoint (e.g. "natural object" vs. "artificial object") but aren't, with over 10 million violations, caused by a small number of "culprits". From the abstract:[3]

"Disjointness checks are among the most important constraint checks in a knowledge base and can be used to help detect and correct incorrect statements and internal contradictions. [...] Because of both its size and construction, Wikidata contains many incorrect statements and internal contradictions. We analyze the current modeling of disjointness on Wikidata, identify patterns that cause these disjointness violations and categorize them. We use SPARQL queries to identify each 'culprit' causing a disjointness violation and lay out formulas to identify and fix conflicting information. We finally discuss how disjointness information could be better modeled and expanded in Wikidata in the future."


"Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature Review"

From the abstract:[4]

"We review existing methods for automatically measuring the quality of Wikipedia articles, identifying and comparing machine learning algorithms, article features, quality metrics, and used datasets, examining 149 distinct studies, and exploring commonalities and gaps in them. The literature is extensive, and the approaches follow past technological trends. However, machine learning is still not widely used by Wikipedia, and we hope that our analysis helps future researchers change that reality."

References

  1. ^ Liu, Shicheng; Semnani, Sina; Triedman, Harold; Xu, Jialiang; Zhao, Isaac Dan; Lam, Monica (November 2024). "SPINACH: SPARQL-Based Information Navigation for Challenging Real-World Questions". In Yaser Al-Onaizan; Mohit Bansal; Yun-Nung Chen (eds.). Findings of the Association for Computational Linguistics: EMNLP 2024. Findings 2024. Miami, Florida, USA: Association for Computational Linguistics. pp. 15977–16001. Data and code Online tool
  2. ^ Rangel, Julio C.; de Farias, Tarcisio Mendes; Sima, Ana Claudia; Kobayashi, Norio (2024-02-07), SPARQL Generation: an analysis on fine-tuning OpenLLaMA for Question Answering over a Life Science Knowledge Graph, arXiv, doi:10.48550/arXiv.2402.04627 (accepted submission at SWAT4HCLS 2024: The 15th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences)
  3. ^ Doğan, Ege Atacan; Patel-Schneider, Peter F. (2024-10-17), Disjointness Violations in Wikidata, arXiv, doi:10.48550/arXiv.2410.13707
  4. ^ Moás, Pedro Miguel; Lopes, Carla Teixeira (2023-09-22). "Automatic Quality Assessment of Wikipedia Articles - A Systematic Literature Review". ACM Computing Surveys. doi:10.1145/3625286. ISSN 0360-0300.
Supplementary references and notes:
  1. ^ Yao, Shunyu; Zhao, Jeffrey; Yu, Dian; Du, Nan; Shafran, Izhak; Narasimhan, Karthik; Cao, Yuan (2023-03-09), ReAct: Synergizing Reasoning and Acting in Language Models, doi:10.48550/arXiv.2210.03629


File:Institute_Dendrology_-_3.jpg
Fira Guli
CC BY-SA 4.0
300
News from the WMF

Wikimedia Foundation and Wikimedia Endowment audit reports: FY 2023–2024

Elena Lappen is the Wikimedia Foundation's Movement Communications Manager; some content in this post was previously published on Diff.

Highlights from the fiscal year 2023–2024 Wikimedia Foundation and Wikimedia Endowment audit reports

Every year, the Wikimedia Foundation shares our audited financial statements along with an explanation of what the numbers mean. Our goal is to make our finances understandable, so that community members, donors, readers and more have clear insight into how we use our funds to further Wikimedia's mission.

This post explains the audit reports for both the Wikimedia Foundation and the Wikimedia Endowment for fiscal year 2023–2024, providing key highlights and additional information for those who want to dive deeper.

What is an audit report?

An audit report presents details on the financial balances and financial activities of any organization, as required by US accounting standards. It is audited by a third party (in the Foundation's and Endowment's case, KPMG) in order to validate accuracy. The Foundation has received clean audits for the past 19 years. Each annual audit is an opportunity to evaluate the Foundation's activities and credibility as a responsible steward of donor funds.

The financial information found in the audit report is also then used to build an organization's Form 990, which is the form required by the United States government for organizations to maintain their nonprofit status. The Form 990 is released closer to the end of the current fiscal year.

Key takeaways from the Foundation's fiscal year 2023-2024 audit report

The Foundation's 2023-2024 Annual Plan laid out a number of financial goals for the fiscal year. Below are key takeaways from the audit report related to those goals:

  • Clean audit opinion: The external auditors, KPMG, issued their opinion that the Wikimedia Foundation's financial statements for FY 2023–2024 are presented accurately, marking the 19th consecutive year of clean audits since the Foundation's first audit in 2006.
  • Expense growth slowing in line with target: In anticipation of slower revenue growth, our 2023–2024 Annual Plan aimed to slow budget growth to around 5% after significant growth in the prior five years averaging 16%. We were able to reach that goal: during the fiscal year, expenses grew at 5.5% ($9.4M), from $169.1M to $178.5M. This came in at only slightly over our target of $177M. Growth in expenses was driven primarily by increases in movement funding (detailed below) and increases in personnel cost due mostly to cost of living adjustments. The Foundation is working to continue this trend of stabilizing growth in the current fiscal year. As outlined in the annual plan for fiscal year 2024–2025, the budget is expected to be $188.7M, which is 6% percent year on year growth.
→ During the year, we prioritized spending on a number of Infrastructure related projects which is the largest area of the Foundation's work. Projects included a revamp of the Community Wishlist, new features for events and campaigns, improvements in moderation tools (e.g., EditCheck, Automoderator, Community Configuration etc.), and a new data center in Brazil.
→ Also during the year, we decided not to renew our lease of our San Francisco office and to instead move to a small administrative space. This move was aimed at both reducing expenses and responding to an increasingly global workforce, where the vast majority of employees (82%) are based outside the San Francisco Bay Area. This move will result in a rent cost savings of over 80% per month.
  • More budget shifted toward movement support: The Annual Plan aimed to increase the percentage of the budget that goes directly to supporting the mission. This means working to minimize both fundraising and administrative costs and increase support for things like platform maintenance, grants to communities, feature development and more. This year's percentage was 77.5%, up from 76% in the prior fiscal year. In real terms, this means that $9.8M more went to direct movement support in the 2023-2024 fiscal year than the prior fiscal year. While this percentage was just shy of our goal of 77.9%, it is well within the range of best practice for nonprofits, which recommends that at least 65% be devoted to programmatic work.
→ Progress was made on greater effectiveness in how we communicate with communities which collectively speak hundreds of languages. A new system for providing translations of core Foundation documentation enabled us to complete more than 650 requests for translations in a year. This has increased the number of languages supported from six to thirty-four languages in written translations. As an added benefit, the translations are provided by members of the Wikimedia volunteer community – whose experience and knowledge of the movement provides much higher quality translations.
  • Growth sustained in community grants: In spite of the Foundation's overall growth slowing to 5%, we increased community grants by $2.2M, or 9.9% from the previous fiscal year. Our Annual Plans have repeatedly prioritized growing community funding at a significantly higher rate than the overall budget–a goal we have continued to prioritize in the 2024-2025 Annual Plan.
→ We support our grantees by working closely with them to form strategic partnerships to close content gaps. An example is how we supported community gender gap campaigns in biographies and women's health during Women's History Month. This included running the Wikipedia Needs More Women campaign (14.5M Unique people reached) and coordinating the global landing page and calendar for the Celebrate Women campaign.
  • Exploring diversified revenue streams for the movement: In order to ensure the movement's future financial sustainability, the Foundation has aimed to diversify our revenue streams over time. For several years, we have been anticipating a trend where fundraising revenue through banners would no longer represent the majority of our donations. During fiscal year 2023–2024, the Foundation's total revenue was $185.4M, of which $174.7M came from donations. This total number represents not only banner fundraising, but also increased percentages in email and major gift donations. Diversified donation income was complemented by increased investment income, income from the Wikimedia Endowment's cost-sharing agreement, and increased income from Wikimedia Enterprise. Investment income was $5.1M up from $3M in the prior year, primarily due to increased interest income from higher interest rates during the year. The new cost sharing agreement with the Wikimedia Endowment generated $2.1M in revenue to offset costs incurred by the Foundation to support the Endowment (Note: This is in addition to the $2.6 million the Foundation received from the Endowment to support technical innovation projects), and Wikimedia Enterprise brought in gross revenue of $3.4M, up slightly from $3.2M in FY 2022–2023. While diversification fell slightly short of our Annual Plan goals, we believe we are still on track over the medium-term: Enterprise contracts have since increased $400K year over year in monthly revenue so far in FY 2024–2025, and we anticipate more income to be generated from Enterprise in subsequent fiscal years.
→ More about Enterprise's financials and the work to diversify revenue streams is available in the Enterprise financial report. More information about the Endowment detailed below.

You can read the full audit report on the Foundation's website, review the frequently asked questions on Meta-Wiki, or ask any additional questions on the FAQ talk page.

Key takeaways from the Wikimedia Endowment's fiscal year 2023–2024 audit report

The Wikimedia Endowment has completed its audit report covering the fiscal year (FY) 2023–2024, which was the nine month time period from 30 September 2023 – 30 June 2024, from the time that the Endowment began operations as a standalone 501(c)(3) organization on 30 September 2023 through the end of the fiscal year on 30 June 2024. This was the first year that the Wikimedia Endowment completed an independent audit report, as it became a standalone 501(c)(3) during this fiscal year. The Endowment is a permanent fund that generates income for the Wikimedia projects in perpetuity with the aim of protecting Wikimedia projects far into the future. The work was overseen by the Endowment's Audit Committee, led by Chair Kevin Bonebrake. Here are a few key takeaways:

  • Clean audit opinion: The external auditors, KPMG, issued their opinion that the Wikimedia Endowment's financial statements for fiscal year 2023–2024 are presented fairly and in accordance with U.S. GAAP.
  • Revenue from Tides transfer, donations, and investment income: The Endowment's total revenue was $132.0M for fiscal year 2023–2024. However, the vast majority of this revenue came from the transfer of $116.2M of the Endowment fund from the Tides Foundation. Funds for the Endowment were held by the Tides Foundation from 2016–2023. In 2023, the Endowment became its own standalone 501(c)(3). At that point, all of the Endowment funds held by Tides were transitioned over to the new entity in the form of a one-time transfer. The Endowment received $13.4M in new donations during FY 2023-2024 and had $2.4M in investment income.
  • Funding to support Wikimedia projects: The Endowment provided $2.9M in funding in FY 2023–2024 to support technical innovation on the Wikimedia projects: $1.5M for MediaWiki upgrades, $600,000 for Abstract Wikipedia, $500,000 for efforts aimed at reaching new audiences, and $278,375 for Kiwix. More information about this round of Endowment funding can be found here.
  • Strong financial position: As of June 30, 2024, the Endowment's net assets were $144.3 million, made up primarily of cash of $20.1M and investments of $123.4M. These assets have generated $19.7M in returns on investment during FY 2023–2024, of which $6.1M has been used to fund technological innovation of the Wikimedia projects over the past two fiscal years.

You can read the full audit report, review the frequently asked questions on Meta-Wiki, or ask any additional questions on the FAQ talk page.

About the Wikimedia Endowment

Launched in 2016, the Wikimedia Endowment is a nonprofit charitable organization providing a permanent safekeeping fund to support the operations and activities of the Wikimedia projects in perpetuity.  It aims to create a solid financial foundation for the future of the Wikimedia projects. As of June 30, 2024, the Wikimedia Endowment was valued at $144.3 million USD. The Wikimedia Endowment is a U.S.-based 501(c)3 charity (Tax ID: 87-3024488). To learn more, please visit www.wikimediaendowment.org.

Wikipedia:Wikipedia Signpost/2024-11-18/News and notes

Monday, 18 November 2024 00:00 UTC
File:Narcisse Snake Dens 10.jpg
Jucá Costa
cc by-sa 4.0
90
25
500
News and notes

Open letter to WMF about court case breaks one thousand signatures, big arb case declined, U4C begins accepting cases

Arbitration declined in case with much private evidence

The opening statement in a new arbitration case request, titled "Covert canvassing and proxying in the Israel-Arab conflict topic area" read:

There is ongoing coordination of off-wiki editors for the purpose of promoting a pro-Palestinian POV, utilizing a discord group, as well as an EEML-style mailing list (Private Evidence A).
A significant participant in the discord group, as well as the founder of the mailing list (Private Evidence B), is a community banned editor (Private Evidence C), who since being banned has engaged in the harassment and outing of Wikipedia editors (Private Evidence D). This individual has substantial reach (Private Evidence E), and their list appears to have been joined by a substantial number of editors, although I am only confident of the identify of three.
The Discord group was previously public, but has now transitioned to a private form in order to better hide their activities (Private Evidence F). It is not compliant with policy, being used to organize non-ECP editors to make edits within the topic area, some of whom have now become extended-confirmed through these violations. In addition, it is used by the community-banned editor to make edit requests, edit requests that are acted upon (Private Evidence G).

There was much discussion by community members voicing concern of a public posting of wide-reaching allegations. Some of the discussion mitigated or accepted the alleged off-wiki coordination, and some did not. Comments included:

Editor 1: another illustration that there are ugly undercurrents about conflicts involving the editing of articles on the Palestinian-Israeli conflict.
Editor 2: goalpost-moving ARBECR [extended confirmed restriction] enforcement creep... expanding ... into literally doxxing editors
Editor 3: public aspersions based on secret denunciations
Arb 1: Decline this publicity stunt
Arb 2: [The filer] shouldn't have just dumped a pile of private evidence in public. But I also don't see how we get out of dealing with the merits of this issue

At our deadline, five out of 10 active arbitrators had voted to decline the public case, which effectively kills the request according to current procedures. However, at approximately the same time as the consensus to decline this case emerged, arbs opened new motions regarding Palestine-Israel articles, "a case to examine the interaction of specific editors in the WP:PIA topic area ... Evidence from the related private matter, as alluded to in the Covert canvassing and proxying in the Israel-Arab conflict topic area case request, will be examined prior to the start of the case, and resolved separately."

B

A petition in the form of an open letter addressed to the Wikimedia Foundation has been created regarding the ongoing lawsuit in India (see also In the media in this issue). Its signatories are profoundly concerned at the suggestion that the Foundation is considering disclosing identifying private information about volunteer editors to the Delhi High Court.

The most signed petition in Wikimedia history before this was the 2020 Community open letter on renaming, which successfully asked the Wikimedia Foundation to refrain from renaming itself to "Wikipedia". That one reached 1015 signatures after running for months. This petition has crossed 1015 signatures in 10 days, making it the strongest community consensus statement yet.

Separately, a site blackout was proposed, then closed with 2:1 opposition: Wikipedia:Requests for comment/2024 Wikipedia blackout. Some of the voters may have been persuaded by personal comments from Wikipedia's co-founder Jimbo Wales who is privy to board discussions on the case, and said I am personally not worried and think that a protest is unwarranted.B, Br, Q

U4C is accepting cases

The U4C is now accepting cases. See the relevant meta page for more information.

CheckUser and COI VRT appointments

Appointments to the Conflict-of-interest volunteer response team (COI VRT) and CheckUser privilege changes were announced by the Arbitration Committee. Spicy was added as a CheckUser. The COI VRT includes, in addition to CheckUsers and Oversighters, the following administrators: 331dot, Bilby, Extraordinary Writ, Robertsky.

Two administrator recalls, one RRFA

Wikipedia:Administrator_recall/Graham87 and Wikipedia:Administrator recall/Fastily were closed as successful. Re-request for adminship (RRFA) remains an option for all recalled administrators, with lower thresholds than a regular RfA. As of our deadline, Graham87's RRFA is active. – B

Brief notes

  • Reminder to apply for Affcom and Ombuds Comm / Case Review committee. Applications for the Affiliations Committee close on November 18, and applications for the Ombuds commission and the Case Review Committee close on December 2. See meta:Wikimedia Foundation/Legal/Committee appointments for details.
  • New administrators: The Signpost welcomes the English Wikipedia's newest administrators, Voorts and Worm That Turned. Voorts said he had been planning an RfA before the election dates were announced, running the first traditional RfA after the October AELECT trial.
  • Arbitration committee election: Questions may be asked of the candidates at Wikipedia:Arbitration Committee Elections December 2024/Questions. Voting will open for eligible community members at 00:00 19 November. Up to nine vacancies will be filled according to the election results.
  • Articles for Improvement: The Article for Improvement is Diurnality (beginning 25 November). Please be bold in helping improve this article![1]

Footnotes

  1. ^ There was no AfI for the week of 17 November and The Signpost has been unable to determine why.

Wikipedia:Wikipedia Signpost/2024-11-18/In the media

Monday, 18 November 2024 00:00 UTC
File:NSRW Map of Australia (cropped).png
?
PD
0
0
300
In the media

Summons issued for Wikipedia editors by Indian court, "Gaza genocide" RfC close in news, old admin Gwern now big AI guy, and a "spectrum of reluctance" over Australian place names

Asian News International case against Wikimedia and Wikipedia editors

Background: Asian News International vs. Wikimedia Foundation blanked by court order, Litigation involving the Wikimedia Foundation, prior Signpost coverage

Summons issued for Wikipedia editors in ANI case

Commentary and facts involving the case were published by Bar and Bench, India Legal Live (ENC Network), The Hindu, and Hindustan Times. At least one source said that according to a summons issued by Delhi High Court, WMF had released or will release email addresses of three editors, "Defendants 2–4".

According to MediaNama, one of the defendants signed the on-wiki open letter protesting the case (see related Signpost coverage). – B

Should Wikipedia be treated like a publisher?

Aditi Agrawal covers the ANI case for Hindustan Times. The question of Wikipedia's publisher-like status is also addressed in India Today's Fiiber channel on MSN, "Why has the Indian government issued a notice to Wikipedia, explained in 5 points". – B

Bias complaint: the phantom menace / MIB is MIA

As we went to press on our last issue abplive reported that "According to ANI, the government has written to Wikipedia highlighting a number of complaints of bias and inaccuracy. In the letter, the Centre pointed out that a small group of people have editorial control over the website." The "Centre" refers to the central Indian government or specifically the Indian Ministry of Information and Broadcasting (MIB).

The existence of this letter, or the timing of its issue, has itself been called into question. At The Signpost, we could not find a solid report to base a story on.

Some media just said there was "a notice" sent, another said unnamed government sources had spoken to one media outlet, and none we could find provided any real details (example, example). Since then, TechCrunch is also reporting that no complaint has been found by their staff, either. – B

RfC closure noted

This closure of a more than month-long Request for comments (RfC) at List of genocides was noted in several press sources ...

The RfC confirming the page title follows a Requested move talkpage discussion which initially set the title earlier this year – see previous Signpost coverage. – B

Luckey Gaetz Wikipedia

There's a bizarre style of biography that commonly appears off-Wiki in the less-than-reliable press with headlines like John Doe Wiki. This week "GhanaCelebrities" provided the best example I've seen "Ginger Luckey Gaetz Wiki, Age, Career, Husband". The article is so well-written – it doesn't seem to have been authored with either artificial intelligence or natural stupidity – that if provided with references it would take at least a week to delete if it were posted on-Wiki. Luckey Gaetz's main claims to fame – if not notability – are that she has a rich brother and is married to the former congressman and currently nominated U.S. Attorney General Matt Gaetz. Mrs. Gaetz, according to the article, is a KPMG manager who has taken some MBA courses through Harvard's online program and in person at UC Berkeley. Mr. Gaetz's notability includes accusations of drug use and paying for sex with minors.

A completely separate linking of Gaetz with Wikipedia was published as a trivia question in Above the Law. Kathryn Rubino asked "What law school did (Matt) Gaetz attend?" Despite a wealth of official sources that she could have linked to document the answer, she linked to Wikipedia. She told The Signpost that she did so "because Wikipedia is the easiest way to encapsulate multiple facts about a source with a single link. In this instance I wanted a reference that Matt Gaetz went to William & Mary Law as well as the other notable legal figures that went to the law school but never held the position of U.S. Attorney General." – S

Gwern interview: How a longtime Wikipedian became an influential voice in AI — and still remains anonymous

Dwarkesh Patel (a US podcaster who TIME magazine recently described as one of the 100 most influential people in AI) published an interview titled "Gwern Branwen - How an Anonymous Researcher Predicted AI's Trajectory". According to Patel, Gwern has "deeply influenced the people building AGI," and "If you've read his blog, you know he's one of the most interesting polymathic thinkers alive."

User:Gwern is also a longtime Wikipedian with almost 100k edits on English Wikipedia. While the interview mostly focused on AI and Gwern's life as an independent writer, it also discussed the pivotal role that editing Wikipedia had played for him:

Dwarkesh Patel

What is it that you are trying to maximize in your life?

Gwern

I maximize rabbit holes. I love more than anything else, falling into a new rabbit hole. That's what I really look forward to. Like this sudden new idea or area that I had no idea about, where I can suddenly fall into a rabbit hole for a while.
[...]

Dwarkesh Patel

What were you doing with all these rabbit holes before you started blogging? Was there a place where you would compile them?

Gwern

Before I started blogging, I was editing Wikipedia.
That was really gwern.net before gwern.net. Everything I do now with my site, I would have done on English Wikipedia. If you go and read some of the articles I am still very proud of—like the Wikipedia article on Fujiwara no Teika—and you would think pretty quickly to yourself, “Ah yes, Gwern wrote this, didn't he?”

Dwarkesh Patel

Is it fair to say that the training that required to make gwern.net happened on Wikipedia?

Gwern

Yeah. I think so. I have learned far more from editing Wikipedia than I learned from any of my school or college training. Everything I learned about writing I learned by editing Wikipedia. [...] For me it was beneficial to combine rabbit-holing with Wikipedia, because Wikipedia would generally not have many good articles on the thing that I was rabbit-holing on.

It was a very natural progression from the relatively passive experience of rabbit-holing—where you just read everything you can about a topic—to compiling that and synthesizing it on Wikipedia. You go from piecemeal, a little bit here and there, to writing full articles. Once you are able to write good full Wikipedia articles and summarize all your work, now you can go off on your own and pursue entirely different kinds of writing now that you have learned to complete things and get them across the finish line.

However, echoing concerns Gwern had already detailed in a 2009 essay titled In Defense of Inclusionism, he cautioned that

It would be difficult to do that with the current English Wikipedia. It's objectively just a much larger Wikipedia than it was back in like 2004. But not only are there far more articles filled in at this point, the editing community is also much more hostile to content contribution, particularly very detailed, obsessive, rabbit hole-y kind of research projects. They would just delete it or tell you that this is not for original research or that you're not using approved sources.

He also recalled other ways in which Wikipedia was different in its earlier years:

Gwern

I got started on Wikipedia in late middle school or possibly early high school.
It was kind of funny. I started skipping lunch in the cafeteria and just going to the computer lab in the library and alternating between Neopets and Wikipedia. I had Neopets in one tab and my Wikipedia watch lists in the other.

Dwarkesh Patel

Were there other kids in middle school or high school who were into this kind of stuff?

Gwern

No, I think I was the only editor there, except for the occasional jerks who would vandalize Wikipedia. I would know that because I would check the IP to see what edits were coming from the school library IP addresses. Kids being kids thought they would be jerks and vandalize Wikipedia.

For a while it was kind of trendy. Early on, Wikipedia was breaking through to mass awareness and controversy. It’s like the way LLMs are now. A teacher might say, “My student keeps reading Wikipedia and relying on it. How can it be trusted?”

"Gwern Branwen" is a pseudonym. Of interest to Wikipedians who are conscientious about keeping their real name separated from their public editing activity (see also coverage of a current open letter in this issue's News and notes), the interview also discusses benefits of maintaining anonymity. While it was conducted in person, responses were re-recorded by a different person, and for the customary video of the interview, an AI-generated avatar was created as a stand-in.

In other parts of the interview that might likewise resonate with Wikipedians who devote large amounts of unpaid work to their hobby, Patel asked various probing questions about Gwern's personal finances, again starting from his Wikipedia volunteering:

Dwarkesh Patel

When you were an editor on Wikipedia, was that your full-time occupation?

Gwern

It would eat as much time as I let it. I could easily spend 8 hours a day reviewing edits and improving articles while I was rabbit-holing. But otherwise I would just neglect it and only review the most suspicious diffs on articles that I was particularly interested in on my watchlist. I might only spend like 20 minutes a day. It was sort of like going through morning email.

and later

Dwarkesh Patel

How do you sustain yourself while writing full time?

Gwern

Patreon and savings. I have a Patreon which does around $900-$1000/month, and then I cover the rest with my savings. [...] So I try to spend as little as possible to make it last.
I should probably advertise the Patreon more, but I'm too proud to shill it harder.
[...]

I live in the middle of nowhere. I don't travel much, or eat out, or have health insurance, or anything like that. [...] I live like a grad student, but with better ramen. I don't mind it much since I spend all my time reading anyway.

The interview then took a rather consequential turn:

Dwarkesh Patel

It seems like you’ve enjoyed this recent trip to San Francisco [home of several AI labs mentioned earlier in the interview, like OpenAI and Anthropic]? What would it take to get you to move here?

Gwern

Yeah, it is mostly just money stopping me at this point. I probably should bite the bullet and move anyway. But I'm a miser at heart and I hate thinking of how many months of writing runway I'd have to give up for each month in San Francisco.

If someone wanted to give me, I don’t know, $50–100K/year to move to SF and continue writing full-time like I do now, I'd take it in a heartbeat.

Patel then encouraged him to share contact information for potential donors, and two days after the interview' release noted that these had indeed been found and that Gwern would be moving to San Francisco.

H

In brief

Exploding whale coverage in The Signpost



Do you want to contribute to "In the media" by writing a story or even just an "in brief" item? Edit next week's edition in the Newsroom or leave a tip on the suggestions page.


Tech News issue #47, 2024 (November 18, 2024)

Monday, 18 November 2024 00:00 UTC
previous 2024, week 47 (Monday 18 November 2024) next

Tech News: 2024-47

As far as the English Wikipedia is concerned, there is no red nor a blue link for the 2024 awardees of the Brewster medal. Its information ends in 2021. The German Wikipedia is up to date. There are no articles for Renée A. Duckworth and for Juan C. Reboreda on both Wikipedias, the German has two red links.

When you maintain information like this, there are three options. You can include an awardee in text or as a link and as luck will have it the link will turn red or blue. This is complicated because a link may have homonyms. With a red link you will only know an homonym issue once an article is created, with a blue link you may know immediately.

The Wikimedia Foundation solved a similar problem a long time ago for another type of link, the "interwiki link".  The solution is Wikidata. It works because there is only one identifier for every topic and every article needs a link to a Wikidata item to have a more global relevance.

Thanks to the ongoing development of Wikidata, there is the Wikibase. We should do a similar job for the red and blue links. It will do away with the false friends problems in Wikipedia. It will improve quality for each Wikipedia and it will improve the quality of Wikidata. Any data related updates that are not strictly local will remain at Wikidata because that helps us in the sharing of the sum of all knowledge.

When a new a link is to be added in any of the 333+ Wikipedias, it starts with disambiguation.. Is the subject already known in any of the other Wikipedias? If not a new Wikidata item will be created and extend options in any future disambiguation. If it is, available information and references are available from the start and consequently a Scholia, a Reasonator or any other generated view of the information may become available dependent on the policies of a Wikipedia.

Implementing such a Wikibase is not really problematic because all the blue links still refer through the local Wikipedia article to Wikidata. The red links are the more tricky bit. They are opened up once they are linked to a Wikidata item. 

With such a Wikibase in place, we can start doing the smart things. The Brewster medal, Q612041, could have a red or blue link to all the awardees. When they don't the article is to be reported for maintenance..

Cool?

     GerardM

weeklyOSM 747

Sunday, 17 November 2024 11:02 UTC

07/11/2024-13/11/2024

lead picture

Collage of some results Hackweekend Berlin November 2024 [1] | © tordans, Hartmut, Christian, Wolfram | Map data © OpenStreetMap contributors

Mapping

  • Requests for comments have been made on these proposals:
    • amenity=travellers_lounge for mapping public seating areas in transport facilities, such as airport lounges or railway station waiting areas.
    • virtual_tour=* to tag virtual 3D tours of places such as museums, hotels, or shops.
    • rental:powerbank=yes, to map stations where users can rent portable power banks to charge mobile devices on the go.
    • addr:milestone=* to allow the tagging of street addresses that use the distance from a reference point as part of the address.
    • languages:official=* and languages:preferred=*, to enable the specification of languages for name rendering, for example the targeted display of street names in different languages or scripts in map applications.
  • The proposal to delete busway=* for bus lanes was accepted with 21 votes in favour, 0 against, and 0 abstentions.

Mapping campaigns

  • Julien Minet reported on the status of address completion in Wallonia, Belgium, on OpenStreetMap, noting a current coverage of 65.9% compared to the ICAR database, although progress has been slowed by recent adjustments to official address sources. A map shows completion rates per commune, highlighting areas with significant updates, and challenges due to pre-assigned addresses in undeveloped areas.
  • If you ever visit Mexico and want to try the delicious esquites/elotes which are sold mostly on street stalls, there’s a collaborative map, so you will know where to find them.
  • The 50,000th challenge on MapRoulette, created just this week, marked a significant milestone in the platform’s journey to enabling collaborative map improvement around the world.

Community

  • OpenStreetMap Belgium, an independent NGO since 2023, continued to work with key supporters such as TomTom and the Belgian National Crisis Centre, organising events including the first European State of the Map in a decade, and providing updated resources such as free Belgian map tiles twice a year.
  • Pieter Vander Vennet explained how OpenStreetMap users can verify their accounts on Mastodon by linking profiles between the two platforms (we reported earlier).

Events

  • [1] Lars Lingner summarised the OpenStreetMap community’s Hackweekend in Berlin, Germany, where over 20 participants worked on creative projects and technical challenges while engaging in an open exchange on social and cartographic issues.
  • OpenStreetMap Belgium is hosting a mapathon in Bruges on Friday 29 November to support the Lili app, which helps visually impaired users navigate safely by mapping essential infrastructure such as tactile paving and audio-enabled traffic signals in Bruges.
  • On Monday 18 November, during the annual ‘Geography Awareness Week’, HeiGIT, in partnership with Doctors Without Borders, the German Red Cross, and the University of Würzburg, is hosting a mapathon in Heidelberg to produce essential map data for humanitarian aid. The event welcomes participants of all experience levels.
  • Mapping USA 2025, a virtual OpenStreetMap conference, is taking place on 24 and 25 January and will feature two days of talks, workshops, and community-driven discussions to engage mappers and advocates from across the US.
  • Geomob’s recent London, England, event to celebrate OpenStreetMap’s 20th birthday, on 18 September, is now available as a video, with slides and audio, thanks to volunteer Andrew Braye.
  • Calling all creative minds: #SotM2025 needs a logo. Submission is via email. Deadline: Saturday 30 November at 23:59 UTC. More details can be found on the Wiki page.

OSM research

  • A new study presented a dataset of classified building footprints for the US derived from OpenStreetMap data, distinguishing between residential and non-residential buildings. The classification, performed using an unsupervised method based on OSM tags and ancillary geospatial data, has been validated with high accuracy across different regions of the US, indicating its usefulness for urban planning, emergency response, and population studies.

Maps

  • 2hu4u has detailed a straightforward way of creating beautiful time-lapse videos of your mapping progress from historical OSM data, all from the comfort of QGIS and without the arduous process of downloading planet files, generating tiles or running a server. In the finished video you can watch the amazing transition, from a nearly blank canvas to a comprehensively mapped Australian city, over 16 years.
  • The open-source DeFlock project shows the global locations of automated licence plate readers, with over 5,600 identified worldwide, to raise awareness and help people avoid surveillance; it uses OpenStreetMap to document camera directions and create warning signs.
  • Andy Townsend described the development of a ‘rural pedestrian’ vector map for England and Wales, focusing on offline usability. This new vector schema, created with Tilemaker and MapLibre, simplifies the original raster schema by reducing data layers and enhancing feature styling, such as UK/IE road shields and handling of previously missed features such as derelict canal bridges. The vector format reduces map size significantly, making it suitable for offline use.
  • Vector tiles are now available on OSMF hardware (we reported earlier). The usage policy is not yet final, but you can use them now, according to Paul Norman. The map style is accessible via the new domain vector.openstreetmap.org and the tiles are available through a MVT address (https://vector.openstreetmap.org/shortbread_v1/{z}/{x}/{y}.mvt). There is a demonstration of the tiles rendered as a map.

OSM in action

  • pl6025 has created a map of commercial POIs in Loire (France) on uMap, which can be reproduced > with Overpass queries.
  • MetroDreamin’ is an interactive platform where users can design and share custom public transit maps, creating their ideal transit systems while connecting with a community of fellow map enthusiasts.

Open Data

  • Daylight Map Distribution v1.58 is the final release of this open geodata project (we reported earlier), concluding its efforts to provide curated and enhanced OSM data. More details about its closure are available in the official announcement.
  • Frederik Ramm, of Geofabrik, discussed access to historical OpenStreetMap data, explaining the tools and datasets available to researchers, the limitations due to OSM’s growth and historical changes, and offering assistance in extracting specific data.

Software

  • Daniel Schep introduced ‘Ultra v3’ (formerly Overpass Ultra), an enhanced mapping tool that now functions as a MapLibre GL JS IDE. New features include query providers for multiple GIS file formats, auto-sorting of map style directives, bundled icon sprite sheets, fallback glyphs, and an HTML control for adding map titles and custom controls, extending Ultra’s utility for geospatial queries and map customisation.
  • The latest update of Jake Coppinger’s Australian Cycleway Stats project improved the efficiency of data processing by adding features such as exclusive cycle lane detection for safer routes on low-speed roads, parallel processing for Australian and international data, and hard-coded Overpass API endpoints for different regions, improving the reliability and speed of data collection for cycleway infrastructure across Australia.
  • The Every Door project, funded by NLNet’s NGI0 Commons Fund, focuses on building the best OpenStreetMap mobile editor for point of interest and address capture, with planned features such as vector tiles and customisation to improve mapping and interoperability.
  • PinPoi is an app for managing and navigating to Points of Interest (POI) by importing files in various formats (e.g. GPX, GeoJSON, CSV) directly to mobile devices. It supports location-based POI searches and displays results in a list or on a map, integrating with users’ preferred navigation applications.
  • TripGeo is offering Map Snake, an interactive map-based game where players navigate a snake on a map to explore different locations, combining geography with classic game elements.
  • The VeloPlanner project is an interactive map focusing on European cycle routes and points of interest such as campsites, shelters, and historic sites, using OpenStreetMap data. It’s currently a map viewer, but planned updates include a route planner and detailed surface and infrastructure data. The platform uses a robust tech stack including MapLibre, Planetiler, osm2pgsql, and Elixir, hosted on Heroku and processed by a dedicated server.

Programming

  • zabop shared a streamlined workflow for editing OpenStreetMap tags, combining the Overpass API, Python, and MapProxy to efficiently identify, visualise, and edit features, emphasising simplicity and fun in the process.
  • ‘Overture to OSM’ is a Python package designed to translate map data from the Overture schema into OpenStreetMap compatible tags, supporting layers such as places, buildings, and addresses, while ensuring OSM compliance.
  • Gregory Peony shared a markdown-based validation feedback template for OpenStreetMap task managers, designed to streamline responses by including standard feedback, reasons for validation results, tips for revisiting tasks, and links to resources. This template aims to support efficient communication, guide contributors in accessing relevant data, and encourage constructive feedback through organised, reusable comments.
  • HOTOSM’s tech updates for November 2024 highlighted ongoing projects, including the alpha launch of Drone Tasking Manager with OpenDroneMap integration, testing of FastAPI for the Tasking Manager, and development of fAIr 2.0 with YOLOv8 model for building detection, as well as improvements to uMap authentication.

OSM in the media

  • For years, two families in Tannhausen suffered from unwanted through traffic from hikers, cyclists and motorists who used their garden as a shortcut. The reason was a map error on OpenStreetMap incorrectly identifying the private path as a public path. After the family discovered this, Florian Fränzl familiarised himself with the OpenStreetMap system and corrected the access to ‘private’. In addition, they have put up official ‘no passing through’ signs, which now ensure peace and quiet and protection of their privacy.

Other “geo” things

  • Berlin (Germany) has introduced a digital overview of all public car parks, which provides real-time information on availability and parking conditions and is intended to make it easier for citizens to find a parking space.
  • Lund University’s (Sweden) ‘mGPS’ system can identify locations with high precision using unique bacterial samples and offers new applications in medicine, epidemiology, and forensics.
  • In Ukraine, military GNSS spoofing to defend against Russian drones is causing problems for the civilian population, as smartphones switch to the wrong time zones and navigation services display inaccurate location data, leading to confusion and delays.
  • Grab has developed its own hyper-local mapping system across Southeast Asia, using input from drivers equipped with special cameras to overcome the challenges of narrow, unmapped roads and improve navigation accuracy, differentiating itself from Google Maps with its regional focus and real-time updates.
  • Google’s Open Buildings Dataset, now enhanced with AI-powered temporal updates, provides detailed building footprint data across Africa and Southeast Asia, supporting applications in urban planning, disaster response, and environmental research.
  • Last month, the US states of Oklahoma and Texas exchanged 0.54 hectares of territory within a reservoir so that a pipeline could resume sending drinking water to a water works without illegally transporting zebra mussels, an invasive species, across the state border.
  • South Korea has accused North Korea of using GNSS jamming signals to interfere with South Korean ships and aircraft, which represents a considerable security risk. The jamming is part of military actions that affect satellite navigation and has already disrupted civilian infrastructure and transport operations.
  • Transit App has introduced an offline feature to track the location of underground trains using motion detection and vibration patterns, allowing users to predict their location, update ETAs, and receive stop reminders without GPS or internet, while maintaining complete privacy.
  • Phoebe Yu explained, in an amusing video, how the problem of Indian addresses in Google Maps was solved. Spoiler: they include places of interest and shops in the route guidance to provide orientation points.

Upcoming Events

Where What Online When Country
Град Зрењанин Okupljanje u Zrenjaninu 2024-11-17 flag
Hannover OSM-Stammtisch Hannover 2024-11-18 flag
Grenoble Atelier du groupe local de Grenoble 2024-11-18 flag
Internationale GeoWoche – Online Mapathon von DRK, HeiGIT, MSF Deutschland & Österreich 2024-11-18
England OSM UK Online Chat 2024-11-18 flag
Workshop: OSM tagging standards for informal settlements 2024-11-19
Missing Maps London: (Online) Mid-Month Mapathon [eng] 2024-11-19
Lyon Réunion du groupe local de Lyon 2024-11-19 flag
Bonn 182. OSM-Stammtisch Bonn 2024-11-19 flag
City of Edinburgh OSM Edinburgh Social Meet-up 2024-11-19 flag
[Online] Map-py Wednesday 2024-11-20
Karlsruhe Stammtisch Karlsruhe 2024-11-20 flag
València XI Jornadas Anuales de Wikimedia España 2024-11-22 – 2024-11-24 flag
Gent Bewakingscamera’s op de kaart (wandeling) 2024-11-22 flag
Bangalore East OSM Bengaluru Mapping Party 2024-11-23 flag
Lyon Campus du Libre 2024 – Lyon – France 2024-11-23 flag
Gent Bewakingscamera’s op de kaart (wandeling) 2024-11-23 flag
명동 국경없는의사회 2024 글로벌 지오위크 매파톤 2024-11-23 flag
Saint-Étienne Rencontre Saint-Étienne et sud Loire 2024-11-25 flag
San Jose South Bay Map Night 2024-11-27 flag
Berlin OSM-Verkehrswende #64 2024-11-26 flag
Düsseldorf Düsseldorfer OpenStreetMap-Treffen (online) 2024-11-27 flag
Lübeck 148. OSM-Stammtisch Lübeck und Umgebung 2024-11-28 flag
Sint-Michiels LiLi-app mapathon 2024-11-29 flag
Olomouc SotM CZ+SK 2024 2024-11-29 flag
ঢাকা State of the Map Asia 2024-11-29 – 2024-11-30 flag

Note:
If you like to see your event here, please put it into the OSM calendar. Only data which is there, will appear in weeklyOSM.

This weeklyOSM was produced by Elizabete, LuxuryCoop, PierZen, Raquel Dezidério Souto, Strubbl, TheSwavu, YoViajo, barefootstache, derFred, mcliquid, tordans.
We welcome link suggestions for the next issue via this form and look forward to your contributions.

Eventually consistent plain text accounting

Wednesday, 13 November 2024 02:16 UTC
Spending for October, generated by piping hledger → R
Spending for October, generated by piping hledger → R

Over the past six months, I’ve tracked my money with hledger—a plain text double-entry accounting system written in Haskell. It’s been surprisingly painless.

My previous attempts to pick up real accounting tools floundered. Hosted tools are privacy nightmares, and my stint with GnuCash didn’t last.

But after stumbling on Dmitry Astapov’s “Full-fledged hledger” wiki1, it clicked—eventually consistent accounting. Instead of modeling your money all at once, take it one hacking session at a time.

It should be easy to work towards eventual consistency. […] I should be able to [add financial records] bit by little bit, leaving things half-done, and picking them up later with little (mental) effort.

– Dmitry Astapov, Full-Fledged Hledger

Principles of my system

I’ve cobbled together a system based on these principles:

  • Avoid manual entry – Avoid typing in each transaction. Instead, rely on CSVs from the bank.
  • CSVs as truth – CSVs are the only things that matter. Everything else can be blown away and rebuilt anytime.
  • Embrace version control – Keep everything under version control in Git for easy comparison and safe experimentation.

Learn hledger in five minutes

hledger concepts are heady, but its use is simple. I divide the core concepts into two categories:

  • Stuff hledger cares about:
    • Transactions – how hledger moves money between accounts.
    • Journal files – files full of transactions
  • Stuff I care about:
    • Rules files – how I set up accounts, import CSVs, and move money between accounts.
    • Reports – help me see where my money is going and if I messed up my rules.

Transactions move money between accounts:

2024-01-01 Payday
    income:work      $-100.00
    assets:checking   $100.00

This transaction shows that on Jan 1, 2024, money moved from income:work into assets:checking—Payday.

The sum of each transaction should be $0. Money comes from somewhere, and the same amount goes somewhere else—double-entry accounting. This is powerful technology—it makes mistakes impossible to ignore.

Journal files are text files containing one or more transactions:

2024-01-01 Payday
    income:work              $-100.00
    assets:checking           $100.00
2024-01-02 QUANSHENG UVK5
    assets:checking          $-29.34
    expenses:fun:radio        $29.34

Rules files transform CSVs into journal files via regex matching.

Here’s a CSV from my bank:

Transaction Date,Description,Category,Type,Amount,Memo
09/01/2024,DEPOSIT Paycheck,Payment,Payment,1000.00,
09/04/2024,PizzaPals Pizza,Food & Drink,Sale,-42.31,
09/03/2024,Amazon.com*XXXXXXXXY,Shopping,Sale,-35.56,
09/03/2024,OBSIDIAN.MD,Shopping,Sale,-10.00,
09/02/2024,Amazon web services,Personal,Sale,-17.89,

And here’s a checking.rules to transform that CSV into a journal file so I can use it with hledger:

# checking.rules
# --------------
# Map CSV fields → hledger fields[0]
fields date,description,category,type,amount,memo,_
# `account1`: the account for the whole CSV.[1]
account1    assets:checking
account2    expenses:unknown
skip 1

date-format %m/%d/%Y
currency $

if %type Payment
    account2 income:unknown
if %category Food & Drink
    account2 expenses:food:dining

# [0]: <https://hledger.org/hledger.html#field-names>
# [1]: <https://hledger.org/hledger.html#account-field>

With these two files (checking.rules and 2024-09_checking.csv), I can make the CSV into a journal:

$ > 2024-09_checking.journal \
    hledger print \
    --rules-file checking.rules \
    -f 2024-09_checking.csv
$ head 2024-09_checking.journal
2024-09-01 DEPOSIT Paycheck
    assets:checking        $1000.00
    income:unknown        $-1000.00

2024-09-02 Amazon web services
    assets:checking          $-17.89
    expenses:unknown          $17.89

Reports are interesting ways to view transactions between accounts.

There are registers, balance sheets, and income statements:

$ hledger incomestatement \
    --depth=2 \
    --file=2024-09_bank.journal

Revenues:
               $1000.00 income:unknown
-----------------------
               $1000.00


Expenses:
                 $42.31 expenses:food
                 $63.45 expenses:unknown
-----------------------
                $105.76
-----------------------
Net:            $894.24

At the beginning of September, I spent $105.76 and made $1000, leaving me with $894.24.

But a good chunk is going to the default expense account, expenses:unknown. I can use the hleger aregister to see what those transactions are:

$ hledger areg expenses:unknown \
    --file=2024-09_checking.journal \
    -O csv | \
  csvcut -c description,change | \
  csvlook
| description              | change |
| ------------------------ | ------ |
| OBSIDIAN.MD              |  10.00 |
| Amazon web services      |  17.89 |
| Amazon.com*XXXXXXXXY     |  35.56 |
l

Then, I can add some more rules to my checking.rules:

if OBSIDIAN.MD
    account2 expenses:personal:subscriptions
if Amazon web services
    account2 expenses:personal:web:hosting
if Amazon.com
    account2 expenses:personal:shopping:amazon

Now, I can reprocess my data to get a better picture of my spending:

$ > 2024-09_bank.journal \
    hledger print \
    --rules-file bank.rules \
    -f 2024-09_bank.csv
$ hledger bal expenses \
    --depth=3 \
    --percent \
    -f 2024-09_checking2.journal
              30.0 %  expenses:food:dining
              33.6 %  expenses:personal:shopping
               9.5 %  expenses:personal:subscriptions
              16.9 %  expenses:personal:web
--------------------
             100.0 %

For the Amazon.com purchase, I lumped it into the expenses:personal:shopping account. But I could dig deeper—download my order history from Amazon and categorize that spending.

This is the power of working bit-by-bit—the data guides you to the next, deeper rabbit hole.

Goals and non-goals

Why am I doing this? For years, I maintained a monthly spreadsheet of account balances. I had a balance sheet. But I still had questions.

Spending over six months, generated by piping hledger → gnuplot
Spending over six months, generated by piping hledger → gnuplot

Before diving into accounting software, these were my goals:

  • Granular understanding of my spending – The big one. This is where my monthly spreadsheet fell short. I knew I had money in the bank—I kept my monthly balance sheet. I budgeted up-front the % of my income I was saving. But I had no idea where my other money was going.
  • Data privacy – I’m unwilling to hand the keys to my accounts to YNAB or Mint.
  • Increased value over time – The more time I put in, the more value I want to get out—this is what you get from professional tools built for nerds. While I wished for low-effort setup, I wanted the tool to be able to grow to more uses over time.

Non-goals—these are the parts I never cared about:

  • Investment tracking – For now, I left this out of scope. Between monthly balances in my spreadsheet and online investing tools’ ability to drill down, I was fine.2
  • Taxes – Folks smarter than me help me understand my yearly taxes.3
  • Shared system – I may want to share reports from this system, but no one will have to work in it except me.
  • Cash – Cash transactions are unimportant to me. I withdraw money from the ATM sometimes. It evaporates.

hledger can track all these things. My setup is flexible enough to support them someday. But that’s unimportant to me right now.

Monthly maintenance

I spend about an hour a month checking in on my money Which frees me to spend time making fancy charts—an activity I perversely enjoy.

Income vs. Expense, generated by piping hledger → gnuplot
Income vs. Expense, generated by piping hledger → gnuplot

Here’s my setup:

$ tree ~/Documents/ledger
.
├── export
│   ├── 2024-balance-sheet.txt
│   └── 2024-income-statement.txt
├── import
│   ├── in
│   │   ├── amazon
│   │   │   └── order-history.csv
│   │   ├── credit
│   │   │   ├── 2024-01-01_2024-02-01.csv
│   │   │   ├── ...
│   │   │   └── 2024-10-01_2024-11-01.csv
│   │   └── debit
│   │       ├── 2024-01-01_2024-02-01.csv
│   │       ├── ...
│   │       └── 2024-10-01_2024-11-01.csv
│   └── journal
│       ├── amazon
│       │   └── order-history.journal
│       ├── credit
│       │   ├── 2024-01-01_2024-02-01.journal
│       │   ├── ...
│       │   └── 2024-10-01_2024-11-01.journal
│       └── debit
│           ├── 2024-01-01_2024-02-01.journal
│           ├── ...
│           └── 2024-10-01_2024-11-01.journal
├── rules
│   ├── amazon
│   │   └── journal.rules
│   ├── credit
│   │   └── journal.rules
│   ├── debit
│   │   └── journal.rules
│   └── common.rules
├── 2024.journal
├── Makefile
└── README

Process:

  1. Import – download a CSV for the month from each account and plop it into import/in/<account>/<dates>.csv
  2. Make – run make
  3. Squint – Look at git diff; if it looks good, git add . && git commit -m "💸" otherwise review hledger areg to see details.

The Makefile generates everything under import/journal:

  • journal files from my CSVs using their corresponding rules.
  • reports in the export folder

I include all the journal files in the 2024.journal with the line: include ./import/journal/*/*.journal

Here’s the Makefile:

SHELL := /bin/bash
RAW_CSV = $(wildcard import/in/**/*.csv)
JOURNALS = $(foreach file,$(RAW_CSV),$(subst /in/,/journal/,$(patsubst %.csv,%.journal,$(file))))

.PHONY: all
all: $(JOURNALS)
    hledger is -f 2024.journal > export/2024-income-statement.txt
    hledger bs -f 2024.journal > export/2024-balance-sheet.txt

.PHONY clean
clean:
        rm -rf import/journal/**/*.journal

import/journal/%.journal: import/in/%.csv
    @echo "Processing csv $< to $@"
    @echo "---"
    @mkdir -p $(shell dirname $@)
    @hledger print --rules-file rules/$(shell basename $$(dirname $<))/journal.rules -f "$<" > "$@"

If I find anything amiss (e.g., if my balances are different than what the bank tells me), I look at hleger areg. I may tweak my rules or my CSVs and then I run make clean && make and try again.

Simple, plain text accounting made simple.

And if I ever want to dig deeper, hledger’s docs have more to teach. But for now, the balance of effort vs. reward is perfect.


  1. while reading a blog post from Jonathan Dowland↩︎

  2. Note, this is covered by full-fledged hledger – Investements↩︎

  3. Also covered in full-fledged hledger – Tax returns↩︎

Wikipedia knew in a text about a fellow of the Royal Zoological Society of New South Wales. Unlike many other awards it does not have its own article, there is no category for these fellows, it has a paragraph in the article about the fellows.

Wikidata did not know the award. 

The list of fellows on the RZS website is formatted in a "last name, first name" format. There are too many fellows so converting it by hand is inconvenient. As so many people are enamoured by ChatGPT, I gave it a spin. ChatGPT does NOT process websites for me. So I copy pasted the list and asked it to change the order of the surname and the first name. 

I asked it who had a Wikipedia article. It could not tell me but it gave me a list of fellows who likely have a Wikipedia article. For many of them I added the award in Wikidata and for some fellows  I added a new Wikidata item. For many of them I linked publications and this results in a nice Scholia for the award

It would be really cool when there is a Wikimedia AI that will answer questions like: "for the people in this list change the order of the name and check if these Australian award winners have a Wikipedia article or a Wikidata item". Maybe start with a tool for editors and then open it up to the general public. 

Given that Wikipedia is multilingual, what would be the effect of the data for the answers being all Wikipedias AND Wikidata.. Given that Wikifunctions is language agnostic, why not have functions that are a front end to such a Wikimedia AI?

Thanks,

       GerardM

Wiki Education welcomes Richard Gingras to Advisory Board

Tuesday, 12 November 2024 17:00 UTC

Wiki Education is pleased to announce the appointment of Richard Gingras, a long-time executive at Google focusing on news, to our Advisory Board. Gingras steps into his role with extensive experience in digital media, deep engagement in the evolution of internet policy relating to the open Internet and a free press, and a strong commitment to Wiki Education’s mission.

“There are few things more foundational than building a society’s communal knowledge,” said Gingras. “Wiki Education’s effort to evangelize and develop Wikipedia authorship are critical to achieving that objective.”

Richard Gingras headshot
Richard Gingras. Image courtesy Richard Gingras, all rights reserved.

Throughout his 50-year career, Gingras has focused on the advance of news and information systems in an evolving digital society – from the evolution of search engines to enabling the next generation global news ecosystem. 

“Richard’s innovative spirit and deep knowledge about the web will bring invaluable insights to our work,” said Frank Schulenburg, Executive Director of Wiki Education. “His contributions will help us advance our goals and strengthen Wiki Education’s impact in the ever-evolving digital information landscape.“

For many years, Gingras served as the Global Vice President for News. In his current role, Gingras provides strategic guidance on how Google presents news to its users as well as advising Google’s efforts to enable a healthy, open ecosystem for quality journalism, including various programs to enable journalists and news providers to be effective and sustainable in our digital world. 

Gingras co-founded the Center for News, Technology, and Innovation, an independent global policy research center, that seeks to encourage independent, sustainable media, and foster informed public policy conversations to maintain a free press and an open internet. Gingras also served as a member of the Knight Commission o n Trust, Media, and Democracy, and helped found the Trust Project.

His broad experience with digital ventures includes leading Salon.com, as well as positions at Apple, the @Home Network, and the Excite search engine. Gingras also serves on the boards of the First Amendment Coalition, the International Center for Journalists, the International Consortium of Investigative Journalists, the UC Berkeley School of Journalism and PRX, the public media podcast network.

 

Goodbye ASN

Tuesday, 12 November 2024 00:00 UTC

The shitpost ASN is (soon to be) no more

Tech News issue #46, 2024 (November 11, 2024)

Monday, 11 November 2024 00:00 UTC
previous 2024, week 46 (Monday 11 November 2024) next

Tech News: 2024-46

This Month in GLAM: October 2024

Sunday, 10 November 2024 15:21 UTC