A Zotero plugin that enhances your research workflow with intelligent metadata discovery and automated file management.
- Attachment Validation: Automatically detect and remove broken file links while preserving valid PDFs and weblinks
- Smart Cleanup: Bulk processing to maintain clean, working attachments across your library
- Multi-API Metadata Fetching: Comprehensive metadata updates using 6+ APIs (CrossRef, OpenAlex, Semantic Scholar, OpenLibrary, Google Books, DBLP)
- Automatic DOI/ISBN Discovery: Find missing identifiers through intelligent title and author matching
- Support for Multiple Item Types: Journal articles, conference papers, preprints, and books
- Fallback Strategies: Multiple search approaches when primary methods fail
- Multi-Source File Search: Access content from 8+ sources including:
- Open Access: Unpaywall, CORE, Internet Archive
- Preprint Servers: arXiv with high reliability
- Academic Repositories: Library Genesis, Sci-Hub
- Custom Resolvers: Multiple mirror support with automatic fallback
- Smart Download Logic: Only downloads when needed, avoids duplicates
- Stored File Creation: All downloads create local stored files (never links)
Retrieval Flow Diagram
The retrieval flow is based on the following diagram:
This diagram was inspired by this Reddit post about accessing scientific papers.
- Published Version Discovery: Automatically find journal publications of arXiv preprints
- Smart Type Conversion: Convert arXiv journal articles to proper preprint format
- Version Management: Handle transitions from preprint to published versions
- Metadata Synchronization: Update bibliographic information when published versions are found
- Concurrent Processing: Handle multiple items simultaneously with intelligent rate limiting
- Progress Tracking: Real-time progress dialogs for large batch operations
- Error Resilience: Continue processing even when individual items fail
- Detailed Reporting: Comprehensive success/failure summaries with actionable insights
- One-Click Access: Right-click context menu integration
- Email Configuration: Simple setup for API access requirements
- Minimal Configuration: Works out-of-the-box with optional email for enhanced features
- Multilingual Support: English and Chinese locales included
- Download the latest release XPI file
- In Zotero 8/9, go to
Tools→Add-ons - Click the gear icon and select "Install Add-on From File..."
- Select the downloaded XPI file
- Restart Zotero
Note: This extension requires Zotero 8.0 or later. For Zotero 7.x compatibility, use an earlier version of this extension.
- Clone or download this repository
- Install dependencies:
npm install - Build the XPI:
npm run build - The XPI will be created at
.scaffold/dist/zotadata.xpi - Install as described above
Access Settings by:
- Right-click on any item in your Zotero library
- Select
Zotadata→Settings
- Email for Unpaywall API: Required for Unpaywall access
- Stored locally in Zotero preferences
- Only used for API requests, never shared
- CORE API Key: Optional key for higher rate limits
Features:
- Enable/Disable: Toggle to allow Sci-Hub as a fallback source (disabled by default)
- Fallback Position: Only tried after legitimate sources (Unpaywall, arXiv, CORE) fail
- Error Handling: Automatically disables after configured number of failures (default: 2)
- Mirror Discovery: Automatically finds working mirrors via sci-hub.pub
Settings:
- Max attempts before fallback (1-3, default: 2)
- Global setting persists until manually changed
By enabling Sci-Hub, you acknowledge:
- Understanding of potential legal implications
- Responsibility for compliance with local regulations
- Use for legitimate research purposes only
The plugin will always prioritize legal sources before attempting Sci-Hub.
Note: Your email is stored locally and only used for API requests to services like Unpaywall. The plugin will prompt you for an email the first time you use features that require it.
Right-click on selected items in your Zotero library to access:
- Validate References: Check and clean up attachments for selected items - removes broken file links while preserving valid PDFs and weblinks
- Update Metadata: Fetch and update metadata for journal articles, conference papers, preprints, and books using multiple APIs (CrossRef, OpenAlex, Semantic Scholar, OpenLibrary, Google Books) - can auto-discover missing DOIs/ISBNs
- Retrieve Files: Search and download missing PDF files from multiple sources (Unpaywall, arXiv, CORE, Library Genesis, Sci-Hub, Internet Archive) - only processes items without existing PDFs
- Process Preprints: Handle arXiv papers by finding published versions, updating metadata, downloading published PDFs, or converting to proper preprint format when no published version exists
Select multiple items to process them all at once. A progress dialog will show the status of each operation.
The plugin uses multi-factor validation to ensure correct metadata matching, especially for papers with identical titles:
- Author Overlap: Validates that search results share authors with the item
- Author Count Similarity: Rejects matches with drastically different author counts
- Year Proximity: Considers publication year in scoring
- Title Similarity: Uses word-based similarity scoring
- arXiv Fallback: Falls back to arXiv DOI when published DOI not found
For best results, ensure your items have:
- Complete author lists (not just first author)
- Publication year
- arXiv ID in Extra field (format:
arXiv: XXXX.XXXXX)
This famous paper has multiple versions and even other papers with identical titles. The plugin correctly identifies it by:
- Matching multiple authors (Goodfellow, Bengio, etc.)
- Checking year (2014 vs 2023 for other papers)
- Falling back to arXiv DOI (10.48550/arxiv.1406.2661) if published DOI not found
When using the Update Metadata feature:
- DOI is Critical: The feature heavily relies on a correct DOI for accurate metadata retrieval. If the DOI is missing or incorrect, results may be unreliable.
- Remove Authors First: For best results, consider removing the authors field before updating metadata. This allows the plugin to search and match based on title and DOI without being confused by incomplete or incorrect author information.
File retrieval success varies significantly by source type:
High Success Rate:
- arXiv Preprints: Very reliable due to arXiv's open access mandate and stable infrastructure
- Open Access Articles: Good success via Unpaywall for legitimately open access content
Moderate to Low Success Rate:
- Paywalled Journal Articles: More challenging due to publisher restrictions and legal considerations
- Books: Particularly difficult to obtain, especially recent publications
- Recent Papers: Sci-Hub has significantly reduced new uploads due to ongoing legal challenges
For difficult-to-find content, consider these community-recommended approaches:
- Anna's Archive: A promising source with about 5-minute wait time for link generation, but it is free.
- Google: Google is always our friend as the resource might be shared in reddit, github or some niche forums.
Note: This plugin automates the search across legitimate and widely-used academic sources. For content not available through these channels, manual research through additional academic resources may be necessary.
This plugin integrates with several external APIs and services:
- Purpose: Fetch metadata for DOIs
- Rate Limit: 50 requests/second (polite pool)
- Authentication: None required (email recommended)
- Purpose: Comprehensive academic work metadata and DOI discovery
- Rate Limit: Very generous, no authentication required
- Authentication: None required
- Purpose: AI-powered paper search and metadata
- Rate Limit: Reasonable limits for academic use
- Authentication: None required
- Purpose: Book metadata and ISBN discovery
- Rate Limit: Standard API limits
- Authentication: None required for basic use
- Purpose: Find open access PDF links
- Rate Limit: 100,000 requests/day
- Authentication: Email address required
- Purpose: Search and download arXiv papers
- Rate Limit: 3 seconds between requests
- Authentication: None required
- Purpose: Search academic papers for full-text access
- Rate Limit: 10,000 requests/month (free tier)
- Authentication: API key optional for higher rate limits
- Purpose: Academic paper and book repository
- Rate Limit: Subject to site availability
- Authentication: None required
- Purpose: Academic paper access service
- Rate Limit: Subject to site availability and blocking
- Authentication: None required
- Purpose: Open access books and historical documents
- Rate Limit: Standard API limits
- Authentication: None required
zotero-zotadata/
├── src/ # TypeScript source code
│ ├── apis/ # External API integrations (CrossRef, OpenAlex, etc.)
│ ├── core/ # Core utilities, types, error management
│ ├── features/ # Feature modules (attachment, metadata)
│ ├── modules/ # Feature modules (MetadataFetcher, ArxivProcessor)
│ ├── services/ # Shared services (Cache, Download, API)
│ ├── shared/ # Shared utilities and core components
│ ├── ui/ # UI components (Menu, Dialog, Preferences)
│ ├── utils/ # Utility functions
│ ├── constants/ # Constants and configuration
│ ├── __tests__/ # Test files
│ ├── index.ts # Main plugin class
│ └── addon.ts # Entry point bridging to bootstrap.js
├── typings/ # Custom TypeScript declarations
├── addon/ # Zotero plugin scaffold
│ ├── bootstrap.js # Plugin bootstrap for Zotero 8
│ ├── manifest.json # Plugin metadata (Zotero 8 format)
│ └── locale/ # Localization files (en-US, zh-CN)
├── skin/ # Plugin assets (icons, legacy CSS)
├── assets/ # Documentation assets
│ ├── images/ # Screenshots and diagrams
│ └── workflows/ # Workflow diagrams and flowcharts
├── zotero-plugin.config.ts # Build configuration
├── package.json # Node.js package config
├── tsconfig.json # TypeScript configuration
├── AGENTS.md # Development guidelines and conventions
└── README.md # This file
- Node.js 22+ (for zotero-plugin-scaffold 0.8.x)
- Zotero 8.0 or later (supports Zotero 9.x)
- TypeScript 5.8+
- Modern IDE with TypeScript support (VS Code recommended)
| Category | Technology |
|---|---|
| Language | TypeScript 5.8 |
| Runtime | Zotero (Firefox/XULRunner) |
| Build | esbuild (via zotero-plugin-scaffold) |
| Testing | Vitest |
| Linting | ESLint 9 + typescript-eslint |
| Formatting | Prettier |
| Types | zotero-types, @types/node |
| Toolkit | zotero-plugin-toolkit |
- Clone the repository
- Install dependencies:
npm install - Make your changes in the
src/directory (TypeScript only)
npm install # Install dependencies
npm run build # Build the XPI package and run type-check
npm run build:dev # Build in development mode (with source maps)
npm run type-check # Run TypeScript type checking
npm run lint:check # Check code style with Prettier and ESLint
npm run lint:fix # Auto-fix code style issues
npm test # Run unit tests with Vitest
npm run test:watch # Run tests in watch mode
npm run test:coverage # Run tests with coverage report
npm run test:live # Run integration tests with live APIs
npm start # Development server with hot reloadThis project follows strict TypeScript standards:
- Strict type annotations for all function parameters and return types
- No
anytypes - useunknownwith proper type guards - Path aliases:
@/core,@/modules,@/services,@/utils,@/apis,@/ui - Naming conventions:
- PascalCase: Classes, types, interfaces, enums
- camelCase: Variables, functions, methods
- UPPER_SNAKE_CASE: Constants, enum values
- Styling: Tailwind CSS (no raw CSS files)
- Async patterns: Prefer
async/awaitover.then()chains
See AGENTS.md for detailed development guidelines.
The project uses Vitest for testing:
npm test # Run all unit tests
npm run test:watch # Watch mode for development
npm run test:coverage # Generate coverage report
npm run test:live # Integration tests with real APIsTest structure:
src/__tests__/unit/- Unit tests for individual componentssrc/__tests__/integration/- Integration tests with live APIssrc/__tests__/setup.ts- Test setup and mock configurations
- Type-check frequently: Run
npm run type-checkto catch TypeScript errors early - Lint before commits: Run
npm run lint:checkto ensure code style compliance - Write tests: Add tests in
src/__tests__/for new functionality - Build and test: Run
npm run buildbefore testing in Zotero - Use hot reload: Run
npm startfor active development with automatic rebuilding
Recommended extensions:
- TypeScript and JavaScript Language Features (built-in)
- ESLint
- Prettier
- Vitest
Configure path aliases in your IDE to recognize @/* imports for better navigation and IntelliSense.
For active development, use the development server:
npm start # Starts Zotero with the plugin and watches for changesThis will:
- Build the plugin in development mode
- Launch Zotero with the plugin loaded
- Automatically rebuild and reload when files change
This version supports both Zotero 8 and Zotero 9:
- Module System: Bootstrap updated to use ESM modules (
ChromeUtils.importESModule) - Services Import: Uses
resource://gre/modules/Services.sys.mjsinstead of JSM - Target Platform: Built for Firefox 140+ (Zotero 8) and Firefox 115+ (Zotero 9)
- Build System: Uses
zotero-plugin-scaffold0.8.6 for modern Node.js support - Version Constraints:
strict_min_version: "8.0"andstrict_max_version: "9.*"
- Bootstrap.js: Updated from JSM to ESM imports
- File Structure: Plugin files moved to
addon/directory for scaffold compatibility - Build Tool: Replaced
build.shwithzotero-plugin-scaffoldnpm package - Node.js Requirement: Now requires Node.js 22+ (was 18+)
- Modular Design: MetadataFetcher refactored into separate services:
DOIDiscoveryService- DOI search across multiple APIsBookMetadataService- ISBN and book metadata handlingMetadataUpdateService- Field update operations
- Dependency Injection: Services can be injected for testing
- Utility Modules: Shared utilities for ISBN, similarity, and field operations
- Rate Limit Optimization: Proper API call ordering with delays
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly with Zotero 8
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.

