CoreXLSX - a Excel spreadsheet (XLSX) format parser written in pure Swift
CoreXLSX
Excel spreadsheet (XLSX) format parser written in pure Swift
CoreXLSX is a library focused on representing the low-level structure
of the XML-based XLSX spreadsheet
format. It allows you to open a
spreadsheet archive with .xlsx
extension and map its internal structure into
model types expressed directly in Swift.
Important to note that this library provides read-only support only for the .xlsx
format. As the older
legacy .xls
spreadsheet
format
has completely different internals, please refer to other
libraries if you need to work with files of
that type.
If your .xlsx
files use ECMA-376 agile
encryption (which seems to be the most popular variety), have a look at the
CryptoOffice library.
Automatically generated documentation is available on our GitHub Pages.
Example
To run the example project, clone the repo, and run pod install
from the
Example directory first.
Model types in CoreXLSX
directly map internal structure of XLSX format with
more sensible naming applied to a few attributes. The API is pretty simple:
import CoreXLSX
let filepath = "./categories.xlsx"
guard let file = XLSXFile(filepath: filepath) else {
fatalError("XLSX file at \(filepath) is corrupted or does not exist")
}
for wbk in try file.parseWorkbooks() {
for (name, path) in try file.parseWorksheetPathsAndNames(workbook: wbk) {
if let worksheetName = name {
print("This worksheet has a name: \(worksheetName)")
}
let worksheet = try file.parseWorksheet(at: path)
for row in worksheet.data?.rows ?? [] {
for c in row.cells {
print(c)
}
}
}
}
This prints raw cell data from every worksheet in the given XLSX file. Please refer
to the Worksheet
model
for more atttributes you might need to read from a parsed file.
Cell references
You should not address cells via their indices in the cells
array. Every
cell has a reference
property, which you can read to understand where exactly a given cell is located. Corresponding properties on the CellReference
struct give you the exact position of a cell.
Empty cells
The .xlsx
format makes a clear distinction between an empty cell and absence of a cell. If you're not getting a cell or a row when iterating through the cells
array, this means that there is no such cell or row in your document. Your .xlsx
document should have empty cells and rows written in it in the first place for you to be able to read them.
Making this distinction makes the format more efficient, especially for sparse spreadsheets. If you had a spreadsheet with a single cell Z1000000, it wouldn't contain millions of empty cells and a single cell with a value. The file only stores a single cell, which allows sparse spreadsheets to be quickly saved and loaded, also taking less space on the filesystem.
Finding a cell by a cell reference
Given how the .xlsx
format stores cells, you potentially have to iterate through all cells and build your own mapping from cell references to actual cell values. The CoreXLSX library does not currently do this automatically, and you will have to implement your own mapping if you need it. You're welcome to submit a pull request that adds such functionality as an optional step during parsing.
Shared strings
Strings in spreadsheet internals are frequently represented as strings
shared between multiple worksheets. To parse a string value from a cell
you should use stringValue(_: SharedStrings)
function on Cell
together with
parseSharedString()
on XLSXFile
.
Here's how you can get all strings in column "C" for example:
if let sharedStrings = try file.parseSharedStrings() {
let columnCStrings = worksheet.cells(atColumns: [ColumnReference("C")!])
.compactMap { $0.stringValue(sharedStrings) }
}
To parse a date value from a cell, use dateValue
property on the Cell
type:
let columnCDates = worksheet.cells(atColumns: [ColumnReference("C")!])
.compactMap { $0.dateValue }
Similarly, to parse rich strings, use the richStringValue
function:
if let richStrings = try file.parseSharedStrings() {
let columnCRichStrings = worksheet.cells(atColumns: [ColumnReference("C")!])
.compactMap { $0.richStringValue(sharedStrings) }
}
Styles
Since version 0.5.0 you can parse style information from the archive with the
new parseStyles()
function. Please refer to the Styles
model
for more details. You should also note that not all XLSX files contain style
information, so you should be prepared to handle the errors thrown from
parseStyles()
function in that case.
Here's a short example that fetches a list of fonts used:
let styles = try file.parseStyles()
let fonts = styles.fonts?.items.compactMap { $0.name?.value }
To get formatting for a given cell, use format(in:)
and font(in:)
functions, passing it
the result of parseStyles
:
let styles = try file.parseStyles()
let format = worksheet.data?.rows.first?.cells.first?.format(in: styles)
let font = worksheet.data?.rows.first?.cells.first?.font(in: styles)
Reporting compatibility issues
If you stumble upon a file that can't be parsed, please file an
issue posting the exact error
message. Thanks to use of standard Swift Codable
protocol, detailed errors are
generated listing a missing attribute, so it can be easily added to the model
enabling broader format support. Attaching a file that can't be parsed would
also greatly help in diagnosing issues. If these files contain any sensitive
data, we suggest obfuscating or generating fake data with same tools that
generated original files, assuming the issue can still be reproduced this way.
If the whole file can't be attached, try passing a sufficiently large value
(between 10 and 20 usually works well) to errorContextLength
argument of
XLSXFile
initializer. This will bundle the failing XML snippet with the debug
description of thrown errors. Please also attach the full debug description if
possible when reporting issues.
How does it work?
Since every XLSX file is a zip archive of XML files, CoreXLSX
uses
XMLCoder
library and standard
Codable
protocols to map XML nodes and atrributes into plain Swift structs.
ZIPFoundation
is used for
in-memory decompression of zip archives. A detailed description is available
here.
Requirements
Apple Platforms
- Xcode 11.3 or later
- Swift 5.1 or later
- iOS 9.0 / watchOS 2.0 / tvOS 9.0 / macOS 10.11 or later deployment targets
Linux
- Ubuntu 16.04 or later
- Swift 5.1 or later
Installation
Swift Package Manager
Swift Package Manager is a tool for
managing the distribution of Swift code. It’s integrated with the Swift build
system to automate the process of downloading, compiling, and linking
dependencies on all platforms.
Once you have your Swift package set up, adding CoreXLSX
as a dependency is as
easy as adding it to the dependencies
value of your Package.swift
.
dependencies: [
.package(url: "https://github.com/CoreOffice/CoreXLSX.git",
.upToNextMinor(from: "0.14.1"))
]
If you're using CoreXLSX in an app built with Xcode, you can also add it as a direct
dependency using Xcode's
GUI.
CocoaPods
CoreXLSX is available through CocoaPods on Apple's
platforms. To install it, simply add pod 'CoreXLSX', '~> 0.14.1'
to your
Podfile
like shown here:
source 'https://github.com/CocoaPods/Specs.git'
# Uncomment the next line to define a global platform for your project
# platform :ios, '9.0'
use_frameworks!
target '<Your Target Name>' do
pod 'CoreXLSX', '~> 0.14.1'
end
Contributing
Sponsorship
If this library saved you any amount of time or money, please consider sponsoring
the work of its maintainer. While some of the
sponsorship tiers give you priority support or even consulting time, any amount is
appreciated and helps in maintaining the project.
Development Workflow
On macOS the easiest way to start working on the project is to open the
Package.swift
file in Xcode 11 or later. There is an extensive test suite that both
tests files end-to-end and isolated snippets against their corresponding model
values.
If you prefer not to work with Xcode, the project fully supports SwiftPM and the
usual workflow with swift build
and swift test
should work, otherwise please
report this as a bug.
Coding Style
This project uses SwiftFormat
and SwiftLint to
enforce formatting and coding style. We encourage you to run SwiftFormat within
a local clone of the repository in whatever way works best for you either
manually or automatically via an Xcode
extension,
build phase or
git pre-commit
hook etc.
To guarantee that these tools run before you commit your changes on macOS, you're encouraged
to run this once to set up the pre-commit hook:
brew bundle # installs SwiftLint, SwiftFormat and pre-commit
pre-commit install # installs pre-commit hook to run checks before you commit
Refer to the pre-commit documentation page for more details
and installation instructions for other platforms.
SwiftFormat and SwiftLint also run on CI for every PR and thus a CI build can
fail with inconsistent formatting or style. We require CI builds to pass for all
PRs before merging.
Code of Conduct
This project adheres to the Contributor Covenant Code of
Conduct.
By participating, you are expected to uphold this code. Please report
unacceptable behavior to conduct@coreoffice.org.
Maintainers
License
CoreXLSX is available under the Apache 2.0 license. See the
LICENSE file
for more info.