Skip to content

Take a csv file and transforms it into the DSpace Simple Archive format.

Notifications You must be signed in to change notification settings

lib-uoguelph-ca/dspace-csv-archive

Repository files navigation

DSpace CSV Archive

Takes a simple CSV spreadsheet, and a bunch of files and magically turns them into the DSpace Simple Archive format. Supports unicode characters in metadata. The tool will automatically strip unicode characters out of filenames.

Requirements

Requires Python version 3.8 or greater

Some simple rules for the CSV spreadsheet

  • The first row should be your header, which defines the values you're going to provide.
  • Only one column is mandatory: 'files'. Files can be organized in any way you want, just provide the proper path relative to the CSV file's location.
  • Add one column for each metadata element (eg: dc.title)
  • The order of the columns does not matter.
  • Only dublin core metadata elements are supported (for now).
  • Use the fully qualified dublin core name for each element (eg dc.contributor.author).
  • Languages can be specified by leaving a space after the element name and then listing the language.
  • Separate multiple values for an element by double-pipes (||).
  • If your metadata value has a comma in it, put some quotes around it. Eg: "Roses are red, violets are blue".

Example CSV structure

files dc.title en dc.contributor.author en dc.subject dc.type
something1.pdf||something_else1.pdf title 1 author 1 subject 1 Report
directory/something2.pdf "title 2, with comma" author 2a||author 2b subject 2 Article

Usage

./dspace-csv-archive /path/to/input/file.csv

or

python3 ./dspace-csv-archive /path/to/input/file.csv

If successful, the script will place the processed files into a directory called output in whatever directory you were in when you ran your command.

Note: The tool will overwrite any exisitng content in the output directory when it is run. If you want to save the results, copy them somewhere safe before you run the tool a second time.

Importing into DSpace

If it is not already, the directory should be placed in a location that the dspace user can access it and write to the directory. I recommend putting the directory into /home/dspace/imported-data/ and leaving it there so the mapfile can be easily found if it is needed later, e.g. to remove or modify imported data. One way to do this is:

sudo cp -r [directory-name] /home/dspace/imported-data/
sudo chown -R dspace:dspace /home/dspace/imported-data/[directory-name]

Now we are ready to use the import command that comes with DSpace. Be sure to run this command as the dspace user. Something like:

[dspace]/bin/dspace import --add --eperson=[importer's email address] --collection=[collection handle] --source=[directory-name] --mapfile=[directory-name]/mapfile

Before running the import, you can validate your import by running the same command above along with the validate argument. This will test the import without actually importing anything and report any issues:

[dspace]/bin/dspace import --add --valideate --eperson=[importer's email address] --collection=[collection handle] --source=[directory-name] --mapfile=[directory-name]/mapfile

Running the import command without the validate argument will add the items in the directory to the specified collection, and document the operations that were completed in the mapfile. If the import didn't work as you planned, you can use the mapfile to reverse the operations.

The mapfile that's generated as part of this import command is particularly important, and the file that gets generated should be kept along with the rest of the input files. You can use the mapfile to reverse or modify the import using the command-line tools. Please refer to the DSpace documentation for more information about the DSpace Simple Archive Format or the import/export commands.

About

Take a csv file and transforms it into the DSpace Simple Archive format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages