Skip to contents

The aim of {obus} to provide users with tidy DATRAS tables with non-ambiguous variables.

That said, {obus} is a temporary experimental package used to explore various DATRAS data connections and wrapper functions to make life a little easier for everyday user. Some of that may be taken up in a more official package. Or possibly not. So far {obus} does actually very little.

For purist, one regrets to inform that this package has quite some number of dependencies (see DESCRIPTION). It should however be possible to trim down that fat.

For more information, check the README.md.

Installation

You can install the development version of {obus} from GitHub running:

remotes::install_github("einarhjorleifsson/obus")

In some cases {obus} uses wrapper functions depending on {icesDatras} features that have, as of yet, not been taken up in the official ICES version (issues pending) install that package via:

remotes::install_github("einarhjorleifsson/icesDatras", force = TRUE)

There are two ways to connect to the DATRAS data, either by importing the whole datasets into R or by making an in-process DuckDB database connection.

Importing

The fastest way to import the full DATRAS data into R is:

system.time({
  hh <- dr_get("HH", from = "parquet")
  hl <- dr_get("HL", from = "parquet")
  ca <- dr_get("CA", from = "parquet")
})
#>    user  system elapsed 
#>   5.520   1.490   5.777

So we are talking about less than 5 seconds if you sitting on the optic fiber. If you are connected via poor wifi this may take more than a minute. Whatever the case one can assume that nobody will complain given that the dimension just imported are as follows:

type rows cols
HH 146957 76
HL 14092714 39
CA 5800888 41

Number of records and variables

The above fast access is achieved by importing from parquet files that are hosted on a conventional https-server. The parquet data source is as of now not a mirror of the data residing at ICES, but a recent copy. ICES datacenter is currently exploring ways to serve a mirror of the DATRAS data via parquet files hosted on a cloud service.

If one wants an up-to-date mirror one could use a slower route using a new API from the ICES datacenter. In R one can use the icesDatras::get_datras_unaggregated_data function. That function has been wrapped into dr_get so one can get data from many surveys with one command. E.g. all surveys from 2025 can be obtained by:

# Not run
hh <- obus::dr_get("HH", years = 2025, from = "new", quiet = TRUE)

Connecting

Although the DATRAS data can not be considered big data, one can use techniques developed for such datasets. So instead of importing the full dataset into R one can generate a connection to the url-hosted parquet files (remember, these are not fully up-to-date) using in-process DuckDB database.

HH data

hh <- dr_con("HH")
hh |> dplyr::glimpse()
#> Rows: ??
#> Columns: 76
#> Database: DuckDB 1.5.0 [root@Darwin 25.3.0:R 4.5.2/:memory:]
#> $ RecordHeader            <chr> "HH", "HH", "HH", "HH", "HH", "HH", "HH", "HH…
#> $ Survey                  <chr> "BITS", "BITS", "BITS", "BITS", "BITS", "BITS"…
#> $ Quarter                 <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ Country                 <chr> "DK", "DK", "DK", "DK", "DK", "DK", "DK", "DK"…
#> $ Platform                <chr> "26D4", "26D4", "26D4", "26D4", "26D4", "26D4"…
#> $ Gear                    <chr> "CAM", "CAM", "CAM", "EXP", "EXP", "GRT", "GRT…
#> $ SweepLength             <int> NA, NA, NA, 110, 110, NA, NA, NA, NA, NA, NA, …
#> $ GearExceptions          <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ DoorType                <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ StationName             <chr> "150", "151", "152", "149", "147", "10", "101"…
#> $ HaulNumber              <int> 67, 68, 69, 66, 65, 8, 54, 2, 55, 56, 57, 59, …
#> $ Year                    <int> 1991, 1991, 1991, 1991, 1991, 1991, 1991, 1991…
#> $ Month                   <int> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3…
#> $ Day                     <int> 20, 20, 20, 19, 19, 6, 17, 5, 17, 17, 17, 17, …
#> $ StartTime               <chr> "0514", "0644", "0923", "2128", "1829", "1417"…
#> $ DepthStratum            <chr> "11", "11", "12", "12", "12", "10", "11", "9",…
#> $ HaulDuration            <int> 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60, 60…
#> $ DayNight                <chr> "D", "D", "D", "N", "N", "D", "D", "D", "D", "…
#> $ ShootLatitude           <dbl> 55.6000, 55.6667, 55.5167, 55.4500, 55.5500, 5…
#> $ ShootLongitude          <dbl> 16.2500, 16.2667, 16.1667, 15.1167, 15.1833, 1…
#> $ HaulLatitude            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ HaulLongitude           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ StatisticalRectangle    <chr> "40G6", "40G6", "40G6", "39G5", "40G5", "38G4"…
#> $ BottomDepth             <int> 76, 71, 80, 83, 80, 47, 79, 34, 60, 80, 80, 73…
#> $ HaulValidity            <chr> "V", "V", "V", "V", "V", "V", "V", "V", "V", "…
#> $ HydrographicStationID   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "104+5", N…
#> $ StandardSpeciesCode     <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "…
#> $ BycatchSpeciesCode      <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "…
#> $ DataType                <chr> "C", "C", "C", "C", "C", "C", "C", "C", "C", "…
#> $ NetOpening              <dbl> NA, 5, 5, 7, 16, 3, 3, 4, 3, 3, 3, 3, 3, 3, 3,…
#> $ Rigging                 <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ Tickler                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ Distance                <dbl> 6111, 6482, 6482, 6667, 8519, 6667, 6482, 6296…
#> $ WarpLength              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ WarpDiameter            <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ WarpDensity             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ DoorSurface             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ DoorWeight              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ DoorSpread              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ WingSpread              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ Buoyancy                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ KiteArea                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ GroundRopeWeight        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ TowDirection            <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SpeedGround             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SpeedWater              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SurfaceCurrentDirection <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SurfaceCurrentSpeed     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ BottomCurrentDirection  <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ BottomCurrentSpeed      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ WindDirection           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ WindSpeed               <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SwellDirection          <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SwellHeight             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SurfaceTemperature      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ BottomTemperature       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SurfaceSalinity         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ BottomSalinity          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ThermoCline             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ThermoClineDepth        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ CodendMesh              <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SecchiDepth             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ Turbidity               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ TidePhase               <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ TideSpeed               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ PelagicSamplingType     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ MinTrawlDepth           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ MaxTrawlDepth           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ SurveyIndexArea         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ EDOM                    <int> 20250401, 20250401, 20250401, 20250401, 202504…
#> $ ReasonHaulDisruption    <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ DateofCalculation       <chr> "", "", "", "", "", "", "", "", "", "", "", ""…
#> $ .id                     <chr> "BITS:1991:1:DK:26D4:CAM:150:67", "BITS:1991:1…
#> $ sur                     <chr> "BITS-1", "BITS-1", "BITS-1", "BITS-1", "BITS-…
#> $ date                    <date> 1991-03-20, 1991-03-20, 1991-03-20, 1991-03-1…
#> $ time                    <dttm> 1991-03-20 05:14:00, 1991-03-20 06:44:00, 199…

HL data

hl <- dr_con("HL", trim = FALSE)
hl |> dplyr::glimpse()
#> Rows: ??
#> Columns: 39
#> Database: DuckDB 1.5.0 [root@Darwin 25.3.0:R 4.5.2/:memory:]
#> $ RecordHeader          <chr> "HL", "HL", "HL", "HL", "HL", "HL", "HL", "HL",…
#> $ Survey                <chr> "BITS", "BITS", "BITS", "BITS", "BITS", "BITS", …
#> $ Quarter               <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ Country               <chr> "DE", "DE", "DE", "DE", "DE", "DE", "DE", "DE", …
#> $ Platform              <chr> "06S1", "06S1", "06S1", "06S1", "06S1", "06S1", …
#> $ Gear                  <chr> "H20", "H20", "H20", "H20", "H20", "H20", "H20",…
#> $ SweepLength           <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ GearExceptions        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ DoorType              <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ StationName           <chr> "48", "48", "48", "491", "491", "491", "491", "4…
#> $ HaulNumber            <int> 43, 43, 43, 42, 42, 42, 42, 42, 42, 42, 42, 42, …
#> $ Year                  <chr> "1991", "1991", "1991", "1991", "1991", "1991", …
#> $ SpeciesCodeType       <chr> "W", "W", "W", "W", "W", "W", "W", "W", "W", "W"…
#> $ SpeciesCode           <chr> "127143", "127143", "126440", "126417", "126417"…
#> $ SpeciesValidity       <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1"…
#> $ SpeciesSex            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ TotalNumber           <dbl> 6, 6, 2, 596, 596, 596, 596, 596, 596, 596, 596,…
#> $ SpeciesCategory       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ SubsampledNumber      <int> 3, 3, 1, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63,…
#> $ SubsamplingFactor     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ SubsampleWeight       <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ SpeciesCategoryWeight <int> 9, 9, 2, 240, 240, 240, 240, 240, 240, 240, 240,…
#> $ LengthCode            <chr> "1", "1", "1", "0", "0", "0", "0", "0", "0", "0"…
#> $ LengthClass           <int> 24, 25, 24, 150, 155, 160, 165, 170, 175, 210, 2…
#> $ NumberAtLength        <dbl> 2, 4, 2, 9, 9, 28, 57, 19, 19, 47, 38, 9, 9, 376…
#> $ DevelopmentStage      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ LengthType            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
#> $ ValidAphiaID          <int> 127143, 127143, 126440, 126417, 126417, 126417, …
#> $ ScientificName_WoRMS  <chr> "Pleuronectes platessa", "Pleuronectes platessa"…
#> $ DateofCalculation     <chr> "20250401", "20250401", "20250401", "20250401", …
#> $ .id                   <chr> "BITS:1991:1:DE:06S1:H20:48:43", "BITS:1991:1:DE…
#> $ DataType              <chr> "C", "C", "C", "C", "C", "C", "C", "C", "C", "C"…
#> $ HaulDuration          <int> 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, 30, …
#> $ sur                   <chr> "BITS-1", "BITS-1", "BITS-1", "BITS-1", "BITS-1"…
#> $ length_cm             <dbl> 24.0, 25.0, 24.0, 15.0, 15.5, 16.0, 16.5, 17.0, …
#> $ n_haul                <dbl> 1.0, 2.0, 1.0, 4.5, 4.5, 14.0, 28.5, 9.5, 9.5, 2…
#> $ n_hour                <dbl> 2, 4, 2, 9, 9, 28, 57, 19, 19, 47, 38, 9, 9, 376…
#> $ latin                 <chr> "Pleuronectes platessa", "Pleuronectes platessa"…
#> $ species               <chr> "European plaice", "European plaice", "green pol…

CA data

ca <- dr_con("CA", trim = FALSE)
ca |> dplyr::glimpse()
#> Rows: ??
#> Columns: 41
#> Database: DuckDB 1.5.0 [root@Darwin 25.3.0:R 4.5.2/:memory:]
#> $ RecordHeader         <chr> "CA", "CA", "CA", "CA", "CA", "CA", "CA", "CA", …
#> $ Survey               <chr> "BITS", "BITS", "BITS", "BITS", "BITS", "BITS", "…
#> $ Quarter              <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ Country              <chr> "SE", "SE", "SE", "SE", "SE", "SE", "SE", "SE", "…
#> $ Platform             <chr> "77AR", "77AR", "77AR", "77AR", "77AR", "77AR", "…
#> $ Gear                 <chr> "GOV", "GOV", "GOV", "GOV", "GOV", "GOV", "GOV", …
#> $ SweepLength          <int> 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 50, 5…
#> $ GearExceptions       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ DoorType             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ StationName          <chr> "71", "71", "71", "71", "71", "71", "71", "70", "…
#> $ HaulNumber           <int> 6, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
#> $ Year                 <chr> "1991", "1991", "1991", "1991", "1991", "1991", "…
#> $ SpeciesCodeType      <chr> "W", "W", "W", "W", "W", "W", "W", "W", "W", "W",…
#> $ SpeciesCode          <chr> "126436", "126436", "126436", "126436", "126436",…
#> $ AreaType             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ AreaCode             <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ LengthCode           <chr> "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",…
#> $ LengthClass          <int> 32, 34, 36, 39, 40, 43, 45, 30, 31, 34, 35, 38, 4…
#> $ IndividualSex        <chr> "M", "M", "F", "F", "F", "F", "F", "M", "M", "M",…
#> $ IndividualMaturity   <chr> "2", "2", "1", "2", "1", "2", "2", "1", "1", "1",…
#> $ AgePlusGroup         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ IndividualAge        <int> 2, 2, 2, 3, 3, 4, 4, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4…
#> $ CANoAtLngt           <int> 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
#> $ IndividualWeight     <dbl> 350, 410, 470, 640, 680, 770, 1100, 250, 280, 400…
#> $ FishID               <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ GeneticSamplingFlag  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ StomachSamplingFlag  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ AgeSource            <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ AgePreparationMethod <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ OtolithGrading       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ ParasiteSamplingFlag <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ MaturityScale        <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ LiverWeight          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ ValidAphiaID         <int> 126436, 126436, 126436, 126436, 126436, 126436, 1…
#> $ ScientificName_WoRMS <chr> "Gadus morhua", "Gadus morhua", "Gadus morhua", "…
#> $ DateofCalculation    <chr> "20250401", "20250401", "20250401", "20250401", "…
#> $ .id                  <chr> "BITS:1991:1:SE:77AR:GOV:71:6", "BITS:1991:1:SE:7…
#> $ sur                  <chr> "BITS-1", "BITS-1", "BITS-1", "BITS-1", "BITS-1",…
#> $ length_cm            <dbl> 32, 34, 36, 39, 40, 43, 45, 30, 31, 34, 35, 38, 4…
#> $ latin                <chr> "Gadus morhua", "Gadus morhua", "Gadus morhua", "…
#> $ species              <chr> "Atlantic cod", "Atlantic cod", "Atlantic cod", "…

Data processing using a connection

For those familiar with using dplyr-verbs to process data most of those function as well as many base-R functions can be used to process the data. E.g. one can get all survey stations for the third quarter in 2025 and add to that the number of cod observed using the following script:

system.time({
  data <-
    # Process the data in DuckDB
    dr_con("HH") |> 
    dplyr::filter(Year == 2024,
                  Quarter %in% 1:4) |> 
    dplyr::select(.id, sur, lon = ShootLongitude, lat = ShootLatitude) |> 
    dplyr::left_join(dr_con("HL", trim = TRUE) |> 
                       dplyr::filter(latin == "Gadus morhua") |> 
                       dplyr::group_by(.id) |> 
                       dplyr::summarise(n_haul = sum(n_haul, na.rm = TRUE)),
                     by = dplyr::join_by(.id)) |> 
    # Import the data into R
    dplyr::collect() |> 
    dplyr::mutate(n_haul = tidyr::replace_na(n_haul, 0))
})
#>    user  system elapsed 
#>   0.385   0.118   1.087
data |> dplyr::glimpse()
#> Rows: 4,185
#> Columns: 5
#> $ .id    <chr> "BITS:2024:4:DK:26D4:TVL:97:21", "BITS:2024:4:DK:26D4:TVL:77:17…
#> $ sur    <chr> "BITS-4", "BITS-4", "BITS-4", "BITS-4", "BITS-4", "BITS-4", "BI…
#> $ lon    <dbl> 16.2580, 16.3018, 17.9585, 19.0617, 14.6079, 19.2450, 17.4817, …
#> $ lat    <dbl> 55.7969, 55.5288, 57.0215, 57.3341, 55.4832, 54.3817, 54.8517, …
#> $ n_haul <dbl> 919.0000, 6.0000, 1.0000, 12.0000, 551.5667, 86.0000, 328.0000,…

Here all the code steps prior to the collect command are automatically translated to SQL and passed to the in-process DuckDB. It is only at collect step that the data is actually downloaded and imported into R. More importantly, only the variables .id (unique station id), sur, lon, lat, latin, and n_haul are ever passed over the web. In addition only certain chunks of the parquet files (read: rows), those that “fall within the range” of the filtered values are passed over the web.

Small print

This stuff is in development, thus bugs, snags and errors are expected. {obus} still has some experimental hangover functions that need to be pruned or removed.

Actually, as of now {obus} does not do very much, the initial focus has been on experimenting with fast access to DATRAS data. For all practical purposes one can totally live without it. Importing the full HH, HL and CA data into R can be achieved by (anticipated that ICES datacenter with maintain an official path):

hh <- arrow::read_parquet("https://heima.hafro.is/~einarhj/datras/raw/HH.parquet")
hl <- arrow::read_parquet("https://heima.hafro.is/~einarhj/datras/raw/HL.parquet")
ca <- arrow::read_parquet("https://heima.hafro.is/~einarhj/datras/raw/CA.parquet")

And a duckdb connection can be achieved by (here we have slightly augmented dataframes, using some simple obus wrapper functions):

hh <- duckdbfs::open_dataset("https://heima.hafro.is/~einarhj/datras/HH.parquet")
hl <- duckdbfs::open_dataset("https://heima.hafro.is/~einarhj/datras/HL.parquet")
ca <- duckdbfs::open_dataset("https://heima.hafro.is/~einarhj/datras/CA.parquet")

Specs

#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.5.2 (2025-10-31)
#>  os       macOS Tahoe 26.3.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Atlantic/Reykjavik
#>  date     2026-03-23
#>  pandoc   3.9.0.2 @ /opt/homebrew/bin/ (via rmarkdown)
#>  quarto   1.8.26 @ /usr/local/bin/quarto
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version    date (UTC) lib source
#>  arrow         23.0.1.1   2026-02-24 [1] CRAN (R 4.5.2)
#>  assertthat    0.2.1      2019-03-21 [1] CRAN (R 4.5.0)
#>  bit           4.6.0      2025-03-06 [1] CRAN (R 4.5.0)
#>  bit64         4.6.0-1    2025-01-16 [1] CRAN (R 4.5.0)
#>  blob          1.3.0      2026-01-14 [1] CRAN (R 4.5.2)
#>  cachem        1.1.0      2024-05-16 [2] CRAN (R 4.5.0)
#>  cli           3.6.5      2025-04-23 [1] CRAN (R 4.5.0)
#>  curl          7.0.0      2025-08-19 [1] CRAN (R 4.5.0)
#>  data.table    1.18.2.1   2026-01-27 [1] CRAN (R 4.5.2)
#>  DBI           1.3.0      2026-02-25 [1] CRAN (R 4.5.2)
#>  dbplyr        2.5.2      2026-02-13 [1] CRAN (R 4.5.2)
#>  devtools      2.5.0      2026-03-14 [2] CRAN (R 4.5.2)
#>  digest        0.6.39     2025-11-19 [2] CRAN (R 4.5.2)
#>  dplyr         1.2.0      2026-02-03 [1] CRAN (R 4.5.2)
#>  duckdb        1.5.0      2026-03-14 [1] CRAN (R 4.5.2)
#>  duckdbfs      0.1.2      2025-10-12 [1] CRAN (R 4.5.0)
#>  ellipsis      0.3.2      2021-04-29 [2] CRAN (R 4.5.0)
#>  evaluate      1.0.5      2025-08-27 [2] CRAN (R 4.5.0)
#>  fastmap       1.2.0      2024-05-15 [2] CRAN (R 4.5.0)
#>  fs            1.6.7      2026-03-06 [1] CRAN (R 4.5.2)
#>  generics      0.1.4      2025-05-09 [1] CRAN (R 4.5.0)
#>  glue          1.8.0      2024-09-30 [1] CRAN (R 4.5.0)
#>  htmltools     0.5.9      2025-12-04 [2] CRAN (R 4.5.2)
#>  httr2         1.2.2      2025-12-08 [1] CRAN (R 4.5.2)
#>  icesDatras    1.4.1      2023-05-08 [1] CRAN (R 4.5.0)
#>  knitr         1.51       2025-12-20 [2] CRAN (R 4.5.2)
#>  lifecycle     1.0.5      2026-01-08 [1] CRAN (R 4.5.2)
#>  magrittr      2.0.4      2025-09-12 [1] CRAN (R 4.5.0)
#>  memoise       2.0.1      2021-11-26 [2] CRAN (R 4.5.0)
#>  obus        * 2026.01.30 2026-03-23 [1] local
#>  otel          0.2.0      2025-08-29 [2] CRAN (R 4.5.0)
#>  pillar        1.11.1     2025-09-17 [1] CRAN (R 4.5.0)
#>  pkgbuild      1.4.8      2025-05-26 [2] CRAN (R 4.5.0)
#>  pkgconfig     2.0.3      2019-09-22 [1] CRAN (R 4.5.0)
#>  pkgload       1.5.0      2026-02-03 [2] CRAN (R 4.5.2)
#>  purrr         1.2.1      2026-01-09 [1] CRAN (R 4.5.2)
#>  R6            2.6.1      2025-02-15 [1] CRAN (R 4.5.0)
#>  rappdirs      0.3.4      2026-01-17 [1] CRAN (R 4.5.2)
#>  rlang         1.1.7      2026-01-09 [1] CRAN (R 4.5.2)
#>  rmarkdown     2.30       2025-09-28 [2] CRAN (R 4.5.0)
#>  rstudioapi    0.18.0     2026-01-16 [2] CRAN (R 4.5.2)
#>  sessioninfo   1.2.3      2025-02-05 [2] CRAN (R 4.5.0)
#>  tibble        3.3.1      2026-01-11 [1] CRAN (R 4.5.2)
#>  tidyr         1.3.2      2025-12-19 [1] CRAN (R 4.5.2)
#>  tidyselect    1.2.1      2024-03-11 [1] CRAN (R 4.5.0)
#>  usethis       3.2.1      2025-09-06 [2] CRAN (R 4.5.0)
#>  vctrs         0.7.2      2026-03-21 [1] CRAN (R 4.5.2)
#>  withr         3.0.2      2024-10-28 [1] CRAN (R 4.5.0)
#>  xfun          0.57       2026-03-20 [2] CRAN (R 4.5.2)
#>  yaml          2.3.12     2025-12-10 [2] CRAN (R 4.5.2)
#> 
#>  [1] /private/var/folders/14/1_h9q5hn2h93byhrkzp8jfj00000gp/T/Rtmpm2bHHp/temp_libpath1154747dbcbea
#>  [2] /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/library
#>  * ── Packages attached to the search path.
#> 
#> ──────────────────────────────────────────────────────────────────────────────