Rectangle 27 1

rest API pagination best practices?


{
    "data" : [
        {  data item 1 with all relevant fields    },
        {  data item 2   },
        ...
        {  data item 100 }
    ],
    "paging":  {
        "previous":  "http://api.example.com/foo?since=TIMESTAMP1" 
        "next":  "http://api.example.com/foo?since=TIMESTAMP2"
    }

}

Sign up for our newsletter and get our top new questions delivered to your inbox (see an example).

@jandjorgensen From your link: "The timestamp data type is just an incrementing number and does not preserve a date or a time. ... In SQL server 2008 and later, the timestamp type has been renamed to rowversion, presumably to better reflect its purpose and value." So there's no evidence here that timestamps (those that actually contain a time value) are unique.

@jandjorgensen I like your proposal, but wouldn't you need some kind of information in the resource links, so we know if we go previous or next? Sth like: "previous": "api.example.com/foo?before=TIMESTAMP "next": "api.example.com/foo?since=TIMESTAMP2 We would also use our sequence ids instead of a timestamp. Do you see any problems with that?

Another similar option is to use the Link header field specified in RFC 5988 (section 5) : tools.ietf.org/html/rfc5988#page-6

I'm not completely sure how your data is handled, so this may or may not work, but have you considered paginating with a timestamp field?

Just a note, only using one timestamp relies on an implicit 'limit' in your results. You may want to add an explicit limit or also use an until property.

One problem may be if you add a data item, but based on your description it sounds like they would be added to the end (if not, let me know and I'll see if I can improve on this).

The timestamp can be dynamically determined using the last data item in the list. This seems to be more or less how Facebook paginates in its Graph API (scroll down to the bottom to see the pagination links in the format I gave above).

Timestamps are not guaranteed to be unique. That is, multiple resources can be created with the same timestamp. So this approach has the downside that the next page, might repeat the last (few?) entries from the current page.

When you query /foos you get 100 results. Your API should then return something like this (assuming JSON, but if it needs XML the same principles can be followed):

Note
Rectangle 27 1

rest API pagination best practices?


For my case I implicitly designate some API calls to allow getting the whole information (primarily reference table data). You can also secure these APIs so it won't harm your system.

If an accurate live scrolling view is needed, REST APIs which are request/response in nature are not well suited for this purpose. For this you should consider WebSockets or HTML5 Server-Sent Events to let your front end know when dealing with changes.

Now if there's a need to get a snapshot of the data, I would just provide an API call that provides all the data in one request with no pagination. Mind you, you would need something that would do streaming of the output without temporarily loading it in memory if you have a large data set.

Pagination is generally a "user" operation and to prevent overload both on computers and the human brain you generally give a subset. However, rather than thinking that we don't get the whole list it may be better to ask does it matter?

Note
Rectangle 27 1

rest API pagination best practices?


{
        "isRefresh" : false,
        "cached" : ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10",,"id500"]//Too long request
}
{
        "isRefresh" : false,
        "firstId" : "id1",
        "lastId" : "id10",
        "last_request_time" : 1421748005
}
{
        "records" : [
{"id" :"id2","more_key":"updated_value"},
{"id" :"id11","more_key":"more_value"},
{"id" :"id12","more_key":"more_value"},
{"id" :"id13","more_key":"more_value"},
{"id" :"id14","more_key":"more_value"},
{"id" :"id15","more_key":"more_value"},
{"id" :"id16","more_key":"more_value"},
{"id" :"id17","more_key":"more_value"},
{"id" :"id18","more_key":"more_value"},
{"id" :"id19","more_key":"more_value"},
{"id" :"id20","more_key":"more_value"}],
        "deleted" : ["id5","id8"]
}

Approach 1: When server is not smart enough to handle object states.

But in this case if youve a lot of local cached records suppose 500, then your request string will be too long like this:-

Now suppose you are requesting old records(load more) and suppose "id2" record is updated by someone and "id5" and "id8" records is deleted from server then your server response should look something like this:-

You could send all cached record unique ids to server, for example ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"] and a boolean parameter to know whether you are requesting new records(pull to refresh) or old records(load more).

You could send the id of first record and the last record and previous request epoch time. In this way your request is always small even if youve a big amount of cached records

Your server is responsible to return the ids of deleted records which is deleted after the last_request_time as well as return the updated record after last_request_time between "id1" and "id10" .

Your sever should responsible to return new records(load more records or new records via pull to refresh) as well as ids of deleted records from ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"].

Note
Rectangle 27 1

rest API pagination best practices?


Following Facebook's flow, you can (and should) cache the pages already requested and just return those with deleted rows filtered if they request a page they had already requested.

I disagree. Just keeping the unique IDs does not use much memory at all. You don't to retain the data indefinitely, just for the "session". This is easy with memcache, just set the expire duration (i.e. 10 minutes).

If you really need to accommodate this edge case, you need to "remember" where you left off. jandjorgensen suggestion is just about spot on, but I would use a field guaranteed to be unique like the primary key. You may need to use more than one field.

It may be tough to find best practices since most systems with APIs don't accommodate for this scenario, because it is an extreme edge, or they don't typically delete records (Facebook, Twitter). Facebook actually says each "page" may not have the number of results requested due to filtering done after pagination. https://developers.facebook.com/blog/post/478/

This is not an acceptable solution. It is considerably time and memory consuming. All the deleted data along with requested data will need to be kept in memory which might not be used at all if the same user doesn't request any more entries.

Note
Rectangle 27 1

rest API pagination best practices?


GET /query/12345?all=true
HTTP/1.1 301 Here's your query
Location: http://www.example.org/query/12345
POST /createquery
filter.firstName=Bob&filter.lastName=Eubanks

(Default sort of foos is by creation date, so row insertion is not a problem.)

First, you have the example that you cited.

If the use case is simply that your users want (and need) all of the data, then you can simply give it to them:

If you are not snapshotting the original data set, then this is just a fact of life.

Then you can page that all day long, since it's now static. This can be reasonably light weight, since you can just capture the actual document keys rather than the entire rows.

You also have a similar problem if rows are inserted, but in this case the user get duplicate data (arguably easier to manage than missing data, but still an issue).

You can have the user make an explicit snapshot:

You have several problems.

and just send the whole kit.

Note
Rectangle 27 1

rest API pagination best practices?


  • A state field keeps track of whether an item is deleted
  • Because only new changes are created, every single changeId represents a unique snapshot of the underlying data at the moment the change was created.
  • I created a single table changelogs with an auto-increment ID column
  • My entities have an id field, but this is not the primary key
  • My queries select the maximum changeId (grouped by id) and self-join that to get the most recent versions of all records.
  • The entities have a changeId field which is both the primary key as well as a foreign key to changelogs.
  • The max changeId is returned to the client and added as a query parameter in subsequent requests
  • This also opens up exciting feature such as rollback / revert, synching client cache etc. Any features that benefit from change history.
  • This means that you can cache the results of requests that have the parameter changeId in them forever. The results will never expire because they will never change.
  • Whenever a user creates, updates or deletes a record, the system inserts a new record in changelogs, grabs the id and assigns it to a new version of the entity, which it then inserts in the DB

I've implemented it and it's actually less difficult than I expected. Here's what I did:

I've thought long and hard about this and finally ended up with the solution I'll describe below. It's a pretty big step up in complexity but if you do make this step, you'll end up with what you are really after, which is deterministic results for future requests.

Your example of an item being deleted is only the tip of the iceberg. What if you are filtering by color=blue but someone changes item colors in between requests? Fetching all items in a paged manner reliably is impossible... unless... we implement revision history.

Note
Rectangle 27 1

rest API pagination best practices?


If you've got pagination you also sort the data by some key. Why not let API clients include the key of the last element of the previously returned collection in the URL and add a WHERE clause to your SQL query (or something equivalent, if you're not using SQL) so that it returns only those elements for which the key is greater than this value?

Note
Rectangle 27 0

rest API pagination best practices?


page_max = 100
def get_page_results(page_no) :

    start = (page_no - 1) * page_max + 1
    end = page_no * page_max

    return fetch_results_by_id_between(start, end)

I agree. rather than query by record number (which is not reliable) you should query by ID. Change your query(x, m) to mean "return up to m records SORTED by ID, with ID > x", then you can simply set x to the maximum id from the previous query result.

I think currently your api's actually responding the way it should. The first 100 records on the page in the overall order of objects you are maintaining. Your explanation tells that you are using some kind of ordering ids to define the order of your objects for pagination.

Now, in case you want that page 2 should always start from 101 and end at 200, then you must make the number of entries on the page as variable, since they are subject to deletion.

True, either sort on ids or if you have some concrete business field to sort on like creation_date etc.

You should do something like the below pseudocode:

Note