Dataset

FAO Dataset

Source

The dataset can be found here: FAO Dataset.

Dataset

We believe that we have found two versions of the same dataset: one found on Kaggle ( https://www.kaggle.com/unitednations/global-food-agriculture-statistics ), as well as one found directly on the UNFAO’s website ( http://www.fao.org/faostat/en/#data ). The dataset found directly on the UNFAO’s website is more up-to-date, and differently structured from the one on Kaggle – it seems that latter is a cleaned and/or restructured version of the first. We would lean towards using the UNFAO’s data directly as we believe that it is better to use data which is closer to its source. We will also note that our datasets contain interpolated values in order to cover gaps in data collection, which we might choose to drop.

The dataset contains 78 csv files, each representing different statistics related to Food and Agriculture, ranging from a few KiB to a couple of GiB of data. We believe we won’t use all of them, as some describe numbers that are not needed such as Consumer prices, exchange rates food aid food balance, household surveys etc …

The deleted (unused) csv files are :

  • Development_Assistance_to_Agriculture_E_All_Data_(Normalized).csv
  • Exchange_rate_E_All_Data_(Normalized).csv
  • Food_Aid_Shipments_WFP_E_All_Data_(Normalized).csv
  • Prices_Monthly_E_All_Data_(Normalized).csv
  • ConsumerPriceIndices_E_All_Data_(Normalized).csv
  • Employment_Indicators_E_All_Data_(Normalized).csv
  • Indicators_from_Household_Surveys_E_All_Data_(Normalized).csv
  • Forestry_Trade_Flows_E_All_Data_(Normalized).csv
  • ASTI_Research_Spending_E_All_Data_(Norm).csv
  • Trade_Crops_Livestock_E_All_Data_(Normalized).csv
  • Trade_DetailedTradeMatrix_E_All_Data_(Normalized).csv
  • Trade_Indices_E_All_Data_(Normalized).csv
  • Trade_LiveAnimals_E_All_Data_(Normalized).csv
  • ASTI_Researchers_E_All_Data_(Norm).csv
  • CommodityBalances_LivestockFish_E_All_Data_(Normalized).csv
  • CommodityBalances_Crops_E_All_Data_(Normalized).csv
  • Deflators_E_All_Data_(Normalized).csv
  • FoodBalanceSheets_E_All_Data_(Normalized).csv
  • Food_Security_Data_E_All_Data_(Normalized).csv
  • FoodSupply_Crops_E_All_Data_(Normalized).csv
  • FoodSupply_LivestockFish_E_All_Data_(Normalized).csv
  • Macro-Statistics_Key_Indicators_E_All_Data_(Normalized).csv
  • Prices_E_All_Data_(Normalized).csv
  • PricesArchive_E_All_Data.csv
  • Price_Indices_E_All_Data_(Normalized).csv
  • Investment_CapitalStock_E_All_Data_(Normalized).csv
  • Investment_ForeignDirectInvestment_E_All_Data_(Normalized).csv
  • Investment_Machinery_E_All_Data_(Normalized).csv
  • Investment_CountryInvestmentStatisticsProfile__E_All_Data_(Normalized).csv
  • Investment_GovernmentExpenditure_E_All_Data_(Normalized).csv
  • Investment_CreditAgriculture_E_All_Data_(Normalized).csv
  • Investment_MachineryArchive_E_All_Data_(Normalized).csv
  • Value_of_Production_E_All_Data_(Normalized).csv
  • Inputs_FertilizersTradeValues_E_All_Data_(Normalized).csv
  • Inputs_Pesticides_Trade_E_All_Data_(Normalized).csv

This leaves us with 43 csv files, which have a very similar schema. It is a row based database, with each row having the following attributes (at least): Area Code, Area, Item Code, Item, Element Code, Element, Year Code, Year, Unit, Value, Flag.

The flag indicates the source of the data, and if it’s estimated, computed, or raw measurements. Elements and element codes are a one to one relationship, and the same applies for Area Code and Area, Item and Item code. The unit is the unit of measurement. If the data is monthly, the column Month is added